.extract_peptide_groups

proteopy.pp.extract_peptide_groups(adata, peptide_col='peptide_id', group_by=None, inplace=True)[source]

Create new columns adata.var['peptide_group_id'] and adata.var['peptide_group_nr']. peptide_group_id contains all overlapping (substring) peptide_ids joined by ';'. peptide_group_nr is a unique integer identifier for each group, numbered across all groups in order of appearance.

Parameters:
  • adata (AnnData) – Must have adata.var[peptide_col] with peptide sequences (already normalized).

  • peptide_col (str) – Column in adata.var containing peptide sequences.

  • group_by (str or None, optional) – Column in adata.var to partition peptides before grouping. When set (e.g. 'protein_id'), substring containment is only evaluated among peptides that share the same value in this column. When None, all peptides are grouped globally.

  • inplace (bool) – If True, modifies adata in place. If False, returns a modified copy.

Returns:

None if inplace is True, otherwise a modified copy.

Return type:

None or AnnData