.peptides_on_prot_sequence

proteopy.pl.peptides_on_prot_sequence(adata, protein_id, group_by=None, ref_sequence=None, alt_pep_sequence_key=None, add_sequences=None, allow_overlaps=False, allow_multi_match=False, color_scheme=None, title=None, order=None, figsize=None, show=True, save=None, ax=None)[source]

Plot peptide coverage on a protein reference sequence.

Filters peptides belonging to the given protein_id and draws them as broken bars on top of a grey reference bar. Extra sequences (e.g., domains or post-translational modification sites) can be overlaid via the add_sequences parameter.

Parameters:
  • adata (AnnData) – Peptide-level AnnData.

  • protein_id (str) – Value in adata.var["protein_id"] to select peptides for.

  • group_by (str | None) – Column in adata.var used to assign peptides to colored rows. When None, all peptides are placed in a single row labeled protein_id.

  • ref_sequence (str | None) – Full reference protein sequence as an amino-acid string.

  • alt_pep_sequence_key (str | None) – Column in adata.var whose values are used as peptide amino-acid strings. When None, adata.var_names (i.e., peptide_id) are used.

  • add_sequences (dict[str, dict] | None) – Additional named sequences to overlay (e.g., domains). Each value must be a dict with a "group" key and either a "seq" or "seq_coord" key. Coordinates are 0-based throughout. "seq_coord" must be a half-open [start, end) tuple where start >= 0 and end <= len(ref_sequence).

  • allow_overlaps (bool) – When False, raise ValueError if two sequences in the same group overlap positionally.

  • allow_multi_match (bool) – When True, a peptide sequence matching the reference at multiple positions is shown as one bar per match, labeled "<name> (match N)". When False, a ValueError is raised for ambiguous matches.

  • color_scheme (str | dict | list | Colormap | callable | None) – Color specification for groups.

  • title (str | None) – Axes title.

  • order (list[str] | None) – Explicit ordering of group rows.

  • figsize (tuple[float, float] | None) – Figure dimensions (width, height) in inches.

  • show (bool) – Call plt.show() at the end.

  • save (str | Path | None) – Path at which to save the figure (300 dpi, bbox_inches="tight"). When None, the figure is not saved.

  • ax (matplotlib.axes.Axes | None) – Matplotlib Axes object to plot onto. If None, a new figure and axes are created. The function always returns the Axes object used for plotting.

Returns:

The Axes object used for plotting.

Return type:

matplotlib.axes.Axes

Raises:
  • ValueError – If adata is not at peptide level.

  • ValueError – If ref_sequence is None.

  • ValueError – If no peptides match protein_id in adata.var["protein_id"].

  • ValueError – If a peptide sequence is not found in the reference.

  • ValueError – If a peptide matches the reference at multiple positions and allow_multi_match=False.

  • ValueError – If two sequences in the same group overlap and allow_overlaps=False.

  • KeyError – If alt_pep_sequence_key or group_by is not found in adata.var.

Examples

Build a minimal peptide-level AnnData with two proteins. "ELPGW" belongs to P2 and is excluded by the filter.

>>> import numpy as np
>>> import pandas as pd
>>> import anndata as ad
>>> import proteopy as pr
>>> ref = "MKTAYIAKQRQISFVKSHFSRQDEEP"
>>> peptides = [
...     "KTAYIAK", "QRQISFVK", "SHFSRQ", "ELPGW",
... ]
>>> var = pd.DataFrame(
...     {
...         "peptide_id": peptides,
...         "protein_id": ["P1", "P1", "P1", "P2"],
...         "proteoform": ["A", "A", "B", "A"],
...     },
...     index=peptides,
... )
>>> obs = pd.DataFrame(
...     {"sample_id": ["S1", "S2"]},
...     index=["S1", "S2"],
... )
>>> adata = ad.AnnData(
...     X=np.ones((2, 4)),
...     obs=obs,
...     var=var,
... )

Filter to P1; group peptides by proteoform.

>>> pr.pl.peptides_on_prot_sequence(
...     adata,
...     protein_id="P1",
...     ref_sequence=ref,
...     group_by="proteoform",
... )