.long
- proteopy.read.long(intensities, level=None, *, sample_annotation=None, var_annotation=None, column_map=None, sep=None, fill_na=None, zero_to_na=False, sort_obs_by_annotation=False, verbose=False)[source]
Read long-format peptide or protein tabular data into an AnnData container.
The
intensitiestable must be in long format with one row per (sample, feature) measurement. Required columns differ by level:Peptide level:
sample_id,intensity, andpeptide_idmust be present.protein_idmay come from the intensities table or fromvar_annotation; see below.Protein level:
sample_id,intensity, andprotein_idmust all be present.
At peptide level,
protein_idis resolved in two steps. If the intensities table already containsprotein_id, it is used directly. Otherwise,var_annotationmust be supplied and contain bothpeptide_idandprotein_id.sample_annotation, when supplied, must contain asample_idcolumn and is merged intoadata.obs.var_annotation, when supplied, must contain apeptide_idcolumn (peptide level) or aprotein_idcolumn (protein level) and is merged intoadata.var.Column names that differ from the defaults above can be mapped to the canonical names via
column_map.- Parameters:
intensities (str | Path | pd.DataFrame) – Long-form intensities data. Accepts a file path (str or Path) or a
pandas.DataFrame.level ({"peptide", "protein"}, default None) – Select whether to process peptide- or protein-level inputs. This argument is required.
sample_annotation (str | Path | pd.DataFrame, optional) – Optional obs annotations. Accepts a file path or DataFrame.
var_annotation (str | Path | pd.DataFrame, optional) – Optional var annotations. Accepts a file path or DataFrame. Interpreted as peptide annotations when
level="peptide"and as protein annotations whenlevel="protein".column_map (dict, optional) – Optional mapping that specifies custom column names for the expected keys: peptide_id, protein_id, sample_id, intensity.
sep (str, optional) – Delimiter passed to pandas.read_csv. If None (the default), the separator is auto-detected from the file extension. Ignored when input is a DataFrame.
fill_na (float, optional) – Optional replacement value for missing intensity entries.
zero_to_na (bool, optional) – If True, zeros in the AnnData X matrix will be replaced with
np.nan.sort_obs_by_annotation (bool, default False) – When True, reorder observations to match the order of samples in the annotation (if supplied) or the original intensity table.
verbose (bool, optional) – If True, print status messages.
- Returns:
Structured representation of the long-form intensities ready for downstream analysis.
- Return type:
AnnData
Examples
Example 1: Minimal peptide-level read with
protein_idin the intensities DataFrame.>>> import pandas as pd >>> import proteopy as pr >>> intensities = pd.DataFrame({ ... "sample_id": [ ... "S1", "S1", "S2", "S2", ... ], ... "peptide_id": [ ... "PEP1", "PEP2", "PEP1", "PEP2", ... ], ... "protein_id": [ ... "PROT1", "PROT1", "PROT1", "PROT1", ... ], ... "intensity": [ ... 12450.0, 8730.0, 15320.0, 6890.0, ... ], ... }) >>> adata = pr.read.long( ... intensities, level="peptide", ... ) >>> adata AnnData object with n_obs × n_vars = 2 × 2 obs: 'sample_id' var: 'peptide_id', 'protein_id'
Example 2: Peptide-level read with
protein_idsupplied viavar_annotationinstead of the intensities DataFrame.>>> intensities = pd.DataFrame({ ... "sample_id": [ ... "S1", "S1", "S2", "S2", ... ], ... "peptide_id": [ ... "PEP1", "PEP2", "PEP1", "PEP2", ... ], ... "intensity": [ ... 12450.0, 8730.0, 15320.0, 6890.0, ... ], ... }) >>> var_ann = pd.DataFrame({ ... "peptide_id": ["PEP1", "PEP2"], ... "protein_id": ["PROT1", "PROT1"], ... }) >>> adata = pr.read.long( ... intensities, ... level="peptide", ... var_annotation=var_ann, ... ) >>> adata AnnData object with n_obs × n_vars = 2 × 2 obs: 'sample_id' var: 'peptide_id', 'protein_id'
Example 3: Peptide-level read with non-standard column names remapped via
column_map.>>> intensities = pd.DataFrame({ ... "run": ["S1", "S1", "S2", "S2"], ... "seq": [ ... "PEP1", "PEP2", "PEP1", "PEP2", ... ], ... "prot": [ ... "PROT1", "PROT1", "PROT1", "PROT1", ... ], ... "quant": [ ... 12450.0, 8730.0, 15320.0, 6890.0, ... ], ... }) >>> adata = pr.read.long( ... intensities, ... level="peptide", ... column_map={ ... "sample_id": "run", ... "peptide_id": "seq", ... "protein_id": "prot", ... "intensity": "quant", ... }, ... ) >>> adata AnnData object with n_obs × n_vars = 2 × 2 obs: 'sample_id' var: 'peptide_id', 'protein_id'