.remove_zero_variance_vars
- proteopy.pp.remove_zero_variance_vars(adata, group_by=None, atol=1e-08, inplace=True, verbose=False)[source]
Remove variables with near-zero or zero variance, skipping NaN values.
Variables whose variance is at or below
atolare removed. Variables that are entirely NaN — globally or within any group whengroup_byis set — are treated as zero variance and also removed.- Parameters:
adata (AnnData) –
AnnDataannotated data matrix.group_by (str | None, optional) – Column in
adata.obsto compute variance per group. When set, a variable is removed if its variance is<= atolor all-NaN in any group.atol (float, optional) – Absolute tolerance threshold. Variables with variance
<= atolare considered zero-variance and removed.inplace (bool, optional) – Modify
adatain place and returnNone. Otherwise, returns a filtered copy.verbose (bool, optional) – Print how many variables were present, removed, and remaining.
- Returns:
Nonewheninplace=True; a newanndata.AnnDatacontaining only variables with variance> atolotherwise.- Return type:
None | AnnData
- Raises:
TypeError – If any argument has an incorrect type.
ValueError – If
atolis negative or thegroup_bycolumn contains NaN values.KeyError – If
group_byis not a column inadata.obs.
- Warns:
UserWarning – Raised when one or more variables are removed because they are entirely NaN (globally or within at least one group).
Examples
Build a small protein-level dataset with four variables:
p1varies,p2is constant,p3is all-NaN, andp4varies.>>> import numpy as np >>> import pandas as pd >>> import anndata as ad >>> import proteopy as pr >>> X = np.array([ ... [1.0, 5.0, np.nan, 7.0], ... [2.0, 5.0, np.nan, 7.0], ... [3.0, 5.0, np.nan, 8.0], ... ]) >>> obs = pd.DataFrame( ... {"sample_id": ["s1", "s2", "s3"]}, ... index=["s1", "s2", "s3"], ... ) >>> var = pd.DataFrame( ... {"protein_id": ["p1", "p2", "p3", "p4"]}, ... index=["p1", "p2", "p3", "p4"], ... ) >>> adata = ad.AnnData(X=X, obs=obs, var=var) >>> pr.pp.remove_zero_variance_vars(adata) >>> adata.var_names.tolist() ['p1', 'p4']
With
group_by, a variable is removed if it has zero variance or is all-NaN in any group. Herep2is constant in group A,p3is all-NaN in group A, andp4is constant in both groups:>>> X_grp = np.array([ ... [1.0, 5.0, np.nan, 9.0], ... [2.0, 5.0, np.nan, 9.0], ... [3.0, 7.0, 8.0, 9.0], ... [4.0, 8.0, 8.0, 9.0], ... ]) >>> obs_grp = pd.DataFrame( ... {"sample_id": ["s1", "s2", "s3", "s4"], ... "group": ["A", "A", "B", "B"]}, ... index=["s1", "s2", "s3", "s4"], ... ) >>> adata = ad.AnnData(X=X_grp, obs=obs_grp, var=var) >>> pr.pp.remove_zero_variance_vars(adata, group_by="group") >>> adata.var_names.tolist() ['p1']