.remove_zero_variance_vars

proteopy.pp.remove_zero_variance_vars(adata, group_by=None, atol=1e-08, inplace=True, verbose=False)[source]

Remove variables with near-zero or zero variance, skipping NaN values.

Variables whose variance is at or below atol are removed. Variables that are entirely NaN — globally or within any group when group_by is set — are treated as zero variance and also removed.

Parameters:
  • adata (AnnData) – AnnData annotated data matrix.

  • group_by (str | None, optional) – Column in adata.obs to compute variance per group. When set, a variable is removed if its variance is <= atol or all-NaN in any group.

  • atol (float, optional) – Absolute tolerance threshold. Variables with variance <= atol are considered zero-variance and removed.

  • inplace (bool, optional) – Modify adata in place and return None. Otherwise, returns a filtered copy.

  • verbose (bool, optional) – Print how many variables were present, removed, and remaining.

Returns:

None when inplace=True; a new anndata.AnnData containing only variables with variance > atol otherwise.

Return type:

None | AnnData

Raises:
  • TypeError – If any argument has an incorrect type.

  • ValueError – If atol is negative or the group_by column contains NaN values.

  • KeyError – If group_by is not a column in adata.obs.

Warns:

UserWarning – Raised when one or more variables are removed because they are entirely NaN (globally or within at least one group).

Examples

Build a small protein-level dataset with four variables: p1 varies, p2 is constant, p3 is all-NaN, and p4 varies.

>>> import numpy as np
>>> import pandas as pd
>>> import anndata as ad
>>> import proteopy as pr
>>> X = np.array([
...     [1.0, 5.0, np.nan, 7.0],
...     [2.0, 5.0, np.nan, 7.0],
...     [3.0, 5.0, np.nan, 8.0],
... ])
>>> obs = pd.DataFrame(
...     {"sample_id": ["s1", "s2", "s3"]},
...     index=["s1", "s2", "s3"],
... )
>>> var = pd.DataFrame(
...     {"protein_id": ["p1", "p2", "p3", "p4"]},
...     index=["p1", "p2", "p3", "p4"],
... )
>>> adata = ad.AnnData(X=X, obs=obs, var=var)
>>> pr.pp.remove_zero_variance_vars(adata)
>>> adata.var_names.tolist()
['p1', 'p4']

With group_by, a variable is removed if it has zero variance or is all-NaN in any group. Here p2 is constant in group A, p3 is all-NaN in group A, and p4 is constant in both groups:

>>> X_grp = np.array([
...     [1.0, 5.0, np.nan, 9.0],
...     [2.0, 5.0, np.nan, 9.0],
...     [3.0, 7.0,  8.0,   9.0],
...     [4.0, 8.0,  8.0,   9.0],
... ])
>>> obs_grp = pd.DataFrame(
...     {"sample_id": ["s1", "s2", "s3", "s4"],
...      "group": ["A", "A", "B", "B"]},
...     index=["s1", "s2", "s3", "s4"],
... )
>>> adata = ad.AnnData(X=X_grp, obs=obs_grp, var=var)
>>> pr.pp.remove_zero_variance_vars(adata, group_by="group")
>>> adata.var_names.tolist()
['p1']