.williams_2018

proteopy.datasets.williams_2018(fill_na=None)[source]

Load Williams 2018 mouse multi-tissue proteomics dataset.

Download, process and format as an AnnData object the peptide-level SWATH-MS dataset from Williams et al. (2018) [1] quantifying protein expression across five tissues in eight genetically diverse BXD mouse strains. Only the whole cell fraction is included; peptide intensities from different charge states are summed per peptide sequence. By default, missing values are represented as np.nan.

Sample annotation (.obs) includes:
  • sample_id: Unique sample identifier

  • tissue: Tissue type (Brain, BAT, Heart, Liver, Quad)

  • mouse_id: BXD mouse strain identifier

Variable annotation (.var) includes:
  • peptide_id: Peptide sequence (matches .var_names)

  • protein_id: UniProt protein identifier

  • gene_id: Gene symbol

Data are sourced from the Elsevier supplementary archive (DOI: 10.1074/mcp.RA118.000554).

Parameters:

fill_na (float | int | None, optional) – If not None, replace np.nan in .X with this value.

Returns:

AnnData object with peptide-level quantification data. .X contains peptide intensities (samples x peptides).

Return type:

ad.AnnData

Raises:

urllib.error.URLError – If download from the Elsevier CDN fails.

Examples

>>> import proteopy as pr
>>> adata = pr.datasets.williams_2018()
>>> adata
AnnData object with n_obs x n_vars
    obs: 'sample_id', 'tissue', 'mouse_id'
    var: 'peptide_id', 'protein_id', 'gene_id'

References