.karayel_2020

proteopy.download.karayel_2020(intensities_path='karayel-2020_ms-proteomics_human-erythropoiesis_intensities.tsv', var_annotation_path='karayel-2020_ms-proteomics_human-erythropoiesis_protein-annotation.tsv', sample_annotation_path='karayel-2020_ms-proteomics_human-erythropoiesis_sample-annotation.tsv', *, sep=None, fill_na=None, force=False)[source]

Save Karayel 2020 erythropoiesis dataset to disk.

Download and process the protein-level DIA-MS dataset from Karayel et al. [1] and save it as three tabular files: intensities in long format, protein annotations, and sample annotations.

The study quantified ~7,400 proteins from CD34+ hematopoietic stem/progenitor cells (HSPCs) isolated from healthy donors, across five sequential erythroid differentiation stages with four biological replicates each (20 samples total). Cells were FACS-sorted using CD235a, CD49d, and Band 3 surface markers. The differentiation stages are:

  • Progenitor: CFU-E progenitor cells (CD34+ HSPCs, negative fraction)

  • ProE&EBaso: Proerythroblasts and early basophilic erythroblasts

  • LBaso: Late basophilic erythroblasts

  • Poly: Polychromatic erythroblasts

  • Ortho: Orthochromatic erythroblasts

Data are sourced from the PRIDE archive (PXD017276). Protein quantities marked as Filtered in the original data are converted to np.nan. Samples collected at day 7 are excluded.

Parameters:
  • intensities_path (str | Path, optional) – Destination path for the intensities file. Columns: sample_id, protein_id, intensity.

  • var_annotation_path (str | Path, optional) – Destination path for the protein annotation file. Columns: protein_id, gene_id.

  • sample_annotation_path (str | Path, optional) – Destination path for the sample annotation file. Columns: sample_id, cell_type, replicate.

  • sep (str | None, optional) – Column separator for all output files. When None, the separator is inferred from each file extension via detect_separator_from_extension() (.tsv → tab, .csv → comma).

  • fill_na (float | int | None, optional) – If not None, replace NaN values in the long-format intensities DataFrame with this value before saving.

  • force (bool, optional) – If True, overwrite existing files at the output paths. Otherwise, raise FileExistsError when a destination file already exists.

Returns:

Writes files to disk; does not return a value.

Return type:

None

Examples

>>> import proteopy as pr
>>> pr.download.karayel_2020(
...     intensities_path="intensities.tsv",
...     var_annotation_path="protein_annotations.tsv",
...     sample_annotation_path="sample_annotations.tsv",
... )

References