sdrf-annotated-datasets
April 29, 2026 · View on GitHub
Community SDRF annotations for public proteomics datasets (ProteomeXchange and related accessions).
The SDRF specification lives in bigbio/proteomics-sample-metadata.
License: Apache 2.0 · Contributing: CONTRIBUTING.md · Agent context: llms.txt, AGENTS.md
Key links
| Resource | URL |
|---|---|
| Specification | https://github.com/bigbio/proteomics-sample-metadata/blob/master/sdrf-proteomics/README.adoc |
| Public site | https://sdrf.quantms.org/ |
| Templates | https://github.com/bigbio/sdrf-templates |
Validator CLI (parse_sdrf) | https://github.com/bigbio/sdrf-pipelines |
| Agentic toolkit | https://github.com/bigbio/sdrf-skills |
Dataset layout
Files follow the pattern datasets/{ACCESSION}/{ACCESSION}.sdrf.tsv:
datasets/PXD000070/PXD000070.sdrf.tsv
datasets/MSV000078494/MSV000078494.sdrf.tsv
Additional .sdrf.tsv files may appear in the same folder when a project requires split designs.
Sandbox
Work-in-progress annotations live under sandbox/.
Move a folder to datasets/ and open a PR once it passes parse_sdrf validate-sdrf.
CI only validates datasets/; sandbox/ is exempt so drafts don't block merges.
Contributing
Open a pull request to add or improve annotated SDRF files. See CONTRIBUTING.md for layout rules and review etiquette.
Agent-assisted annotation
Use sdrf-skills as the primary toolkit. Key rules:
- Anchor every row in public evidence (PX page, submitted metadata, publication). Don't invent sample names or file names.
- Keep PRs small — one accession or a closely related batch.
- Run validation locally (
parse_sdrf validate-sdrf) before opening a PR. - Declare assistance in the PR description so reviewers can calibrate review depth.
For agent-specific instructions see AGENTS.md.
CI validation
GitHub Actions runs parse_sdrf validate-sdrf on every PR and push touching datasets/**.
The validator is installed from bigbio/sdrf-pipelines main branch.
Re-run all checks manually via workflow_dispatch in the Actions tab.
How to cite
- Dai C, Füllgrabe A, Pfeuffer J, Solovyeva EM, Deng J, Moreno P, Kamatchinathan S, Kundu DJ, George N, Fexova S, Grüning B, Föll MC, Griss J, Vaudel M, Audain E, Locard-Paulet M, Turewicz M, Eisenacher M, Uszkoreit J, Van Den Bossche T, Schwämmle V, Webel H, Schulze S, Bouyssié D, Jayaram S, Duggineni VK, Samaras P, Wilhelm M, Choi M, Wang M, Kohlbacher O, Brazma A, Papatheodorou I, Bandeira N, Deutsch EW, Vizcaíno JA, Bai M, Sachsenberg T, Levitsky LI, Perez-Riverol Y. A proteomics sample metadata representation for multiomics integration and big data analysis. Nat Commun. 2021 Oct 6;12(1):5854. doi: 10.1038/s41467-021-26111-3. PMID: 34615866; PMCID: PMC8494749. Manuscript
- Perez-Riverol, Yasset, European Bioinformatics Community for Mass Spectrometry. "Towards a sample metadata standard in public proteomics repositories." Journal of Proteome Research (2020) Manuscript.