T/BCR-seq-analysis
December 1, 2024 · View on GitHub
T/B cell receptor sequencing analysis notes
Please check awesome vdj too!
tutorials
- review Adaptive immune receptor repertoire analysis
- Single-cell immune repertoire analysis | Nature Methods
- A clonotype nomenclature for T cell receptors
- T Cell Clonal Analysis Using Single-cell RNA Sequencing and Reference Maps
- biostar post on integration scTCR with Seurat
- https://repseq-tutorial.readthedocs.io/en/latest/prerequisites.html
- Welcome to the Immcantation Portal Use the docker version of Immcantation if you have installation problems. 10x scBCR tutorial using Immcantation https://immcantation.readthedocs.io/en/stable/tutorials/10x_tutorial.html
- scirpy "getting started" tutorial and case study reanalysing 140k T-cells from Wu et al. (2020).
- Tutorial: a statistical genetics guide to identifying HLA alleles driving complex disease
papers
- Can we predict T cell specificity with digital biology and machine learning?
- review High-throughput and single-cell T cell receptor sequencing technologies
- Disease diagnostics using machine learning of immune receptors
- Rep-Seq: uncovering the immunological repertoire through next-generation sequencing
- Single Cell T Cell Receptor Sequencing: Techniques and Future Challenges
- T-cell repertoire analysis and metrics of diversity and clonality
- TCR-Vγδ usage distinguishes protumor from antitumor intestinal γδ T cell subsets
- De novo prediction of cancer-associated T cell receptors for noninvasive cancer detection
- TCR-engineered T cell therapy in solid tumors: State of the art and perspectives
simulation
Echidna: Integrated simulations of single-cell immune receptor repertoires and transcriptomes
Tools
"Cool! I would start with immunarch, VDJTools, and the new scRepertoire package" -- Wʏᴀᴛᴛ MᴄDᴏɴɴᴇʟʟ from 10x genomcis
-
dandelion python package for analyzing single cell BCR/TCR data from 10x Genomics 5’ solution!
-
TRUST4 developed in Shirley Liu's group. Use it to extract TCR/BCR information from bulk RNAseq or 5' scRNAseq data.
-
Benchmarking computational methods for B-cell receptor reconstruction from single-cell RNA-seq data
-
We are happy to report a dramatic speedup for one of the core computations for adaptive immune receptor repertoire (AIRR) analysis - the discovery and counting of receptors that overlap between repertoires! Check out our CompAIRR. With repertoires of sequences each, CompAIRR ran in 17 minutes while the fastest existing tool took 10 days, amounting to a ~1000x speedup
-
ClusTCR: a Python interface for rapid clustering of large sets of CDR3 sequences with unknown antigen specificity
-
Pyrepseq: the immune repertoire analysis toolkit has a function to cluster TCR superfast: Lightning-fast adaptive immune receptor similarity search by symmetric deletion lookup clustering TCR sequences in one second? Or TCRs in 10 seconds
-
tcrdist3 is a python API-enabled toolkit for analyzing T-cell receptor repertoires
-
TCRex: a web tool for the prediction of TCR–epitope recognition
-
ImRex TCR-epitope recognition prediction using combined sequence input represention for convolutional neural networks.
-
NetTCR - 2.0 Sequence-based prediction of peptide-TCR binding
-
GraphMHC: neoantigen prediction model applying the graph neural network to molecular structure" A hybrid graph attention network + CNN to predict peptides that bind MHC proteins
-
enclone from 10x. we should give this a try if we want to cluster TCR and BCR clonotypes.
-
migec:A RepSeq processing swiss-knife.
-
MiXCR is a universal software for fast and accurate analysis of T- and B- cell receptor repertoire sequencing data.
-
tcR: an R package for T cell receptor repertoire advanced data analysis
-
ImReP is a computational method for rapid and accurate profiling of the adaptive immune repertoire from regular RNA-Seq data.
-
Grouping of Lymphocyte Interactions by Paratope Hotspots paper: https://www.nature.com/nature/journal/v547/n7661/full/nature22976.html
-
TcellMatch: Predicting T-cell to epitope specificity. cellMatch is a collection of models to predict antigen specificity of single T cells based on CDR3 sequences and other single cell modalities, such as RNA counts and surface protein counts
-
scirpy: A scanpy extension for single-cell TCR analysis.
-
Tessa is a Bayesian model to integrate T cell receptor (TCR) sequence profiling with transcriptomes of T cells. Enabled by the recently developed single cell sequencing techniques, which provide both TCR sequences and RNA sequences of each T cell concurrently, Tessa maps the functional landscape of the TCR repertoire, and generates insights into understanding human immune response to diseases.
-
DeepTCR Deep Learning Methods for Parsing T-Cell Receptor Sequencing (TCRSeq) Data https://twitter.com/John_Will_I_Am/status/1570837756787691527 https://www.science.org/doi/10.1126/sciadv.abq5089
-
T1K Efficient and accurate KIR and HLA genotyping with massively parallel sequencing data
-
Full resolution HLA and KIR genes annotation for human genome assemblies
-
Design of high specificity binders for peptide-MHC-I complexes https://www.biorxiv.org/content/10.1101/2024.11.28.625793v1
-
Why do TCR analysis tools (tcrdist, nettcr, etc) rely on substitution matrices made for evolution, like blosum? And could we improve them with a dedicated substitution matrix tcrBLOSUM: an amino acid substitution matrix for sensitive alignment of distant epitope-specific TCRs https://academic.oup.com/bib/article/26/1/bbae602/7906917?login=false
machine learning
-
Structure-based prediction of T cell receptor recognition of unseen epitopes using TCRen https://www.nature.com/articles/s43588-024-00653-0
-
TAPIR: a T-cell receptor language model for predicting rare and novel targets
-
STAPLER: Efficient learning of TCR-peptide specificity prediction from full-length TCR-peptide data
-
Structure-based prediction of T cell receptor:peptide-MHC interactions Preprint from Philip Bradley where he creates a version of AlphaFold to model TCR:peptide-MHC interactions. Benchmark is far from perfect, but the paper shows that deep learning-based structural modelling is a possible strategy to predict TCR specificity.
-
Uni-Fold: an open-source platform for developing protein models beyond AlphaFold. https://github.com/dptech-corp/Uni-Fold
-
Equidock: docking protein receptor and ligand https://github.com/octavian-ganea/equidock_public news https://news.mit.edu/2022/ai-predicts-protein-docking-0201
-
AlphaFill: enriching AlphaFold models with ligands and cofactors
-
DeepMind AlphaFold for antibody discovery: What's the status?
-
Why AlphaFold won’t revolutionise drug discovery We made AlphaFold dream of new protein assemblies, used #ProteinMPNN to bring it back to reality. https://twitter.com/BasileWicky/status/1570564831522213888
-
Here, we introduce OmegaFold, the first computational method to successfully predict high-resolution protein structure from a single primary sequence alone. Using a new combination of a protein language model that allows us to make predictions from single sequences and a geometry-inspired transformer model trained on protein structures, OmegaFold outperforms RoseTTAFold and achieves similar prediction accuracy to AlphaFold2 on recently released structures.
-
Learning inverse folding from millions of predicted structures https://twitter.com/alexrives/status/1513603415959556096
-
[PSP: Million-level Protein Sequence Dataset for Protein Structure Prediction}(https://arxiv.org/abs/2206.12240)
-
Fast, accurate ranking of engineered proteins by receptor binding propensity using structural modeling https://twitter.com/DingXiaozhe/status/1618257727515676672
database
- Stitchr: stitching coding TCR nucleotide sequences from V/J/CDR3 information
- The IPD-IMGT/HLA Database provides a specialist database for sequences of the human major histocompatibility complex (MHC) and includes the official sequences named by the WHO Nomenclature Committee For Factors of the HLA System. The IPD-IMGT/HLA Database is part of the international ImMunoGeneTics project (IMGT).
- hlabud provides methods to retrieve sequence alignment data from IMGTHLA and convert the data into convenient R matrices ready for downstream analysis.
- 7 million pairs! A great resource for TCR-antigen interaction.TRAIT: A Comprehensive Database for T-cell Receptor-Antigen Interactions https://www.biorxiv.org/content/10.1101/2024.11.20.624436v1
- TCRdb A comprehensive database of human T-cell receptor (TCR) sequences
- Immuno-Navigator A database for gene coexpression in the immune system
- McPAS-TCR A manually curated catalogue of pathology associated T-cell receptor sequences
- Vdjdb
- @OPIGlets has built lots of lovely stuff including SAbPred, OAS, TAP http://opig.stats.ox.ac.uk/resources
- Observed TCR space. 5.33M redundant/1.63M non-redundant alpha/beta TCR sequences deriving from 50 separate studies. https://opig.stats.ox.ac.uk/webapps/ots/
- The Observed Antibody Space database (OAS) https://opig.stats.ox.ac.uk/webapps/oas/documentation/
- SAbDab is a database containing all the antibody structures available in the PDB, annotated and presented in a consistent fashion. https://opig.stats.ox.ac.uk/webapps/sabdab-sabpred/sabdab
- A science gateway that enables the discovery, analysis, and download of AIRR-seq data (antibody/B-cell and T-cell receptor repertoires) from the 10 remote repositories in the AIRR Data Commons (ADC) https://gateway.ireceptor.org/login