Cell Dynamics Pipeline Implementation
December 8, 2023 ยท View on GitHub
This pipeline implements our TrajectoryNet pipeline for discovering gene regulatory structure from single cell dynamics in continuous time and space. We first learn likely individual cell trajectories in a maximum likelihood fashion. We then learn a graph between genes using a granger causality based score to control the edge weights. We reference this graph and known regulatory interactions to discovery likely regulatory relationships in the mesenchymal to epithelial transition.
In brief, TrajectoryNet is a Continuous Normalizing Flow model which can perform dynamic optimal transport using energy regularization and / or a combination of velocity, density, and growth regularizations to better match cellular trajectories.
Our setting is similar to that of WaddingtonOT <https://broadinstitute.github.io/wot/>_. In that we have access to a bunch of
population measurements of cells over time and would like to model the dynamics
of cells over that time period. TrajectoryNet is trained end-to-end and is
continuous both in gene space and in time.
Installation
TrajectoryNet is available in pypi. Install by running the following
.. code-block:: bash
pip install -r requirements.txt
This code was tested with python 3.7 and 3.8.
Basic Usage
Run with
.. code-block:: bash
python -m TrajectoryNet.main --dataset SCURVE
To run TrajectoryNet on a small S Curve example. To use a
custom dataset expose the coordinates and timepoint information according
to the example jupyter notebooks in the /notebooks/ folder.
If you have an AnnData <https://anndata.readthedocs.io>_ object then take a look at
notebooks/Example_Anndata_to_TrajectoryNet.ipynb <https://github.com/KrishnaswamyLab/Cell-Dynamics-Pipeline/blob/master/notebooks/Example_Anndata_to_TrajectoryNet.ipynb>,
which shows how to load one of the example scvelo <https://scvelo.readthedocs.io> anndata objects into
TrajectoryNet. Alternatively you can use the custom (compressed) format for
TrajectoryNet as described below. Running this code can take on the order of a day for a computer with a single GPU.
For this format TrajectoryNet requires the following:
- An embedding matrix titled
[embedding_name](Cells x Dimensions) - A sample labels array titled
sample_labels(Cells)
To run TrajectoryNet with a custom dataset use:
.. code-block:: bash
python -m TrajectoryNet.main --dataset [PATH_TO_NPZ_FILE] --embedding_name [EMBEDDING_NAME]
python -m TrajectoryNet.eval --dataset [PATH_TO_NPZ_FILE] --embedding_name [EMBEDDING_NAME]
See notebooks/EB-Eval.ipynb for an example on how to use TrajectoryNet on
a PCA embedding to get trajectories in the gene space.
TrajectoryNet save backwards_trajectories.npy which is a numpy file of trajectories. This file can be loaded and further processed in the Cell Dynamics Pipeline to learn Granger causal gene structures. This step can be found in notebooks/granger-analysis.ipynb <https://github.com/KrishnaswamyLab/Cell-Dynamics-Pipeline/blob/master/notebooks/granger-analysis.ipynb>_. This takes the data and learned trajectories and produces gene x gene graphs using the Total Granger Causal Score (TGCS).
Data
All processed data can be found in the release <https://github.com/KrishnaswamyLab/Cell-Dynamics-Pipeline/releases/tag/v0.0.1>_.