=> see bash code below for the size of the tar files (to enter manually)

April 10, 2024 · View on GitHub

Introduction

Tests datasets have beed generated for each of the 4 species supported by Cactus. This allows to make sure the pipeline works equally well for each species. Downloading the smallest datasets (fly or worm) should be sufficient for users to get a sense on how the pipeline works. Here are the sizes of the test datasets:

specieraw_filessampled_filessampled_atacsampled_mrnatar
fly17 GB377 MB329 MB48 MB360 MB
worm68 GB1.4 GB1.3 GB41 MB1.3 GB
mouse52 GB6 GB5.9 GB142 MB5.7 GB
human118 GB9 GB8.8 GB190 MB8.5 GB

NOTE: Sampled mRNA-Seq datasets are similar between the 4 species, however, sampled ATAC-Seq datasets are much larger for human and mice than for worm and fly. This is due to large difference in genome sizes but not in transcriptome size between these species; as shown here:

speciesgenometranscriptome
fly100 MB53 MB
worm144 MB89 MB
human2731 MB158 MB
mouse3100 MB261 MB

Below is a table that shows the time needed to run each tests datasets on a local server with 47 CPUs and 300Gb of RAM:

SpeciesDurationCPU_hoursTasks
fly22m9.81496
worm11m4.2874
human2h 12m51.5981
mouse3h 27m77.5801

Downloading test datasets

The test datasets can be downloaded with this command:

nextflow run jsalignon/cactus/scripts/download/download.nf --test_datasets --species worm -r main -latest

The parameters for this command are:

  • --species: can be any of the 4 species supported by Cactus (worm, fly, mouse or human)
  • --threads can be set to determine the number of thread used by pigz for uncompressing the references archive files

NOTE: The test datasets contains 3 folders: data, design and parameters; as described in the Inputs section.

NOTE: A template script to run all test datasets with all tools manager can be found here.

Details on the test datasets origins and labels

Worm and human (GSE98758)

GEO Status: Public on Aug 29, 2018

GEO Title: Genome-wide DNA accessibility maps and differential gene expression using ChIP-seq, ATAC-seq and RNA-seq for the human secondary fibroblast cell line hiF-T and whole worms with and without knockdown of FACT complex

GEO Summary: To assess the mechanisms by which FACT depletion leads to increased sensitivity of cells to be reprogrammed, we measured the chromatin accessibility landscape using ATAC-seq following mock treatment, SSRP1 knockdown, or SUPT16H knockdown in human fibroblasts and mock, hmg-3 or hmg-4 knockdown in whole worms, and differential gene expression in hmg-3 knockout mutants or following mock, hmg-4, or spt-16 knockdown by RNAseq.

GEO Design: Examination of two FACT complex components in human cells and worms with ChIP-seq, ATAC-seq and RNA-seq

Citation: Kolundzic E, Ofenbauer A, Bulut SI, Uyar B et al. FACT Sets a Barrier for Cell Fate Reprogramming in Caenorhabditis elegans and Human Cells. Dev Cell 2018 Sep 10;46(5):611-626.e12. PMID: 30078731

Abstract: The chromatin regulator FACT (facilitates chromatin transcription) is essential for ensuring stable gene expression by promoting transcription. In a genetic screen using Caenorhabditis elegans, we identified that FACT maintains cell identities and acts as a barrier for transcription factor-mediated cell fate reprogramming. Strikingly, FACT’s role as a barrier to cell fate conversion is conserved in humans as we show that FACT depletion enhances reprogramming of fibroblasts. Such activity is unexpected because FACT is known as a positive regulator of gene expression, and previously described reprogramming barriers typically repress gene expression. While FACT depletion in human fibroblasts results in decreased expression of many genes, a number of FACT-occupied genes, including reprogramming-promoting factors, show increased expression upon FACT depletion, suggesting a repressive function of FACT. Our findings identify FACT as a cellular reprogramming barrier in C. elegans and humans, revealing an evolutionarily conserved mechanism for cell fate protection.

Homologs:

WormHumans
hmg-4, hmg-3SSRP1
spt-16SUPT16H

Worm samples:

sample_idlibrary_strategysample_titlesrr_idgsm_idlibrary_layout
inputATAC-seqinput_controlSRR5000684GSM2385318PAIRED
ctl_1RNA-Seqctrl_rep1SRR5860412GSM2715402SINGLE
ctl_2RNA-Seqctrl_rep2SRR5860413GSM2715403SINGLE
ctl_3RNA-Seqctrl_rep3SRR5860414GSM2715404SINGLE
hmg4_1RNA-Seqhmg4_rep1SRR5860415GSM2715405SINGLE
hmg4_2RNA-Seqhmg4_rep2SRR5860416GSM2715406SINGLE
hmg4_3RNA-Seqhmg4_rep3SRR5860417GSM2715407SINGLE
spt16_1RNA-Seqspt16_rep1SRR5860418GSM2715408SINGLE
spt16_2RNA-Seqspt16_rep2SRR5860419GSM2715409SINGLE
spt16_3RNA-Seqspt16_rep3SRR5860420GSM2715410SINGLE
ctl_1ATAC-seqRluc_rep1SRR5860424GSM2715414PAIRED
ctl_2ATAC-seqRluc_rep2SRR5860425GSM2715415PAIRED
ctl_3ATAC-seqRluc_rep3SRR5860426GSM2715416PAIRED
spt16_1ATAC-seqspt-16_rep1SRR5860430GSM2715420PAIRED
spt16_2ATAC-seqspt-16_rep2SRR5860431GSM2715421PAIRED
spt16_3ATAC-seqspt-16_rep3SRR5860432GSM2715422PAIRED
hmg4_1ATAC-seqhmg-4_rep1SRR5860433GSM2715423PAIRED
hmg4_2ATAC-seqhmg-4_rep2SRR5860434GSM2715424PAIRED
hmg4_3ATAC-seqhmg-4_rep3SRR5860435GSM2715425PAIRED

Human samples:

sample_idlibrary_strategysample_titlesrr_idgsm_idlibrary_layout
ctl_1ATAC-seqmock_ATAC-seq_rep1SRR5521292GSM2611319PAIRED
ctl_2ATAC-seqmock_ATAC-seq_rep2SRR5521293GSM2611320PAIRED
ssrp1_1ATAC-seqSSRP1_ATAC-seq_rep1SRR5521294GSM2611321PAIRED
ssrp1_2ATAC-seqSSRP1_ATAC-seq_rep2SRR5521295GSM2611322PAIRED
supt16h_1ATAC-seqSUPT16H_ATAC-seq_rep1SRR5521296GSM2611323PAIRED
supt16h_2ATAC-seqSUPT16H_ATAC-seq_rep2SRR5521297GSM2611324PAIRED
ctl_1RNA-Seqpri_mockTotal_A:_human_mock_RNA-seq_rep1SRR7101006GSM3127942PAIRED
ctl_2RNA-Seqpri_mockTotal_B:_human_mock_RNA-seq_rep2SRR7101007GSM3127943PAIRED
ctl_3RNA-Seqpri_mockTotal_C:_human_mock_RNA-seq_rep3SRR7101008GSM3127944PAIRED
ssrp1_1RNA-Seqpri_ssrp1Total_A:_human_Ssrp1_kd_RNA-seq_rep1SRR7101009GSM3127945PAIRED
ssrp1_2RNA-Seqpri_ssrp1Total_B:_human_Ssrp1_kd_RNA-seq_rep2SRR7101010GSM3127946PAIRED
ssrp1_3RNA-Seqpri_ssrp1Total_C:_human_Ssrp1_kd_RNA-seq_rep3SRR7101011GSM3127947PAIRED
supt16h_1RNA-Seqpri_supt16hTotal_A:_human_Supt16h_kd_RNA-seq_rep1SRR7101012GSM3127948PAIRED
supt16h_2RNA-Seqpri_supt16hTotal_B:_human_Supt16h_kd_RNA-seq_rep2SRR7101013GSM3127949PAIRED
supt16h_3RNA-Seqpri_supt16hTotal_C:_human_Supt16h_kd_RNA-seq_rep3SRR7101014GSM3127950PAIRED

Fly (GSE149339)

GEO Status: Public on May 10, 2020

GEO Title: Pioneer factor GAF cooperates with PBAP and NURF to regulate transcription

GEO Summary: The Drosophila pioneer factor GAF is known to be essential for RNA Pol II promoter-proximal pausing and the removal of nucleosomes from a set of target promoters with GAGAG motifs. We and others have speculated that GAF recruits the ISWI family ATP-dependent chromatin remodeling complex NURF, on the basis that NURF and GAF are both required to remodel nucleosomes on an hsp70 promoter in vitro and that GAF interacts physically with NURF. However, GAF was also recently shown to interact with PBAP, a SWI/SNF family remodeler. To test which of these remodeling complexes GAF works with, we depleted GAF, NURF301, BAP170, and NURF301+BAP170 in Drosophila S2 cells using RNAi. We used a combination of PRO-seq, ATAC-seq, 3'RNA-seq, and CUT&RUN to demonstrate that while GAF and PBAP synergistically open chromatin at target promoters which allows Pol II recruitment and pausing to proceed, GAF and NURF also synergistically position the +1 nucleosome to ensure efficient pause release and transition to productive elongation.

GEO Design: We treated two independent replicates of Drosophila S2 cells with dsRNA to LACZ (control), GAF, NURF301, BAP170 (the unique subunits of the NURF and PBAP complexes, respectively), and NURF301+BAP170. After 5 days, we harvested cells, validated knockdowns, and performed PRO-seq, ATAC-seq and 3'RNA-seq. We also performed CUT&RUN for both GAF and NURF301 in untreated S2 cells.

Citation: Judd, J., Duarte, F. M. & Lis, J. T. Pioneer-like factor GAF cooperates with PBAP (SWI/SNF) and NURF (ISWI) to regulate transcription. Genes Dev. 35, 147–156 (2021).

Abstract: Transcriptionally silent genes must be activated throughout development. This requires nucleosomes be removed from promoters and enhancers to allow transcription factor (TF) binding and recruitment of coactivators and RNA polymerase II (Pol II). Specialized pioneer TFs bind nucleosome-wrapped DNA to perform this chromatin opening by mechanisms that remain incompletely understood. Here, we show that GAGA factor (GAF), a Drosophila pioneer-like factor, functions with both SWI/SNF and ISWI family chromatin remodelers to allow recruitment of Pol II and entry to a promoter-proximal paused state, and also to promote Pol II's transition to productive elongation. We found that GAF interacts with PBAP (SWI/SNF) to open chromatin and allow Pol II to be recruited. Importantly, this activity is not dependent on NURF as previously proposed; however, GAF also synergizes with NURF downstream from this process to ensure efficient Pol II pause release and transition to productive elongation, apparently through its role in precisely positioning the +1 nucleosome. These results demonstrate how a single sequence-specific pioneer TF can synergize with remodelers to activate sets of genes. Furthermore, this behavior of remodelers is consistent with findings in yeast and mice, and likely represents general, conserved mechanisms found throughout eukarya.

Samples:

sample_idlibrary_strategysample_titlesrr_idgsm_idlibrary_layout
ctl_1ATAC-seqLACZ_ATACseq_Rep1SRR11607688GSM4498282PAIRED
ctl_1ATAC-seqLACZ_ATACseq_Rep1SRR11607689GSM4498282PAIRED
ctl_2ATAC-seqLACZ_ATACseq_Rep2SRR11607690GSM4498283PAIRED
ctl_2ATAC-seqLACZ_ATACseq_Rep2SRR11607691GSM4498283PAIRED
gaf_1ATAC-seqGAF_ATACseq_Rep1SRR11607692GSM4498284PAIRED
gaf_1ATAC-seqGAF_ATACseq_Rep1SRR11607693GSM4498284PAIRED
gaf_2ATAC-seqGAF_ATACseq_Rep2SRR11607675GSM4498285PAIRED
gaf_2ATAC-seqGAF_ATACseq_Rep2SRR11607694GSM4498285PAIRED
b170_1ATAC-seqBAP170_ATACseq_Rep1SRR11607676GSM4498286PAIRED
b170_1ATAC-seqBAP170_ATACseq_Rep1SRR11607677GSM4498286PAIRED
b170_2ATAC-seqBAP170_ATACseq_Rep2SRR11607678GSM4498287PAIRED
b170_2ATAC-seqBAP170_ATACseq_Rep2SRR11607679GSM4498287PAIRED
n301_1ATAC-seqNURF301_ATACseq_Rep1SRR11607680GSM4498288PAIRED
n301_1ATAC-seqNURF301_ATACseq_Rep1SRR11607681GSM4498288PAIRED
n301_2ATAC-seqNURF301_ATACseq_Rep2SRR11607682GSM4498289PAIRED
n301_2ATAC-seqNURF301_ATACseq_Rep2SRR11607683GSM4498289PAIRED
n301b170_1ATAC-seqNURF301BAP170_ATACseq_Rep1SRR11607684GSM4498290PAIRED
n301b170_1ATAC-seqNURF301BAP170_ATACseq_Rep1SRR11607685GSM4498290PAIRED
n301b170_2ATAC-seqNURF301BAP170_ATACseq_Rep2SRR11607686GSM4498291PAIRED
n301b170_2ATAC-seqNURF301BAP170_ATACseq_Rep2SRR11607687GSM4498291PAIRED
gaf_2RNA-SeqGAF_RNAseq_Rep2SRR11607698GSM4498295SINGLE
b170_1RNA-SeqBAP170_RNAseq_Rep1SRR11607699GSM4498296SINGLE
b170_2RNA-SeqBAP170_RNAseq_Rep2SRR11607700GSM4498297SINGLE
n301_1RNA-SeqNURF301_RNAseq_Rep1SRR11607701GSM4498298SINGLE
n301_2RNA-SeqNURF301_RNAseq_Rep2SRR11607702GSM4498299SINGLE
n301b170_1RNA-SeqNURF301BAP170_RNAseq_Rep1SRR11607703GSM4498300SINGLE
n301b170_2RNA-SeqNURF301BAP170_RNAseq_Rep2SRR11607704GSM4498301SINGLE
ctl_1RNA-SeqLACZ_RNAseq_Rep1SRR11607695GSM4498292SINGLE
ctl_2RNA-SeqLACZ_RNAseq_Rep2SRR11607696GSM4498293SINGLE
gaf_1RNA-SeqGAF_RNAseq_Rep1SRR11607697GSM4498294SINGLE

Mouse (GSE193393)

GEO Status: Public on Jun 23, 2022

GEO Title: PHF20 Activates Autophagy Genes through Enhancer Activation via H3K36me2 Binding Activity

GEO Summary: Autophagy is a catabolic pathway that maintains cellular homeostasis under various stress conditions, including nutrient-deprived conditions. To elevate autophagic flux to a sufficient level under stress conditions, transcriptional activation of autophagy genes occurs to replenish autophagy components. Here, using combination of RNA-seq, ATAC-seq and ChIP-seq, we demonstrated found that plant homeodomain finger protein 20 (Phf20PHF20), which is an epigenetic reader possessing methyl binding activity, plays a key role in controlling the expression of autophagy genes. PHF20 activates autophagy genes through enhancer activation via H3K36me2 binding activity as an epigenetic reader and that our findings emphasize the importance of nuclear regulation of autophagy.

GEO Design: mRNA-seq, ATAC-seq and ChIP-seq experiments under normal and 24hrs of glucose starvation condition in WT and Phf20-/- MEFs

Citation: Park SW, Kim J, Oh S, Lee J et al. PHF20 is crucial for epigenetic control of starvation-induced autophagy through enhancer activation. Nucleic Acids Res 2022 Aug 12;50(14):7856-7872. PMID: 35821310

Samples:

sample_idlibrary_strategysample_titlesrr_idgsm_idlibrary_layout
Wt_1ATAC-seqWT_control_ATAC-seq_rep1SRR17483668GSM5776726PAIRED
Wt_2ATAC-seqWT_control_ATAC-seq_rep2SRR17483667GSM5776727PAIRED
WtStarv_1ATAC-seqWT_GlcStarv_ATAC-seq_rep1SRR17483666GSM5776728PAIRED
WtStarv_2ATAC-seqWT_GlcStarv_ATAC-seq_rep2SRR17483665GSM5776729PAIRED
Phf_1ATAC-seqPhf20-/-_control_ATAC-seq_rep1SRR17483664GSM5776730PAIRED
Phf_2ATAC-seqPhf20-/-_control_ATAC-seq_rep2SRR17483663GSM5776731PAIRED
PhfStarv_1ATAC-seqPhf20-/-_GlcStarv_ATAC-seq_rep1SRR17483662GSM5776732PAIRED
PhfStarv_2ATAC-seqPhf20-/-_GlcStarv_ATAC-seq_rep2SRR17483661GSM5776733PAIRED
Wt_1RNA-SeqWT_control_RNA-seq_rep1SRR17535397GSM5799507PAIRED
Wt_2RNA-SeqWT_control_RNA-seq_rep2SRR17535396GSM5799508PAIRED
WtStarv_1RNA-SeqWT_GlcStarv_RNA-seq_rep1SRR17535395GSM5799509PAIRED
WtStarv_2RNA-SeqWT_GlcStarv_RNA-seq_rep2SRR17535394GSM5799510PAIRED
Phf_1RNA-SeqPhf20-/-_control_RNA-seq_rep1SRR17535392GSM5799511PAIRED
Phf_2RNA-SeqPhf20-/-_control_RNA-seq_rep2SRR17535391GSM5799512PAIRED
PhfStarv_1RNA-SeqPhf20-/-_GlcStarv_RNA-seq_rep1SRR17535390GSM5799513PAIRED
PhfStarv_2RNA-SeqPhf20-/-_GlcStarv_RNA-seq_rep2SRR17535393GSM5799514PAIRED