3_Inputs.md
April 10, 2024 · View on GitHub
- Introduction: Quick Start, Tutorial, Flowchart, Outputs structure
- Install: Dependencies, Containers, References, Test datasets
- Inputs: Data, Design, Parameters
- 1. Preprocessing: ATAC reads, ATAC peaks, mRNA
- 2. Differential Analysis: ATAC, mRNA, Split
- 3. Enrichment Analysis: Enrichment, Figures, Tables
This section covers the inputs files needed by Cactus. There are 3 kind of inputs files:
- Data (.fastq.gz files): raw sequencing output files
- Design (.tsv files): to indicate the design of the experiment; that is how fastq files relate to samples and conditions, and comparisons to perform and groups of comparisons to plot together in the heatmaps
- Parameters (.yml file): to indicate the parameters to use for the current analysis run. This is the only needed input file for a cactus call.
Here is an example of input files at the run directory folder:
.
├── parameters
│ └── run.yml
├── data
│ ├── atac
│ │ ├── sample_1000K_reads_atac_SRX2333004_SRR5000684_R1.fastq.gz
│ │ ├── sample_1000K_reads_atac_SRX2333004_SRR5000684_R2.fastq.gz
│ │ ├── sample_1000K_reads_atac_SRX3029124_SRR5860424_R1.fastq.gz
│ │ ├── ...
│ └── mrna
│ ├── sample_50K_reads_mrna_SRX3029112_SRR5860412.fastq.gz
│ ├── sample_50K_reads_mrna_SRX3029113_SRR5860413.fastq.gz
│ ├── ...
└── design
├── atac_fastq.tsv
├── comparisons.tsv
├── groups.tsv
├── mrna_fastq.tsv
└── regions_to_remove.tsv
└── genes_to_remove.tsv
Note: There is an additional .cactus.config file that is located in the root folder and that indicates the global configuration of cactus for all runs of the user.
Note: Directory structure can be changed arbitrarily as files path are specified in the .yml input file.
Note: The Cactus run will create two additional directories: the results directory and the work directory (a temporary directory created by Nextflow).