3_Inputs.md

April 10, 2024 · View on GitHub

This section covers the inputs files needed by Cactus. There are 3 kind of inputs files:

  • Data (.fastq.gz files): raw sequencing output files
  • Design (.tsv files): to indicate the design of the experiment; that is how fastq files relate to samples and conditions, and comparisons to perform and groups of comparisons to plot together in the heatmaps
  • Parameters (.yml file): to indicate the parameters to use for the current analysis run. This is the only needed input file for a cactus call.

Here is an example of input files at the run directory folder:

.
├── parameters
│   └── run.yml
├── data
│   ├── atac
│   │   ├── sample_1000K_reads_atac_SRX2333004_SRR5000684_R1.fastq.gz
│   │   ├── sample_1000K_reads_atac_SRX2333004_SRR5000684_R2.fastq.gz
│   │   ├── sample_1000K_reads_atac_SRX3029124_SRR5860424_R1.fastq.gz
│   │   ├── ...
│   └── mrna
│       ├── sample_50K_reads_mrna_SRX3029112_SRR5860412.fastq.gz
│       ├── sample_50K_reads_mrna_SRX3029113_SRR5860413.fastq.gz
│       ├── ...
└── design
    ├── atac_fastq.tsv
    ├── comparisons.tsv
    ├── groups.tsv
    ├── mrna_fastq.tsv
    └── regions_to_remove.tsv
    └── genes_to_remove.tsv

Note: There is an additional .cactus.config file that is located in the root folder and that indicates the global configuration of cactus for all runs of the user.

Note: Directory structure can be changed arbitrarily as files path are specified in the .yml input file.

Note: The Cactus run will create two additional directories: the results directory and the work directory (a temporary directory created by Nextflow).