Post-Quantification Workflows
March 28, 2026 · View on GitHub
Step 1: Build the Analysis Database (majec_build_db)
Consolidate output files from one or more pipeline runs into a single, queryable SQLite database. This becomes the input for all downstream tools.
majec_build_db \
--run_manifests chunk1_run_manifest.json chunk2_run_manifest.json \
--metadata_file sample_metadata.tsv \
--output_db my_project.db \
--force
| Flag | Description |
|---|---|
--run_manifests | One or more _run_manifest.json files from majec_run_pipeline. Automatically merged and validated. |
--metadata_file | Tab-separated sample metadata (recommended). Enables downstream statistical comparisons. |
--output_db | Path for the SQLite database file. |
--force | Overwrite an existing database. |
Metadata File Format
#experimental_variables: treatment, cell_line
#id_variables: sample_id, rep
sample_id cell_line treatment rep batch
MCF7_Control_R1 MCF7 Control 1 Day1
MCF7_DrugA_R1 MCF7 DrugA 1 Day1
- Header lines starting with
#define variable types for downstream tools. experimental_variables: factors for comparison (e.g., in DESeq2).id_variables: sample identifiers and replicates.sample_idmust match sample names from the pipeline run.- Include batch columns if applicable — enables batch correction via
--batch_columninmajec_prepare_deseq2.
Step 2: DESeq2 Analysis (majec_prepare_deseq2)
Generates a ready-to-run DESeq2 analysis package from the database.
majec_prepare_deseq2 \
--db my_project.db \
--comparisons_file comparisons.txt \
--output_dir deseq2_analysis \
--level gene \
--confidence_mode variance
Defining Comparisons
Each line in the comparisons file defines one contrast:
# Format: Name; Case_Criteria; Control_Criteria
MCF7_DrugA_vs_Ctrl; cell_line=MCF7;treatment=DrugA; cell_line=MCF7;treatment=Control
All_Drugs_vs_Ctrl; treatment=DrugA,DrugB; treatment=Control
Criteria use key=value syntax. Multiple values for one key are comma-separated; multiple keys are semicolon-separated. You can also select specific samples with sample_id=Sample_A01,Sample_A04.
Cohort
The --cohort_str or --cohort_file flag defines the superset of samples included in the DESeq2 model. All samples in your comparisons are automatically included; the cohort flag adds extra samples to serve as background for variance estimation. If unspecified, the cohort is just the samples in your comparisons.
Analysis Levels
--level | Description |
|---|---|
gene | Standard gene-level differential expression. |
transcript | Transcript-level differential expression. |
junction | Junction-level counts. |
delta_psi | Differential splicing — generates a normalized junction usage matrix and tests for changes in relative splice site utilization between conditions. |
Confidence Modes
--confidence_mode | Description |
|---|---|
none | Standard DESeq2 on raw counts. |
append | Standard DESeq2, then annotate results with MAJEC confidence metrics (distinguishability, discord, evidence fractions). Recommended for exploration and filtering. |
variance | Incorporate reliability scores directly into the DESeq2 model by inflating variance for low-confidence features. Down-weights ambiguous genes/transcripts during statistical testing. |
Output
The command creates a directory containing:
_counts_matrix.tsv— count matrix_coldata.tsv— sample metadata for DESeq2_variance_matrix.tsv— variance inflation factors (if--confidence_mode variance)_run_deseq2.R— ready-to-run R script
Run with Rscript _run_deseq2.R.
Step 3: Visualization (majec_visualize)
Generate interactive, multi-panel HTML reports for individual genes.
majec_visualize \
--db my_project.db \
--gene MYH9 \
--group_A_str "cell_line=MCF7;treatment=DrugA" \
--group_B_str "cell_line=MCF7;treatment=Control" \
--output_dir myh9_report \
--export_excel \
--save_svg
Report Contents
- Junction arc plot: all isoforms with arc thickness representing junction usage per condition.
- Differential junction heatmap: delta PSI for every junction across every isoform.
- Subset analysis plots: junction and exonic territory evidence used to resolve subset/superset ambiguity.
- Per-sample penalty heatmaps: raw counts, applied penalties, and final corrected counts per isoform per sample. Combined into a single HTML with keyboard navigation.
- Excel export (
--export_excel): companion spreadsheet with all raw and aggregated data.