Post-Quantification Workflows

March 28, 2026 · View on GitHub

Step 1: Build the Analysis Database (majec_build_db)

Consolidate output files from one or more pipeline runs into a single, queryable SQLite database. This becomes the input for all downstream tools.

majec_build_db \
    --run_manifests chunk1_run_manifest.json chunk2_run_manifest.json \
    --metadata_file sample_metadata.tsv \
    --output_db my_project.db \
    --force
FlagDescription
--run_manifestsOne or more _run_manifest.json files from majec_run_pipeline. Automatically merged and validated.
--metadata_fileTab-separated sample metadata (recommended). Enables downstream statistical comparisons.
--output_dbPath for the SQLite database file.
--forceOverwrite an existing database.

Metadata File Format

#experimental_variables: treatment, cell_line
#id_variables: sample_id, rep
sample_id	cell_line	treatment	rep	batch
MCF7_Control_R1	MCF7	Control	1	Day1
MCF7_DrugA_R1	MCF7	DrugA	1	Day1
  • Header lines starting with # define variable types for downstream tools.
  • experimental_variables: factors for comparison (e.g., in DESeq2).
  • id_variables: sample identifiers and replicates.
  • sample_id must match sample names from the pipeline run.
  • Include batch columns if applicable — enables batch correction via --batch_column in majec_prepare_deseq2.

Step 2: DESeq2 Analysis (majec_prepare_deseq2)

Generates a ready-to-run DESeq2 analysis package from the database.

majec_prepare_deseq2 \
    --db my_project.db \
    --comparisons_file comparisons.txt \
    --output_dir deseq2_analysis \
    --level gene \
    --confidence_mode variance

Defining Comparisons

Each line in the comparisons file defines one contrast:

# Format: Name; Case_Criteria; Control_Criteria
MCF7_DrugA_vs_Ctrl; cell_line=MCF7;treatment=DrugA; cell_line=MCF7;treatment=Control
All_Drugs_vs_Ctrl; treatment=DrugA,DrugB; treatment=Control

Criteria use key=value syntax. Multiple values for one key are comma-separated; multiple keys are semicolon-separated. You can also select specific samples with sample_id=Sample_A01,Sample_A04.

Cohort

The --cohort_str or --cohort_file flag defines the superset of samples included in the DESeq2 model. All samples in your comparisons are automatically included; the cohort flag adds extra samples to serve as background for variance estimation. If unspecified, the cohort is just the samples in your comparisons.

Analysis Levels

--levelDescription
geneStandard gene-level differential expression.
transcriptTranscript-level differential expression.
junctionJunction-level counts.
delta_psiDifferential splicing — generates a normalized junction usage matrix and tests for changes in relative splice site utilization between conditions.

Confidence Modes

--confidence_modeDescription
noneStandard DESeq2 on raw counts.
appendStandard DESeq2, then annotate results with MAJEC confidence metrics (distinguishability, discord, evidence fractions). Recommended for exploration and filtering.
varianceIncorporate reliability scores directly into the DESeq2 model by inflating variance for low-confidence features. Down-weights ambiguous genes/transcripts during statistical testing.

Output

The command creates a directory containing:

  • _counts_matrix.tsv — count matrix
  • _coldata.tsv — sample metadata for DESeq2
  • _variance_matrix.tsv — variance inflation factors (if --confidence_mode variance)
  • _run_deseq2.R — ready-to-run R script

Run with Rscript _run_deseq2.R.


Step 3: Visualization (majec_visualize)

Generate interactive, multi-panel HTML reports for individual genes.

majec_visualize \
    --db my_project.db \
    --gene MYH9 \
    --group_A_str "cell_line=MCF7;treatment=DrugA" \
    --group_B_str "cell_line=MCF7;treatment=Control" \
    --output_dir myh9_report \
    --export_excel \
    --save_svg

Report Contents

  • Junction arc plot: all isoforms with arc thickness representing junction usage per condition.
  • Differential junction heatmap: delta PSI for every junction across every isoform.
  • Subset analysis plots: junction and exonic territory evidence used to resolve subset/superset ambiguity.
  • Per-sample penalty heatmaps: raw counts, applied penalties, and final corrected counts per isoform per sample. Combined into a single HTML with keyboard navigation.
  • Excel export (--export_excel): companion spreadsheet with all raw and aggregated data.