Modkit

June 8, 2026 ยท View on GitHub

Oxford Nanopore Technologies logo

Modkit

A bioinformatics tool for working with modified bases from Oxford Nanopore. Specifically for quickly converting modBAM to bedMethyl files using best practices, but also manipulating modBAM files and generating summary statistics. Detailed documentation and quick-start can be found in the online documentation.

Installation

Pre-compiled binaries are provided for Linux from the release page. We recommend the use of these in most circumstances.

Building from source

The provided packages should be used where possible. We understand that some users may wish to compile the software from its source code. To build modkit from source cargo should be used.

git clone https://github.com/nanoporetech/modkit.git
cd modkit
cargo install --path modkit
# or
cargo install --git https://github.com/nanoporetech/modkit.git

macOS (Apple Silicon) with acceleration for open-chromatin predict

A script is provided to compile modkit on Apple Silicon Macs with Metal GPU (MPS) acceleration. This is only necessary if you plan to use modkit open-chromatin predict, otherwise a normal cargo installation is recommended. Download the script and run it with the desired installation directory, modkit version, and Python provider (system, conda, pyenv, or uv).

bash mac_compile_modkit.sh ~/tools

This installs all dependencies (Homebrew, Rust, PyTorch) and compiles modkit automatically. See the macOS installation guide for full details, Python version control options, and troubleshooting.

Usage

Modkit comprises a suite of tools for manipulating modified-base data stored in BAM files. Modified base information is stored in the MM and ML tags (see section 1.7 of the SAM tags specification). These tags are produced by contemporary basecallers of data from Oxford Nanopore Technologies sequencing platforms.

Constructing bedMethyl tables

A primary use of modkit is to create summary counts of modified and unmodified bases in an extended bedMethyl format. bedMethyl files tabulate the counts of base modifications from every sequencing read over each reference genomic position.

For best performance use the --modified-bases option with the base modifications you intend to analyze.

For example:

modkit pileup \
  path/to/reads.bam \
  path/to/output.bed.gz \
  --modified-bases 5mC 5hmC \
  --reference path/to/reference.fasta \
  --log path/to/log.txt \ # optional, recommended
  --bgzf \ # optional

A single file (described below) with base count summaries will be created. The final argument here specifies an optional log file output.

The program performs best-practices filtering and manipulation of the raw data stored in the input file. For further details see filtering modified-base calls.

For user convenience the counting process can be modulated using several additional transforms and filters. The most basic of these is to report only counts from reference CpG dinucleotides. This option requires a reference sequence in order to locate the CpGs in the reference:

modkit pileup path/to/reads.bam output/path/pileup.bed \
  --cpg \
  --modified-bases 5mC 5hmC \
  --ref path/to/reference.fasta

For more details on pileup and other commands, please see the online documentation.

For more information on the individual options see the Advanced Usage help document.

Advanced usage examples

For complete usage instructions please see the command-line help of the program or the Advanced usage help documentation. Some more commonly required examples are provided below. To combine multiple base modification calls into one, for example to combine basecalls for both 5hmC and 5mC into a count for "all cytosine modifications" (with code C) the --combine-mods option can be used:

modkit pileup \
  path/to/reads.bam \
  output/path/pileup.bed \
  --modified-bases 5mC 5hmC \
  --combine-mods \
  --ref path/to/reference.fasta \
  [--cpg] \  # optional
  [--combine-strands] \ # optional

If you have a modBAM with phased reads containing a HP tag. These can be partitioned into separate bedMethyl files on output by passing the --phased flag.

modkit pileup \
  path/to/reads.bam \
  output/directory/ \
  --cpg \
  --modified-bases 5mC 5hmC \
  --phased \
  --ref <reference.fasta>

The output will be 3 files: hp1.bedmethyl, hp2.bedmethyl, and combined.bedmethyl. hp1.bedmethyl and hp2.bedmethyl contain counts for records with HP=1 and HP=2 tags, respectively. combined.bedmethyl contains counts for all modBAM records.

Description of bedMethyl output

Below is a description of the bedMethyl columns generated by modkit pileup. A brief description of the bedMethyl specification can be found on Encode.

Definitions:

  • Nmod - Number of calls passing filters that were classified as a residue with a specified base modification.
  • Ncanonical - Number of calls passing filters were classified as the canonical base rather than modified. The exact base must be inferred by the modification code. For example, if the modification code is m (5mC) then the canonical base is cytosine. If the modification code is a, the canonical base is adenosine.
  • Nother mod - Number of calls passing filters that were classified as modified, but where the modification is different from the listed base (and the corresponding canonical base is equal). For example, for a given cytosine there may be 3 reads with h calls, 1 with a canonical call, and 2 with m calls. In the bedMethyl row for h Nother_mod would be 2. In the m row Nother_mod would be 3.
  • Nvalid_cov - the valid coverage. Nvalid_cov = Nmod + Nother_mod + Ncanonical, also used as the score in the bedMethyl
  • Ndiff - Number of reads with a base other than the canonical base for this modification. For example, in a row for h the canonical base is cytosine, if there are 2 reads with C->A substitutions, Ndiff will be 2.
  • Ndelete - Number of reads with a deletion at this reference position
  • Nfail - Number of calls where the probability of the call was below the threshold. The threshold can be set on the command line or computed from the data (usually failing the lowest 10th percentile of calls).
  • Nnocall - Number of reads aligned to this reference position, with the correct canonical base, but without a base modification call. This can happen, for example, if the model requires a CpG dinucleotide and the read has a CG->CH substitution such that no modification call was produced by the basecaller.

bedMethyl column descriptions

columnnamedescriptiontype
1chromname of reference sequence from BAM headerstr
2start position0-based start positionint
3end position0-based exclusive end positionint
4modified base codesingle letter code for modified basestr
5scoreEqual to Nvalid_cov.int
6strand'+' for positive strand '-' for negative strand, '.' when strands are combinedstr
7start positionincluded for compatibilityint
8end positionincluded for compatibilityint
9colorincluded for compatibility, always 255,0,0str
10Nvalid_covSee definitions above.int
11fraction modifiedNmod / Nvalid_covfloat
12NmodSee definitions above.int
13NcanonicalSee definitions above.int
14Nother_modSee definitions above.int
15NdeleteSee definitions above.int
16NfailSee definitions above.int
17NdiffSee definitions above.int
18NnocallSee definitions above.int

Description of columns in modkit summary:

Totals table

The lines of the totals table are prefixed with a # character.

rownamedescriptiontype
1basescomma-separated list of canonical bases with modification calls.str
2total_reads_usedtotal number of reads from which base modification calls were extractedint
3+count_reads_{base}total number of reads that contained base modifications for {base}int
4+filter_threshold_{base}filter threshold used for {base}float

Modification calls table

The modification calls table follows immediately after the totals table.

columnnamedescriptiontype
1basecanonical base with modification callchar
2codebase modification code, or - for canonicalchar
3pass_counttotal number of passing (confidence >= threshold) calls for the modification in column 2int
4pass_fracfraction of passing (>= threshold) calls for the modification in column 2float
5all_counttotal number of calls for the modification code in column 2int
6all_fracfraction of all calls for the modification in column 2float

Licence and Copyright

(c) 2023 Oxford Nanopore Technologies Plc.

Modkit is distributed under the terms of the Oxford Nanopore Technologies, Ltd. Public License, v. 1.0. If a copy of the License was not distributed with this file, You can obtain one at http://nanoporetech.com