K2Rmini: filter a set of reads using k-mers
May 13, 2026 · View on GitHub
K2Rmini (or K-mer to Reads mini) is a tool to filter the reads contained in a FASTA/Q file based on a set of k-mers of interest.
Under the hood, it uses simd-minimizers to quickly prefilter reads based on their minimizers, and filters the remaining candidates using the k-mer set. On an Apple M1, K2Rmini is able to filter long reads at ~2 Gbp/s.
Installation
If you have not installed Rust yet, please visit rustup.rs to install it.
git clone https://github.com/Malfoy/K2Rmini.git
cd K2Rmini
RUSTFLAGS="-C target-cpu=native" cargo install --path .
This will compile the K2Rmini and k2rminimulti binaries and add them to your path.
Usage
Usage: K2Rmini [OPTIONS] -p <PATTERNS> <FILE>
Arguments:
<FILE> FASTA/Q file to filter (possibly compressed)
Options:
-p <PATTERNS> FASTA/Q file containing k-mers of interest (possibly compressed)
-t, --threshold <THRESHOLD> K-mer threshold, either relative (float) or absolute (int) [default: 0.5]
-o <OUTPUT> Output file for filtered sequences [default: stdout]
-k <K> K-mer size [default: 31]
-m <M> Minimizer size, must be ≤ k, up to 29 [default: 21]
-T, --threads <THREADS> Number of threads [default: all]
-h, --help Print help
-V, --version Print version
K2Rmini has 3 main arguments:
- a FASTA/Q file containing the sequences that you want to filter, this file can be compressed using
gzip/xz/zstd - a FASTA/Q file (flagged with
-p) containing the k-mers of interest used for filtering: sequences containing enough of these k-mers will be outputed, while others will be discarded - a selection threshold (flagged with
-t): a sequence is discarded if its number of desired k-mers is below this threshold, the threshold can be relative (e.g. at least 90% of desired k-mers) or absolute (e.g. at least 2 desired k-mers)
It also provides options to write the output to a file (-o), set the k-mer size (-k) or set the number of threads (-T).
You shouldn't need to change the minimizer size (-m), excepted if k is smaller than 25.
Example: selecting reads with ≥90% of desired k-mers
Let's say we want to filter the reads in reads.fa to only keep those that share at least 90% of their k-mers with the reference in reference.fa, this can be achieved with:
K2Rmini -p reference.fa -t 0.9 reads.fa
Example: selecting reads with ≥2 desired k-mers
Let's say this time we have a list of k-mers of size 63 stored in patterns.fa and we want to select the reads in reads.fa that contain at least two of them, this can be achieved with:
K2Rmini -p patterns.fa -k 63 -t 2 reads.fa
K2Rminimulti
k2rminimulti filters reads using several query files at once. It is useful when a read must share enough k-mers with several independent query sets, for example at least X k-mers with Q1.fa and at least Y k-mers with Q2.fa.
A read is kept only if it satisfies every constraint. In other words, constraints are combined with AND semantics.
Usage: k2rminimulti [OPTIONS] --constraint <PATTERNS> <THRESHOLD> <FILE>
Arguments:
<FILE> FASTA/Q file to filter (possibly compressed)
Options:
-c, --constraint <PATTERNS> <THRESHOLD> FASTA/Q file containing k-mers of interest and its threshold; may be repeated
-o <OUTPUT> Output file for filtered sequences [default: stdout]
-k <K> K-mer size [default: 31]
-m <M> Minimizer size, must be <= k, up to 29 [default: 21]
-T, --threads <THREADS> Number of threads [default: all]
-h, --help Print help
-V, --version Print version
k2rminimulti has 2 main arguments:
- a FASTA/Q file containing the sequences that you want to filter, this file can be compressed using
gzip/xz/zstd - one or more constraints, each flagged with
-c/--constraint, made of a FASTA/Q query file and the threshold associated with that query
The threshold syntax is the same as K2Rmini: an integer is interpreted as an absolute number of shared k-mers, while a float in (0, 1] is interpreted as a fraction of the read's k-mers.
Example: selecting reads matching two query files
Let's say we want to keep reads from reads.fa only when they share at least 10 k-mers with Q1.fa and at least 5 k-mers with Q2.fa:
k2rminimulti -c Q1.fa 10 -c Q2.fa 5 reads.fa
This is equivalent to the logical condition:
shared_kmers(read, Q1.fa) >= 10 AND shared_kmers(read, Q2.fa) >= 5
Example: mixing absolute and relative thresholds
k2rminimulti -c Q1.fa 10 -c Q2.fa 5 -c Q3.fa 0.25 reads.fa
This keeps reads sharing at least 10 k-mers with Q1.fa, at least 5 with Q2.fa, and at least 25% of their k-mers with Q3.fa. The constraints are combined with AND semantics.
Implementation notes
k2rminimulti indexes all query files into shared maps:
- the k-mer map stores a 32-bit k-mer hash as key and a 32-bit query presence mask as value
- the minimizer map stores a 32-bit minimizer key and a 32-bit query presence mask as value
Each bit in the presence mask corresponds to one query file, so k2rminimulti supports up to 32 query files. During filtering, a read is first prefiltered using minimizers, then candidate reads are checked using their k-mer hashes.
Because both k-mers and minimizers are represented with 32-bit keys, hash/key collisions can create false positives. This matches the current 32-bit k-mer hash behavior of K2Rmini, while also applying 32-bit keys to minimizers in k2rminimulti.
Benchmarks
Benchmarks and plots against other sequence filtering tools are available in the experiments repository.
Citation
Accelerating k-mer based sequence filtering. I. Martayan, L. Vandamme, B. Constantinides, B. Cazaux, C. Paperman and A. Limasset. https://doi.org/10.1101/2025.06.16.659853