tsvkit

April 8, 2026 · View on GitHub

tsvkit is a fast, ergonomic toolkit for working with TSV tables. Written in Rust, it brings familiar data-wrangling verbs (join, cut, filter, mutate, summarize, reshape, slice, pretty-print) to the command line with consistent column selection, rich expressions, and streaming-friendly performance. The CLI is inspired by projects such as csvtk, csvkit, datamash, awk, xsv, and mlr, and many options are intentionally compatible with csvtk so existing users can adapt quickly.

Table of Contents

Overview

tsvkit combines versatile column selection with an expression engine for statistics, filtering, and data transformation. This makes it straightforward to join multiple files and select column from each file to generate data matrix (e.g. gene count table), filter row based on selected columns, compute multi-column summaries, or pipe TSV/Excel data through complex workflows without leaving the shell. Multi-sheet Excel workbooks are supported alongside .tsv, .tsv.gz, and .tsv.xz files.

Key features

  • Stream-friendly processing; every command reads from files or standard input and writes to standard output.
  • Column selectors that accept names, 1-based indices, ranges, regexes, and multi-file specifications.
  • Expression language with arithmetic, comparisons, logical operators, list membership, regex matching, and numeric helper functions.
  • Aggregations for grouped summaries (summarize) and row-wise calculations (mutate).
  • Excel tooling to inspect, preview, export, and assemble .xlsx workbooks.

Installation

cargo build --release
# binary available at target/release/tsvkit

Run any command with --help to see detailed usage and concrete examples:

tsvkit --help
tsvkit join --help

Sample data

Curated example tables live under examples/ and power the walkthroughs below.

FileDescription
abundance.tsvMock metagenomic counts with taxonomic kingdoms for quick pivot/stack demos.
cytokines.tsvWide cytokine panel with one row per sample.
expression.tsvLong-form gene expression measurements (sample_id, gene, expression).
metadata.tsvMinimal ID → group lookup for join/filter illustrations.
profiles.tsvWide expression profiles suitable for melt/pivot walkthroughs.
qc.tsvSequencing QC metrics (reads, mapped percentage, duplication rate).
samples.tsvSample-level metrics (purity, yields, contamination) for six RNA-seq libraries.
scores.tsvTidy student scores table for summarize/group-by coverage.
subjects.tsvSubject demographics linked to samples via subject_id.
bioinfo_example.xlsxTwo-sheet workbook (Samples, Cytokines) built from the TSVs above for the Excel tooling.

Quick start pipeline

Join sample and subject metadata, derive a total cytokine score, filter high-purity case samples, and pretty-print the result:

cat examples/cytokines.tsv \
  | tsvkit mutate -e 'total=sum($IL6:$IL10)' -e 'log_total=log2($total)' \
  | tsvkit join -f 'sample_id;sample_id' -k 0 - examples/samples.tsv \
  | tsvkit filter -e '$group == "case" & $purity >= 0.94' \
  | tsvkit cut -f 'sample_id:IL10,log_total:purity' \
  | tsvkit pretty

Tip: wrap every -e expression in single quotes so your shell keeps $column selectors intact. Inside an expression, always prefix column references with $ (e.g. $total, $1).

Output

+-----------+-----+-----+------+------+-----------+------------+-------+-----------+--------+
| sample_id | IL6 | TNF | IFNG | IL10 | log_total | subject_id | group | timepoint | purity |
+-----------+-----+-----+------+------+-----------+------------+-------+-----------+--------+
| S01       | 4.2 | 3.1 | 6.8  | 2.4  | 4.044394  | P001       | case  | baseline  | 0.94   |
| S03       | 4.9 | 3.6 | 7.4  | 2.6  | 4.209453  | P001       | case  | week4     | 0.96   |
+-----------+-----+-----+------+------+-----------+------------+-------+-----------+--------+

The same pipeline works if the cytokine table is compressed (examples/cytokines.tsv.gz).

Command overview

The list below provides a one-line description of every tsvkit subcommand. Each item links to the detailed section later in this guide.

  • info — inspect table shape, inferred column types, and sample values.
  • cut — select or reorder columns via names, indices, or ranges.
  • filter — keep rows matching an expression (math, logic, regex, functions).
  • join — join multiple TSVs on key columns with parallel loading and fill values.
  • mutate — create or overwrite columns using expressions, aggregates, and string helpers.
  • summarize — group rows and compute aggregates (mean, median, quantiles, etc.).
  • sort — sort rows by one or more keys with numeric/text modifiers.
  • melt — convert wide tables into tidy long form with variable/value pairs.
  • pivot — convert long form back to wide with optional fill value for missing cells.
  • slice — extract rows by 1-based indices or ranges.
  • head — preview the first rows as a boxed table (single or multiple files).
  • pretty — render aligned, boxed tables for quick inspection or sharing.
  • transpose — transpose rows and columns.
  • excel — inspect, preview, export, or build .xlsx workbooks.
  • csv — convert delimited text to TSV with custom separators.

Core concepts

These conventions appear across the toolkit; understanding them once makes each subcommand predictable.

Column selectors

Selectors are reused in cut, filter, join, mutate, summarize, and others.

PatternMeaningExample
nameColumn by header name.sample_id,purity
index1-based column index.1,4,9
-indexColumn counted from the end (1 = last).-1,-2
start:endInclusive range by name or index. Supports open ends.IL6:IL10, 2:5, :IL10, IL6:
~"regex"Columns whose names match the regular expression. Requires headers.~"^sample_"
:Select every column in order.-f ':'
mixedCombine names, indices, ranges, and regexes.sample_id,3:5,~"_pct$"
multi-fileSeparate selectors for each input with semicolons (primarily join).sample_id;subject_id
range in expressionsPrefixed with $ to access a slice of values.$IL6:$IL10

Wrap selectors in backticks or braces to treat punctuation literally. For example, -f '`IL6:IL10`,`total,reads`' or -f '{IL6:IL10},{total,reads}' selects columns named IL6:IL10 and total,reads instead of expanding a range or splitting on the comma.

Negative indices are also valid inside ranges: :-2 selects every column except the final two, while -3: keeps the last three columns. Regex selectors deduplicate by first match; add --allow-dups (or -D) on cut/summarize when you need repeated columns.

Regex selectors require a header row. When -H/--no-header is active, using ~"..." results in an error with guidance to remove the regex or restore headers.

Anywhere you access column values inside an expression, prefix the selector with $ ($purity, $1, $IL6:$IL10).

Streaming and file handling

  • Every command accepts files or - (stdin) and auto-detects .tsv, .tsv.gz, and .tsv.xz inputs.
  • Add -H/--no-header when your data lacks a header row; selectors fall back to 1-based indices.
  • Use -C/--comment-char, -E/--ignore-empty-row, and -I/--ignore-illegal-row on any subcommand to control how input lines are filtered before processing.
  • tsvkit join parallelizes input loading; control the worker count with -t/--threads (defaults to the lesser of 8 and the available CPU cores).
  • --fill TEXT lets join, melt, pivot (and others) swap empty cells for a custom placeholder.

Expression language essentials

The same expression language powers filter -e, mutate -e name=EXPR, and regex substitutions. Wrap expressions in single quotes to protect $columns from the shell.

Operators and comparisons

Symbol / keywordDescriptionWorks on
+ - * / ^Arithmetic operators (^ is exponentiation, right-associative).Numbers
== != < <= > >=Comparisons.Numbers or strings
& / andLogical AND.Booleans
| / orLogical OR.Booleans
! / notLogical negation.Booleans
~Regex match. Right-hand side can be literal text or a $range.Strings
!~Regex does not match.Strings
inMembership test against a list literal or numeric range.$group in ["case","control"]
!inNegated membership test.$status !in ["fail","missing"]

Reference columns whose names contain operators or punctuation with ${column-name} inside expressions (e.g. ${dna-} - $rna_ug). This prevents the parser from treating the characters as arithmetic.

List literals use square brackets: [1,2,3], ["case","control"], [IL6:IL10]. Combine them with in/!in to test membership, or pass them to helper functions that accept lists.

Numeric helper functions

FunctionDescription
abs(expr)Absolute value
sqrt(expr)Square root
exp(expr) / exp2(expr)Exponential (e^x) / base-2 exponential
ln(expr)Natural logarithm
log(expr) / log10(expr)Base-10 logarithm
log2(expr)Base-2 logarithm
len(expr)Character count using Unicode code points.
is_na(expr)Returns 1 when the expression is blank/NA/NaN, otherwise 0.
upper(expr)Convert text to uppercase.
lower(expr)Convert text to lowercase.
cap(expr)Capitalize only the first character.

Functions accept column references (abs($purity - 1)), constants, or subexpressions. Empty or non-numeric values yield blanks.

Conditional and regex helpers

  • case_when(condition -> result, ..., _ -> default) evaluates each boolean condition in order and returns the matching result. The final _ branch acts as the default.
  • switch(value, [match1, match2] -> result, ..., _ -> default) compares value to one or more literal matches (strings or numbers) and returns the corresponding result.
  • re(value, pattern) evaluates a regex against value, returning 1 or 0. When the pattern matches, capture groups become available as $1, $2, etc. for the remainder of the expression (use $0-style numeric selectors sparingly when you rely on captures).

Example:

case_when(
    re($sample, "^ERR(\\d+)$") -> \$1,
    re($sample, "^SRR")          -> "SRA",
    _                              -> $sample
)

Row-wise aggregation helpers

Available within mutate expressions via functions such as sum($col1:$col5); see the Mutate section for the full list.

Command reference

Each subsection highlights the core options, shows realistic invocations, and calls out relevant selectors or expressions.

info

Get a quick, structured summary of any TSV: the overall shape plus one row per column with inferred types and sample values. The preview column defaults to the first three rows, but you can raise or lower it with -n (e.g. -n 5). Combine with -H when the input has no header row so the summary omits the name column.

tsvkit info examples/samples.tsv

Output

#shape(6, 9)
index   name            type    first3
1       sample_id       str     [S01, S02, S03]
2       subject_id      str     [P001, P002, P001]
3       group           str     [case, control, case]
4       timepoint       str     [baseline, baseline, week4]
5       purity          num     [0.94, 0.90, 0.96]
6       dna_ug          num     [25.3, 22.8, 27.4]
7       rna_ug          num     [18.1, 17.5, 19.8]
8       contamination_pct       num     [0.02, 0.03, 0.01]
9       tech            str     [sRNA-seq, sRNA-seq, sRNA-seq]

cut

Select or reorder columns by name, index, or range.

tsvkit cut -f 'sample_id,group,purity,tech' examples/samples.tsv

Ranges expand consecutive columns automatically:

tsvkit cut -f 'sample_id,IL6:IL10' examples/cytokines.tsv

Regex selectors pick up columns whose headers match a pattern. Combine them with names, indices, and ranges in any order:

tsvkit cut -f '1,group,~"^IL",~"_pct$"' examples/qc.tsv

Injecting the source file basename (__base__) or filename with path (__file__):

tsvkit cut -f '__base__,1:' examples/qc*.tsv

Explanation:

  • __base__ injects the source filename as the first column
  • 1: selects all existing columns from each input file

You can now inject file-derived values directly in -f with template selectors:

tsvkit cut -f '{file},{base:},1:2' examples/qc.tsv
tsvkit cut -f 'sample={base:#sample_!lower},1:' sample_A.tsv

Use --inject-col-names (aliases: --file-col, --fc) to rename injected columns. If you have multiple injected selectors in -f, pass comma-separated names in order:

tsvkit cut --inject-col-names sample -f '__base__,1:' examples/qc*.tsv
tsvkit cut --inject-col-names file_name,sample -f '{base:},sample={base:#sample_!upper},1:2' sample_A.tsv

Template tokens:

  • {file} / {__file__} full path
  • {base} / {__base__} basename
  • {dir} / {__dir__} parent dir
  • {base:} basename without all extensions
  • {base.} basename without last extension
  • {file%} basename of {file}
  • {file/} directory of {file}
  • {file^suffix} remove a literal trailing suffix when present
  • {base:#prefix} remove a literal prefix when present (example: {base:#sample_})
  • case controls: append !upper, !lower, or !cap (for example {base:!upper})

Shell tip: ! is interpreted by many shells in double quotes. Use single quotes for selector/template expressions like '{base:!lower},1:3' or 'sample={base:!lower},1:'. If you must use double quotes, escape it: "{base:\!lower}".

Negative selectors in cut -f:

  • -1 = last column
  • -2 = second-last column
  • -2: = from second-last to the final column

Matches deduplicate by default; add -D/--allow-dups to keep every occurrence when multiple selectors target the same column.

Useful operational flags:

  • -H/--no-header for headerless TSVs (selectors are index-based)
  • -C/--comment-char to skip comment lines (default #)
  • -E/--ignore-empty-row to skip blank rows
  • -I/--ignore-illegal-row to skip rows with inconsistent column counts

filter

Filter rows with boolean logic, arithmetic, column ranges, regexes, and list membership tests.

tsvkit filter -e '$group == "case" & $purity >= 0.94' examples/samples.tsv
tsvkit filter -e '$status !in ["fail","missing","error"] & $tech ~ "sRNA"' examples/samples.tsv

Case helpers are supported in filter expressions:

tsvkit filter -e 'cap(\$1) == "HELLO"' data.tsv
tsvkit filter -e 'upper($group) == "CASE"' examples/samples.tsv

Expression building blocks for filter

Building blockExamplesNotes
Column values$purity, $1, $IL6:$IL10Ranges produce a list; use in regex matches or aggregate helpers.
Literals1.25, "case"Strings use double quotes; escape inner quotes with \".
Arithmetic($rna_ug - $dna_ug) / $rna_ugStandard precedence applies (parentheses for clarity).
Comparisons$purity >= 0.9, $group != "control"Works on numeric or string data.
Logical($purity >= 0.9) & ($group == "case")&, |, and ! (or and, or, not).
Numeric functionslog2($total), sqrt($reads)See Expression language essentials.
Row-wise aggregatorssum($dna_ug:$rna_ug), mode(\$1,\$3), countunique($gene:)Same catalog as summarize: totals, quantiles (q* / p*), variance/SD, products, entropy, argmin/argmax, membership stats. Works with ranges, lists, and open selectors.
Regex match$tech ~ "sRNA", $notes !~ "(?i)fail"Patterns follow Rust regex syntax. (?i) enables case-insensitive matching.
Regex across ranges$gene:$notes ~ "kinase", ~ "control"When the left-hand side is omitted, ~ scans all columns.
Membership$group in ["case","control"], $rank in [1:3]Right-hand side must be a list literal or numeric range.

Regex usage at a glance

PatternDescription
$col ~ "^ABC"Keep rows where the column starts with ABC.
$col !~ "xyz$"Exclude rows where the column ends with xyz.
$A:$C ~ "kinase"Match if any column in the range contains kinase.
~ "(?i)na"Match if any column (entire row) contains na, case-insensitive.

join

Merge tables on shared keys. Provide selectors with -f/--fields; when all inputs use the same key you can specify it once.

tsvkit join -f subject_id examples/samples.tsv examples/subjects.tsv

Control join type with -k (-k 0 = full outer). Use -F/--select to specify output columns (defaults to all non-key columns); syntax mirrors -f. --fill TEXT supplies placeholders for missing combinations, while --sorted streams pre-sorted data. tsvkit join trims unused columns before indexing, and -t/--threads (default up to 8) balances throughput and resource usage.

Use --add-header to override emitted non-key header names with per-file/per-column templates:

tsvkit join \
  -f 'subject_id;subject_id' \
  -F 'group,purity;sex,age' \
  --add-header '{base:}_group,{base:}_purity;{base:}_sex,{base:}_age' \
  examples/samples.tsv examples/subjects.tsv

Formatting rules: split files with ;, columns with ,, and keep counts aligned with -F for each file. Template tokens are shared with cut -f '{...}' template selectors.

When using -H (no input header) together with --add-header, join emits a header row:

  • join-key columns are named index1, index2, ..., indexN by default
  • non-key columns use your --add-header templates.

Use --key-header (alias --index-name) to rename join-key columns explicitly. Provide comma-separated names and match the number of join columns.

Example:

tsvkit join -H \
  -f '1;1' \
  --key-header 'sample_id' \
  --add-header 'patient_{base:#sample_}' \
  sample_A.tsv sample_B.tsv

mutate

Create derived columns or rewrite values using expressions.

tsvkit mutate \
  -e 'total=sum($IL6:$IL10)' \
  -e 'log_total=log2($total)' \
  -e 'label=sub($sample_id,"S","Sample_")' \
  examples/cytokines.tsv

Use case_when, switch, and the re() helper for richer branching logic and regex capture reuse:

tsvkit mutate \
  -e 'label = case_when(
        re($sample, "^ERR(\d+)$") -> \$1,
        re($sample, "^SRR")        -> "SRA",
        _                            -> $sample
      )' \
  -e 'bucket = case_when(len($clean) == 0 -> "empty", len($clean) < 5 -> "short", _ -> "long")' \
  examples/samples.tsv

Apply in-place edits with the sed-style form:

tsvkit mutate -e 's/$group/ctrl/control/' examples/samples.tsv

Multiple expressions can be packed into one -e clause using ;:

tsvkit mutate -e 'v1=\$7/\$8;v2=\$11/\$12' data.tsv

Create new columns from regex replacement with:

tsvkit mutate -e 'new=s/\$2/aa[0-9]+/bb/' data.tsv
tsvkit mutate -e 'new2=${1/aa/bb}' data.tsv

String case helpers can be used directly in mutate expressions:

tsvkit mutate -e 'v1=cap(\$2)' -e 'v2=upper($group)' data.tsv

Mutation building blocks

FormMeaningExample
name=EXPRAppend a new column containing the evaluated expression.mean_signal=mean($sig1:$sig4)
existing=EXPROverwrite an existing column with the expression result.purity=round($purity,2) (via custom helper script)
s/$selectors/pattern/replacement/Regex substitution on one or more columns ($ optional).s/$group/ctrl/control/
new=s/$selector/pattern/replacement/Create a new column from one source column via regex replacement.new=s/\$2/aa/bb/
new=${selector/pattern/replacement}Braced shorthand for assignment substitution.new=${1/aa/bb}

Row-wise aggregators shared by filter and mutate

CategoryFunctions (aliases)Description
Totals & centerssum, mean/avg, median/med, trimmean, iqrNumeric summaries that ignore blanks; iqr computes q3 - q1.
Dispersionsd/std/stddev, var/variance, entropySpread metrics and Shannon entropy of the value distribution.
Extremes & positionsmin, max, absmin, absmax, argmin, argmaxarg* return the 1-based position among numeric entries.
Membership & countscount, first, last, rand/random, unique, collapse, countunique/distinct, mode, antimodeOperate on the original strings (including duplicates and blanks).
Productsprod/productMultiply all numeric inputs (skips blanks and NaNs).
Quantilesq* (q1, q0.9, q_0_25), p* (p95, p99.5)Fractions 0–1 and percents 0–100; underscores may replace dots.

Aggregators accept any range, list, or open-ended selector (sum(\$1,\$3:)). Non-numeric cells are skipped for numeric summaries. Results are appended as new columns unless you assign them back to an existing name.

summarize

Group rows and compute descriptive statistics. Without -g/--group, the entire table is treated as a single group.

tsvkit summarize \
  -g group \
  -s 'purity=mean,sd' \
  -s 'dna_ug:contamination_pct=q1,q3' \
  examples/samples.tsv

Regex selectors work here as well, so you can summarize whole families of columns in one shot:

tsvkit summarize -g patient -s '~"^sample_"=mean,sd' -D cohort.tsv

Aggregators supported by summarize

Counts & membership

Aggregator (aliases)DescriptionOutput type
countNumber of rows in the group (ignores blanks).Numeric
firstFirst non-empty value encountered.Original type
lastLast non-empty value encountered.Original type
rand / randomRandom value from the group.Original type
uniqueComma-separated list of distinct values in encounter order.String
collapseConcatenate every value (comma-separated, includes duplicates).String
countunique / distinctCount of distinct values.Numeric
modeMost frequent value (ties resolved by first occurrence).Original type
antimodeLeast frequent value (ties resolved by first occurrence).Original type
entropyShannon entropy calculated from value frequencies.Numeric

Numeric summaries

Aggregator (aliases)Description
sumSum of numeric values.
mean / avgArithmetic mean.
median / medMedian (50th percentile).
trimmeanMean of values after trimming 25% from each tail.
iqrInterquartile range (q3 - q1).
sd / std / stddevSample standard deviation.
var / varianceSample variance.
min / maxMinimum / maximum value.
absmin / absmaxValue with the smallest / largest absolute magnitude (returned with original sign).
prod / productProduct of numeric values.
argmin / argmax1-based row index within the group where the min/max occurs.

Quantiles

PatternDescription
q1, q2, q3, q4Quartiles (q2 equals the median).
q0, q0.25, q0.75, q0.9Fractional quantiles between 0 and 1 (underscores allowed instead of dots).
p0, p25, p95, p99.5Percentiles between 0 and 100.

Quantile aggregators accept any q* (fraction) or p* (percent) token. Values may include decimals (q0.05, p99.5) or integers. Non-numeric cells are ignored for numeric summaries and quantiles. absmin, absmax, mode, antimode, and entropy inspect the original string values, so they work even without numeric conversion.

sort

Sort rows by one or more keys. Modifiers: :n (numeric), :nr (numeric descending), :r (reverse text).

tsvkit sort -k purity:nr -k contamination_pct examples/samples.tsv

melt

Convert wide tables into tidy long form. Add --fill TEXT to substitute blanks with a chosen value.

tsvkit melt -i sample_id -v IL6:IL10 examples/cytokines.tsv

pivot

Promote long-form values to columns. -c/--column also accepts the short alias -f, and --fill TEXT sets a default value for missing combinations.

tsvkit pivot -i gene -c sample_id -v expression examples/expression.tsv

slice

Take specific rows (1-based indices or ranges, including open-ended forms like :10, 10:, or even : for everything). Negative indices count from the end, so :-2 emits every row except the final two and -3: keeps the last three rows.

tsvkit slice -r 1,4:5 examples/samples.tsv

Print the first rows from TSV input (default -n 10). With one input (including stdin), output is plain TSV with no file banner; with multiple files, each block is prefixed by # <file>.

tsvkit head -n 5 examples/samples.tsv examples/subjects.tsv
cat examples/samples.tsv | tsvkit head -n 3

pretty

Render aligned, boxed output for quick inspection.

tsvkit filter -e '$group == "case"' examples/samples.tsv | tsvkit pretty
  • --round DIGITS (or -r) rounds numeric cells to the requested precision. Tiny magnitudes automatically switch to scientific notation so columns stay legible even when values approach zero.

transpose

Transpose a table (rows become columns). With headers, the header row is included in transposition; use -H for headerless input.

tsvkit transpose examples/samples.tsv

excel

Inspect .xlsx workbooks, preview sheets, export ranges as TSV, or assemble new workbooks from TSV inputs. Unless -H/--no-header is supplied, the first row of each sheet is treated as the header row; use that flag when you need to preview or export raw rows.

  • List sheets (--sheets) with row/column counts and inferred column types:

    tsvkit excel --sheets examples/bioinfo_example.xlsx
    

    Output

    1     Samples rows=21 cols=4  types=[string,string,mixed,mixed]
    2     Variants        rows=21 cols=4  types=[string,mixed,mixed,mixed]
    3     Expression      rows=21 cols=3  types=[string,mixed,mixed]
    4     Pathways        rows=11 cols=3  types=[string,string,mixed]
    5     QC      rows=21 cols=4  types=[string,mixed,mixed,mixed]
    6     ClinMetadata    rows=21 cols=4  types=[string,mixed,string,string]
    7     Taxonomy        rows=21 cols=4  types=[string,mixed,mixed,mixed]
    8     Coverage        rows=11 cols=2  types=[string,mixed]
    9     Proteomics      rows=21 cols=3  types=[string,mixed,mixed]
    10    Metabolites     rows=21 cols=2  types=[string,mixed]
    
  • Preview (--preview) the first rows of every sheet (header + N rows by default). Use -s to focus on one sheet, -n to change the window, --formulas to show Excel formulas instead of values, --dates raw|excel|iso to control date rendering (iso is the default), and --pretty to render the preview with aligned borders. Add -H/--no-header when the sheet lacks a header row so the preview shows raw rows only:

    tsvkit excel --preview reports.xlsx -n 5 -s Summary
    tsvkit excel --preview reports.xlsx --formulas --dates raw
    tsvkit excel --preview reports.xlsx --pretty
    tsvkit excel --preview examples/bioinfo_example.xlsx -n 3 --pretty
    

    Output (Only the first three sheets are shown below; the actual output has ten.)

    #1 Samples
    +----------+-------+--------+--------+
    | SampleID | Group | Purity | DNA_ug |
    +----------+-------+--------+--------+
    | S001     | Case  | 0.744  | 3.05   |
    | S002     | Case  | 0.837  | 8.59   |
    | S003     | Case  | 0.703  | 1.15   |
    +----------+-------+--------+--------+
    
    #2 Variants
    +----------+------+--------+------+
    | SampleID | SNPs | Indels | CNVs |
    +----------+------+--------+------+
    | S001     | 1217 | 130    | 13   |
    | S002     | 3191 | 241    | 19   |
    | S003     | 2463 | 210    | 8    |
    +----------+------+--------+------+
    
    #3 Expression
    +-------+-----------+--------------+
    | Gene  | Expr_Case | Expr_Control |
    +-------+-----------+--------------+
    | Gene1 | 12.46     | 9.49         |
    | Gene2 | 5.6       | 14.43        |
    | Gene3 | 6.99      | 5.05         |
    +-------+-----------+--------------+
    ...
    
  • Dump (--dump) a sheet (or subset) to TSV. Columns accept names, indices, or Excel letters/ranges (e.g. A:C,Expr, :C, C:). Rows accept 1-based indices or inclusive ranges (1,10:20,:25,100:). --na replaces blanks, --escape-* makes TSV-safe output, and the same --values/--formulas + --dates controls apply. Use -H/--no-header when the sheet lacks a header row so column names fall back to indices:

    tsvkit excel --dump examples/bioinfo_example.xlsx -s Samples -f 'SampleID,Group,Purity' -r 2:3 | tsvkit pretty
    

    Output

    +----------+-------+--------+
    | SampleID | Group | Purity |
    +----------+-------+--------+
    | S002     | Case  | 0.837  |
    | S003     | Case  | 0.703  |
    +----------+-------+--------+
    
  • Load (--load) one or more TSV files into a new workbook. Each TSV can be followed by -s SHEETNAME. Use -H when the TSV lacks headers, --fields to supply header names in that case, --types infer|string to control numeric inference, --dates excel|iso|raw to influence how date strings are written, --na to treat specific tokens as blanks, and --max-rows-per-sheet to split very tall sheets (defaults to Excel's 1,048,576 rows):

    tsvkit excel --load examples/samples.tsv -s Samples --load examples/cytokines.tsv -s Cytokines -o examples/testout.xlsx
    

Only .xlsx files are supported at the moment. Sheets created via --load are renamed Name (2), Name (3), … when row limits force splits. When -s is omitted the sheet falls back to its 1-based position (1, 2, …) in the load order, so a mixture of named and unnamed inputs still yields deterministic sheet names.

csv

Convert delimited text into TSV. Use --delim to specify the input delimiter (default ,) and -H when the source has no header row. The converter also mirrors common TSV-reader switches:

  • -C/--comment-char skips comment lines.
  • -E/--ignore-empty-row drops blank lines.
  • -I/--ignore-illegal-row skips rows whose width differs from the header/first row.
  • --na STR substitutes a string for empty fields (left blank by default).
  • --lazy-quotes treats stray quotes literally rather than erroring out.

Compressed inputs work via the same auto-detection as other commands.

tsvkit csv examples/data.csv > examples/data.tsv
tsvkit csv examples/semicolon.csv --delim ';' -H > tmp.tsv

Additional tips

  • tsvkit automatically detects .tsv, .tsv.gz, and .tsv.xz. Pipe from curl/zcat for other formats.
  • Numeric functions treat empty cells as missing; regex syntax follows Rust's regex crate.
  • For massive joins, pre-sort inputs and use join --sorted to keep memory usage flat.
  • Combine mutate, filter, and summarize to build complete pipelines directly on the command line.

Contributing

Issues and pull requests are welcome!