qsv Command Help

June 24, 2026 ยท View on GitHub

Auto-generated from qsv command USAGE text. See README for full documentation.

CommandDescription
apply
๐Ÿ“‡๐Ÿง ๐Ÿค–๐Ÿš€๐Ÿ”ฃ๐Ÿ‘†โ›ฉ๏ธ
Apply series of string, date, math & currency transformations to given CSV column/s. It also has some basic NLP functions (similarity, sentiment analysis, profanity, eudex, language & name gender) detection. Its summarize subcommand condenses a column or group of columns using an OpenAI API-compatible LLM (local or commercial) with customizable, Mini Jinja-templated per-record prompts.
applydp
๐Ÿ“‡๐Ÿš€๐Ÿ”ฃ๐Ÿ‘† CKAN
applydp is a slimmed-down version of apply with only Datapusher+ relevant subcommands/operations (qsvdp binary variant only).
beheadDrop headers from a CSV.
blake3
๐Ÿš€
Compute or check BLAKE3 hashes of files.
cat
๐Ÿ—„๏ธ
Concatenate CSV files by row or by column.
cleanRemove qsv-generated cache files (.idx index, stats & frequency caches) to reduce clutter & simplify data packaging. With --stale, only removes caches whose source changed or is gone. Opt-in flags also clean schema, validate & moarstats outputs.
clipboard
๐Ÿ–ฅ๏ธ
Provide input from the clipboard or save output to the clipboard.
color
๐Ÿคฏ๐Ÿปโ€โ„๏ธ๐Ÿ–ฅ๏ธ
Outputs tabular data as a pretty, colorized table that always fits into the terminal. Apart from CSV and its dialects, Arrow, Avro/IPC, Parquet, JSON array & JSONL formats are supported with the "polars" feature.
count
๐Ÿ“‡๐Ÿปโ€โ„๏ธ๐ŸŽ๏ธ
Count the rows and optionally compile record width statistics of a CSV file. (11.87 seconds for a 15gb, 28m row NYC 311 dataset without an index. Instantaneous with an index.) If the polars feature is enabled, uses Polars' multithreaded, mem-mapped CSV reader for fast counts even without an index
datefmt
๐Ÿ“‡๐Ÿš€๐Ÿ‘†
Formats recognized date fields (19 formats recognized) to a specified date format using strftime date format specifiers.
dedup
๐Ÿคฏ๐Ÿš€๐Ÿ‘†
Remove duplicate rows (See also extdedup, extsort, sort & sortcheck commands).
describegpt
๐Ÿ“‡๐Ÿ—ƒ๏ธ๐Ÿค–๐ŸŒ๐Ÿช„๐Ÿ“šโ›ฉ๏ธ CKAN
Infer a "neuro-symbolic" Data Dictionary, Description & Tags or ask questions about a CSV with a configurable, Mini Jinja prompt file, using any OpenAI API-compatible LLM, including local LLMs. (e.g. Markdown, JSON, TOON, JSON Schema, Semantic Markdown, OKF, Everything, Spanish, Mandarin, Controlled Tags; --prompt "What are the top 10 complaint types by community board & borough by year?" - deterministic, hallucination-free SQL RAG result; iterative, session-based SQL RAG refinement - refined SQL RAG result)
diff
๐Ÿš€
Find the difference between two CSVs with ludicrous speed! e.g. compare two CSVs with 1M rows x 9 columns in under 600ms!
editReplace the value of a cell specified by its row and column.
enum
๐Ÿ‘†
Add a new column enumerating rows by adding a column of incremental or uuid identifiers. Can also be used to copy a column or fill a new column with a constant value.
excel
๐Ÿš€
Exports a specified Excel/ODS sheet to a CSV file.
exclude
๐Ÿ“‡๐Ÿ‘†
Removes a set of CSV data from another set based on the specified columns.
explode
๐Ÿ”ฃ๐Ÿ‘†
Explode rows into multiple ones by splitting a column value based on the given separator. The inverse of implode.
extdedup
๐Ÿ‘†
Remove duplicate rows from an arbitrarily large CSV/text file using a memory-mapped, on-disk hash table. Unlike the dedup command, this command does not load the entire file into memory nor does it sort the deduped file.
extsort
๐Ÿ“‡๐Ÿš€๐Ÿ‘†
Sort an arbitrarily large CSV/text file using a multithreaded external merge sort algorithm.
fetch
๐Ÿ“‡๐Ÿง ๐ŸŒ
Send/Fetch data to/from web services for every row using HTTP Get. Comes with HTTP/2 adaptive flow control, jaq JSON query language support, dynamic throttling (RateLimit) & caching with available persistent caching using Redis or a disk-cache.
fetchpost
๐Ÿ“‡๐Ÿง ๐ŸŒโ›ฉ๏ธ
Similar to fetch, but uses HTTP Post (HTTP GET vs POST methods). Supports HTML form (application/x-www-form-urlencoded), JSON (application/json) and custom content types - with the ability to render payloads using CSV data using the Mini Jinja template engine.
fill
๐Ÿ‘†
Fill empty values.
fixlengthsForce a CSV to have same-length records by either padding or truncating them.
flattenA flattened view of CSV records. Useful for viewing one record at a time. e.g. qsv slice -i 5 data.csv | qsv flatten.
fmtReformat a CSV with different delimiters, record terminators or quoting rules. (Supports ASCII delimited data.)
foreachExecute a shell command once per record in a given CSV file.
frequency
๐Ÿ“‡๐Ÿ˜ฃ๐ŸŽ๏ธ๐Ÿ‘†๐Ÿช„Luau
Build frequency distribution tables of each column. Uses multithreading to go faster if an index is present (Examples: CSV JSON TOON).
get
๐Ÿ“‡๐Ÿง ๐ŸŒ CKAN
Get tabular data from local files, URLs (http/https & dathere://) & CKAN (ckan://) into a managed, queryable disk cache - with conditional revalidation (ETag/Last-Modified), transparent zstd compression, BLAKE3 hashing & automatic indexing. Cached resources are reusable by ANY qsv command via the dc: prefix (e.g. qsv stats dc:data.csv), with stale entries auto-refreshed. Efficiently seeds luau lookup tables, validate dynamicEnum reference data & speeds up Datapusher+ harvesting.
geocode
๐Ÿ“‡๐Ÿง ๐Ÿš€๐ŸŒ๐Ÿ”ฃ๐Ÿ‘†๐ŸŒŽ
Geocodes a location against an updatable local copy of the Geonames cities & the Maxmind GeoLite2 databases โ€” with caching and multi-threading, this offline path geocodes up to 360,000 records/sec! Can also geocode online (forward & reverse) via the OpenCage geocoder.
geoconvert
๐ŸŒŽ
Convert between various spatial formats and CSV/SVG including GeoJSON, SHP, and more.
headers
๐Ÿ—„๏ธ
Show the headers of a CSV. Or show the intersection of all headers between many CSV files.
implode
๐Ÿ˜ฃ๐Ÿ‘†
Implode rows by grouping on key column(s) and joining a value column with a given separator. The inverse of explode.
indexCreate an index for a CSV. This is very quick (even the 15gb, 28m row NYC 311 dataset takes all of 14 seconds to index) & provides constant time indexing/random access into the CSV. With an index, count, sample & slice work instantaneously; random access mode is enabled in luau; and multithreading is enabled for the frequency, split, stats & schema commands.
inputRead CSV data with special commenting, quoting, trimming, line-skipping & non-UTF8 encoding handling rules. Typically used to "normalize" a CSV for further processing with other qsv commands.
join
๐Ÿ“‡๐Ÿ˜ฃ๐Ÿ‘†
Inner, outer, right, cross, anti & semi joins. Automatically creates a simple, in-memory hash index to make it fast.
joinp
๐Ÿปโ€โ„๏ธ๐Ÿš€๐Ÿช„
Inner, outer, right, cross, anti, semi, non-equi & asof joins using the Pola.rs engine. Unlike the join command, joinp can process files larger than RAM, is multithreaded, has join key validation, a maintain row order option, pre and post-join filtering, join keys unicode normalization, supports "special" non-equi joins and asof joins (which is particularly useful for time series data) & its output columns can be coalesced.
json
๐Ÿ‘†
Convert JSON array to CSV.
jsonl
๐Ÿš€๐Ÿ”ฃ
Convert newline-delimited JSON (JSONL/NDJSON) to CSV. See tojsonl command to convert CSV to JSONL.
lens
๐Ÿ—ƒ๏ธ๐Ÿปโ€โ„๏ธ๐Ÿ–ฅ๏ธ
Interactively view, search & filter tabular data files using the csvlens engine. Apart from CSV and its dialects, Arrow, Avro/IPC, Parquet, JSON array & JSONL formats are supported with the "polars" feature.
luau
๐Ÿ“‡๐ŸŒ๐Ÿ”ฃ๐Ÿ“š CKAN Luau
Create multiple new computed columns, filter rows, compute aggregations and build complex data pipelines by executing a Luau 0.725 expression/script for every row of a CSV file (sequential mode), or using random access with an index (random access mode). Can process a single Luau expression or full-fledged data-wrangling scripts using lookup tables with discrete BEGIN, MAIN and END sections. It is not just another qsv command, it is qsv's Domain-specific Language (DSL) with numerous qsv-specific helper functions to build production data pipelines.
moarstats
๐Ÿ“‡๐ŸŽ๏ธ
Add up to an additional 55 statistical measures, including extended outlier, robust & bivariate statistics to an existing stats CSV file. (example).
partition
๐Ÿ‘†
Partition a CSV based on a column value.
pivotp
๐Ÿปโ€โ„๏ธ๐Ÿš€๐Ÿช„
Pivot CSV data. Features "smart" aggregation auto-selection based on data type & stats.
pragmastat
๐Ÿ“‡๐Ÿคฏ๐ŸŽฒ๐Ÿช„
Compute pragmatic statistics using the Pragmastat library. Uses the stats cache to auto-filter non-numeric columns and support Date/DateTime columns.
proInteract with the qsv pro API.
profile
๐Ÿ“‡๐Ÿง ๐Ÿค–๐Ÿ“šโ›ฉ๏ธ CKAN
Extract, derive & infer metadata from a CSV (local path or URL) - using the statistical profile of a dataset, mapped and driven by a configurable metadata scheming YAML spec (DCAT-US v3, DCAT-AP v3 and Croissant 1.1 bundled; Geoconnex when built with the geoconnex feature), with optional CKAN/DCAT metadata discovery for URL inputs. This enables FAIRification at scale.
prompt
๐Ÿปโ€โ„๏ธ๐Ÿ–ฅ๏ธ
Open a file dialog to either pick a file as input or save output to a file.
pseudo
๐Ÿ”ฃ๐Ÿ‘†
Pseudonymise the value of the given column by replacing them with an incremental identifier.
py
๐Ÿ“‡๐Ÿ”ฃ
Create a new computed column or filter rows by evaluating a Python expression on every row of a CSV file. Python's f-strings is particularly useful for extended formatting, with the ability to evaluate Python expressions as well. Requires Python 3.11 or greater.
renameRename the columns of a CSV efficiently.
replace
๐Ÿ“‡๐ŸŽ๏ธ๐Ÿ‘†
Replace CSV data using a regex. Applies the regex to each field individually.
reverse
๐Ÿ“‡๐Ÿคฏ
Reverse order of rows in a CSV. Unlike the sort --reverse command, it preserves the order of rows with the same key. If an index is present, it works with constant memory. Otherwise, it will load all the data into memory.
safenames
CKAN
Modify headers of a CSV to only have "safe" names - guaranteed "database-ready"/"CKAN-ready" names.
sample
๐Ÿ“‡๐ŸŽ๏ธ๐ŸŒ๐ŸŽฒ๐Ÿช„
Randomly draw rows (with optional seed) from a CSV using ten different sampling methods - reservoir (default), indexed, bernoulli, systematic, stratified, weighted, varopt, mergeable-reservoir, cluster & timeseries sampling. The --varopt & --mergeable-reservoir modes support mergeable sketch I/O (--sketch-out/--sketch-in) so sharded inputs can be sampled and combined without re-reading the corpus. Supports sampling from CSVs on remote URLs. Uses the stats cache to skip unnecessary scanning and inform its sampling strategies.
schema
๐Ÿ“‡๐Ÿ˜ฃ๐Ÿปโ€โ„๏ธ๐ŸŽ๏ธ๐Ÿ‘†๐Ÿช„
Infer either a JSON Schema Validation Draft 2020-12 (Example) or Polars Schema (Example) from CSV data. In JSON Schema Validation mode, it produces a .schema.json file replete with inferred data type & domain/range validation rules derived from stats. Uses multithreading to go faster if an index is present. See validate command to use the generated JSON Schema to validate if similar CSVs comply with the schema. With the --polars option, it produces a .pschema.json file that all polars commands (sqlp, joinp & pivotp) use to determine the data type of each column & to optimize performance. Both schemas are editable and can be fine-tuned. For JSON Schema, to refine the inferred validation rules. For Polars Schema, to change the inferred Polars data types.
scoresql
๐Ÿปโ€โ„๏ธ๐Ÿช„
Analyze a SQL query against CSV file caches (stats, moarstats, frequency) to produce a performance score with actionable optimization suggestions BEFORE running the query. Supports Polars (default) and DuckDB modes.
search
๐Ÿ“‡๐ŸŽ๏ธ๐Ÿ‘†
Run a regex over a CSV. Applies the regex to selected fields & shows only matching rows.
searchset
๐Ÿ“‡๐ŸŽ๏ธ๐Ÿ‘†
Run multiple regexes over a CSV in a single pass. Applies the regexes to each field individually & shows only matching rows.
select
๐Ÿ‘†
Select, re-order, reverse, duplicate or drop columns.
slice
๐Ÿ“‡๐Ÿ—ƒ๏ธ๐ŸŽ๏ธ
Slice rows from any part of a CSV. When an index is present, this only has to parse the rows in the slice (instead of all rows leading up to the start of the slice).
snappy
๐Ÿš€๐ŸŒ
Does streaming compression/decompression of the input using Google's Snappy framing format (more info).
sniff
๐Ÿ“‡๐Ÿค–๐ŸŒ CKAN
Quickly sniff & infer CSV metadata (delimiter, header row, preamble rows, quote character, flexible, is_utf8, average record length, number of records, content length & estimated number of records if sniffing a CSV on a URL, number of fields, field names & data types). It is also a general mime type detector.
sort
๐Ÿคฏ๐Ÿš€๐Ÿ‘†๐ŸŽฒ
Sorts CSV data in lexicographical, natural, numerical, reverse, unique or random (with optional seed) order (Also see extsort & sortcheck commands).
sortcheck
๐Ÿ‘†
Check if a CSV is sorted. With the --json options, also retrieve record count, sort breaks & duplicate count.
split
๐Ÿ“‡๐ŸŽ๏ธ
Split one CSV file into many CSV files. It can split by number of rows, number of chunks or file size. Uses multithreading to go faster if an index is present when splitting by rows or chunks.
sqlp
๐Ÿ“‡๐Ÿ—„๏ธ๐Ÿปโ€โ„๏ธ๐Ÿš€๐Ÿช„
Run Polars SQL (a PostgreSQL dialect) queries against several CSVs, Parquet, JSONL and Arrow files - converting queries to blazing-fast Polars LazyFrame expressions, processing larger than memory CSV files. Query results can be saved in CSV, JSON, JSONL, Parquet, Apache Arrow IPC and Apache Avro formats.
stats
๐Ÿ“‡๐Ÿคฏ๐ŸŽ๏ธ๐Ÿ‘†๐Ÿช„
Compute up to 48 summary statistics & make GUARANTEED data type inferences (Null, String, Float, Integer, Date, DateTime, Boolean) for each column in a CSV (Example). Uses multithreading to go faster if an index is present. With an index, can compile "streaming" stats on a 1M row sample of NYC's 311 data in less than 0.25 seconds vs 2.24 seconds without one.
synthesize
๐Ÿ“‡๐ŸŽฒ๐Ÿค–
Generate a synthetic CSV that is statistically faithful to a source CSV. Runs stats + frequency on the source so synthesized columns reproduce its per-column attributes โ€” frequency-weighted sampling for categorical columns, quartile-bucketed numeric/date generation, null-ratio preservation. With a Data Dictionary from describegpt --dictionary --infer-content-type, semantic Content Types pick realistic fake-rs fakers (names, emails, addresses, UUIDs, etc.) for non-enumerable columns. A dictionary relationships array preserves inter-column structure within each row โ€” joint (functional dependencies like city/state/zip), ordered (monotonic chains like created_date โ‰ค closed_date) and correlated (numeric correlation via a Gaussian copula). Fully reproducible with --seed.
table
๐Ÿคฏ
Align output of a CSV using elastic tabstops for viewing; or to create an "aligned TSV" file or Fixed Width Format file. To interactively view a CSV, use the lens command.
template
๐Ÿ“‡๐Ÿš€๐Ÿ”ฃ๐Ÿ“šโ›ฉ๏ธ CKAN
Renders a template using CSV data with the Mini Jinja template engine (Example).
to
๐Ÿ—„๏ธ๐Ÿปโ€โ„๏ธ๐Ÿš€
Convert CSV files to Parquet, PostgreSQL, SQLite, Excel (XLSX), LibreOffice Calc (ODS) and Data Package.
tojsonl
๐Ÿ“‡๐Ÿ˜ฃ๐Ÿ—ƒ๏ธ๐Ÿš€๐Ÿ”ฃ๐Ÿช„
Smartly converts CSV to a newline-delimited JSON (JSONL/NDJSON). By scanning the CSV first, it "smartly" infers the appropriate JSON data type for each column. See jsonl command to convert JSONL to CSV.
transpose
๐Ÿคฏ๐Ÿ‘†
Transpose rows/columns of a CSV.
validate
๐Ÿ“‡๐Ÿ—„๏ธ๐Ÿš€๐ŸŒ๐Ÿ“š CKAN
Validate CSV data blazingly-fast using JSON Schema Validation (Draft 2020-12) (e.g. up to 780,031 rows/second[^1] using NYC's 311 schema generated by the schema command) & put invalid records into a separate file along with a detailed validation error report. Supports several custom JSON Schema formats & keywords: * currency custom format with ISO-4217 validation * dynamicEnum custom keyword that supports enum validation against a CSV on the filesystem or a URL (http/https/ckan & dathere URL schemes supported) * uniqueCombinedWith custom keyword to validate uniqueness across multiple columns for composite key validation. If no JSON schema file is provided, validates if a CSV conforms to the RFC 4180 standard and is UTF-8 encoded.
viz
๐Ÿช„๐Ÿ‘†
Generate interactive charts (bar, line, scatter, histogram, box, pie, heatmap, candlestick/ohlc, sankey, radar, geographic maps) and an auto-dashboard (viz smart) from CSV data using plotly. viz smart "automagically" picks an appropriate chart per column from the dataset's statistics & frequency distributions (box plots for continuous columns from precomputed quartiles; frequency bars for low-cardinality/boolean columns; a correlation heatmap when there are 2+ eligible continuous numeric columns; a map panel when a lat/lon column pair is detected). Outputs self-contained, interactive HTML (charts work offline; map basemaps fetch their tiles over the network unless the white-bg style is used) - or static PNG/SVG/PDF/JPEG/WebP with the viz_static feature - and can --open the result in your browser.

Legend

โœจ: enabled by a feature flag.
๐Ÿ“‡: uses an index when available.
๐Ÿคฏ: loads entire CSV into memory, though dedup, stats & transpose have "streaming" modes as well.
๐Ÿ˜ฃ: uses additional memory proportional to the cardinality of the columns in the CSV.
๐Ÿง : expensive operations are memoized with available inter-session Redis/Disk caching for fetch commands.
๐Ÿ—„๏ธ: Extended input support.
๐Ÿ—ƒ๏ธ: Limited Extended input support.
๐Ÿปโ€โ„๏ธ: command powered/accelerated by polars 0.54.4:py-1.42.0 vectorized query engine.
๐Ÿค–: command uses Natural Language Processing or Generative AI.
๐ŸŽ๏ธ: multithreaded and/or faster when an index (๐Ÿ“‡) is available.
๐Ÿš€: multithreaded even without an index.
CKAN : has CKAN-aware integration options.
๐ŸŒ: has web-aware options.
๐Ÿ”ฃ: requires UTF-8 encoded input.
๐Ÿ‘†: has powerful column selector support. See select for syntax.
๐Ÿช„: "automagical" commands that uses stats and/or frequency tables to work "smarter" & "faster".
๐Ÿ“š: has lookup table support, enabling runtime "lookups" against local or remote reference CSVs.
๐ŸŒŽ: has geospatial capabilities.
โ›ฉ๏ธ: uses Mini Jinja template engine.
Luau : uses Luau 0.725 as an embedded scripting DSL.
๐ŸŽฒ: randomly generated or randomized output with a --seed option for reproducibility.
๐Ÿ–ฅ๏ธ: part of the User Interface (UI) feature group.


README