sniff

May 31, 2026 · View on GitHub

Quickly sniff & infer CSV metadata (delimiter, header row, preamble rows, quote character, flexible, is_utf8, average record length, number of records, content length & estimated number of records if sniffing a CSV on a URL, number of fields, field names & data types). It is also a general mime type detector.

Table of Contents | Source: src/cmd/sniff.rs | 📇🤖🌐 CKAN

Description | Examples | Usage | Arguments | Sniff Options | Common Options

Description

Quickly sniff the first n rows and infer CSV metadata (delimiter, header row, number of preamble rows, quote character, flexible, is_utf8, average record length, number of records, content length and estimated number of records if sniffing a URL, file size, number of fields, field names & data types).

sniff is also a mime type detector, returning the detected mime type, file size and last modified date. If --no-infer is enabled, it doesn't even bother to infer the CSV's schema. This makes it useful for accelerated CKAN harvesting and for checking stale/broken resource URLs.

When qsv is compiled with the optional magika feature, it uses Magika - Google's AI-powered content detection library to identify file types with high accuracy. Magika detects over 200 content types including CSV, parquet, MS Office/Open Document files, JSON, PDF, PNG, JPEG & more. See https://opensource.googleblog.com/2025/11/announcing-magika-10-now-faster-smarter.html.

When the magika feature is not enabled in a build (e.g., MUSL builds, qsvlite, qsvdp), it falls back to the file-format library which provides basic MIME type detection.

Note

This command "sniffs" a CSV's schema by sampling the first n rows (default: 1000) of a file. Its inferences are sometimes wrong if the the file is too small to infer a pattern or if the CSV has unusual formatting - with atypical delimiters, quotes, etc.

In such cases, selectively use the --sample, --delimiter and --quote options to improve the accuracy of the sniffed schema.

If you want more robust, guaranteed schemata, use the "schema" or "stats" commands instead as they scan the entire file. However, they only work on local files and well-formed CSVs, unlike sniff which can work with remote files, various CSV dialects and is very fast regardless of file size.

Examples

Sniff a local CSV file

qsv sniff data.csv

Sniff a remote TSV file over HTTPS

qsv sniff https://example.com/data.tsv

Get the mime type of a remote file without inferring the CSV schema

qsv sniff --no-infer https://example.com/data.xlsx

Sniff the first 20 percent of a SSV file

qsv sniff --sample 0.20 data.ssv

For more examples, see tests.

See also https://github.com/dathere/qsv/wiki/Selection-and-Inspection#sniff

Usage

qsv sniff [options] [<input>]
qsv sniff --help

Arguments

Argument Description
 <input> The file to sniff. This can be a local file, stdin or a URL (http and https schemes supported).

Sniff Options

      Option      TypeDescriptionDefault
 ‑‑sample floatFirst n rows to sample to sniff out the metadata. When sample size is between 0 and 1 exclusive, it is treated as a percentage of the CSV to sample (e.g. 0.20 is 20 percent). When it is zero, the entire file will be sampled. When the input is a URL, the sample size dictates how many lines to sample without having to download the entire file. Ignored when --no-infer is enabled. When sniffing a local file that has a CSV index, the sample budget is instead drawn as a DISTRIBUTED sample (the first & last 5 rows, 5 rows each around the 25th, 50th & 75th percentiles, and the rest random across the whole file) rather than just the first n rows. This improves type/date inference for values that only appear late in the file. Run qsv index to create an index.1000
 ‑‑prefer‑dmy flagPrefer to parse dates in dmy format. Otherwise, use mdy format. Ignored when --no-infer is enabled.
 ‑d,
‑‑delimiter 
stringThe delimiter for reading CSV data. Specify this when the delimiter is known beforehand, as the delimiter inferencing algorithm can sometimes fail. Must be a single ascii character.
 ‑‑quote stringThe quote character for reading CSV data. Specify this when the quote character is known beforehand, as the quote char inferencing algorithm can sometimes fail. Must be a single ascii character - typically, double quote ("), single quote ('), or backtick (`).
 ‑‑json flagReturn results in JSON format.
 ‑‑pretty‑json flagReturn results in pretty JSON format.
 ‑‑save‑urlsample stringSave the URL sample to a file. Valid only when input is a URL.
 ‑‑timeout integerTimeout when sniffing URLs in seconds. If 0, no timeout is used.30
 ‑‑user‑agent stringSpecify custom user agent to use when sniffing a CSV on a URL. It supports the following variables - QSVVERSION,QSV_VERSION, QSV_TARGET, QSVBINNAME,QSV_BIN_NAME, QSV_KIND and $QSV_COMMAND. Try to follow the syntax here - https://developer.mozilla.org/en-US/docs/Web/HTTP/Headers/User-Agent
 ‑‑stats‑types flagUse the same data type names as stats. (Unsigned, Signed => Integer, Text => String, everything else the same)
 ‑‑no‑infer flagDo not infer the schema. Only return the file's mime type, size and last modified date. Use this to use sniff as a general mime type detector. Note that CSV and TSV files will only be detected as mime type plain/text in this mode.
 ‑‑just‑mime flagOnly return the file's mime type. Use this to use sniff as a general mime type detector. Synonym for --no-infer.
 ‑Q,
‑‑quick 
flagWhen sniffing a non-CSV remote file, only download the first chunk of the file before attempting to detect the mime type. This is faster but less accurate as some mime types cannot be detected with just the first downloaded chunk.
 ‑‑harvest‑mode flagThis is a convenience flag when using sniff in CKAN harvesters. It is equivalent to --quick --timeout 10 --stats-types --json and --user-agent "CKAN-harvest/QSVVERSION(QSV_VERSION (QSV_TARGET; $QSV_BIN_NAME)"

Common Options

     Option     TypeDescriptionDefault
 ‑h,
‑‑help 
flagDisplay this message
 ‑p,
‑‑progressbar 
flagShow progress bars. Only valid for URL input.

Source: src/cmd/sniff.rs | Table of Contents | README