Examples
May 14, 2026 · View on GitHub
Contents
- Introduction
- Migrating from Ruby CSV
- Ruby CSV Pitfalls
- Parsing Strategy
- The Basic Read API
- The Basic Write API
- Batch Processing
- Configuration Options
- Row and Column Separators
- Header Transformations
- Header Validations
- Column Selection
- Data Transformations
- Value Converters
- Bad Row Quarantine
- Warnings
- Instrumentation Hooks
- Examples
- Real-World CSV Files
- SmarterCSV over the Years
- Release Notes
Examples
Rescue from SmarterCSV::Error (recommended): SmarterCSV auto-detects row and column separators. In rare cases detection fails and raises an exception (e.g. NoColSepDetected). Rescuing from SmarterCSV::Error ensures your application handles unexpected CSV formats gracefully.
- CSV → Array of Hashes
- Parsing a CSV String
- Key Mapping and Column Selection
- Encoding and Preamble Skip
- Value Converters
- Header Validation
- Bad Row Handling
- Writing CSV
- Using
eachandeach_chunkEnumerators - Importing into a Database
- Batch Processing with Sidekiq
- Resumable CSV Import with Rails ActiveJob
- Instrumentation
- Streaming Inputs (Non-Seekable IO)
- Resumable Import (Plain Ruby)
- CSV Files with Comment Lines
- Tab-Separated Values (TSV)
- Multi-Line Fields
- Filtering and Transforming a CSV File
Example 1: CSV → Array of Hashes
Each hash only contains keys for columns with non-nil, non-empty values — columns with blank entries are omitted automatically:
$ cat pets.csv
first name,last name,dogs,cats,birds,fish
Dan,McAllister,2,,,
Lucy,Laweless,,5,,
Miles,O'Brian,,,,21
Nancy,Homes,2,,1,
$ irb
> require 'smarter_csv'
> pets_by_owner = SmarterCSV.process('pets.csv')
=> [ {first_name: "Dan", last_name: "McAllister", dogs: 2},
{first_name: "Lucy", last_name: "Laweless", cats: 5},
{first_name: "Miles", last_name: "O'Brian", fish: 21},
{first_name: "Nancy", last_name: "Homes", dogs: 2, birds: 1}
]
Example 2: Parsing a CSV String
Use SmarterCSV.parse to parse a CSV string directly — no file needed. Useful in tests, API responses, or when the CSV arrives as a string in memory:
csv_string = <<~CSV
name,age,city
Alice,30,New York
Bob,25,Chicago
CSV
data = SmarterCSV.parse(csv_string)
# => [{name: "Alice", age: 30, city: "New York"}, {name: "Bob", age: 25, city: "Chicago"}]
See The Basic Read API and Migrating from Ruby CSV.
Example 3: Key Mapping and Column Selection
Rename headers and drop unwanted columns in one pass:
options = {
key_mapping: {
first_name: :fname,
last_name: :lname,
dob: :birth_date,
ssn: nil, # drop this column entirely
},
}
data = SmarterCSV.process('people.csv', options)
# => [{fname: "Alice", lname: "Smith", birth_date: "1990-05-14"}, ...]
# ↑ :ssn is gone; original CSV headers remapped to your domain names
Keep only specific columns using headers: { only: }:
data = SmarterCSV.process('people.csv', headers: { only: [:name, :email] })
# => [{name: "Alice", email: "alice@example.com"}, ...]
See Header Transformations and Column Selection.
Example 4: Encoding and Preamble Skip
Handle non-UTF-8 files and metadata rows before the header:
# Bank statement export: Windows-1252, 3 preamble rows, then header
data = SmarterCSV.process('statement.csv',
file_encoding: 'windows-1252',
skip_lines: 3)
# European lab instrument export: semicolon-separated, Latin-1
data = SmarterCSV.process('results.csv',
file_encoding: 'iso-8859-1',
col_sep: :auto) # :auto detects the semicolon
See Row and Column Separators and Real-World CSV Files.
Example 5: Value Converters
Transform raw strings into typed values — dates, booleans, currency:
require 'date'
data = SmarterCSV.process('records.csv',
value_converters: {
# Parse US date format
dob: ->(v) { v ? Date.strptime(v, '%m/%d/%Y') : nil },
# Strip currency symbol and convert to Float
price: ->(v) { v&.delete('$,')&.to_f },
# Boolean from various representations
active: ->(v) { v&.match?(/\Atrue\z/i) },
})
data.first[:dob] # => #<Date: 1990-05-14>
data.first[:price] # => 44.5
data.first[:active] # => true
Combining with nil_values_matching to clean sentinel values before conversion:
data = SmarterCSV.process('export.csv',
nil_values_matching: /\A(N\/A|NULL|#N\/A)\z/i,
value_converters: {
score: ->(v) { v&.to_f }, # v is nil for N/A rows — guard with &.
})
See Value Converters.
Example 6: Header Validation
Raise early if required columns are missing, before processing any data rows:
begin
data = SmarterCSV.process('transactions.csv',
required_keys: [:account_id, :amount, :currency])
rescue SmarterCSV::MissingKeys => e
puts "CSV is missing required columns: #{e.keys.join(', ')}"
# => "CSV is missing required columns: currency"
end
See Header Validations.
Example 7: Bad Row Handling
Collect parse errors without stopping the import:
reader = SmarterCSV::Reader.new('data.csv', on_bad_row: :collect)
good_rows = reader.process
bad = reader.errors[:bad_rows]
puts "Imported #{good_rows.size} rows, #{bad.size} bad rows"
bad.each do |rec|
puts "Line #{rec[:file_line_number]}: #{rec[:error_message]}"
puts " Raw: #{rec[:raw_line]}"
end
Cap the number of tolerated bad rows and limit field sizes to guard against malformed input:
SmarterCSV.process('untrusted.csv',
on_bad_row: :skip,
bad_row_limit: 10,
field_size_limit: 4096)
See Bad Row Quarantine.
Example 8: Writing CSV
records = [
{ name: "Alice", age: 30, city: "New York" },
{ name: "Bob", age: 25, city: "Chicago" },
]
SmarterCSV.generate('output.csv') do |csv|
records.each { |r| csv << r }
end
# output.csv:
# name,age,city
# Alice,30,New York
# Bob,25,Chicago
Writing with header renaming and value converters:
require 'date'
SmarterCSV.generate('report.csv',
map_headers: { name: 'Full Name', dob: 'Date of Birth' },
value_converters: { dob: ->(v) { v&.strftime('%m/%d/%Y') } },
) do |csv|
User.find_each { |u| csv << { name: u.full_name, dob: u.dob } }
end
See The Basic Write API.
Example 9: Using each and each_chunk Enumerators
The modern API gives you full Enumerable power without loading the whole file:
# each — one hash per row
reader = SmarterCSV::Reader.new('data.csv')
reader.each { |hash| MyModel.upsert(hash) }
puts reader.headers.inspect # accessible after processing
# Enumerable methods
active_users = reader.select { |h| h[:status] == 'active' }
names = reader.map { |h| h[:name] }
# Lazy — stop early without reading the whole file
first_ten_active = reader.lazy.select { |h| h[:active] }.first(10)
# each_slice — manual batching without chunk_size
reader.each_slice(500) { |batch| MyModel.insert_all(batch) }
See Batch Processing and The Basic Read API.
Example 10: Importing into a Database
filename = '/tmp/some.csv'
options = { key_mapping: { unwanted_row: nil, old_row_name: :new_name } }
n = SmarterCSV.process(filename, options) do |array|
MyModel.create(array.first)
end
# => returns number of rows processed
Example 11: Batch Processing with Sidekiq
Processing in chunks reduces memory usage and enables parallel processing. The block receives the chunk as an optional second parameter:
filename = '/tmp/input.csv'
n = SmarterCSV.process(filename, chunk_size: 100) do |chunk, chunk_index|
puts "Queueing chunk #{chunk_index} with #{chunk.size} records..."
Sidekiq::Client.push_bulk(
'class' => SidekiqWorkerClass,
'args' => chunk,
)
end
# => returns number of chunks
See Batch Processing.
Example 12: Resumable CSV Import with Rails ActiveJob (Rails 8.1+)
Rails 8.1 introduced ActiveJob::Continuable, which lets a job pause and resume from exactly where it stopped — for example during a deployment or queue drain.
# app/jobs/import_csv_job.rb
class ImportCsvJob < ApplicationJob
include ActiveJob::Continuable
def perform(file_path)
step :import_rows do |step|
SmarterCSV.process(file_path, chunk_size: 500) do |chunk, chunk_index|
next if chunk_index < step.cursor.to_i # skip already-processed chunks on resume
MyModel.import!(chunk)
step.set! chunk_index + 1
end
end
end
end
step.cursorstarts asnil(→0), so the first run processes all chunks.- If interrupted after chunk 7, Rails persists the cursor as
8. - On the next run chunks 0–7 are skipped quickly via
next; processing resumes from chunk 8.
Requires Rails 8.1+ and a queue adapter that supports graceful shutdown (Sidekiq, Solid Queue).
Example 13: Instrumentation
SmarterCSV.process('large_import.csv',
chunk_size: 1000,
on_start: ->(info) {
Rails.logger.info "Import started: #{info[:input]} (#{info[:file_size]} bytes)"
},
on_chunk: ->(info) {
Rails.logger.debug "Chunk #{info[:chunk_number]}: #{info[:rows_in_chunk]} rows"
},
on_complete: ->(stats) {
Rails.logger.info "Done: #{stats[:total_rows]} rows in #{stats[:duration].round(2)}s"
},
) { |chunk| MyModel.insert_all(chunk) }
Example 14: Streaming Inputs (Non-Seekable IO)
(1.17.0+) SmarterCSV reads from gzipped files, HTTP responses, S3 objects, or piped STDIN — no need to materialize the file on disk first.
require 'zlib'
Zlib::GzipReader.open('huge.csv.gz') do |io|
SmarterCSV.process(io) { |row| MyModel.upsert(row.first) }
end
See Real-World CSV Files → I/O Patterns for gzip, S3, HTTP, STDIN, and IO.popen worked examples.
Example 15: Resumable Import (Plain Ruby)
A non-Rails counterpart to Example 12 — track the chunk cursor in a JSON file so an interrupted import resumes where it left off.
See Batch Processing → Resumable Import (Plain Ruby) for the worked example.
Example 16: CSV Files with Comment Lines
Strip lines matching a pattern (e.g. #-prefixed comments in DB dumps and log exports) using comment_regexp:
SmarterCSV.process('data.csv', comment_regexp: /\A#/)
See Header Transformations → CSV Files with Comment Lines for the worked example.
Example 17: Tab-Separated Values (TSV)
SmarterCSV.process('data.tsv') # auto-detected
SmarterCSV.process('data.tsv', col_sep: "\t") # explicit
See Row and Column Separators → Tab-Separated Values (TSV) for details.
Example 18: Multi-Line Fields
Newlines inside "..." are preserved as part of the field — common in addresses, CRM notes, and free-text comments. No configuration needed.
See Real-World CSV Files → Multi-Line Quoted Fields for the worked example.
Example 19: Filtering and Transforming a CSV File
The Ruby CSV library has CSV.filter for "read CSV, mutate each row, write CSV." In SmarterCSV this is a two-line composition of SmarterCSV.each and SmarterCSV.generate:
SmarterCSV.generate('out.csv') do |csv|
SmarterCSV.each('in.csv') do |row|
row[:price] = (row[:price] * 1.1).round(2)
row.delete(:internal_notes)
csv << row
end
end
The explicit csv << row is the win over CSV.filter — emission is intentional, not a side effect of mutating the block argument.
Pipeline (STDIN → STDOUT)
# cat in.csv | ruby filter.rb > out.csv
SmarterCSV.generate($stdout) do |csv|
SmarterCSV.each($stdin) { |row| csv << row }
end
Skipping rows
SmarterCSV.generate('out.csv') do |csv|
SmarterCSV.each('in.csv') do |row|
next if row[:status] == 'archived' # just skip — no emit
csv << row
end
end
Compressed in, compressed out
require 'zlib'
Zlib::GzipWriter.open('out.csv.gz') do |gz_out|
SmarterCSV.generate(gz_out) do |csv|
Zlib::GzipReader.open('in.csv.gz') do |gz_in|
SmarterCSV.each(gz_in) { |row| csv << row }
end
end
end
Both endpoints are non-seekable streams — a pattern CSV.filter cannot handle, since it requires seekable input/output.
Header renaming on the way through
SmarterCSV.generate('out.csv', headers: [:given_name, :family_name, :email]) do |csv|
SmarterCSV.each('in.csv',
key_mapping: { first_name: :given_name, last_name: :family_name }
) { |row| csv << row }
end
Use key_mapping: on the read side to rename columns and headers: on the write side to enforce output column order.
PREVIOUS: Instrumentation Hooks | NEXT: Real-World CSV Files | UP: README