Examples

May 14, 2026 · View on GitHub

Examples

Rescue from SmarterCSV::Error (recommended): SmarterCSV auto-detects row and column separators. In rare cases detection fails and raises an exception (e.g. NoColSepDetected). Rescuing from SmarterCSV::Error ensures your application handles unexpected CSV formats gracefully.

CSV → Array of Hashes
Parsing a CSV String
Key Mapping and Column Selection
Encoding and Preamble Skip
Value Converters
Header Validation
Bad Row Handling
Writing CSV
Using each and each_chunk Enumerators
Importing into a Database
Batch Processing with Sidekiq
Resumable CSV Import with Rails ActiveJob
Instrumentation
Streaming Inputs (Non-Seekable IO)
Resumable Import (Plain Ruby)
CSV Files with Comment Lines
Tab-Separated Values (TSV)
Multi-Line Fields
Filtering and Transforming a CSV File

Example 1: CSV → Array of Hashes

Each hash only contains keys for columns with non-nil, non-empty values — columns with blank entries are omitted automatically:

$ cat pets.csv
first name,last name,dogs,cats,birds,fish
Dan,McAllister,2,,,
Lucy,Laweless,,5,,
Miles,O'Brian,,,,21
Nancy,Homes,2,,1,

$ irb
> require 'smarter_csv'
> pets_by_owner = SmarterCSV.process('pets.csv')
 => [ {first_name: "Dan",   last_name: "McAllister", dogs: 2},
      {first_name: "Lucy",  last_name: "Laweless",   cats: 5},
      {first_name: "Miles", last_name: "O'Brian",    fish: 21},
      {first_name: "Nancy", last_name: "Homes",      dogs: 2, birds: 1}
    ]

Example 2: Parsing a CSV String

Use SmarterCSV.parse to parse a CSV string directly — no file needed. Useful in tests, API responses, or when the CSV arrives as a string in memory:

csv_string = <<~CSV
  name,age,city
  Alice,30,New York
  Bob,25,Chicago
CSV

data = SmarterCSV.parse(csv_string)
# => [{name: "Alice", age: 30, city: "New York"}, {name: "Bob", age: 25, city: "Chicago"}]

See The Basic Read API and Migrating from Ruby CSV.

Example 3: Key Mapping and Column Selection

Rename headers and drop unwanted columns in one pass:

options = {
  key_mapping: {
    first_name: :fname,
    last_name:  :lname,
    dob:        :birth_date,
    ssn:        nil,          # drop this column entirely
  },
}
data = SmarterCSV.process('people.csv', options)
# => [{fname: "Alice", lname: "Smith", birth_date: "1990-05-14"}, ...]
#  ↑ :ssn is gone; original CSV headers remapped to your domain names

Keep only specific columns using headers: { only: }:

data = SmarterCSV.process('people.csv', headers: { only: [:name, :email] })
# => [{name: "Alice", email: "alice@example.com"}, ...]

See Header Transformations and Column Selection.

Example 4: Encoding and Preamble Skip

Handle non-UTF-8 files and metadata rows before the header:

# Bank statement export: Windows-1252, 3 preamble rows, then header
data = SmarterCSV.process('statement.csv',
  file_encoding: 'windows-1252',
  skip_lines:    3)

# European lab instrument export: semicolon-separated, Latin-1
data = SmarterCSV.process('results.csv',
  file_encoding: 'iso-8859-1',
  col_sep:       :auto)   # :auto detects the semicolon

See Row and Column Separators and Real-World CSV Files.

Example 5: Value Converters

Transform raw strings into typed values — dates, booleans, currency:

require 'date'

data = SmarterCSV.process('records.csv',
  value_converters: {
    # Parse US date format
    dob:    ->(v) { v ? Date.strptime(v, '%m/%d/%Y') : nil },

    # Strip currency symbol and convert to Float
    price:  ->(v) { v&.delete('$,')&.to_f },

    # Boolean from various representations
    active: ->(v) { v&.match?(/\Atrue\z/i) },
  })

data.first[:dob]    # => #<Date: 1990-05-14>
data.first[:price]  # => 44.5
data.first[:active] # => true

Combining with nil_values_matching to clean sentinel values before conversion:

data = SmarterCSV.process('export.csv',
  nil_values_matching: /\A(N\/A|NULL|#N\/A)\z/i,
  value_converters: {
    score: ->(v) { v&.to_f },   # v is nil for N/A rows — guard with &.
  })

See Value Converters.

Example 6: Header Validation

Raise early if required columns are missing, before processing any data rows:

begin
  data = SmarterCSV.process('transactions.csv',
    required_keys: [:account_id, :amount, :currency])
rescue SmarterCSV::MissingKeys => e
  puts "CSV is missing required columns: #{e.keys.join(', ')}"
  # => "CSV is missing required columns: currency"
end

See Header Validations.

Example 7: Bad Row Handling

Collect parse errors without stopping the import:

reader = SmarterCSV::Reader.new('data.csv', on_bad_row: :collect)
good_rows = reader.process

bad = reader.errors[:bad_rows]
puts "Imported #{good_rows.size} rows, #{bad.size} bad rows"
bad.each do |rec|
  puts "Line #{rec[:file_line_number]}: #{rec[:error_message]}"
  puts "  Raw: #{rec[:raw_line]}"
end

Cap the number of tolerated bad rows and limit field sizes to guard against malformed input:

SmarterCSV.process('untrusted.csv',
  on_bad_row:       :skip,
  bad_row_limit:    10,
  field_size_limit: 4096)

See Bad Row Quarantine.

Example 8: Writing CSV

records = [
  { name: "Alice", age: 30, city: "New York" },
  { name: "Bob",   age: 25, city: "Chicago"  },
]

SmarterCSV.generate('output.csv') do |csv|
  records.each { |r| csv << r }
end
# output.csv:
# name,age,city
# Alice,30,New York
# Bob,25,Chicago

Writing with header renaming and value converters:

require 'date'

SmarterCSV.generate('report.csv',
  map_headers:      { name: 'Full Name', dob: 'Date of Birth' },
  value_converters: { dob: ->(v) { v&.strftime('%m/%d/%Y') } },
) do |csv|
  User.find_each { |u| csv << { name: u.full_name, dob: u.dob } }
end

See The Basic Write API.

Example 9: Using `each` and `each_chunk` Enumerators

The modern API gives you full Enumerable power without loading the whole file:

# each — one hash per row
reader = SmarterCSV::Reader.new('data.csv')
reader.each { |hash| MyModel.upsert(hash) }
puts reader.headers.inspect   # accessible after processing

# Enumerable methods
active_users = reader.select { |h| h[:status] == 'active' }
names        = reader.map    { |h| h[:name] }

# Lazy — stop early without reading the whole file
first_ten_active = reader.lazy.select { |h| h[:active] }.first(10)

# each_slice — manual batching without chunk_size
reader.each_slice(500) { |batch| MyModel.insert_all(batch) }

See Batch Processing and The Basic Read API.

Example 10: Importing into a Database

filename = '/tmp/some.csv'
options = { key_mapping: { unwanted_row: nil, old_row_name: :new_name } }

n = SmarterCSV.process(filename, options) do |array|
  MyModel.create(array.first)
end
# => returns number of rows processed

Example 11: Batch Processing with Sidekiq

Processing in chunks reduces memory usage and enables parallel processing. The block receives the chunk as an optional second parameter:

filename = '/tmp/input.csv'

n = SmarterCSV.process(filename, chunk_size: 100) do |chunk, chunk_index|
  puts "Queueing chunk #{chunk_index} with #{chunk.size} records..."
  Sidekiq::Client.push_bulk(
    'class' => SidekiqWorkerClass,
    'args'  => chunk,
  )
end
# => returns number of chunks

See Batch Processing.

Example 12: Resumable CSV Import with Rails ActiveJob (Rails 8.1+)

Rails 8.1 introduced ActiveJob::Continuable, which lets a job pause and resume from exactly where it stopped — for example during a deployment or queue drain.

# app/jobs/import_csv_job.rb
class ImportCsvJob < ApplicationJob
  include ActiveJob::Continuable

  def perform(file_path)
    step :import_rows do |step|
      SmarterCSV.process(file_path, chunk_size: 500) do |chunk, chunk_index|
        next if chunk_index < step.cursor.to_i  # skip already-processed chunks on resume

        MyModel.import!(chunk)
        step.set! chunk_index + 1
      end
    end
  end
end

step.cursor starts as nil (→ 0), so the first run processes all chunks.
If interrupted after chunk 7, Rails persists the cursor as 8.
On the next run chunks 0–7 are skipped quickly via next; processing resumes from chunk 8.

Requires Rails 8.1+ and a queue adapter that supports graceful shutdown (Sidekiq, Solid Queue).

Example 13: Instrumentation

SmarterCSV.process('large_import.csv',
  chunk_size: 1000,

  on_start: ->(info) {
    Rails.logger.info "Import started: #{info[:input]} (#{info[:file_size]} bytes)"
  },

  on_chunk: ->(info) {
    Rails.logger.debug "Chunk #{info[:chunk_number]}: #{info[:rows_in_chunk]} rows"
  },

  on_complete: ->(stats) {
    Rails.logger.info "Done: #{stats[:total_rows]} rows in #{stats[:duration].round(2)}s"
  },
) { |chunk| MyModel.insert_all(chunk) }

See Instrumentation Hooks.

Example 14: Streaming Inputs (Non-Seekable IO)

(1.17.0+) SmarterCSV reads from gzipped files, HTTP responses, S3 objects, or piped STDIN — no need to materialize the file on disk first.

require 'zlib'
Zlib::GzipReader.open('huge.csv.gz') do |io|
  SmarterCSV.process(io) { |row| MyModel.upsert(row.first) }
end

See Real-World CSV Files → I/O Patterns for gzip, S3, HTTP, STDIN, and IO.popen worked examples.

Example 15: Resumable Import (Plain Ruby)

A non-Rails counterpart to Example 12 — track the chunk cursor in a JSON file so an interrupted import resumes where it left off.

See Batch Processing → Resumable Import (Plain Ruby) for the worked example.

Example 16: CSV Files with Comment Lines

Strip lines matching a pattern (e.g. #-prefixed comments in DB dumps and log exports) using comment_regexp:

SmarterCSV.process('data.csv', comment_regexp: /\A#/)

See Header Transformations → CSV Files with Comment Lines for the worked example.

Example 17: Tab-Separated Values (TSV)

SmarterCSV.process('data.tsv')                  # auto-detected
SmarterCSV.process('data.tsv', col_sep: "\t")   # explicit

See Row and Column Separators → Tab-Separated Values (TSV) for details.

Example 18: Multi-Line Fields

Newlines inside "..." are preserved as part of the field — common in addresses, CRM notes, and free-text comments. No configuration needed.

See Real-World CSV Files → Multi-Line Quoted Fields for the worked example.

Example 19: Filtering and Transforming a CSV File

The Ruby CSV library has CSV.filter for "read CSV, mutate each row, write CSV." In SmarterCSV this is a two-line composition of SmarterCSV.each and SmarterCSV.generate:

SmarterCSV.generate('out.csv') do |csv|
  SmarterCSV.each('in.csv') do |row|
    row[:price] = (row[:price] * 1.1).round(2)
    row.delete(:internal_notes)
    csv << row
  end
end

The explicit csv << row is the win over CSV.filter — emission is intentional, not a side effect of mutating the block argument.

Pipeline (STDIN → STDOUT)

# cat in.csv | ruby filter.rb > out.csv
SmarterCSV.generate($stdout) do |csv|
  SmarterCSV.each($stdin) { |row| csv << row }
end

Skipping rows

SmarterCSV.generate('out.csv') do |csv|
  SmarterCSV.each('in.csv') do |row|
    next if row[:status] == 'archived'   # just skip — no emit
    csv << row
  end
end

Compressed in, compressed out

require 'zlib'
Zlib::GzipWriter.open('out.csv.gz') do |gz_out|
  SmarterCSV.generate(gz_out) do |csv|
    Zlib::GzipReader.open('in.csv.gz') do |gz_in|
      SmarterCSV.each(gz_in) { |row| csv << row }
    end
  end
end

Both endpoints are non-seekable streams — a pattern CSV.filter cannot handle, since it requires seekable input/output.

Header renaming on the way through

SmarterCSV.generate('out.csv', headers: [:given_name, :family_name, :email]) do |csv|
  SmarterCSV.each('in.csv',
    key_mapping: { first_name: :given_name, last_name: :family_name }
  ) { |row| csv << row }
end

Use key_mapping: on the read side to rename columns and headers: on the write side to enforce output column order.

PREVIOUS: Instrumentation Hooks | NEXT: Real-World CSV Files | UP: README