Data Transformations

June 19, 2026 · View on GitHub

Contents


Data Transformations

SmarterCSV automatically normalizes the values in each row. All transformations are configurable — most are enabled by default because they're the right behavior for the vast majority of CSV files.

Transformation Pipeline

Transformations run in this order for every row:

StepOptionDefaultWhat it does
1strip_whitespacetrueStrips leading/trailing whitespace from all values (and headers) at parse time
2nil_values_matchingnilSets values matching the regexp to nil
3remove_empty_valuestrueRemoves keys whose value is nil or blank
4remove_zero_valuesfalseRemoves keys whose value is numeric zero
5convert_values_to_numerictrueConverts numeric-looking strings to Integer or Float
6value_convertersnilApplies per-key custom converter lambdas or classes
7remove_empty_hashestrueDrops rows that are entirely empty after all transformations

Steps 2–6 run per field in order. value_converters receive the value after numeric conversion — guard against receiving Integer/Float if your converter expects a string.


strip_whitespace

Default: true

Strips leading and trailing whitespace from all header names and all field values at parse time, before any other transformation runs.

# CSV with padded values:
# name,  score
# Alice ,  42
# Bob   ,  0

data = SmarterCSV.process(file)
# => [{name: "Alice", score: 42}, {name: "Bob", score: 0}]
#  ↑ "Alice " stripped to "Alice", "  42" stripped to "42" then converted

data = SmarterCSV.process(file, strip_whitespace: false)
# => [{"name"=>"Alice ", " score"=>"  42"}, ...]
#  ↑ whitespace preserved in both headers and values

nil_values_matching

Default: nil (disabled)

Set values matching the given regular expression to nil. Combined with the default remove_empty_values: true, matching values are removed from the result hash. With remove_empty_values: false, the key is retained with a nil value — useful when you need to distinguish "field was absent" from "field had a sentinel value".

# Treat common null sentinels as nil and remove them
data = SmarterCSV.process(file, nil_values_matching: /\A(NULL|N\/A|NA|#N\/A|\(null\))\z/i)

# Nil-ify but retain the key (don't remove)
data = SmarterCSV.process(file,
  nil_values_matching: /\A(NULL|N\/A)\z/i,
  remove_empty_values: false)
# => [{name: "Alice", score: nil}]  ← key retained with nil value

# Remove Excel error values
data = SmarterCSV.process(file, nil_values_matching: /\A(#VALUE!|#REF!|#DIV\/0!|NaN)\z/)

Deprecated: remove_values_matching: still works but emits a deprecation warning. Use nil_values_matching: instead.


remove_empty_values

Default: true

Removes key/value pairs where the value is nil or an empty string after strip_whitespace and nil_values_matching have run. This is why SmarterCSV result hashes only contain keys with actual values — sparse CSV rows don't produce hashes cluttered with nil entries.

# CSV: name,score,notes
#      Alice,42,
#      Bob,,great player

data = SmarterCSV.process(file)
# => [{name: "Alice", score: 42}, {name: "Bob", notes: "great player"}]
#  ↑ empty :notes and :score keys are dropped automatically

data = SmarterCSV.process(file, remove_empty_values: false)
# => [{name: "Alice", score: 42, notes: nil}, {name: nil, score: nil, notes: "great player"}]

remove_zero_values

Default: false

When enabled, removes key/value pairs where the value is numeric zero (0, 0.0, "0", "0.0"). Useful when zero and absent mean the same thing in your domain.

# CSV: product,quantity,discount
#      Widget,10,0
#      Gadget,0,5

data = SmarterCSV.process(file, remove_zero_values: true)
# => [{product: "Widget", quantity: 10}, {product: "Gadget", discount: 5}]
#  ↑ :discount=>0 and :quantity=>0 removed

convert_values_to_numeric

Default: true

Converts string values that look like integers or floats to the appropriate numeric type. This is one of the most common sources of silent data loss if not configured carefully — fields like ZIP codes, phone numbers, and account numbers with leading zeros will be silently corrupted if not excluded.

data = SmarterCSV.process(file)
# "42"     => 42    (Integer)
# "3.14"   => 3.14  (Float)
# "01234"  => 1234  ← leading zero lost! exclude this column

# Exclude specific columns from numeric conversion
data = SmarterCSV.process(file,
  convert_values_to_numeric: { except: [:zip, :phone, :account_number] })
# => [{zip: "01234", phone: "800-555-0100", amount: 99.99}]

# Only convert specific columns (all others stay as strings)
data = SmarterCSV.process(file,
  convert_values_to_numeric: { only: [:quantity, :price] })

Scientific notation (e.g. "1.5e3", "6.022e23") is recognized and converted too. Bare-dot forms like ".5" and "3." are left as Strings (they are not valid numbers here). Integers and floats convert identically on the C-accelerated and pure-Ruby paths.


decimal_precision

Default: :auto

Controls how decimal values (those with a . or an exponent) are converted. Integers are unaffected — they are always returned as Integer.

ValueResult
:autoFloat, unless the value carries more than 16 significant digits — then BigDecimal.
:floatAlways Float (correctly rounded; matches String#to_f).
:bigdecimalAlways BigDecimal (full precision).
# :auto (default) — keeps full precision only when needed
SmarterCSV.process(file)
# "3.14"                 => 3.14                              (Float)
# "1234567890.123456789" => 0.1234567890123456789e10          (BigDecimal — >16 sig digits)

# :float — always Float (faster, may lose precision on long decimals)
SmarterCSV.process(file, decimal_precision: :float)
# "1234567890.123456789" => 1234567890.1234567               (Float)

# :bigdecimal — always BigDecimal
SmarterCSV.process(file, decimal_precision: :bigdecimal)
# "3.14" => 0.314e1 (BigDecimal)

Unlike Ruby's standard-library CSV — whose :numeric/:float converters use Float() and silently lose precision — :auto preserves high-precision decimals as BigDecimal. Decimal values are decoded on the C path with the Eisel-Lemire algorithm (correctly rounded, identical to String#to_f).


remove_empty_hashes

Default: true

After all per-field transformations, removes rows that have no remaining key/value pairs. This handles blank lines and rows where every field was empty or matched nil_values_matching.

# CSV with a blank line between records:
# name,score
# Alice,42
#
# Bob,99

data = SmarterCSV.process(file)
# => [{name: "Alice", score: 42}, {name: "Bob", score: 99}]
#  ↑ blank line silently dropped

data = SmarterCSV.process(file, remove_empty_hashes: false)
# => [{name: "Alice", score: 42}, {}, {name: "Bob", score: 99}]

Custom Transformations — value_converters

For type conversions beyond numeric (dates, booleans, currency, etc.), use value_converters. They run last in the pipeline, after numeric conversion. See Value Converters for full documentation.

data = SmarterCSV.process(file, value_converters: {
  date:   ->(v) { v ? Date.strptime(v, '%m/%d/%Y') : nil },
  active: ->(v) { v&.match?(/\Atrue\z/i) },
})

PREVIOUS: Column Selection | NEXT: Value Converters | UP: README