Data Transformations

June 19, 2026 · View on GitHub

Data Transformations

SmarterCSV automatically normalizes the values in each row. All transformations are configurable — most are enabled by default because they're the right behavior for the vast majority of CSV files.

Transformation Pipeline

Transformations run in this order for every row:

Step	Option	Default	What it does
1	`strip_whitespace`	`true`	Strips leading/trailing whitespace from all values (and headers) at parse time
2	`nil_values_matching`	`nil`	Sets values matching the regexp to `nil`
3	`remove_empty_values`	`true`	Removes keys whose value is `nil` or blank
4	`remove_zero_values`	`false`	Removes keys whose value is numeric zero
5	`convert_values_to_numeric`	`true`	Converts numeric-looking strings to `Integer` or `Float`
6	`value_converters`	`nil`	Applies per-key custom converter lambdas or classes
7	`remove_empty_hashes`	`true`	Drops rows that are entirely empty after all transformations

Steps 2–6 run per field in order. value_converters receive the value after numeric conversion — guard against receiving Integer/Float if your converter expects a string.

`strip_whitespace`

Default: true

Strips leading and trailing whitespace from all header names and all field values at parse time, before any other transformation runs.

# CSV with padded values:
# name,  score
# Alice ,  42
# Bob   ,  0

data = SmarterCSV.process(file)
# => [{name: "Alice", score: 42}, {name: "Bob", score: 0}]
#  ↑ "Alice " stripped to "Alice", "  42" stripped to "42" then converted

data = SmarterCSV.process(file, strip_whitespace: false)
# => [{"name"=>"Alice ", " score"=>"  42"}, ...]
#  ↑ whitespace preserved in both headers and values

Set values matching the given regular expression to nil. Combined with the default remove_empty_values: true, matching values are removed from the result hash. With remove_empty_values: false, the key is retained with a nil value — useful when you need to distinguish "field was absent" from "field had a sentinel value".

# Treat common null sentinels as nil and remove them
data = SmarterCSV.process(file, nil_values_matching: /\A(NULL|N\/A|NA|#N\/A|\(null\))\z/i)

# Nil-ify but retain the key (don't remove)
data = SmarterCSV.process(file,
  nil_values_matching: /\A(NULL|N\/A)\z/i,
  remove_empty_values: false)
# => [{name: "Alice", score: nil}]  ← key retained with nil value

# Remove Excel error values
data = SmarterCSV.process(file, nil_values_matching: /\A(#VALUE!|#REF!|#DIV\/0!|NaN)\z/)

Deprecated: remove_values_matching: still works but emits a deprecation warning. Use nil_values_matching: instead.

`remove_empty_values`

Default: true

Removes key/value pairs where the value is nil or an empty string after strip_whitespace and nil_values_matching have run. This is why SmarterCSV result hashes only contain keys with actual values — sparse CSV rows don't produce hashes cluttered with nil entries.

# CSV: name,score,notes
#      Alice,42,
#      Bob,,great player

data = SmarterCSV.process(file)
# => [{name: "Alice", score: 42}, {name: "Bob", notes: "great player"}]
#  ↑ empty :notes and :score keys are dropped automatically

data = SmarterCSV.process(file, remove_empty_values: false)
# => [{name: "Alice", score: 42, notes: nil}, {name: nil, score: nil, notes: "great player"}]

`remove_zero_values`

Default: false

When enabled, removes key/value pairs where the value is numeric zero (0, 0.0, "0", "0.0"). Useful when zero and absent mean the same thing in your domain.

# CSV: product,quantity,discount
#      Widget,10,0
#      Gadget,0,5

data = SmarterCSV.process(file, remove_zero_values: true)
# => [{product: "Widget", quantity: 10}, {product: "Gadget", discount: 5}]
#  ↑ :discount=>0 and :quantity=>0 removed

`convert_values_to_numeric`

Default: true

Converts string values that look like integers or floats to the appropriate numeric type. This is one of the most common sources of silent data loss if not configured carefully — fields like ZIP codes, phone numbers, and account numbers with leading zeros will be silently corrupted if not excluded.

data = SmarterCSV.process(file)
# "42"     => 42    (Integer)
# "3.14"   => 3.14  (Float)
# "01234"  => 1234  ← leading zero lost! exclude this column

# Exclude specific columns from numeric conversion
data = SmarterCSV.process(file,
  convert_values_to_numeric: { except: [:zip, :phone, :account_number] })
# => [{zip: "01234", phone: "800-555-0100", amount: 99.99}]

# Only convert specific columns (all others stay as strings)
data = SmarterCSV.process(file,
  convert_values_to_numeric: { only: [:quantity, :price] })

Scientific notation (e.g. "1.5e3", "6.022e23") is recognized and converted too. Bare-dot forms like ".5" and "3." are left as Strings (they are not valid numbers here). Integers and floats convert identically on the C-accelerated and pure-Ruby paths.

`decimal_precision`

Default: :auto

Controls how decimal values (those with a . or an exponent) are converted. Integers are unaffected — they are always returned as Integer.

Value	Result
`:auto`	`Float`, unless the value carries more than 16 significant digits — then `BigDecimal`.
`:float`	Always `Float` (correctly rounded; matches `String#to_f`).
`:bigdecimal`	Always `BigDecimal` (full precision).

# :auto (default) — keeps full precision only when needed
SmarterCSV.process(file)
# "3.14"                 => 3.14                              (Float)
# "1234567890.123456789" => 0.1234567890123456789e10          (BigDecimal — >16 sig digits)

# :float — always Float (faster, may lose precision on long decimals)
SmarterCSV.process(file, decimal_precision: :float)
# "1234567890.123456789" => 1234567890.1234567               (Float)

# :bigdecimal — always BigDecimal
SmarterCSV.process(file, decimal_precision: :bigdecimal)
# "3.14" => 0.314e1 (BigDecimal)

Unlike Ruby's standard-library CSV — whose :numeric/:float converters use Float() and silently lose precision — :auto preserves high-precision decimals as BigDecimal. Decimal values are decoded on the C path with the Eisel-Lemire algorithm (correctly rounded, identical to String#to_f).

`remove_empty_hashes`

Default: true

After all per-field transformations, removes rows that have no remaining key/value pairs. This handles blank lines and rows where every field was empty or matched nil_values_matching.

# CSV with a blank line between records:
# name,score
# Alice,42
#
# Bob,99

data = SmarterCSV.process(file)
# => [{name: "Alice", score: 42}, {name: "Bob", score: 99}]
#  ↑ blank line silently dropped

data = SmarterCSV.process(file, remove_empty_hashes: false)
# => [{name: "Alice", score: 42}, {}, {name: "Bob", score: 99}]

Custom Transformations — `value_converters`

For type conversions beyond numeric (dates, booleans, currency, etc.), use value_converters. They run last in the pipeline, after numeric conversion. See Value Converters for full documentation.

data = SmarterCSV.process(file, value_converters: {
  date:   ->(v) { v ? Date.strptime(v, '%m/%d/%Y') : nil },
  active: ->(v) { v&.match?(/\Atrue\z/i) },
})

PREVIOUS: Column Selection | NEXT: Value Converters | UP: README

Data Transformations

Contents

Data Transformations

Transformation Pipeline

`strip_whitespace`

`nil_values_matching`

`remove_empty_values`

`remove_zero_values`

`convert_values_to_numeric`

`decimal_precision`

`remove_empty_hashes`

Custom Transformations — `value_converters`