FusionCore NCLT Benchmarks

May 19, 2026 ยท View on GitHub

FusionCore vs robot_localization EKF on the NCLT dataset (University of Michigan North Campus Long-Term). Twelve sequences across all seasons, same single config file, no per-sequence tuning.


Results: 10/12 FC wins

SequenceSeasonDurationGPS FixesMax BlackoutFC ATE 3DRL-EKF ATE 3DWinner
2012-01-08Winter92 min22,041203s18.6 m41.2 mFC +55%
2012-02-04Winter77 min18,808184s49.7 m265.5 mFC +81%
2012-03-31Spring87 min20,482262s22.0 m156.5 mFC +86%
2012-05-11Spring84 min21,621120s9.7 m11.5 mFC +16%
2012-06-15Summer55 min12,399462s49.2 m18.2 mRL +63%
2012-08-20Summer83 min20,025228s98.3 m10.6 mRL +89%
2012-09-28Fall77 min19,191196s10.8 m55.7 mFC +81%
2012-10-28Fall85 min21,060256s29.9 m60.0 mFC +50%
2012-11-04Fall79 min17,840400s60.1 m122.0 mFC +51%
2012-12-01Winter75 min17,941173s21.0 m90.7 mFC +77%
2013-02-23Winter78 min19,333240s59.4 m82.2 mFC +28%
2013-04-05Spring68 min16,297275s12.1 m268.9 mFC +96%

ATE = absolute trajectory error, SE3-aligned to RTK GPS ground truth. GPS Fixes = mode-3 (3D) fixes only, as published by nclt_player.

RL-UKF: NaN divergence on all sequences (known numerical instability under sim-time playback, confirmed by RL maintainer).


Full metrics

SequenceFilterATE 3DATE XYWithin 5mWithin 10mDrift m/kmRPE@10m
2012-01-08FusionCore18.6 m16.9 m26.7%74.5%2.5522.5 m
RL-EKF41.2 m41.0 m22.8%76.5%5.6425.3 m
2012-02-04FusionCore49.7 m31.5 m6.4%33.4%5.9630.0 m
RL-EKF265.5 m265.4 m0.0%0.1%31.8444.3 m
2012-03-31FusionCore22.0 m20.2 m19.9%67.3%2.2721.4 m
RL-EKF156.5 m156.3 m0.2%0.6%16.1642.7 m
2012-05-11FusionCore9.7 m4.9 m45.9%82.6%1.0519.0 m
RL-EKF11.5 m9.0 m56.2%90.1%1.2520.2 m
2012-06-15FusionCore49.2 m48.4 m2.4%20.0%8.4022.4 m
RL-EKF18.2 m17.1 m42.8%78.4%3.1122.3 m
2012-08-20FusionCore98.3 m97.9 m0.1%13.8%13.0853.7 m
RL-EKF10.6 m9.9 m59.4%89.3%1.4019.1 m
2012-09-28FusionCore10.8 m7.5 m31.4%76.9%1.5023.7 m
RL-EKF55.7 m55.5 m1.7%25.1%7.7328.0 m
2012-10-28FusionCore29.9 m21.1 m19.9%59.7%3.6940.6 m
RL-EKF60.0 m59.6 m0.1%3.6%7.4027.8 m
2012-11-04FusionCore60.1 m59.2 m3.8%29.5%9.8632.3 m
RL-EKF122.0 m121.9 m0.0%0.0%20.0237.0 m
2012-12-01FusionCore21.0 m14.6 m24.3%65.4%2.9032.9 m
RL-EKF90.7 m90.5 m5.3%20.6%12.5342.1 m
2013-02-23FusionCore59.4 m58.5 m1.6%16.2%6.6724.1 m
RL-EKF82.2 m81.8 m0.0%0.6%9.2335.0 m
2013-04-05FusionCore12.1 m10.1 m32.8%81.5%2.2630.2 m
RL-EKF268.9 m268.7 m0.0%0.0%50.1127.3 m

What drives the results

Why RL-EKF fails on 10 sequences

The drift rate column is the clearest signal. RL drift rates of 31.84 m/km (2012-02-04), 50.11 m/km (2013-04-05), and 20.02 m/km (2012-11-04) mean the filter is operating without GPS for large portions of those runs. A Segway at 1.5 m/s accumulating 31 m/km is in pure dead-reckoning almost the entire time.

The cause is always the same: the nclt_player publishes position_covariance var_xy = 9 (3m sigma) because that is the Novatel SPAN-CPT specification in ideal conditions. Measured against the RTK ground truth, actual GPS noise looks like this:

SequenceMedian errorp95 errorp99 error
2012-01-083.7 m20.1 m49.7 m
2012-02-045.6 m46.6 m234.9 m
2012-03-315.7 m14.7 m32.7 m
2012-05-113.3 m13.3 m47.7 m
2012-06-152.6 m9.7 m21.3 m
2012-08-203.4 m12.7 m55.0 m
2012-09-283.5 m12.8 m43.2 m
2012-10-284.6 m16.0 m48.9 m
2012-11-045.7 m53.1 m79.2 m
2012-12-014.7 m20.7 m80.4 m
2013-02-235.4 m33.0 m73.6 m
2013-04-053.7 m19.9 m87.8 m

The driver states 3m sigma. Median actual error is 2.6-5.7m (already at or above the stated 1-sigma on most sequences). p95 ranges from 9.7m to 53.1m. RL's gate is calibrated to the stated 3m; it rejects most fixes on sequences like 2012-02-04 and 2012-11-04. GPS is effectively off for those sequences, and the filter runs in dead-reckoning.

The contrast on 2012-05-11 (RL drift: 1.25 m/km vs 31.84 m/km on 2012-02-04) is the clearest evidence. Same robot, same campus, same config. The only difference is GPS data quality on that day. When GPS covariance is accurate, both filters perform comparably (9.7m vs 11.5m). The advantage opens when the covariance is wrong.

FusionCore's adaptive.gnss: true adjusts GPS measurement noise in real time from the innovation sequence. When actual GPS noise is higher than the driver reports, the adaptive window inflates the noise model and keeps chi2 statistics calibrated. RL has no equivalent.

What would improve RL: A single global change (not per-sequence tuning) would help substantially: increasing position_covariance var_xy in the nclt_player from 9 to 25 (5m sigma, reflecting actual NCLT GPS accuracy in urban conditions). This would bring RL's catastrophic losses (265m, 268m, 156m) down significantly. However, RL has no equivalent to FusionCore's adaptive.gnss: true. The calibration burden would remain whenever the dataset or environment changes.

What drives FC performance variation

The single best predictor of FC ATE is the longest GPS blackout in the sequence:

Max blackoutSequencesFC ATE range
< 200s2012-01-08, 2012-12-0118-21m
200-300s2012-03-31, 2012-05-11, 2012-09-28, 2012-10-28, 2013-04-0510-30m
300-480s2012-02-04, 2012-06-15, 2012-11-04, 2013-02-2349-60m
Adversarial2012-08-20 (228s but 105 corrupt fixes at boundary)98.3m

FC drift rate is consistent at 1-4 m/km on clean sequences. Spikes above 6 m/km (2012-06-15, 2012-08-20, 2012-11-04, 2013-02-23) signal heading error accumulated during coast mode.


The two FC losses

2012-06-15 (FC 49.2m, RL 18.2m)

The lowest-density GPS sequence in the set. 12,399 mode-3 fixes vs 17,000-22,000 on other sequences. One GPS blackout of 462 seconds (7.7 minutes).

During the blackout, FC dead-reckons on encoder and IMU. Coast mode inflates Q_position (coast_q_factor=10) and down-weights IMU WZ (coast_imu_wz_scale=500) so encoder WZ dominates heading. The encoder WZ bias (B_EWZ) is calibrated from GPS heading cross-covariance before the blackout and subtracted during it. However, any residual B_EWZ error compounds over 7.7 minutes. At 100 Hz with even a small uncorrected heading rate error, lateral position error grows quadratically.

RL-EKF wins here because its 2D mode has a simpler state and accumulates less uncertainty over the blackout. This is a structural advantage for RL on GPS-sparse, flat-terrain sequences with very long blackouts.

Path to fixing this:

  • Reduce coast_imu_wz_scale from 500 to 50-100 for blackouts exceeding 200s. At 500x, the IMU WZ is essentially ignored during coast. Both sensors sharing heading responsibility reduces B_EWZ sensitivity.
  • Magnetometer integration closes the observability gap completely: an absolute heading reference during GPS absence makes B_GZ and B_EWZ irrelevant. This is the architecturally correct fix and is on the roadmap.
  • Duration-dependent coast_q_factor: the current fixed 10x multiplier was tuned for the majority of sequences. For blackouts > 300s, a nonlinear ramp (aggressive early, conservative late) may reduce heading drift without sacrificing re-acquisition.

2012-08-20 (FC 98.3m, RL 10.6m)

The raw GPS stream contains 105 mode-3 fixes 720-840m off the RTK ground truth in gps.csv. The ground-truth preprocessor excludes them from gps_rtk.csv but they are valid mode-3 fixes in the real data stream. They cluster in a 24-second window at the end of the second GPS blackout (211s at t=62.5 min).

This is adversarial for any chi2-based gating scheme. During blackout recovery, FC's coast mode relaxes the chi2 gate to accept the first valid returning fix after genuine drift. A cluster of corrupt fixes arriving at exactly the re-acquisition moment exploits this window.

Per-minute error analysis:

TimeFC errorStatus
0-42 min1-10mNormal GPS coverage
43-46 minspike to ~100m, recovers in 2-3 minBlackout 1 (228s): boundary GPS errors up to ~70m
47-62 min3-10mFull recovery
63-67 minspike to ~788m, recovers in 2 minBlackout 2 (211s): 105 adversarial fixes at boundary
68-82 min5-10mFull recovery, remaining 15 minutes on-par with RL

The 98m ATE RMSE is driven entirely by those two transients. RL-EKF wins here because its tight gate (which fails on 10 other sequences) accidentally rejects these outliers too.

Path to fixing this:

  • Velocity sanity check: A GPS fix 720m from the dead-reckoned position after a 211s blackout implies ~3600 m/s of motion. A hard max_implied_speed check (e.g., 20 m/s) operating before the chi2 gate rejects this trivially and has zero effect on normal operation.
  • Cluster consistency gate: A single outlier at 720m is handled by chi2. Five consecutive fixes all landing 720-840m from the predicted position with geometric consistency (tight cluster, not random scatter) is a distinguishable pattern. A secondary check on cluster coherence would catch this without affecting single-fix behavior.
  • Gate hysteresis on recovery: Instead of a step change in chi2 threshold at recovery, a linear ramp from relaxed back to tight over the first N returned fixes makes it harder for a cluster to slip through entirely.

FC performance tier breakdown

Excellent (< 20m ATE): 2012-05-11 (9.7m), 2012-09-28 (10.8m), 2013-04-05 (12.1m), 2012-01-08 (18.6m) Common: high GPS fix count (19k-22k), max blackout under 200s, no adversarial data.

Good (20-35m ATE): 2012-03-31 (22.0m), 2012-12-01 (21.0m), 2012-10-28 (29.9m) Common: moderate GPS density, one or two blackouts under 260s, clean GPS at boundaries.

Moderate (35-65m ATE): 2012-02-04 (49.7m), 2012-06-15 (49.2m), 2013-02-23 (59.4m), 2012-11-04 (60.1m) Common: long blackouts (240-462s) or low GPS density, heading drift accumulates.

Poor (> 65m ATE): 2012-08-20 (98.3m) Specific cause: adversarial GPS cluster at blackout boundary. Structurally different failure mode from all other sequences.


Methodology

Filters compared:

  • FusionCore: 23-state UKF, full 3D, adaptive noise from innovation sequence, GPS chi2 gating with coast mode, gyro + accel + encoder WZ bias estimation, inertial coast mode, ZUPT.
  • RL-EKF: robot_localization EKF with two_d_mode: true, GPS via navsat_transform with a fixed RTK datum. Chi2 gating set to equivalent confidence level: odom0_twist_rejection_threshold: 4.03 (chi2(3, 0.999)), odom1_pose_rejection_threshold: 3.72 (chi2(2, 0.999)).

Sensor inputs (identical to both filters):

  • IMU: Microstrain 3DM-GX3-45, 100 Hz, raw specific force (gravity not removed by driver)
  • Wheel odometry: Segway RMP encoders, 100 Hz, from odometry_mu_100hz.csv
  • GPS: Novatel SPAN-CPT, 5 Hz, ~3m CEP in urban Michigan campus, published as NavSatFix with position_covariance var_xy=9

Ground truth: RTK GPS (gps_rtk.csv), projected to local ENU. Only fixes with RTK mode >= 3 are used.

Evaluation: evo evo_ape with --align (SE3 alignment). ATE computed after finding the best rigid-body transform between filter trajectory and RTK ground truth.

Config: Single YAML for all sequences: fusioncore_datasets/config/nclt_fusioncore.yaml. No per-sequence modifications.


Reproduce

Prerequisites

  • NCLT data: download from http://robots.engin.umich.edu/nclt/ Place each sequence under benchmarks/nclt/<date>/raw files/
  • ROS 2 Jazzy sourced
  • evo installed: pip install evo --break-system-packages
  • FusionCore built: colcon build --packages-select fusioncore_core fusioncore_ros fusioncore_datasets

Run one sequence (full length, auto-stops)

bash benchmarks/run_one.sh 2012-01-08

Takes 15-50 minutes depending on sequence length (running at 3x real time). Results write to benchmarks/nclt/2012-01-08/results_full/.

Run all sequences sequentially

bash benchmarks/run_all.sh

Runs all sequences in chronological order. Plan for 6-8 hours total.


Directory structure

benchmarks/
  README.md               <- this file
  run_one.sh              <- run one sequence end-to-end
  run_all.sh              <- run all sequences sequentially
  nclt/
    2012-01-08/
      raw files/          <- NCLT CSV data (not committed, download separately)
        ms25.csv          <- IMU (100 Hz)
        ms25_euler.csv    <- IMU Euler angles
        odometry_mu_100hz.csv  <- wheel encoder (100 Hz)
        gps.csv           <- GPS fixes (5 Hz, raw including outliers)
        gps_rtk.csv       <- RTK GPS (5 Hz, clean ground truth)
      bag_full/           <- ROS 2 bag from the run (not committed, large)
      fusioncore.tum      <- FC trajectory (not committed, regenerate)
      rl_ekf.tum          <- RL-EKF trajectory (not committed, regenerate)
      ground_truth.tum    <- RTK ground truth (not committed, regenerate)
      results_full/
        BENCHMARK.md      <- metrics table (committed)
        trajectories.png  <- trajectory overlay plot (committed)
        ate_over_time.png <- ATE vs time plot (committed)
        error_distribution.png <- error histogram (committed)
        launch.log        <- full launch output (committed)

.tum files, .mcap bags, and bag_full/ directories are in .gitignore. Only results_full/ contents are committed.