FusionCore NCLT Benchmarks
May 19, 2026 ยท View on GitHub
FusionCore vs robot_localization EKF on the NCLT dataset (University of Michigan North Campus Long-Term). Twelve sequences across all seasons, same single config file, no per-sequence tuning.
Results: 10/12 FC wins
| Sequence | Season | Duration | GPS Fixes | Max Blackout | FC ATE 3D | RL-EKF ATE 3D | Winner |
|---|---|---|---|---|---|---|---|
| 2012-01-08 | Winter | 92 min | 22,041 | 203s | 18.6 m | 41.2 m | FC +55% |
| 2012-02-04 | Winter | 77 min | 18,808 | 184s | 49.7 m | 265.5 m | FC +81% |
| 2012-03-31 | Spring | 87 min | 20,482 | 262s | 22.0 m | 156.5 m | FC +86% |
| 2012-05-11 | Spring | 84 min | 21,621 | 120s | 9.7 m | 11.5 m | FC +16% |
| 2012-06-15 | Summer | 55 min | 12,399 | 462s | 49.2 m | 18.2 m | RL +63% |
| 2012-08-20 | Summer | 83 min | 20,025 | 228s | 98.3 m | 10.6 m | RL +89% |
| 2012-09-28 | Fall | 77 min | 19,191 | 196s | 10.8 m | 55.7 m | FC +81% |
| 2012-10-28 | Fall | 85 min | 21,060 | 256s | 29.9 m | 60.0 m | FC +50% |
| 2012-11-04 | Fall | 79 min | 17,840 | 400s | 60.1 m | 122.0 m | FC +51% |
| 2012-12-01 | Winter | 75 min | 17,941 | 173s | 21.0 m | 90.7 m | FC +77% |
| 2013-02-23 | Winter | 78 min | 19,333 | 240s | 59.4 m | 82.2 m | FC +28% |
| 2013-04-05 | Spring | 68 min | 16,297 | 275s | 12.1 m | 268.9 m | FC +96% |
ATE = absolute trajectory error, SE3-aligned to RTK GPS ground truth. GPS Fixes = mode-3 (3D) fixes only, as published by nclt_player.
RL-UKF: NaN divergence on all sequences (known numerical instability under sim-time playback, confirmed by RL maintainer).
Full metrics
| Sequence | Filter | ATE 3D | ATE XY | Within 5m | Within 10m | Drift m/km | RPE@10m |
|---|---|---|---|---|---|---|---|
| 2012-01-08 | FusionCore | 18.6 m | 16.9 m | 26.7% | 74.5% | 2.55 | 22.5 m |
| RL-EKF | 41.2 m | 41.0 m | 22.8% | 76.5% | 5.64 | 25.3 m | |
| 2012-02-04 | FusionCore | 49.7 m | 31.5 m | 6.4% | 33.4% | 5.96 | 30.0 m |
| RL-EKF | 265.5 m | 265.4 m | 0.0% | 0.1% | 31.84 | 44.3 m | |
| 2012-03-31 | FusionCore | 22.0 m | 20.2 m | 19.9% | 67.3% | 2.27 | 21.4 m |
| RL-EKF | 156.5 m | 156.3 m | 0.2% | 0.6% | 16.16 | 42.7 m | |
| 2012-05-11 | FusionCore | 9.7 m | 4.9 m | 45.9% | 82.6% | 1.05 | 19.0 m |
| RL-EKF | 11.5 m | 9.0 m | 56.2% | 90.1% | 1.25 | 20.2 m | |
| 2012-06-15 | FusionCore | 49.2 m | 48.4 m | 2.4% | 20.0% | 8.40 | 22.4 m |
| RL-EKF | 18.2 m | 17.1 m | 42.8% | 78.4% | 3.11 | 22.3 m | |
| 2012-08-20 | FusionCore | 98.3 m | 97.9 m | 0.1% | 13.8% | 13.08 | 53.7 m |
| RL-EKF | 10.6 m | 9.9 m | 59.4% | 89.3% | 1.40 | 19.1 m | |
| 2012-09-28 | FusionCore | 10.8 m | 7.5 m | 31.4% | 76.9% | 1.50 | 23.7 m |
| RL-EKF | 55.7 m | 55.5 m | 1.7% | 25.1% | 7.73 | 28.0 m | |
| 2012-10-28 | FusionCore | 29.9 m | 21.1 m | 19.9% | 59.7% | 3.69 | 40.6 m |
| RL-EKF | 60.0 m | 59.6 m | 0.1% | 3.6% | 7.40 | 27.8 m | |
| 2012-11-04 | FusionCore | 60.1 m | 59.2 m | 3.8% | 29.5% | 9.86 | 32.3 m |
| RL-EKF | 122.0 m | 121.9 m | 0.0% | 0.0% | 20.02 | 37.0 m | |
| 2012-12-01 | FusionCore | 21.0 m | 14.6 m | 24.3% | 65.4% | 2.90 | 32.9 m |
| RL-EKF | 90.7 m | 90.5 m | 5.3% | 20.6% | 12.53 | 42.1 m | |
| 2013-02-23 | FusionCore | 59.4 m | 58.5 m | 1.6% | 16.2% | 6.67 | 24.1 m |
| RL-EKF | 82.2 m | 81.8 m | 0.0% | 0.6% | 9.23 | 35.0 m | |
| 2013-04-05 | FusionCore | 12.1 m | 10.1 m | 32.8% | 81.5% | 2.26 | 30.2 m |
| RL-EKF | 268.9 m | 268.7 m | 0.0% | 0.0% | 50.11 | 27.3 m |
What drives the results
Why RL-EKF fails on 10 sequences
The drift rate column is the clearest signal. RL drift rates of 31.84 m/km (2012-02-04), 50.11 m/km (2013-04-05), and 20.02 m/km (2012-11-04) mean the filter is operating without GPS for large portions of those runs. A Segway at 1.5 m/s accumulating 31 m/km is in pure dead-reckoning almost the entire time.
The cause is always the same: the nclt_player publishes position_covariance var_xy = 9 (3m sigma) because that is the Novatel SPAN-CPT specification in ideal conditions. Measured against the RTK ground truth, actual GPS noise looks like this:
| Sequence | Median error | p95 error | p99 error |
|---|---|---|---|
| 2012-01-08 | 3.7 m | 20.1 m | 49.7 m |
| 2012-02-04 | 5.6 m | 46.6 m | 234.9 m |
| 2012-03-31 | 5.7 m | 14.7 m | 32.7 m |
| 2012-05-11 | 3.3 m | 13.3 m | 47.7 m |
| 2012-06-15 | 2.6 m | 9.7 m | 21.3 m |
| 2012-08-20 | 3.4 m | 12.7 m | 55.0 m |
| 2012-09-28 | 3.5 m | 12.8 m | 43.2 m |
| 2012-10-28 | 4.6 m | 16.0 m | 48.9 m |
| 2012-11-04 | 5.7 m | 53.1 m | 79.2 m |
| 2012-12-01 | 4.7 m | 20.7 m | 80.4 m |
| 2013-02-23 | 5.4 m | 33.0 m | 73.6 m |
| 2013-04-05 | 3.7 m | 19.9 m | 87.8 m |
The driver states 3m sigma. Median actual error is 2.6-5.7m (already at or above the stated 1-sigma on most sequences). p95 ranges from 9.7m to 53.1m. RL's gate is calibrated to the stated 3m; it rejects most fixes on sequences like 2012-02-04 and 2012-11-04. GPS is effectively off for those sequences, and the filter runs in dead-reckoning.
The contrast on 2012-05-11 (RL drift: 1.25 m/km vs 31.84 m/km on 2012-02-04) is the clearest evidence. Same robot, same campus, same config. The only difference is GPS data quality on that day. When GPS covariance is accurate, both filters perform comparably (9.7m vs 11.5m). The advantage opens when the covariance is wrong.
FusionCore's adaptive.gnss: true adjusts GPS measurement noise in real time from the innovation sequence. When actual GPS noise is higher than the driver reports, the adaptive window inflates the noise model and keeps chi2 statistics calibrated. RL has no equivalent.
What would improve RL: A single global change (not per-sequence tuning) would help substantially: increasing position_covariance var_xy in the nclt_player from 9 to 25 (5m sigma, reflecting actual NCLT GPS accuracy in urban conditions). This would bring RL's catastrophic losses (265m, 268m, 156m) down significantly. However, RL has no equivalent to FusionCore's adaptive.gnss: true. The calibration burden would remain whenever the dataset or environment changes.
What drives FC performance variation
The single best predictor of FC ATE is the longest GPS blackout in the sequence:
| Max blackout | Sequences | FC ATE range |
|---|---|---|
| < 200s | 2012-01-08, 2012-12-01 | 18-21m |
| 200-300s | 2012-03-31, 2012-05-11, 2012-09-28, 2012-10-28, 2013-04-05 | 10-30m |
| 300-480s | 2012-02-04, 2012-06-15, 2012-11-04, 2013-02-23 | 49-60m |
| Adversarial | 2012-08-20 (228s but 105 corrupt fixes at boundary) | 98.3m |
FC drift rate is consistent at 1-4 m/km on clean sequences. Spikes above 6 m/km (2012-06-15, 2012-08-20, 2012-11-04, 2013-02-23) signal heading error accumulated during coast mode.
The two FC losses
2012-06-15 (FC 49.2m, RL 18.2m)
The lowest-density GPS sequence in the set. 12,399 mode-3 fixes vs 17,000-22,000 on other sequences. One GPS blackout of 462 seconds (7.7 minutes).
During the blackout, FC dead-reckons on encoder and IMU. Coast mode inflates Q_position (coast_q_factor=10) and down-weights IMU WZ (coast_imu_wz_scale=500) so encoder WZ dominates heading. The encoder WZ bias (B_EWZ) is calibrated from GPS heading cross-covariance before the blackout and subtracted during it. However, any residual B_EWZ error compounds over 7.7 minutes. At 100 Hz with even a small uncorrected heading rate error, lateral position error grows quadratically.
RL-EKF wins here because its 2D mode has a simpler state and accumulates less uncertainty over the blackout. This is a structural advantage for RL on GPS-sparse, flat-terrain sequences with very long blackouts.
Path to fixing this:
- Reduce
coast_imu_wz_scalefrom 500 to 50-100 for blackouts exceeding 200s. At 500x, the IMU WZ is essentially ignored during coast. Both sensors sharing heading responsibility reduces B_EWZ sensitivity. - Magnetometer integration closes the observability gap completely: an absolute heading reference during GPS absence makes B_GZ and B_EWZ irrelevant. This is the architecturally correct fix and is on the roadmap.
- Duration-dependent
coast_q_factor: the current fixed 10x multiplier was tuned for the majority of sequences. For blackouts > 300s, a nonlinear ramp (aggressive early, conservative late) may reduce heading drift without sacrificing re-acquisition.
2012-08-20 (FC 98.3m, RL 10.6m)
The raw GPS stream contains 105 mode-3 fixes 720-840m off the RTK ground truth in gps.csv. The ground-truth preprocessor excludes them from gps_rtk.csv but they are valid mode-3 fixes in the real data stream. They cluster in a 24-second window at the end of the second GPS blackout (211s at t=62.5 min).
This is adversarial for any chi2-based gating scheme. During blackout recovery, FC's coast mode relaxes the chi2 gate to accept the first valid returning fix after genuine drift. A cluster of corrupt fixes arriving at exactly the re-acquisition moment exploits this window.
Per-minute error analysis:
| Time | FC error | Status |
|---|---|---|
| 0-42 min | 1-10m | Normal GPS coverage |
| 43-46 min | spike to ~100m, recovers in 2-3 min | Blackout 1 (228s): boundary GPS errors up to ~70m |
| 47-62 min | 3-10m | Full recovery |
| 63-67 min | spike to ~788m, recovers in 2 min | Blackout 2 (211s): 105 adversarial fixes at boundary |
| 68-82 min | 5-10m | Full recovery, remaining 15 minutes on-par with RL |
The 98m ATE RMSE is driven entirely by those two transients. RL-EKF wins here because its tight gate (which fails on 10 other sequences) accidentally rejects these outliers too.
Path to fixing this:
- Velocity sanity check: A GPS fix 720m from the dead-reckoned position after a 211s blackout implies ~3600 m/s of motion. A hard
max_implied_speedcheck (e.g., 20 m/s) operating before the chi2 gate rejects this trivially and has zero effect on normal operation. - Cluster consistency gate: A single outlier at 720m is handled by chi2. Five consecutive fixes all landing 720-840m from the predicted position with geometric consistency (tight cluster, not random scatter) is a distinguishable pattern. A secondary check on cluster coherence would catch this without affecting single-fix behavior.
- Gate hysteresis on recovery: Instead of a step change in chi2 threshold at recovery, a linear ramp from relaxed back to tight over the first N returned fixes makes it harder for a cluster to slip through entirely.
FC performance tier breakdown
Excellent (< 20m ATE): 2012-05-11 (9.7m), 2012-09-28 (10.8m), 2013-04-05 (12.1m), 2012-01-08 (18.6m) Common: high GPS fix count (19k-22k), max blackout under 200s, no adversarial data.
Good (20-35m ATE): 2012-03-31 (22.0m), 2012-12-01 (21.0m), 2012-10-28 (29.9m) Common: moderate GPS density, one or two blackouts under 260s, clean GPS at boundaries.
Moderate (35-65m ATE): 2012-02-04 (49.7m), 2012-06-15 (49.2m), 2013-02-23 (59.4m), 2012-11-04 (60.1m) Common: long blackouts (240-462s) or low GPS density, heading drift accumulates.
Poor (> 65m ATE): 2012-08-20 (98.3m) Specific cause: adversarial GPS cluster at blackout boundary. Structurally different failure mode from all other sequences.
Methodology
Filters compared:
- FusionCore: 23-state UKF, full 3D, adaptive noise from innovation sequence, GPS chi2 gating with coast mode, gyro + accel + encoder WZ bias estimation, inertial coast mode, ZUPT.
- RL-EKF: robot_localization EKF with
two_d_mode: true, GPS vianavsat_transformwith a fixed RTK datum. Chi2 gating set to equivalent confidence level:odom0_twist_rejection_threshold: 4.03(chi2(3, 0.999)),odom1_pose_rejection_threshold: 3.72(chi2(2, 0.999)).
Sensor inputs (identical to both filters):
- IMU: Microstrain 3DM-GX3-45, 100 Hz, raw specific force (gravity not removed by driver)
- Wheel odometry: Segway RMP encoders, 100 Hz, from
odometry_mu_100hz.csv - GPS: Novatel SPAN-CPT, 5 Hz, ~3m CEP in urban Michigan campus, published as NavSatFix with
position_covariance var_xy=9
Ground truth: RTK GPS (gps_rtk.csv), projected to local ENU. Only fixes with RTK mode >= 3 are used.
Evaluation: evo evo_ape with --align (SE3 alignment). ATE computed after finding the best rigid-body transform between filter trajectory and RTK ground truth.
Config: Single YAML for all sequences: fusioncore_datasets/config/nclt_fusioncore.yaml. No per-sequence modifications.
Reproduce
Prerequisites
- NCLT data: download from http://robots.engin.umich.edu/nclt/
Place each sequence under
benchmarks/nclt/<date>/raw files/ - ROS 2 Jazzy sourced
evoinstalled:pip install evo --break-system-packages- FusionCore built:
colcon build --packages-select fusioncore_core fusioncore_ros fusioncore_datasets
Run one sequence (full length, auto-stops)
bash benchmarks/run_one.sh 2012-01-08
Takes 15-50 minutes depending on sequence length (running at 3x real time). Results write to benchmarks/nclt/2012-01-08/results_full/.
Run all sequences sequentially
bash benchmarks/run_all.sh
Runs all sequences in chronological order. Plan for 6-8 hours total.
Directory structure
benchmarks/
README.md <- this file
run_one.sh <- run one sequence end-to-end
run_all.sh <- run all sequences sequentially
nclt/
2012-01-08/
raw files/ <- NCLT CSV data (not committed, download separately)
ms25.csv <- IMU (100 Hz)
ms25_euler.csv <- IMU Euler angles
odometry_mu_100hz.csv <- wheel encoder (100 Hz)
gps.csv <- GPS fixes (5 Hz, raw including outliers)
gps_rtk.csv <- RTK GPS (5 Hz, clean ground truth)
bag_full/ <- ROS 2 bag from the run (not committed, large)
fusioncore.tum <- FC trajectory (not committed, regenerate)
rl_ekf.tum <- RL-EKF trajectory (not committed, regenerate)
ground_truth.tum <- RTK ground truth (not committed, regenerate)
results_full/
BENCHMARK.md <- metrics table (committed)
trajectories.png <- trajectory overlay plot (committed)
ate_over_time.png <- ATE vs time plot (committed)
error_distribution.png <- error histogram (committed)
launch.log <- full launch output (committed)
.tum files, .mcap bags, and bag_full/ directories are in .gitignore. Only results_full/ contents are committed.