FusionCore NCLT Benchmarks

May 19, 2026 · View on GitHub

FusionCore vs robot_localization EKF on the NCLT dataset (University of Michigan North Campus Long-Term). Twelve sequences across all seasons, same single config file, no per-sequence tuning.

Results: 10/12 FC wins

Sequence	Season	Duration	GPS Fixes	Max Blackout	FC ATE 3D	RL-EKF ATE 3D	Winner
2012-01-08	Winter	92 min	22,041	203s	18.6 m	41.2 m	FC +55%
2012-02-04	Winter	77 min	18,808	184s	49.7 m	265.5 m	FC +81%
2012-03-31	Spring	87 min	20,482	262s	22.0 m	156.5 m	FC +86%
2012-05-11	Spring	84 min	21,621	120s	9.7 m	11.5 m	FC +16%
2012-06-15	Summer	55 min	12,399	462s	49.2 m	18.2 m	RL +63%
2012-08-20	Summer	83 min	20,025	228s	98.3 m	10.6 m	RL +89%
2012-09-28	Fall	77 min	19,191	196s	10.8 m	55.7 m	FC +81%
2012-10-28	Fall	85 min	21,060	256s	29.9 m	60.0 m	FC +50%
2012-11-04	Fall	79 min	17,840	400s	60.1 m	122.0 m	FC +51%
2012-12-01	Winter	75 min	17,941	173s	21.0 m	90.7 m	FC +77%
2013-02-23	Winter	78 min	19,333	240s	59.4 m	82.2 m	FC +28%
2013-04-05	Spring	68 min	16,297	275s	12.1 m	268.9 m	FC +96%

ATE = absolute trajectory error, SE3-aligned to RTK GPS ground truth. GPS Fixes = mode-3 (3D) fixes only, as published by nclt_player.

RL-UKF: NaN divergence on all sequences (known numerical instability under sim-time playback, confirmed by RL maintainer).

Full metrics

Sequence	Filter	ATE 3D	ATE XY	Within 5m	Within 10m	Drift m/km	RPE@10m
2012-01-08	FusionCore	18.6 m	16.9 m	26.7%	74.5%	2.55	22.5 m
	RL-EKF	41.2 m	41.0 m	22.8%	76.5%	5.64	25.3 m
2012-02-04	FusionCore	49.7 m	31.5 m	6.4%	33.4%	5.96	30.0 m
	RL-EKF	265.5 m	265.4 m	0.0%	0.1%	31.84	44.3 m
2012-03-31	FusionCore	22.0 m	20.2 m	19.9%	67.3%	2.27	21.4 m
	RL-EKF	156.5 m	156.3 m	0.2%	0.6%	16.16	42.7 m
2012-05-11	FusionCore	9.7 m	4.9 m	45.9%	82.6%	1.05	19.0 m
	RL-EKF	11.5 m	9.0 m	56.2%	90.1%	1.25	20.2 m
2012-06-15	FusionCore	49.2 m	48.4 m	2.4%	20.0%	8.40	22.4 m
	RL-EKF	18.2 m	17.1 m	42.8%	78.4%	3.11	22.3 m
2012-08-20	FusionCore	98.3 m	97.9 m	0.1%	13.8%	13.08	53.7 m
	RL-EKF	10.6 m	9.9 m	59.4%	89.3%	1.40	19.1 m
2012-09-28	FusionCore	10.8 m	7.5 m	31.4%	76.9%	1.50	23.7 m
	RL-EKF	55.7 m	55.5 m	1.7%	25.1%	7.73	28.0 m
2012-10-28	FusionCore	29.9 m	21.1 m	19.9%	59.7%	3.69	40.6 m
	RL-EKF	60.0 m	59.6 m	0.1%	3.6%	7.40	27.8 m
2012-11-04	FusionCore	60.1 m	59.2 m	3.8%	29.5%	9.86	32.3 m
	RL-EKF	122.0 m	121.9 m	0.0%	0.0%	20.02	37.0 m
2012-12-01	FusionCore	21.0 m	14.6 m	24.3%	65.4%	2.90	32.9 m
	RL-EKF	90.7 m	90.5 m	5.3%	20.6%	12.53	42.1 m
2013-02-23	FusionCore	59.4 m	58.5 m	1.6%	16.2%	6.67	24.1 m
	RL-EKF	82.2 m	81.8 m	0.0%	0.6%	9.23	35.0 m
2013-04-05	FusionCore	12.1 m	10.1 m	32.8%	81.5%	2.26	30.2 m
	RL-EKF	268.9 m	268.7 m	0.0%	0.0%	50.11	27.3 m

The drift rate column is the clearest signal. RL drift rates of 31.84 m/km (2012-02-04), 50.11 m/km (2013-04-05), and 20.02 m/km (2012-11-04) mean the filter is operating without GPS for large portions of those runs. A Segway at 1.5 m/s accumulating 31 m/km is in pure dead-reckoning almost the entire time.

The cause is always the same: the nclt_player publishes position_covariance var_xy = 9 (3m sigma) because that is the Novatel SPAN-CPT specification in ideal conditions. Measured against the RTK ground truth, actual GPS noise looks like this:

Sequence	Median error	p95 error	p99 error
2012-01-08	3.7 m	20.1 m	49.7 m
2012-02-04	5.6 m	46.6 m	234.9 m
2012-03-31	5.7 m	14.7 m	32.7 m
2012-05-11	3.3 m	13.3 m	47.7 m
2012-06-15	2.6 m	9.7 m	21.3 m
2012-08-20	3.4 m	12.7 m	55.0 m
2012-09-28	3.5 m	12.8 m	43.2 m
2012-10-28	4.6 m	16.0 m	48.9 m
2012-11-04	5.7 m	53.1 m	79.2 m
2012-12-01	4.7 m	20.7 m	80.4 m
2013-02-23	5.4 m	33.0 m	73.6 m
2013-04-05	3.7 m	19.9 m	87.8 m

The driver states 3m sigma. Median actual error is 2.6-5.7m (already at or above the stated 1-sigma on most sequences). p95 ranges from 9.7m to 53.1m. RL's gate is calibrated to the stated 3m; it rejects most fixes on sequences like 2012-02-04 and 2012-11-04. GPS is effectively off for those sequences, and the filter runs in dead-reckoning.

The contrast on 2012-05-11 (RL drift: 1.25 m/km vs 31.84 m/km on 2012-02-04) is the clearest evidence. Same robot, same campus, same config. The only difference is GPS data quality on that day. When GPS covariance is accurate, both filters perform comparably (9.7m vs 11.5m). The advantage opens when the covariance is wrong.

FusionCore's adaptive.gnss: true adjusts GPS measurement noise in real time from the innovation sequence. When actual GPS noise is higher than the driver reports, the adaptive window inflates the noise model and keeps chi2 statistics calibrated. RL has no equivalent.

What would improve RL: A single global change (not per-sequence tuning) would help substantially: increasing position_covariance var_xy in the nclt_player from 9 to 25 (5m sigma, reflecting actual NCLT GPS accuracy in urban conditions). This would bring RL's catastrophic losses (265m, 268m, 156m) down significantly. However, RL has no equivalent to FusionCore's adaptive.gnss: true. The calibration burden would remain whenever the dataset or environment changes.

What drives FC performance variation

The single best predictor of FC ATE is the longest GPS blackout in the sequence:

Max blackout	Sequences	FC ATE range
< 200s	2012-01-08, 2012-12-01	18-21m
200-300s	2012-03-31, 2012-05-11, 2012-09-28, 2012-10-28, 2013-04-05	10-30m
300-480s	2012-02-04, 2012-06-15, 2012-11-04, 2013-02-23	49-60m
Adversarial	2012-08-20 (228s but 105 corrupt fixes at boundary)	98.3m

FC drift rate is consistent at 1-4 m/km on clean sequences. Spikes above 6 m/km (2012-06-15, 2012-08-20, 2012-11-04, 2013-02-23) signal heading error accumulated during coast mode.

The two FC losses

2012-06-15 (FC 49.2m, RL 18.2m)

The lowest-density GPS sequence in the set. 12,399 mode-3 fixes vs 17,000-22,000 on other sequences. One GPS blackout of 462 seconds (7.7 minutes).

During the blackout, FC dead-reckons on encoder and IMU. Coast mode inflates Q_position (coast_q_factor=10) and down-weights IMU WZ (coast_imu_wz_scale=500) so encoder WZ dominates heading. The encoder WZ bias (B_EWZ) is calibrated from GPS heading cross-covariance before the blackout and subtracted during it. However, any residual B_EWZ error compounds over 7.7 minutes. At 100 Hz with even a small uncorrected heading rate error, lateral position error grows quadratically.

RL-EKF wins here because its 2D mode has a simpler state and accumulates less uncertainty over the blackout. This is a structural advantage for RL on GPS-sparse, flat-terrain sequences with very long blackouts.

Path to fixing this:

Reduce coast_imu_wz_scale from 500 to 50-100 for blackouts exceeding 200s. At 500x, the IMU WZ is essentially ignored during coast. Both sensors sharing heading responsibility reduces B_EWZ sensitivity.
Magnetometer integration closes the observability gap completely: an absolute heading reference during GPS absence makes B_GZ and B_EWZ irrelevant. This is the architecturally correct fix and is on the roadmap.
Duration-dependent coast_q_factor: the current fixed 10x multiplier was tuned for the majority of sequences. For blackouts > 300s, a nonlinear ramp (aggressive early, conservative late) may reduce heading drift without sacrificing re-acquisition.

2012-08-20 (FC 98.3m, RL 10.6m)

The raw GPS stream contains 105 mode-3 fixes 720-840m off the RTK ground truth in gps.csv. The ground-truth preprocessor excludes them from gps_rtk.csv but they are valid mode-3 fixes in the real data stream. They cluster in a 24-second window at the end of the second GPS blackout (211s at t=62.5 min).

This is adversarial for any chi2-based gating scheme. During blackout recovery, FC's coast mode relaxes the chi2 gate to accept the first valid returning fix after genuine drift. A cluster of corrupt fixes arriving at exactly the re-acquisition moment exploits this window.

Per-minute error analysis:

Time	FC error	Status
0-42 min	1-10m	Normal GPS coverage
43-46 min	spike to ~100m, recovers in 2-3 min	Blackout 1 (228s): boundary GPS errors up to ~70m
47-62 min	3-10m	Full recovery
63-67 min	spike to ~788m, recovers in 2 min	Blackout 2 (211s): 105 adversarial fixes at boundary
68-82 min	5-10m	Full recovery, remaining 15 minutes on-par with RL

The 98m ATE RMSE is driven entirely by those two transients. RL-EKF wins here because its tight gate (which fails on 10 other sequences) accidentally rejects these outliers too.

Path to fixing this:

Velocity sanity check: A GPS fix 720m from the dead-reckoned position after a 211s blackout implies ~3600 m/s of motion. A hard max_implied_speed check (e.g., 20 m/s) operating before the chi2 gate rejects this trivially and has zero effect on normal operation.
Cluster consistency gate: A single outlier at 720m is handled by chi2. Five consecutive fixes all landing 720-840m from the predicted position with geometric consistency (tight cluster, not random scatter) is a distinguishable pattern. A secondary check on cluster coherence would catch this without affecting single-fix behavior.
Gate hysteresis on recovery: Instead of a step change in chi2 threshold at recovery, a linear ramp from relaxed back to tight over the first N returned fixes makes it harder for a cluster to slip through entirely.

FC performance tier breakdown

Excellent (< 20m ATE): 2012-05-11 (9.7m), 2012-09-28 (10.8m), 2013-04-05 (12.1m), 2012-01-08 (18.6m) Common: high GPS fix count (19k-22k), max blackout under 200s, no adversarial data.

Good (20-35m ATE): 2012-03-31 (22.0m), 2012-12-01 (21.0m), 2012-10-28 (29.9m) Common: moderate GPS density, one or two blackouts under 260s, clean GPS at boundaries.

Moderate (35-65m ATE): 2012-02-04 (49.7m), 2012-06-15 (49.2m), 2013-02-23 (59.4m), 2012-11-04 (60.1m) Common: long blackouts (240-462s) or low GPS density, heading drift accumulates.

Poor (> 65m ATE): 2012-08-20 (98.3m) Specific cause: adversarial GPS cluster at blackout boundary. Structurally different failure mode from all other sequences.

Methodology

Filters compared:

FusionCore: 23-state UKF, full 3D, adaptive noise from innovation sequence, GPS chi2 gating with coast mode, gyro + accel + encoder WZ bias estimation, inertial coast mode, ZUPT.
RL-EKF: robot_localization EKF with two_d_mode: true, GPS via navsat_transform with a fixed RTK datum. Chi2 gating set to equivalent confidence level: odom0_twist_rejection_threshold: 4.03 (chi2(3, 0.999)), odom1_pose_rejection_threshold: 3.72 (chi2(2, 0.999)).

Sensor inputs (identical to both filters):

IMU: Microstrain 3DM-GX3-45, 100 Hz, raw specific force (gravity not removed by driver)
Wheel odometry: Segway RMP encoders, 100 Hz, from odometry_mu_100hz.csv
GPS: Novatel SPAN-CPT, 5 Hz, ~3m CEP in urban Michigan campus, published as NavSatFix with position_covariance var_xy=9

Ground truth: RTK GPS (gps_rtk.csv), projected to local ENU. Only fixes with RTK mode >= 3 are used.

Evaluation: evo evo_ape with --align (SE3 alignment). ATE computed after finding the best rigid-body transform between filter trajectory and RTK ground truth.

Config: Single YAML for all sequences: fusioncore_datasets/config/nclt_fusioncore.yaml. No per-sequence modifications.

Reproduce

Prerequisites

NCLT data: download from http://robots.engin.umich.edu/nclt/ Place each sequence under benchmarks/nclt/<date>/raw files/
ROS 2 Jazzy sourced
evo installed: pip install evo --break-system-packages
FusionCore built: colcon build --packages-select fusioncore_core fusioncore_ros fusioncore_datasets

Run one sequence (full length, auto-stops)

bash benchmarks/run_one.sh 2012-01-08

Takes 15-50 minutes depending on sequence length (running at 3x real time). Results write to benchmarks/nclt/2012-01-08/results_full/.

Run all sequences sequentially

bash benchmarks/run_all.sh

Runs all sequences in chronological order. Plan for 6-8 hours total.

Directory structure

benchmarks/
  README.md               <- this file
  run_one.sh              <- run one sequence end-to-end
  run_all.sh              <- run all sequences sequentially
  nclt/
    2012-01-08/
      raw files/          <- NCLT CSV data (not committed, download separately)
        ms25.csv          <- IMU (100 Hz)
        ms25_euler.csv    <- IMU Euler angles
        odometry_mu_100hz.csv  <- wheel encoder (100 Hz)
        gps.csv           <- GPS fixes (5 Hz, raw including outliers)
        gps_rtk.csv       <- RTK GPS (5 Hz, clean ground truth)
      bag_full/           <- ROS 2 bag from the run (not committed, large)
      fusioncore.tum      <- FC trajectory (not committed, regenerate)
      rl_ekf.tum          <- RL-EKF trajectory (not committed, regenerate)
      ground_truth.tum    <- RTK ground truth (not committed, regenerate)
      results_full/
        BENCHMARK.md      <- metrics table (committed)
        trajectories.png  <- trajectory overlay plot (committed)
        ate_over_time.png <- ATE vs time plot (committed)
        error_distribution.png <- error histogram (committed)
        launch.log        <- full launch output (committed)

.tum files, .mcap bags, and bag_full/ directories are in .gitignore. Only results_full/ contents are committed.