FusionSQL

May 31, 2026 · View on GitHub

arXiv visitors Contrib

FusionSQL

Text2SQL evaluation, FusionDataset construction, and shift-aware regression for Text-to-SQL.

Motivation

Citation

@inproceedings{fusionsql,
  author       = {Trinh Pham and Thanh Tam Nguyen and Viet Huynh and Hongzhi Yin and Quoc Viet Hung Nguyen},
  title        = {An Efficient and Effective Evaluator for Text2SQL Models on Unseen and Unlabeled Data},
  booktitle    = {ICDE},
  publisher    = {IEEE},
  year         = {2026},
}

What is it?

FusionSQL provides:

  • A portable evaluator that reports execution accuracy for Spider, Spider 2.0, BIRD, SParC, CoSQL, and WikiSQL.
  • A pipeline to construct a synthetic FusionDataset of databases, SQLs, and paraphrased questions.
  • Shift descriptors (Frechet-like, Mahalanobis, Sliced-Wasserstein) between a target workload and the training set.
  • An MLP regressor that learns to predict execution accuracy for a given base model with minimal MAE.

All metrics and reports here are execution-accuracy by design.

FusionSQL Framework

Project layout

  • fusion_evaluator/
    • data/ dataset loaders and adapters
    • sql/ SQL normalization and parsing (sqlglot)
    • exec/ SQLite execution with caching
    • metrics/ execution
    • evaluator.py orchestrator
    • cli.py evaluation entrypoint
  • figure/ diagrams
  • outputs/ reports and caches

Getting Started

0) Dependencies

  • Python 3.10+
  • SQLite (comes with Python stdlib sqlite3)
  • Recommended: a GPU with CUDA if embedding large datasets

Install Python dependencies:

pip install -r requirements.txt

Torch wheels differ by platform/GPU. If the default install fails or is slow, install a matching build from the official site: PyTorch Install.

1) Datasets and expected layout

  • Spider / Spider 2.0 / BIRD / SParC / CoSQL

    • Gold and predictions are JSON/JSONL with fields: question, query (gold) or prediction (pred), and db_id.
    • Databases are under db_root/DBID/DB.sqlite.
  • WikiSQL

    • Gold/pred are JSONL; tables file is tables.jsonl (id, header, rows).
    • We materialize one SQLite per table into an output directory.

Download links:

Place gold/pred files accordingly and provide --db_root pointing to per-DB folders with DB.sqlite for Spider/Spider2/BIRD/SParC/CoSQL.

FusionDataset

Construct a synthetic, diverse dataset from CSV sources:

python -m fusion_evaluator.fusion_dataset.cli \
	--sources /path/to/csv_sources ... \
	--out_root outputs/fusion_dataset \
	--max_tables 1000

Optional LLM-driven question generation and rewrites (provide both to enable):

python -m fusion_evaluator.fusion_dataset.cli \
  --sources /path/to/csv_sources \
  --out_root outputs/fusion_dataset \
  --prompts fusion_evaluator/fusion_dataset/prompts.yaml \
  --hf_model Qwen/Qwen2.5-72B-Instruct \
  --device cuda --torch_dtype fp16 \
  --q_per_sql 4 \
  --enable_rewrites --rw_per_cat 2

This will:

  • acquire CSVs, filter tables (language, structure, near-dup),
  • synthesize relational DBs (SQLite under outputs/fusion_dataset/databases),
  • generate SQLs and paraphrased questions with distractors (LLM-backed if provided),
  • optionally produce rewritten Q/A pairs for semantic rewriting, numeric condition transforms, and query logic adjustments,
  • write outputs/fusion_dataset/fusion_dataset.jsonl.

FusionSQL

We embed SQLs (or questions) with a Hugging Face model, compute shift descriptors between a training workload and FusionDataset, and fit an MLP to predict execution accuracy.

1) Compute embeddings directly

python -m fusion_evaluator.evaluator_training.cli embed \
	--input outputs/fusion_dataset/fusion_dataset.jsonl \
	--output outputs/fusion_dataset/fusion_emb.npy \
	--model Qwen/Qwen2.5-72B-Instruct \
	--field sql \
	--device cuda \
	--batch_size 8 \
	--max_length 256 \
	--torch_dtype fp16

You can pass any compatible encoder from Hugging Face. Common choices include:

  • Qwen/Qwen2.5-72B-Instruct
  • meta-llama/Llama-3.1-70B-Instruct
  • deepseek-ai/deepseek-coder-33b-instruct
  • XGenerationLab/XiYanSQL-QwenCoder-14B-2502
  • cycloneboy/CscSQL-Grpo-Qwen2.5-Coder-7B-Instruct

2) Train the regressor (from precomputed embeddings)

python -m fusion_evaluator.evaluator_training.cli train \
	--source_embeddings path/to/source.npy \
	--target_embeddings path/to/fusion.npy \
	--observed_metric 0.712 \
	--slices 34 \
	--hybrid_swd --pca_k 10 --rand_r 24 --pca_subsample 8192 \
	--out outputs/regressor.joblib

3) End-to-end training with FusionDataset

python -m fusion_evaluator.evaluator_training.pipeline train \
    --dataset spider \
    --gold path/to/spider_dev_gold.json \
    --pred path/to/spider_dev_preds.jsonl \
    --db_root path/to/spider/database \
    --fusion_jsonl outputs/fusion_dataset/fusion_dataset.jsonl \
    --exec_accuracy 0.712 \
    --model_name Qwen/Qwen2.5-72B-Instruct \
    --hybrid_swd --pca_k 10 --rand_r 24 --pca_subsample 8192 \
    --slices 34 \
    --out outputs/regressor_spider_qwen.joblib

4) Inference with FusionSQL

python -m fusion_evaluator.evaluator_training.pipeline infer \
    --dataset spider \
    --gold path/to/spider_dev_gold.json \
    --pred path/to/spider_dev_preds.jsonl \
    --db_root path/to/spider/database \
    --fusion_jsonl outputs/fusion_dataset/fusion_dataset.jsonl \
    --model_name Qwen/Qwen2.5-72B-Instruct \
    --hybrid_swd --pca_k 10 --rand_r 24 --pca_subsample 8192 \
    --slices 34 \
    --model outputs/regressor_spider_qwen.joblib

The regressor predicts execution accuracy for the target workload and chosen base model.

5) Sampling-based shift + true execution accuracy (small example)

This helper script repeatedly samples target subsets (e.g., 500 examples), computes shift descriptors between the training workload and each subset, then estimates true execution accuracy by generating SQL with a model and executing against the databases. It saves the 100 shift vectors and their accuracies, then fits a 3-layer MLP regressor.

Example (BIRD dev):

python -m fusion_evaluator.evaluator_training.shift_sampling_train \
  --db_root fusion_evaluator/data/bird/dev/dev_databases \
  --source fusion_evaluator/data/spider/sft_spider_train_text2sql.json \
  --target fusion_evaluator/data/bird/sft_bird_dev_text2sql.json \
  --target_limit 500 \
  --num_sets 100 \
  --seed 0 \
  --device cuda --batch_size 8 --torch_dtype fp16

What it does:

  • Builds prompts from question + schema (same format as shift_from_json.py).
  • Uses Qwen/Qwen2.5-3B-Instruct to generate SQL.
  • Computes execution accuracy by running SQL against SQLite databases under --db_root.
  • Samples 100 subsets of size 500 (no replacement per subset).
  • Computes 100 shift vectors and their 100 accuracies.
  • Trains a 3-layer MLP regressor (256, 128, 64) on these vectors.

Outputs:

  • outputs/shift_samples/shift_samples.npz containing:
    • deltas: (num_sets, 5) shift vectors
    • accuracies: (num_sets,) true execution accuracies
    • sample_indices: (num_sets, target_limit) indices into the target set
  • outputs/shift_samples/shift_mlp.joblib trained regressor

Notes:

  • For Spider, set --db_root to fusion_evaluator/data/spider/database (or test_database if needed).
  • If you want to reuse a different generation model, set --model.
  • To embed with a different model than generation, set --embed_model.
Show additional usage (Spider, Spider2, BIRD, SParC, CoSQL, WikiSQL)
# Spider
python -m fusion_evaluator.cli \
  --dataset spider \
  --gold path/to/dev_gold.json \
  --pred path/to/predictions.jsonl \
  --db_root path/to/spider/database \
  --out outputs/spider_report.json

# Spider 2.0
python -m fusion_evaluator.cli \
  --dataset spider2 \
  --gold path/to/spider2_gold.json \
  --pred path/to/spider2_preds.jsonl \
  --db_root path/to/spider2/database \
  --out outputs/spider2_report.json

# BIRD
python -m fusion_evaluator.cli \
  --dataset bird \
  --gold path/to/bird_gold.jsonl \
  --pred path/to/bird_preds.jsonl \
  --db_root path/to/bird/database \
  --out outputs/bird_report.json

# SParC
python -m fusion_evaluator.cli \
  --dataset sparc \
  --gold path/to/sparc_dev.json \
  --pred path/to/preds.jsonl \
  --db_root path/to/spider/database \
  --out outputs/sparc_report.json

# CoSQL
python -m fusion_evaluator.cli \
  --dataset cosql \
  --gold path/to/cosql_dev.json \
  --pred path/to/preds.jsonl \
  --db_root path/to/spider/database \
  --out outputs/cosql_report.json

# WikiSQL
python -m fusion_evaluator.cli \
  --dataset wikisql \
  --gold path/to/wikisql_gold.jsonl \
  --pred path/to/wikisql_preds.jsonl \
  --wikisql_tables path/to/tables.jsonl \
  --wikisql_db_out databases/wikisql \
  --out outputs/wikisql_report.json

Output:

  • JSON report at --out with summary and per-sample metrics.
  • Console table: ExecAcc.

Reported Results

FusionSQL-TL denotes FusionSQL Transfer Learning. FusionSQL-ML denotes FusionSQL Meta-learning.

Table III. MAE (↓) of dataset-level accuracy estimation for source-target transfers

Each cell reports mean ± 95% CI in percentage points. Best is in bold, second-best is underlined.

Transfer Method Qwen2.5-72B Llama-3.1-70B DeepSeek-33B XiYanSQL-14B CSC-SQL-7B Avg.
Spider → BIRD ATC-MC 13.9 ± 1.1 14.6 ± 1.2 15.2 ± 1.2 17.4 ± 1.4 18.3 ± 1.5 15.9 ± 1.3
ATC-NE 15.0 ± 1.2 15.7 ± 1.3 16.5 ± 1.3 18.6 ± 1.5 19.8 ± 1.6 17.1 ± 1.4
DoC (τ=0.8) 15.5 ± 1.3 16.0 ± 1.3 17.3 ± 1.4 19.2 ± 1.6 20.5 ± 1.6 17.7 ± 1.4
DoC (τ=0.9) 16.7 ± 1.4 17.3 ± 1.4 18.6 ± 1.5 20.3 ± 1.7 21.7 ± 1.7 18.9 ± 1.5
PseAutoEval 11.6 ± 0.9 12.2 ± 1.0 13.1 ± 1.0 15.1 ± 1.2 16.3 ± 1.3 13.7 ± 1.1
BugJudge 14.8 ± 1.2 15.4 ± 1.2 16.2 ± 1.3 18.1 ± 1.4 19.0 ± 1.5 16.7 ± 1.3
ArenaCmp 9.7 ± 0.8 10.4 ± 0.9 11.2 ± 0.9 12.6 ± 1.0 13.5 ± 1.1 11.5 ± 0.9
FusionSQL-TL 3.4 ± 1.2 4.0 ± 1.2 4.6 ± 1.3 5.2 ± 1.4 5.6 ± 1.4 4.6 ± 1.3
FusionSQL (Ours) 3.1 ± 0.5 3.7 ± 0.5 4.2 ± 0.6 4.8 ± 0.7 5.1 ± 0.7 4.2 ± 0.6
WikiSQL → Spider ATC-MC 12.2 ± 1.0 13.1 ± 1.1 13.8 ± 1.2 15.2 ± 1.3 16.1 ± 1.4 14.1 ± 1.2
ATC-NE 13.4 ± 1.1 14.0 ± 1.2 15.1 ± 1.3 16.3 ± 1.4 17.5 ± 1.5 15.3 ± 1.3
DoC (τ=0.8) 14.6 ± 1.2 15.3 ± 1.3 16.5 ± 1.4 17.8 ± 1.5 19.0 ± 1.6 16.6 ± 1.4
DoC (τ=0.9) 15.8 ± 1.3 16.4 ± 1.3 17.7 ± 1.4 19.1 ± 1.6 20.3 ± 1.6 17.9 ± 1.4
PseAutoEval 11.1 ± 0.9 11.8 ± 1.0 12.6 ± 1.0 13.7 ± 1.1 14.9 ± 1.2 12.8 ± 1.0
BugJudge 13.6 ± 1.1 14.2 ± 1.1 15.1 ± 1.2 16.5 ± 1.3 17.6 ± 1.4 15.4 ± 1.2
ArenaCmp 9.2 ± 0.8 9.9 ± 0.8 10.7 ± 0.9 12.0 ± 1.0 12.8 ± 1.0 10.9 ± 0.9
FusionSQL-TL 3.6 ± 1.2 4.1 ± 1.2 4.7 ± 1.3 5.1 ± 1.3 5.6 ± 1.4 4.6 ± 1.3
FusionSQL (Ours) 3.2 ± 0.5 3.8 ± 0.5 4.3 ± 0.6 4.7 ± 0.7 5.2 ± 0.6 4.2 ± 0.6
SParC → CoSQL (in-domain) ATC-MC 6.5 ± 0.6 7.2 ± 0.7 7.8 ± 0.8 8.3 ± 0.8 9.0 ± 0.9 7.8 ± 0.8
ATC-NE 7.1 ± 0.6 7.8 ± 0.7 8.4 ± 0.7 9.0 ± 0.8 9.6 ± 0.9 8.4 ± 0.7
DoC (τ=0.8) 7.7 ± 0.6 8.3 ± 0.7 8.8 ± 0.7 9.3 ± 0.8 9.9 ± 0.8 8.8 ± 0.7
DoC (τ=0.9) 8.8 ± 0.7 9.3 ± 0.7 9.8 ± 0.8 10.4 ± 0.9 10.9 ± 0.9 9.8 ± 0.8
PseAutoEval 5.5 ± 0.5 6.1 ± 0.5 6.7 ± 0.6 7.2 ± 0.6 7.8 ± 0.7 6.7 ± 0.6
BugJudge 6.1 ± 0.6 6.7 ± 0.6 7.3 ± 0.7 7.9 ± 0.7 8.4 ± 0.8 7.3 ± 0.7
ArenaCmp 3.9 ± 0.4 4.4 ± 0.4 4.9 ± 0.5 5.4 ± 0.5 5.9 ± 0.5 4.9 ± 0.5
FusionSQL-TL 1.5 ± 1.2 1.7 ± 1.2 2.0 ± 1.3 2.2 ± 1.3 2.4 ± 1.3 2.0 ± 1.3
FusionSQL (Ours) 1.6 ± 0.3 1.8 ± 0.3 2.1 ± 0.3 2.3 ± 0.4 2.5 ± 0.4 2.1 ± 0.3
Spider → SynSQL-2.5M ATC-MC 10.9 ± 0.9 11.7 ± 1.0 12.3 ± 1.0 13.8 ± 1.1 14.7 ± 1.2 12.7 ± 1.0
ATC-NE 12.1 ± 1.0 12.9 ± 1.1 13.5 ± 1.1 14.9 ± 1.2 15.8 ± 1.3 13.8 ± 1.1
DoC (τ=0.8) 12.9 ± 1.0 13.6 ± 1.1 14.7 ± 1.2 16.0 ± 1.3 17.2 ± 1.4 14.9 ± 1.2
DoC (τ=0.9) 14.1 ± 1.1 14.8 ± 1.2 15.9 ± 1.3 17.2 ± 1.4 18.4 ± 1.5 16.1 ± 1.3
PseAutoEval 9.5 ± 0.8 10.1 ± 0.9 10.8 ± 0.9 12.0 ± 1.0 13.1 ± 1.1 11.1 ± 0.9
BugJudge 12.4 ± 1.0 13.2 ± 1.1 14.0 ± 1.1 15.5 ± 1.2 16.6 ± 1.3 14.3 ± 1.1
ArenaCmp 8.4 ± 0.7 9.1 ± 0.8 9.8 ± 0.8 11.1 ± 0.9 11.9 ± 1.0 10.1 ± 0.8
FusionSQL-TL 3.1 ± 1.2 3.5 ± 1.2 4.0 ± 1.3 4.4 ± 1.3 4.9 ± 1.4 4.0 ± 1.3
FusionSQL (Ours) 2.8 ± 0.4 3.2 ± 0.5 3.7 ± 0.5 4.1 ± 0.6 4.5 ± 0.6 3.7 ± 0.5
WikiSQL → Spider 2.0 ATC-MC 18.0 ± 1.5 18.7 ± 1.5 19.6 ± 1.6 21.0 ± 1.7 22.2 ± 1.8 19.9 ± 1.6
ATC-NE 19.4 ± 1.6 20.1 ± 1.7 21.3 ± 1.8 22.6 ± 1.9 23.9 ± 2.0 21.5 ± 1.8
DoC (τ=0.8) 20.5 ± 1.7 21.3 ± 1.8 22.7 ± 1.9 24.0 ± 2.0 25.4 ± 2.1 22.8 ± 1.9
DoC (τ=0.9) 21.7 ± 1.8 22.5 ± 1.9 23.9 ± 2.0 25.2 ± 2.1 26.6 ± 2.2 23.9 ± 2.0
PseAutoEval 16.3 ± 1.3 17.0 ± 1.4 17.7 ± 1.4 18.8 ± 1.5 20.1 ± 1.6 18.0 ± 1.4
BugJudge 17.3 ± 1.4 18.1 ± 1.5 19.3 ± 1.6 20.7 ± 1.7 22.0 ± 1.8 19.5 ± 1.6
ArenaCmp 12.6 ± 1.0 13.4 ± 1.1 14.5 ± 1.2 15.8 ± 1.3 16.9 ± 1.4 14.6 ± 1.2
FusionSQL-TL 4.5 ± 1.3 5.1 ± 1.4 5.6 ± 1.4 6.1 ± 1.5 6.6 ± 1.5 5.6 ± 1.4
FusionSQL (Ours) 4.2 ± 0.6 4.8 ± 0.7 5.3 ± 0.7 5.8 ± 0.8 6.3 ± 0.8 5.3 ± 0.7
Table IV. MAE (↓) for generalizing FusionSQL to unseen Text2SQL models

Columns are the unseen model pool. Each cell reports mean ± 95% CI in percentage points. Best is in bold.

Transfer Method CodeLlama-34B StarCoder2-15B Mistral-7B DeepSeek-Coder-6.7B Phi-3-mini Avg.
Spider → BIRD BugJudge 13.8 ± 1.0 13.5 ± 1.1 14.0 ± 1.0 13.9 ± 0.9 13.6 ± 1.0 13.8 ± 1.0
ArenaCmp 11.1 ± 0.8 10.8 ± 0.9 11.4 ± 0.9 11.2 ± 0.9 10.9 ± 0.8 11.1 ± 0.9
FusionSQL-ML (Ours) 6.7 ± 0.5 6.5 ± 0.6 6.8 ± 0.7 6.7 ± 0.6 6.6 ± 0.5 6.7 ± 0.6
WikiSQL → Spider BugJudge 12.7 ± 1.0 12.4 ± 1.1 12.9 ± 1.0 12.8 ± 0.9 12.5 ± 1.0 12.7 ± 1.0
ArenaCmp 10.4 ± 0.8 10.1 ± 0.9 10.6 ± 1.0 10.4 ± 0.9 10.2 ± 0.8 10.3 ± 0.9
FusionSQL-ML (Ours) 6.0 ± 0.4 5.8 ± 0.5 6.1 ± 0.6 6.0 ± 0.5 5.9 ± 0.4 6.0 ± 0.5
SParC → CoSQL BugJudge 11.5 ± 0.8 11.3 ± 0.9 11.6 ± 1.0 11.5 ± 0.9 11.2 ± 0.8 11.4 ± 0.9
ArenaCmp 9.6 ± 0.7 9.4 ± 0.8 9.7 ± 0.9 9.6 ± 0.8 9.3 ± 0.7 9.5 ± 0.8
FusionSQL-ML (Ours) 5.1 ± 0.4 4.9 ± 0.5 5.1 ± 0.6 5.0 ± 0.5 4.9 ± 0.4 5.0 ± 0.5
Spider → SynSQL-2.5M BugJudge 13.3 ± 1.0 13.0 ± 1.1 13.4 ± 1.0 13.2 ± 0.9 13.1 ± 1.0 13.2 ± 1.0
ArenaCmp 10.9 ± 0.8 10.6 ± 0.9 11.0 ± 1.0 10.9 ± 0.9 10.7 ± 0.8 10.8 ± 0.9
FusionSQL-ML (Ours) 6.5 ± 0.5 6.3 ± 0.6 6.6 ± 0.7 6.5 ± 0.6 6.4 ± 0.5 6.5 ± 0.6
WikiSQL → Spider 2.0 BugJudge 14.6 ± 1.0 14.2 ± 1.1 14.7 ± 1.2 14.5 ± 1.1 14.3 ± 1.0 14.5 ± 1.1
ArenaCmp 12.0 ± 0.9 11.7 ± 1.0 12.1 ± 1.1 12.0 ± 1.0 11.8 ± 0.9 11.9 ± 1.0
FusionSQL-ML (Ours) 7.0 ± 0.5 6.8 ± 0.6 7.1 ± 0.7 7.0 ± 0.6 6.9 ± 0.5 7.0 ± 0.6
Table VI. MAE (↓) on classic Text2SQL models such as ATHENA++

Each cell reports mean ± 95% CI in percentage points. Best is in bold, second-best is underlined.

Dataset Method ATHENA ATHENA++ SQLizer Avg.
Spider BugJudge 12.0 ± 1.0 11.8 ± 0.9 12.1 ± 1.0 12.0 ± 1.0
ArenaCmp 10.8 ± 0.9 10.6 ± 0.8 10.9 ± 0.9 10.8 ± 0.9
FusionSQL-TL 14.3 ± 1.1 14.1 ± 1.1 14.4 ± 1.2 14.3 ± 1.2
FusionSQL-LLM 12.8 ± 1.1 12.6 ± 1.0 12.9 ± 1.1 12.8 ± 1.1
FusionSQL 8.3 ± 0.6 8.2 ± 0.7 8.4 ± 0.8 8.3 ± 0.7
Spider 2.0 BugJudge 12.8 ± 1.1 12.6 ± 1.0 12.9 ± 1.1 12.8 ± 1.1
ArenaCmp 11.6 ± 0.8 11.4 ± 0.9 11.7 ± 1.0 11.6 ± 0.9
FusionSQL-TL 15.1 ± 1.1 14.9 ± 1.2 15.2 ± 1.3 15.1 ± 1.2
FusionSQL-LLM 13.6 ± 1.0 13.4 ± 1.0 13.7 ± 1.2 13.6 ± 1.1
FusionSQL 9.0 ± 0.6 8.9 ± 0.7 9.1 ± 0.8 9.0 ± 0.7
SynSQL-2.5M BugJudge 13.0 ± 1.1 12.8 ± 1.1 13.1 ± 1.1 13.0 ± 1.1
ArenaCmp 11.8 ± 0.8 11.6 ± 0.9 11.9 ± 1.0 11.8 ± 0.9
FusionSQL-TL 15.3 ± 1.2 15.1 ± 1.2 15.4 ± 1.3 15.3 ± 1.3
FusionSQL-LLM 13.7 ± 1.1 13.5 ± 1.1 13.8 ± 1.2 13.7 ± 1.1
FusionSQL 9.1 ± 0.6 9.0 ± 0.7 9.2 ± 0.8 9.1 ± 0.7
CoSQL BugJudge 11.5 ± 0.8 11.3 ± 0.9 11.6 ± 1.0 11.5 ± 0.9
ArenaCmp 10.2 ± 0.8 10.0 ± 0.7 10.3 ± 0.8 10.2 ± 0.8
FusionSQL-TL 13.8 ± 1.1 13.6 ± 1.1 13.9 ± 1.2 13.8 ± 1.2
FusionSQL-LLM 12.3 ± 1.1 12.1 ± 1.0 12.4 ± 1.1 12.3 ± 1.1
FusionSQL 7.9 ± 0.5 7.8 ± 0.6 8.0 ± 0.7 7.9 ± 0.6
BIRD BugJudge 13.2 ± 1.1 13.0 ± 1.0 13.3 ± 1.1 13.2 ± 1.1
ArenaCmp 12.0 ± 0.9 11.8 ± 0.8 12.1 ± 0.9 12.0 ± 0.9
FusionSQL-TL 15.5 ± 1.3 15.3 ± 1.2 15.6 ± 1.3 15.5 ± 1.3
FusionSQL-LLM 13.9 ± 1.2 13.7 ± 1.1 14.0 ± 1.2 13.9 ± 1.2
FusionSQL 9.2 ± 0.6 9.1 ± 0.7 9.3 ± 0.8 9.2 ± 0.7

If you run into issues or need helper scripts for dataset downloads/materialization, open an issue or reach out.


Backup Statistics

Visitors