Changelog
April 23, 2026 · View on GitHub
All notable changes to this project are documented here. The format is based on Keep a Changelog and this project follows Semantic Versioning.
[Unreleased]
Added
- First-class iSCO sampler support via
qqa.discrete_langevin(paper-faithful aliasqqa.isco_anneal). Faithful, GPU-parallel implementation of Algorithm 1 + Appendix C (PAS-MH-Step) of Sun, Goshvadi, Nova, Schuurmans, Dai, Revisiting Sampling for Combinatorial Optimization, ICML 2023 (pmlr-v202-sun23c). Every MH step samples a Poisson-length pathL ~ Poisson(μ)truncated atL ≥ 1, picksLsites without replacement via Gumbel-top-Lwith logits−Δ_j/(2τ), applies the path-auxiliary MH correction over the ordered permutation σ (Eq. 30), and adapts μ toward the paper's 0.574 acceptance target (Eq. 31). Works on single-instance (Q_mat) and batched-instance (Q_tensor) QUBOs; spin / categorical / structured-shape relaxations are rejected at the API boundary with an actionableNotImplementedError. Returns anISCOResultthat mirrorsSAResult/PAResult(best_sol/best_obj/runtime/history/score/polished_sol) plus iSCO-specific diagnostics (accept_rate,mu_final,mean_path_length,t_max_used). Cross-checked against the DISCS reference implementation (samplers/path_auxiliary.py) and the Zhang et al.discrete-langevinreference. See the new iSCO baseline (Sun et al., ICML 2023) section in the README and citationssun2023revisiting+goshvadi2023discs. - Empirical detailed-balance test for iSCO
(
tests/test_isco.py::test_isco_detailed_balance_on_tiny_qubo) enumerates a -state QUBO, runs the full PAS-MH kernel for 4000 inner steps × 200 chains at fixed temperature, and asserts TV(empirical, exact Boltzmann) < 0.02. Ships as a permanent guard against silent MH-correction regressions; offline sweep acrossN ∈ {3, 4, 5} × seed ∈ {0, 7, 42} × μ ∈ {1, 2, 3} × {float32, float64}shows the post-fix sampler converges to TV ≤ 0.0064 in every cell.
Fixed
- iSCO
_plackett_luce_logprobNaN bug (silent detailed-balance violation). The Plackett-Luce log-prob recursion useddiff.clamp(max=-1e-12)to keeplog1p(-exp(diff))finite, but-1e-12round-trips to0.0in float32 (machine ε ≈ 1.19e-7), sending the recursion intolog(0) = -infwheneversigmacontained the repeated indices that_reverse_pathwrites into the masked tail (i.e. every chain withL_per_chain < L_max, which is every short chain in any batch with variable Poisson path length). Subsequent summation via* mask.to(dtype)then produced(-inf) * 0 = NaN, makinglog(u) < log_alphaevaluate toFalseeverywhere and silently rejecting every multi-flip proposal in the affected chain. Empirical TV(empirical, Boltzmann) on a 4-bit enumerable QUBO was ~0.51; after the fix it is ~0.001-0.002. Two surgical changes: (a) dtype-aware clamp (eps_clamp = -1e-6for float32,-1e-12for float64); (b)torch.where(mask, value, 0)instead of* maskso masked positions can never contaminate the sum viainf * 0. Regression teststest_plackett_luce_logprob_handles_repeated_indices_in_float32andtest_isco_detailed_balance_on_tiny_quboensure this stays fixed. Lessons L48-L50 intasks/lessons.md.
[0.6.0] - 2026-04-20
Added
qqa.polish.apply_polish_if_improves: single entry point for the greedy 1-flip QUBO polish post-processing.qqa.anneal,qqa.simulated_annealing,qqa.population_annealingand both PI-GNN trainers now route through this helper so every backend has the same "monotone free improvement" contract without five copies of the sameif polish and Q_mat is not None: …block.- Shared test fixtures at
tests/conftest.py:APP,PAGE_DIRpath constants, amake_problem_config(kind, size, **extra)factory and aset_sliderhelper. Test modules now import these directly, eliminating twelve copies of the sameproblem_configliteral intest_gui_apptest.py. app/_common.retheme_plotly(fig): replaces the_rethemeclone previously defined once per Streamlit page. Import it alongsideplotly_layoutso every chart stays in step with the active theme.app/_common.as_numpy(x)(public alias of the former_as_np): imported by_solution_viz.pyso the two modules share a single tensor-to-numpy conversion path.
Changed
- Benchmark suite refreshed: the project version now tracks the
"qqa4co-bench" HF dataset (coloring / mis-rrg / ea3d /
balanced-partition / MaxCut G-set families), wired through the
qqa.benchpublic API andqqa bench run|plot|list|setupCLI. SpinRelaxation.perturb_now inherits fromBinaryRelaxation— both relaxations share the same latent cube[0, 1]and therefore the same noise +clamp_schedule. Removes a silent copy-paste drift risk.qqa.benchcollapsed_load_bench_discsand_load_plot_benchmarksonto a shared_load_scripts_module(name)helper so the twosys.path/importlibcall sites no longer drift.tests/directory is now on the pytestpythonpathso test modules canfrom conftest import …the shared helpers.
Removed
qqa.sa._qubo_glauber_sweepdeprecated alias dropped — it forwarded to_qubo_seq_glauber_sweepand was only referenced by an in-tree diagnostic script (updated). The buggy parallel-update semantics it warned about have been gone since 0.4.0.
[0.5.3] - 2026-04-20
Added
- Backend-aware Visualize layout: the Streamlit Visualize page now shows PQQA-only tabs for PQQA runs (family tree, PCA embedding, diversity, parallel coordinates) and PA-only tabs for PA runs (ESS, free-energy trajectory, equilibration diagnostic, Thermodynamics, Lineage vs energy, Ancestry Sankey). Empty "No snapshots recorded" placeholders are gone.
- Up-front PA capability probe in the Solve page: problems that
PA cannot sample (categorical / structured binary, e.g. TSP, QAP,
Coloring, NQueens) now trigger a clear warning banner and disable
the Run button, instead of surfacing a cryptic
einsum/NotImplementedErrormid-run. - Three PA-specific visualisation tabs: Thermodynamics (Q vs β, internal energy, specific heat), Lineage vs energy, Ancestry Sankey.
Changed
qqa.simulated_annealing/qqa.population_annealingnow acceptpolish=True/Falseand expose apolished_solfield, matching the contractqqa.annealhas always had. The 1-flip polish is default-on across all backends so the "best_obj" score card reflects the same post-processing everywhere._validate_chain_problem(used by both SA and PA) now rejects structuredBinaryRelaxation(non-flatshape_fn, e.g. TSP) with an actionable error steering users toqqa.anneal.
[0.5.2] - 2026-04-20
Added
qqa.benchpublic Python API (run,plot,list_suites,resolve_suite) mirroring theqqa benchCLI so notebooks can dispatch a benchmark without subprocess boilerplate.- Polished benchmark report figure (
scripts/plot_benchmarks.py) and the correspondingqqa bench plotCLI flow.
Changed
- HF Hub dataset renamed to
qqa4co-bench(wasdiscs-benchmarks);scripts/setup_discs_data.shand all docs follow suit.
[0.5.1] - 2026-04-19
Added
qqa.population_annealing: Population Annealing backend with parallel chain sampling, importance resampling between inverse temperatures, full free-energy / log-Z estimates and an optional genealogy / ancestry record.PAResultdataclass andqqa solve --backend paCLI expose the new path.- MaxCut G-set benchmark family via
scripts/fetch_gset_data.py+scripts/maxcut_gset_g70.py.
[0.5.0] - 2026-04-19
Added
- Streamlit Compare page now offers a PQQA vs SA shootout mode that runs both backends on the same problem instance and reports the per-backend best objective, runtime and a "SA time to PQQA best" speed-up factor side-by-side, including a convergence plot.
Changed
- Internal refactor:
qqa.utilsnow exposesrequire_cuda_if_requested(device)andsafe_score_summary(problem, sol, fallback_obj)helpers. The QQA, SA and PI-GNN/CPRA trainers now route their CUDA-availability check andproblem.score_summaryfallback through these shared helpers, removing duplicated inlinetry/exceptblocks while preserving the exact user-facing error messages and result dictionaries. - Marked the legacy graph-evaluation helpers in
qqa.utils(approximate_mis,mis_stats,max_cut_stats,_gen_combinations) as superseded byproblem.score_summary. They are kept for backward compatibility but are no longer used internally.
Documentation
- Repo-wide audit of the QQA / CPRA paper citations. Three places had
silently swapped the QQA paper (Ichikawa & Arai, ICLR 2025) with the
CPRA paper (Ichikawa & Iwashita, TMLR 2025) — fixed in
src/qqa/__init__.pydocstring,notebooks/cra_pignn_example.ipynbandnotebooks/cpra_pignn_example.ipynb. Adopted the TMLR-published title for CPRA ("Continuous Parallel Relaxation for Finding Diverse Solutions in Combinatorial Optimization Problems"); the older arXiv-preprint title ("Continuous Tensor Relaxation …") is no longer used. - Added a Codecov coverage badge to
README.mdand a placeholder for the Zenodo DOI badge (uncommented and DOI-substituted as soon as the first release is minted). - Fixed
CITATION.cffpreferred-citationblock: title now correctly matches the URL (both point at the QQA ICLR 2025 paper); arXiv:2409.02135 added as an explicit identifier so citation tooling (Zenodo, ORCID, OpenAlex) resolves to the same artefact.
Infrastructure
publish.ymlTrusted Publishing wired up end-to-end on PyPI: GitHub Actions environmentpypiis now connected to the registered Trusted Publisher, so future tagged releases upload automatically without manualtwineinvocations.- Broadened PyPI classifiers in
pyproject.toml(Environment :: Console,Environment :: GPU :: NVIDIA CUDA,Intended Audience :: Education / Developers, OS-specific tags,Topic :: Mathematics / Physics,Typing :: Typed) for better PyPI discoverability.
[0.4.0] - 2026-04-19
Added
qqa.simulated_annealing: GPU-parallel Simulated Annealing baseline with two execution paths:- QUBO fast path (Glauber-like parallel update, single matmul per
sweep) for any problem exposing
Q_mat. - Generic single-spin sequential Metropolis fallback for non-QUBO problems.
- New
SAResultdataclass mirroringAnnealResultfor interchangeable downstream tooling.
- QUBO fast path (Glauber-like parallel update, single matmul per
sweep) for any problem exposing
- CLI:
qqa solve --backend sawith--sa-num-sweeps,--sa-beta-start,--sa-beta-end,--sa-schedule. qqa.utils.enable_tf32helper to opt into TF32 matmul / cuDNN on Ampere+ GPUs.anneal(..., mixed_precision="bf16")opt-in for bfloat16 autocast on the QQA forward pass (CUDA only; falls back to fp32 silently elsewhere).train_cra_pi_gnn/train_cpra_pi_gnn: newearly_stop_disc_patienceargument that terminates training when the best discrete objective stops improving.- CPRA
multi_problembatching: when every replica problem has a same-shapeQ_mat, the trainer stacks them into one tensor and computes all replica costs in a single batchedeinsum, replacing the previous Python-level per-replica loop. docs/explanation/algorithm.md: SA section documenting the parallel-Glauber fast path and when to reach for SA vs QQA / CRA / CPRA.notebooks/benchmark_sa_vs_qqa_vs_pignn.ipynb: head-to-head benchmark notebook comparing all four solver families on a common MIS instance with controlled compute budget.
Changed
HistoryRecordernow buffers per-epoch metrics as GPU scalars and performs a single bulkcpu()transfer inon_train_end, eliminating per-epoch host-device synchronisation. Publicresult.historyshape is unchanged.qqa.annealand the PI-GNN trainers now useoptimizer.zero_grad(set_to_none=True)(PyTorch 2.x best practice).SpinRelaxation.projectno longer allocates twoones_like(x)intermediates per call; uses scalar-broadcasttorch.where.CategoricalRelaxation.penaltyno longer triggers a redundantforward: the relaxation now exposespenalty_from_forwardsoannealreuses the already-normalised tensor.
Performance
- ~15 % wall-clock reduction on CPU for
qqa.anneal-driven workloads (HistoryRecorder +set_to_none+SpinRelaxationtogether). - CPRA
multi_problem$ \text{runs} \text{are} 2–4 \times \text{faster} \text{on} \text{GPU} \text{at} $R = 16thanks to the batchedeinsumpath.
Notes
- No public API removed.
qqa.anneal,qqa.pignn.train_*and theAnnealResultdataclass are unchanged. New keyword arguments (mixed_precision,early_stop_disc_patience) are opt-in and default to the prior behaviour.
[0.3.0] - 2026-04-18
Added
- Spin problem family in
qqa.problems:Ising1D,EdwardsAnderson,SherringtonKirkpatrickBinaryPerceptron(teacher-student),HopfieldMemory- New
SpinRelaxationthat maps[0,1]→±1with differentiable forward.
- Visualization (
qqa.visualization):- Dual backend (
"matplotlib"default,"plotly"optional). plot_best_trajectory,plot_schedule,plot_run_comparison,plot_parallel_coordinates,plot_solution_heatmap.
- Dual backend (
- CLI (
qqaentry point):qqa version,qqa solve,qqa bench,qqa gui. - Streamlit GUI (
qqa gui/uv run streamlit run app/streamlit_app.py): problem definition → live annealing → visualization → comparison. - Example notebooks: MIS, coloring, MaxCut, 3D Edwards–Anderson, SK, binary perceptron, Hopfield memory, parallel benchmark.
- Docs site via MkDocs + Material with auto API reference.
- Tooling: GitHub Actions CI,
pre-commit,CONTRIBUTING.md,CITATION.cff.
Changed
qqa.problemsis now a subpackage (qubo.py,categorical.py,spin.py). Public symbols (MaximumIndependentSet,Coloring, ...) are preserved via re-export, so existing code keeps working.
Deprecated
qqa.legacy.*wrappers still work and emitDeprecationWarning; useqqa.annealinstead.
[0.2.0]
- Initial unified
qqa.annealAPI, package reorganization undersrc/qqa,uv/pyproject.tomlbased install, smoke tests and demo scripts.
[0.1.0]
- Original research release accompanying the ICLR 2025 paper.