ViSQOL (Python)

May 27, 2026 · View on GitHub

A pure Python implementation of Google's ViSQOL (Virtual Speech Quality Objective Listener) for objective audio/speech quality assessment.

ViSQOL compares a reference audio signal with a degraded version and outputs a MOS-LQO (Mean Opinion Score - Listening Quality Objective) score on a scale of 1.0 – 5.0.

Features

Two modes: Audio mode (music/general audio at 48 kHz) and Speech mode (speech at 16 kHz)
High accuracy: 12/12 conformance tests pass against the official C++ implementation
- Audio mode: 9/10 tests produce identical MOS scores (diff = 0.000000), 1 test diff = 0.000117
- Speech mode (polynomial): diff = 0.001057
- Speech mode (lattice TFLite): diff = 0.002341
Two speech quality mappers matching C++ ViSQOL:
- Lattice (default) — deep-lattice TFLite network (--use_lattice_model=true in C++); requires the optional [lattice] extra
- Polynomial (fallback) — legacy exponential fit (--use_lattice_model=false in C++)
Pure Python: no C/C++ compilation required (the optional [lattice] extra adds the Google ai-edge-litert TFLite runtime as a binary wheel)
Minimal dependencies: 4 core pip packages (numpy, scipy, soundfile, libsvm-official)
Optional Numba acceleration: pip install visqol-python[accel] for JIT-compiled Gammatone filterbank (parallel) and a fused NSIM + DP patch matching kernel
Optional pyFFTW backend: pip install visqol-python[fftw] routes alignment / xcorr FFTs through FFTW3 — ~16× overall speedup, RTF 0.036 (vs C++ estimate 0.093)
Batch & parallel evaluation: measure_batch(parallel=True) for multi-process execution across CPU cores
Fully typed: PEP 561 py.typed, strict mypy, ruff-enforced code style

Installation

pip install visqol-python

For C++-default-equivalent speech mode (deep-lattice TFLite mapper):

pip install visqol-python[lattice]   # requires Python ≥ 3.10

For Numba-accelerated Gammatone filtering and the fused NSIM + DP kernel:

pip install visqol-python[accel]

For FFTW3-backed alignment FFTs via pyFFTW:

pip install visqol-python[fftw]

Install everything (lattice + numba + fftw):

pip install visqol-python[all]

Or install from source:

git clone https://github.com/talker93/visqol-python.git
cd visqol-python
pip install -e ".[dev]"

Note on speech mode parity: Without the [lattice] extra, speech mode falls back to the polynomial mapping (equivalent to running C++ ViSQOL with --use_lattice_model=false). The polynomial can over-predict MOS by 1–2 points on degraded speech vs the C++ default. Install [lattice] whenever you need numbers that line up with the C++ default behaviour (see issue #1).

Quick Start

Python API

from visqol import VisqolApi

# Audio mode (default) - for music and general audio
api = VisqolApi()
api.create(mode="audio")
result = api.measure("reference.wav", "degraded.wav")
print(f"MOS-LQO: {result.moslqo:.4f}")

# Speech mode - for speech signals
api = VisqolApi()
api.create(mode="speech")
result = api.measure("ref_speech.wav", "deg_speech.wav")
print(f"MOS-LQO: {result.moslqo:.4f}")

Using NumPy Arrays

import numpy as np
import soundfile as sf
from visqol import VisqolApi

ref, sr = sf.read("reference.wav")
deg, _  = sf.read("degraded.wav")

api = VisqolApi()
api.create(mode="audio")
result = api.measure_from_arrays(ref, deg, sample_rate=sr)
print(f"MOS-LQO: {result.moslqo:.4f}")

Batch Evaluation

from visqol import VisqolApi

api = VisqolApi()
api.create(mode="audio")

file_pairs = [
    ("ref1.wav", "deg1.wav"),
    ("ref2.wav", "deg2.wav"),
    ("ref3.wav", "deg3.wav"),
]

# Sequential with progress callback
results = api.measure_batch(
    file_pairs,
    progress_callback=lambda done, total: print(f"{done}/{total}"),
)

# Multi-process parallel (uses all CPU cores)
results = api.measure_batch(file_pairs, parallel=True, max_workers=4)

for pair, result in zip(file_pairs, results):
    if isinstance(result, Exception):
        print(f"{pair}: FAILED — {result}")
    else:
        print(f"{pair}: MOS-LQO = {result.moslqo:.4f}")

Command Line

# Audio mode (default)
python -m visqol -r reference.wav -d degraded.wav

# Speech mode
python -m visqol -r reference.wav -d degraded.wav --speech_mode

# Verbose output (per-patch details)
python -m visqol -r reference.wav -d degraded.wav -v

CLI options:

Flag	Description
`-r`, `--reference`	Path to reference WAV file (required)
`-d`, `--degraded`	Path to degraded WAV file (required)
`--speech_mode`	Use speech mode (16 kHz)
`--no_lattice_model`	Speech mode: disable lattice TFLite mapper, use polynomial fallback
`--lattice_model`	Custom path to lattice `.tflite` model (speech mode)
`--unscaled_speech`	Don't scale polynomial speech MOS to 5.0 (polynomial only)
`--model`	Custom SVR model file path (audio mode only)
`--search_window`	Search window radius (default: 60)
`--verbose`, `-v`	Show detailed per-patch results

Output

The measure() method returns a SimilarityResult object with:

Field	Description
`moslqo`	MOS-LQO score (1.0 – 5.0)
`vnsim`	Mean NSIM across all patches
`fvnsim`	Per-frequency-band mean NSIM
`fstdnsim`	Per-frequency-band std of NSIM
`fvdegenergy`	Per-frequency-band degraded energy
`patch_sims`	List of per-patch similarity details

Modes

Audio Mode (default)

Target sample rate: 48 kHz
32 Gammatone frequency bands (50 Hz – 15 000 Hz)
Quality mapping: SVR (Support Vector Regression) model
Best for: music, environmental audio, codecs

Speech Mode

Target sample rate: 16 kHz
21 Gammatone frequency bands (50 Hz – 8 000 Hz)
VAD (Voice Activity Detection) based patch selection
Quality mapping (choose one):
- Deep-lattice TFLite (default) — same mapper as C++ ViSQOL's default --use_lattice_model=true; requires pip install visqol-python[lattice]
- Exponential polynomial (fallback) — same as C++ --use_lattice_model=false; used automatically when the lattice runtime is not installed
Toggle from Python: api.create(mode="speech", use_lattice_model=False)
Toggle from CLI: --no_lattice_model
Best for: speech, VoIP, telephony

Performance

Measured on Apple M-series, Python 3.13, audio mode on the guitar48_stereo 12.5 s conformance case (3-run average):

Configuration	RTF	Typical Time	Speedup vs pure Python
Pure Python + NumPy/SciPy	0.58	~7 s	1.0×
+ `[accel]` (Numba JIT)	0.067	~0.84 s	8.7×
+ `[accel] [fftw]` (Numba + FFTW3)	0.036	~0.45 s	16×

RTF (Real-Time Factor) < 1.0 means faster than real-time. With Numba + pyFFTW the Python implementation runs at 2.6× the C++ estimated speed (C++ RTF ≈ 0.093).

Stage-level breakdown of the v3.6.0 fully-accelerated path:

Stage	Time	%
Gammatone filterbank	0.179 s	40%
DP Patch matching (fused NSIM kernel)	0.131 s	29%
Global alignment (pyFFTW rfft/irfft)	0.091 s	20%
Fine alignment + NSIM	0.043 s	10%
Other (SPL, postproc, SVR, …)	0.003 s	< 1%

Project Structure

visqol-python/
├── visqol/                    # Main package
│   ├── __init__.py            # Package exports & version
│   ├── api.py                 # Public API (VisqolApi)
│   ├── visqol_manager.py      # Pipeline orchestrator
│   ├── visqol_core.py         # Core algorithm
│   ├── audio_utils.py         # Audio I/O & SPL normalization
│   ├── signal_utils.py        # Envelope, cross-correlation
│   ├── analysis_window.py     # Hann window
│   ├── gammatone.py           # ERB + Gammatone filterbank + spectrogram
│   ├── patch_creator.py       # Patch creation (Image + VAD modes)
│   ├── patch_selector.py      # DP-based optimal patch matching
│   ├── alignment.py           # Global alignment via cross-correlation
│   ├── nsim.py                # NSIM similarity metric
│   ├── quality_mapper.py      # SVR & exponential quality mapping
│   ├── numba_accel.py         # Optional Numba JIT kernels (DP, NSIM, Gammatone)
│   ├── __main__.py            # CLI entry point
│   ├── py.typed               # PEP 561 type marker
│   └── model/                 # Bundled SVR model
│       └── libsvm_nu_svr_model.txt
├── tests/                     # Tests & benchmarks (pytest)
│   ├── conftest.py            # Shared fixtures & CLI options
│   ├── test_quick.py          # Smoke tests (no external data needed)
│   ├── test_conformance.py    # Full conformance tests (needs testdata)
│   ├── test_parallel_correctness.py  # Numba parallel correctness tests
│   └── bench_*.py             # Performance benchmarks
├── .github/workflows/
│   ├── ci.yml                 # CI: lint + type-check + matrix test (Python × NumPy)
│   └── publish.yml            # Auto-publish to PyPI on tag push
├── pyproject.toml             # Package metadata & build config
├── CHANGELOG.md
├── CONTRIBUTING.md
├── LICENSE
└── README.md

Conformance Test Results

Tested against the official C++ ViSQOL v3.3.3 expected values:

Test Case	Mode	Expected MOS	Python MOS	Δ
strauss_lp35	Audio	1.3889	1.3889	0.000000
steely_lp7	Audio	2.2502	2.2502	0.000000
sopr_256aac	Audio	4.6823	4.6823	0.000000
ravel_128opus	Audio	4.4651	4.4651	0.000000
moonlight_128aac	Audio	4.6843	4.6843	0.000000
harpsichord_96mp3	Audio	4.2237	4.2237	0.000000
guitar_64aac	Audio	4.3497	4.3497	0.000000
glock_48aac	Audio	4.3325	4.3325	0.000000
contrabassoon_24aac	Audio	2.3469	2.3468	0.000117
castanets_identity	Audio	4.7321	4.7321	0.000000
speech_CA01 (polynomial)	Speech	3.3745	3.3756	0.001057
speech_CA01 (lattice)	Speech	3.3130	3.3153	0.002341

Both speech values come from running the C++ ViSQOL binary directly with the corresponding --use_lattice_model flag, so they represent ground-truth parity targets.

References

Google ViSQOL (C++) — the original implementation this project is ported from
Hines, A., Gillen, E., Kelly, D., Skoglund, J., Kokaram, A., & Harte, N. (2015). ViSQOLAudio: An Objective Audio Quality Metric for Low Bitrate Codecs. The Journal of the Acoustical Society of America.
Chinen, M., Lim, F. S., Skoglund, J., Gureev, N., O'Gorman, F., & Hines, A. (2020). ViSQOL v3: An Open Source Production Ready Objective Speech and Audio Metric. 2020 Twelfth International Conference on Quality of Multimedia Experience (QoMEX).

License

Apache License 2.0. See LICENSE for details.

This project is a Python port of Google's ViSQOL, which is also licensed under Apache 2.0.