Getting Started with pvx

May 25, 2026 · View on GitHub

pvx logo

Getting Started with pvx

This guide is for first-time users who want to understand what pvx does, why it exists, and how to get useful results without treating digital signal processing (DSP) as magic. It is practical first, mystical later.

0. Quick Setup (Install + PATH)

python3 -m venv .venv
source .venv/bin/activate
python3 -m pip install -e .
pvx --help

Same setup with uv:

uv venv .venv
source .venv/bin/activate
uv pip install -e .
uv run pvx --help

If pvx is not found, add the project virtualenv to your path environment variable (PATH) (zsh):

printf 'export PATH="%s/.venv/bin:$PATH"\n' "$(pwd)" >> ~/.zshrc
source ~/.zshrc
pvx --help

No-PATH fallback:

.venv/bin/pvx voc input.wav --stretch 1.2 --output output.wav
# or
python3 -m pvx.cli.pvx voc input.wav --stretch 1.2 --output output.wav

No-PATH fallback with uv:

uv run pvx voc input.wav --stretch 1.2 --output output.wav

Man-page generation:

python3 scripts/scripts_install_man_pages.py
MANPATH="$(pwd)/man:$MANPATH" man pvx

0.3 Launch-Ready Helper Commands

Before announcing or demoing, run these:

pvx doctor
pvx quickstart input.wav --output output.wav
pvx safe input.wav --material mix --output output_safe.wav
pvx transforms
pvx smoke --output smoke_out.wav
pvx augment data/*.wav --output-dir aug_out --variants-per-input 4 --intent asr_robust --seed 1337
pvx augment-manifest validate aug_out/augment_manifest.jsonl --strict

What each does:

  • pvx doctor: checks environment, path environment variable (PATH), and optional dependencies.
  • pvx quickstart: prints the minimal launch command sequence.
  • pvx safe: runs pvx voc with conservative quality-first defaults.
  • pvx transforms: explains transform options and runtime availability.
  • pvx smoke: executes a fast synthetic end-to-end sanity render.
  • pvx augment: builds deterministic augmentation datasets and writes manifests for machine-learning pipelines.
  • pvx augment-manifest: validates, merges, and summarizes augmentation manifests.

0.2 Running Any Command with uv

uv can run every command in this guide without changing the DSP arguments.

  • pvx ... -> uv run pvx ...
  • python3 some_script.py ... -> uv run python3 some_script.py ...
  • python3 -m module.path ... -> uv run python3 -m module.path ...

Example:

pvx voc input.wav --stretch 1.25 --output out.wav
uv run pvx voc input.wav --stretch 1.25 --output out.wav

If you are already muttering at your shell, that is normal.

0.1 First 3 Commands to Run

# 1) Stretch without pitch change
pvx voc input.wav --stretch 1.25 --output input_stretch.wav

# 2) Pitch change without duration change
pvx voc input.wav --stretch 1.0 --pitch -3 --output input_pitch.wav

# 3) Same pitch shift, but with formant protection
pvx voc input.wav --stretch 1.0 --pitch -3 --pitch-mode formant-preserving --output input_pitch_formant.wav

What to listen for:

  • input_stretch.wav: longer timing, same note center
  • input_pitch.wav: lower note center, possible timbral darkening
  • input_pitch_formant.wav: lower note center with more stable vowel/timbre identity
  • if all three sound identical, double-check that you did not typo the output path at 2 a.m.

1. What Problem pvx Solves

Audio workflows often need one or more of the following, with controllable artifacts:

  • change duration without changing pitch
  • change pitch without changing duration
  • apply time-varying pitch/time trajectories from a map
  • preserve transients and formants where possible
  • process many files consistently from the command line

pvx gives you explicit control over those operations using phase-vocoder/short-time Fourier transform (STFT) methods plus specialized companion tools (freeze, harmonize, morph, retune, denoise, dereverb).

Design priority:

  • quality first: preserve musical intent and minimize artifacts
  • speed second: optimize runtime after quality targets are achieved
  • translation: we would rather be right than merely fast

Analog Tape Methods for Pitch/Time Shifting

Historically, reel-to-reel varispeed coupled time and pitch. Analog systems like the Eltro information rate changer and related Anton Springer regulator concepts were early attempts to separate them using segmented rotating-head playback methods instead of simple fixed-speed tape transport.

Why this matters for pvx:

  • artifact control is largely a continuity problem (then: segment stitching; now: phase/transient/stereo continuity),
  • high-quality time/pitch work usually benefits from conservative, staged processing.

Sources:

2. Basic DSP Terms (Plain Language)

  • Sample rate: how many audio samples per second (e.g. 48,000 Hz).
  • Frame: a short window of audio processed as one block.
  • STFT: repeated short Fourier transforms over overlapping frames.
  • Bin: one frequency slot in an STFT frame.
  • Window: taper applied to each frame to reduce spectral leakage.
  • Hop size: frame advance between adjacent STFT frames.
  • Phase coherence: consistency of phase evolution across frames/bins.
  • Transient: short attack event (drum hit, consonant burst).
  • Formant: resonant spectral envelope that carries vowel/timbre identity.

2.1 Supported File Types

pvx audio I/O is provided by soundfile/libsndfile, so exact format support depends on your runtime build.

  • Quick summary: wav, flac, aiff, ogg, caf are supported for stream output and are safe defaults.
  • Full current format table: FILE_TYPES.md

3. Stretch vs Pitch Shift

  • Stretch controls duration.
    • stretch = 2.0 means output is twice as long.
    • stretch = 0.5 means output is half as long.
  • Pitch shift controls musical frequency position.
    • +12 semitones = one octave up.
    • -12 semitones = one octave down.

In pvx voc, duration and pitch can be controlled independently:

pvx voc input.wav --stretch 1.25 --pitch 0 --output stretched.wav
pvx voc input.wav --stretch 1.0 --pitch 3 --output pitched.wav

3.1 Denoising with Phase-Consistent STFT Processing

pvx denoise uses short-time Fourier transform (STFT) spectral subtraction while preserving phase continuity in overlap-add resynthesis.

Starter profiles:

# Speech-safe (conservative)
pvx denoise speech.wav --noise-seconds 0.4 --reduction-db 5 --floor 0.2 --smooth 9 --output speech_clean.wav

# Music-safe (preserve more ambience/harmonics)
pvx denoise mix.wav --noise-seconds 0.3 --reduction-db 4 --floor 0.25 --smooth 7 --output mix_clean.wav

If you hear chirpy/watery artifacts:

  • reduce --reduction-db
  • increase --smooth
  • raise --floor slightly
  • use --noise-file roomtone.wav when the beginning of the source is not a clean noise-only segment

3.2 Time-Varying Parameter Control (CSV/JSON)

You can pass a control file directly to many core phase-vocoder numeric flags.

Examples:

pvx voc input.wav --stretch controls/stretch.csv --interp linear --output out.wav
pvx voc input.wav --pitch-shift-ratio controls/pitch.json --interp polynomial --order 3 --output out.wav
pvx voc input.wav --n-fft controls/nfft.csv --hop-size controls/hop.csv --output out.wav

Common flags that accept scalar values or control files (.csv / .json):

  • time/pitch trajectory: --stretch, --time-stretch, --pitch-shift-ratio, --pitch-shift-semitones, --pitch-shift-cents
  • frame/spectral resolution: --n-fft, --win-length, --hop-size, --kaiser-beta
  • phase/transient/stereo shaping: --ambient-phase-mix, --transient-threshold, --transient-sensitivity, --transient-protect-ms, --transient-crossfade-ms, --coherence-strength
  • multistage/Fourier-sync tuning: --extreme-stretch-threshold, --max-stage-stretch, --fourier-sync-min-fft, --fourier-sync-max-fft, --fourier-sync-smooth
  • onset/formant shaping: --onset-credit-pull, --onset-credit-max, --formant-lifter, --formant-strength, --formant-max-gain-db

Control interpolation options:

  • --interp none (stairstep / no interpolation)
  • --interp linear (default)
  • --interp nearest
  • --interp cubic
  • --interp polynomial --order N (any integer N >= 1, default N=3; effective degree is min(N, control_points-1))

Polynomial order examples:

  • --interp polynomial --order 1
  • --interp polynomial --order 2
  • --interp polynomial --order 3
  • --interp polynomial --order 5

Point-style CSV:

time_sec,value
0.0,1.0
0.5,1.2
1.0,1.8

Segment-style CSV:

start_sec,end_sec,value
0.0,0.4,1.0
0.4,0.8,1.2
0.8,1.2,1.6

JSON points:

{
  "interpolation": "linear",
  "points": [
    {"time_sec": 0.0, "value": 1.0},
    {"time_sec": 0.5, "value": 1.2},
    {"time_sec": 1.0, "value": 1.8}
  ]
}

JSON schema quick reference:

KeyRequiredTypeMeaning
interpolation / interpnostringInterpolation override (none, linear, nearest, cubic, exponential, s_curve, smootherstep, polynomial)
ordernointegerPolynomial order for polynomial interpolation
pointsyes (point mode)arrayTime/value points
points[].time_secyesnumberTimestamp in seconds
points[].valueyesnumber/stringParameter value
segmentsyes (segment mode)arrayPiecewise constant control regions
segments[].start_sec, segments[].end_secyesnumberSegment boundaries in seconds
segments[].valueyesnumber/stringSegment value
parametersnoobjectMulti-parameter container keyed by parameter name

Notes:

  • per-parameter dynamic controls cannot be combined with legacy --pitch-map / --pitch-map-stdin in the same command
  • dynamic --time-stretch cannot be combined with --target-duration

Interpolation graph examples:

ModeExample curve
none (stairstep)none interpolation
nearestnearest interpolation
linearlinear interpolation
cubiccubic interpolation
exponentialexponential interpolation
s_curve (smoothstep)s_curve interpolation
smootherstepsmootherstep interpolation
polynomial order 1polynomial order 1
polynomial order 2polynomial order 2
polynomial order 3polynomial order 3
polynomial order 5polynomial order 5

Core processing function charts:

Function familyGraph
Pitch ratio vs semitonespitch ratio vs semitones
Dynamics transfer curvesdynamics transfer curves
Soft clip transfer functionssoftclip transfer functions
Morph blend magnitude curvesmorph blend curves
Mask exponent responsemask exponent response

4. Visual Mental Model of STFT

Time-domain signal
x[n] ------------------------------------------------------------>

Frame + windowing
         [w[n] * x[n:n+N]]
              [w[n] * x[n+H:n+H+N]]
                   [w[n] * x[n+2H:n+2H+N]]

Per-frame transform
         FFT -> X0[k]
         FFT -> X1[k]
         FFT -> X2[k]

Process in spectral domain
         modify magnitude/phase trajectories

Synthesis
         IFFT + overlap-add -> y[n]

Key tradeoff:

  • larger N (FFT) -> better frequency resolution, worse time localization
  • smaller N -> better time localization, rougher low-frequency precision
  • yes, this is the classic engineering compromise you were hoping to avoid

5. Example Audio Scenarios

  • Dialogue cleanup + mild timing correction: pvx voc + pvx denoise + pvx deverb.
  • Ambient texture from short sample: pvx voc --preset ambient --target-duration ....
  • Vocal harmonies: pvx harmonize with interval and pan controls.
  • Timeline-locked effects: pvx conform / pvx warp with CSV maps.

5.1 Use-Case Matrix (Where to start quickly)

Use caseFirst command to try
Speech slowdown for transcriptionpvx voc speech.wav --preset vocal_studio --stretch 1.25 --output speech_slow.wav
Podcast timing cleanuppvx voc voice.wav --stretch 0.95 --output voice_tight.wav
Vocal pitch correctionpvx retune vocal.wav --scale major --root C --output-dir out --suffix _retune
Harmonic backing voicespvx harmonize lead.wav --intervals 0,4,7 --gains 1,0.8,0.65 --output-dir out
Unison widen synthpvx unison synth.wav --voices 7 --detune-cents 16 --width 1.1 --output-dir out
Long ambient from short sourcepvx voc oneshot.wav --preset extreme_ambient --target-duration 600 --output ambient.wav
Drum-safe stretchpvx voc drums.wav --preset drums_safe --stretch 1.3 --output drums_stretch.wav
Stereo image stabilitypvx voc mix.wav --stretch 1.15 --stereo-mode mid_side_lock --coherence-strength 0.9 --output mix_lock.wav
Noise reduction pre-passpvx denoise field.wav --reduction-db 8 --output-dir out --suffix _den
Room tail reductionpvx deverb room.wav --strength 0.5 --output-dir out --suffix _dry
CSV time/pitch choreographypvx conform source.wav --map map_conform.csv --output-dir out
Automated quality comparisonuv run python3 benchmarks/run_bench.py --quick --out-dir benchmarks/out --gate --baseline benchmarks/baseline_small.json

6. Step-by-Step Walkthrough (One Small WAV)

Assume sample.wav is about 2–5 seconds long.

Step A: Baseline stretch

pvx voc sample.wav --stretch 1.2 --output sample_stretch.wav

Expected:

  • 20% longer duration
  • same pitch
  • slight smearing possible on sharp transients

Step B: Add transient protection

pvx voc sample.wav --stretch 1.2 --transient-preserve --phase-locking identity --output sample_stretch_transient.wav

Expected:

  • attacks should feel tighter than Step A
  • fewer “swishy” transients

Step C: Pitch shift with formant preservation

pvx voc sample.wav --stretch 1.0 --pitch -4 --pitch-mode formant-preserving --output sample_pitch_formant.wav

Expected:

  • pitch lowered by 4 semitones
  • vowel/timbre identity more stable than plain shift

Step D: Auto profile planning

pvx voc sample.wav --auto-profile --auto-transform --explain-plan

Expected:

  • JSON plan with resolved profile, transform, and core processing settings
  • no audio output (plan only)

7. What Outputs Should Sound Like

  • sample_stretch.wav: same pitch, longer phrase timing.
  • sample_stretch_transient.wav: similar to above, but cleaner attack edges.
  • sample_pitch_formant.wav: lower note center with less “character collapse.”

If artifacts are strong:

  • lower ratio magnitude
  • increase overlap (--hop-size smaller)
  • try preset vocal for speech/singing
  • test --multires-fusion for mixed content

8. New Quality Modes (Beginner Version)

pvx voc sample.wav \
  --transient-mode hybrid \
  --transient-sensitivity 0.6 \
  --transient-protect-ms 30 \
  --transient-crossfade-ms 10 \
  --stretch 1.25 \
  --output sample_hybrid.wav

What to expect:

  • steadier harmonic regions from phase-vocoder processing
  • cleaner attacks from WSOLA handling around detected onsets
pvx voc stereo_mix.wav \
  --stereo-mode mid_side_lock \
  --coherence-strength 0.9 \
  --stretch 1.2 \
  --output stereo_locked.wav

What to expect:

  • less left/right image wobble
  • more stable center image after heavy stretch

9. Intent Presets

Use presets when you do not want to tune many flags:

  • --preset vocal_studio: formant-aware vocal defaults + transient hybrid handling
  • --preset drums_safe: WSOLA-heavy transient safety for percussive content
  • --preset extreme_ambient: extreme long-form ambient settings
  • --preset stereo_coherent: stereo coupling defaults

Legacy presets remain available: vocal, ambient, extreme.

10. Simpler One-Line Pipelines

If you do not want long Unix pipe chains, use managed helpers:

pvx follow guide.wav target.wav --output target_follow.wav --emit pitch_to_stretch --pitch-conf-min 0.75
pvx chain sample.wav --pipeline "voc --stretch 1.2 | formant --mode preserve" --output sample_chain.wav
pvx stream sample.wav --output sample_stream.wav --chunk-seconds 0.2 --time-stretch 2.0
pvx stream sample.wav --mode wrapper --output sample_stream_wrapper.wav --chunk-seconds 0.2 --time-stretch 2.0
pvx stretch-budget sample.wav --disk-budget 20GB --bit-depth 16 --requested-stretch 1000000
  • pvx follow replaces long sidechain pipes for pitch/control-map-driven workflows.
  • pvx chain runs serial stages with managed intermediate files.
  • pvx stream defaults to a stateful chunk processor for smoother long-form continuity.
  • pvx stream --mode wrapper keeps legacy segmented-wrapper behavior.
  • pvx stretch-budget estimates maximum practical stretch before you commit to a long render.

10.1 Estimate Stretch Budget Before Extreme Jobs

Use this helper before very large ratios (100x, 1000x, 1000000x) so disk limits are explicit.

pvx stretch-budget input.wav --disk-budget 20GB --bit-depth 16
pvx stretch-budget input.wav --disk-budget 20GB --requested-stretch 1000000 --fail-if-exceeds --json

What it uses:

  • input shape (frames/channels/sample rate)
  • output storage assumption (--output-format, --bit-depth / --subtype)
  • budget (--disk-budget or free space at --budget-path)
  • headroom (--safety-margin, default 0.90)

Recommendation:

  • for production, prefer --target-duration over arbitrary huge ratios.
  • if you run extreme jobs, combine --stretch-mode multistage with --auto-segment-seconds, --checkpoint-dir, and --resume.

10.2 Feature Tracking for Sidechain Control

pvx pitch-track now emits feature vectors (not just pitch map fields), including:

  • pitch/voicing (f0_hz, pitch_ratio, confidence, voicing_prob, pitch_stability)
  • loudness/dynamics (rms, rms_db, short_lufs_db, crest_factor_db)
  • spectral features (spectral_centroid_hz, spectral_flatness, spectral_flux, rolloff_hz)
  • formants and cepstra (formant_f1_hz..formant_f3_hz, mfcc_01..mfcc_N)
  • rhythm markers (tempo_bpm, beat_phase, downbeat_phase, onset_strength, transient_mask)
  • MPEG-7-style descriptors (mpeg7_* columns, including audio spectrum envelope bands)

Use with control-bus routes:

pvx pitch-track guide.wav --feature-set all --mfcc-count 13 --output - \
  | pvx voc target.wav --control-stdin \
      --route pitch_ratio=affine(mfcc_01,0.002,1.0) \
      --route pitch_ratio=clip(pitch_ratio,0.5,2.0) \
      --route stretch=affine(spectral_flux,0.03,1.0) \
      --route stretch=clip(stretch,0.85,1.5) \
      --output target_feature_follow.wav

For a larger gallery (single-feature, multi-feature, MFCC/MPEG-7 vector, and multi-guide workflows), see:

You can also print built-in command snippets directly from the command-line interface (CLI):

pvx follow --example
pvx follow --example all
pvx follow --example formant_onset

11. Output Policy Controls

All audio-output tools now share deterministic output policy flags:

  • --bit-depth {inherit,16,24,32f}
  • --dither {none,tpdf} and --dither-seed
  • --true-peak-max-dbtp
  • --metadata-policy {none,sidecar,copy}
  • --subtype for explicit low-level output subtype override

12. Next Steps

13. Extra Beginner Recipes (Quick Wins)

# Speech slowdown
pvx voc speech.wav --preset vocal_studio --stretch 1.30 --output speech_slow.wav

# Drum-safe stretch
pvx voc drums.wav --preset drums_safe --stretch 1.20 --output drums_safe.wav

# Stereo coherence lock
pvx voc mix.wav --stretch 1.2 --stereo-mode mid_side_lock --coherence-strength 0.9 --output mix_lock.wav

# Freeze a transient into a pad
pvx freeze hit.wav --freeze-time 0.22 --duration 12 --output hit_pad.wav

# Morph two sources
pvx morph a.wav b.wav --alpha 0.45 --blend-mode carrier_a_envelope_b --output a_b_morph.wav

# True A->B trajectory morph in one command
pvx morph A.wav B.wav --alpha controls/alpha_curve.csv --interp linear --blend-mode linear --output A_to_B_morph.wav

# Major-scale retune
pvx retune vocal.wav --root C --scale major --strength 0.8 --output vocal_c_major.wav

# Alternate concert pitch retune (A4 = 432 Hz)
pvx retune vocal.wav --root A --scale minor --a4-reference-hz 432 --output vocal_a432.wav

# Explicit root fundamental retune (C4 ~= 261.6256 Hz)
pvx retune vocal.wav --root-hz 261.6256 --scale major --output vocal_c4_root.wav

# Auto-recommend root fundamental from the source
pvx retune vocal.wav --recommend-root --scale minor --output vocal_auto_root.wav

# Denoise then dereverb
pvx denoise noisy.wav --reduction-db 8 --stdout | pvx deverb - --strength 0.3 --output noisy_clean.wav

# Denoise then stretch
pvx denoise noisy.wav --reduction-db 6 --stdout | pvx voc - --stretch 2.0 --output noisy_clean_stretch.wav

# Generate an LFO control map (triangle) and apply it as time-varying stretch
pvx lfo --wave triangle --duration 12 --frequency-hz 0.4 --center 1.0 --amplitude 0.2 --key stretch --output controls/stretch_tri.csv
pvx voc input.wav --stretch controls/stretch_tri.csv --interp linear --output input_tri_stretch.wav

Run them in order, listen after each step, and resist changing ten parameters at once unless chaos is the objective.

Attribution