Acknowledgments
May 25, 2026 · View on GitHub
StemForge is built on the shoulders of many outstanding open-source projects. We are grateful to every team listed below for making their work freely available.
Demucs — Meta (Facebook AI Research)
Hybrid Transformer source separation powering the Separate tab (htdemucs, htdemucs_ft, mdx_extra, mdx_extra_q).
- Repository: https://github.com/facebookresearch/demucs
- Paper: Rouard, Massa & Défossez — Hybrid Transformers for Music Source Separation (ICASSP 2023)
- License: MIT
BS-Roformer / MelBand-Roformer — Community
High-quality separation models used alongside Demucs in the Separate tab.
bs-roformer Python package — Lucidrains
- Repository: https://github.com/lucidrains/BS-RoFormer
ViperX vocal model (SDR 12.97) — TRvlvr / UVR community
- Model repository: https://github.com/TRvlvr/model_repo
ZFTurbo 4-stem BS-Roformer & Music-Source-Separation-Training — Roman Solovyev (ZFTurbo)
- Repository: https://github.com/ZFTurbo/Music-Source-Separation-Training
- Paper: Solovyev et al. — Benchmarks and leaderboards for sound demixing tasks (2023)
jarredou 6-stem BS-Roformer (guitar + piano)
- Model repository: https://huggingface.co/jarredou/BS-ROFO-SW-Fixed
Basic Pitch — Spotify
Polyphonic audio-to-MIDI transcription for instrument stems in the MIDI tab.
- Repository: https://github.com/spotify/basic-pitch
- Paper: Bittner et al. — A Lightweight Instrument-Agnostic Model for Polyphonic Note Transcription and Multipitch Estimation (ICASSP 2022)
- License: Apache 2.0
Whisper — OpenAI
Speech recognition model used (via faster-whisper) for vocal pitch-to-MIDI extraction.
- Repository: https://github.com/openai/whisper
- Paper: Radford et al. — Robust Speech Recognition via Large-Scale Weak Supervision (2022)
- License: MIT
faster-whisper — SYSTRAN
CTranslate2-accelerated Whisper inference powering the Vocal MIDI pipeline.
- Repository: https://github.com/SYSTRAN/faster-whisper
- License: MIT
Qwen3-ASR — Alibaba Cloud (Tongyi Lab)
Multilingual automatic speech recognition model used as the GPU-backed engine in the Lyrics transcription feature on the MIDI tab. Trained specifically for speech and singing voice recognition across 52 languages.
- Model: https://huggingface.co/Qwen/Qwen3-ASR-1.7B
- Paper: Shi et al. — Qwen3-ASR Technical Report (arXiv:2601.21337, 2026)
- License: Apache 2.0 (see
licenses/LICENSE-Qwen3-ASR) - Toolkit: https://github.com/QwenLM/Qwen3-ASR
Stable Audio Open — Stability AI
Text-conditioned audio generation model powering the Synth tab.
- Repository: https://huggingface.co/stabilityai/stable-audio-open-1.0
- Paper: Evans et al. — Stable Audio Open (2024)
- License: Stability AI Community License
ACE-Step — ACE Studio / Timedomain
Full song generation from lyrics and style descriptions, powering the Compose tab.
- Repository: https://github.com/AceStudioAI/ACE-Step
- Paper: ACE-Step: A Step Towards Music Generation Foundation Model (2025)
- License: Apache 2.0
Applio / RVC — IAHispano & RVC-Project
Retrieval-based Voice Conversion inference code (vendored) powering the Voice mode in the Compose tab. StemForge vendors Applio's inference-only subtree for audio-in → audio-out voice transformation.
- Applio repository: https://github.com/IAHispano/Applio
- RVC project: https://github.com/RVC-Project/Retrieval-based-Voice-Conversion-WebUI
- License: MIT
RMVPE — lj1995
Robust pitch estimation model used as the default F0 extraction method for voice conversion.
- Repository: https://github.com/Dream-High/RMVPE
- Paper: Wei et al. — RMVPE: A Robust Model for Vocal Pitch Estimation in Polyphonic Music (2023)
FAISS — Meta (Facebook AI Research)
Similarity search library used for speaker embedding retrieval in the RVC pipeline.
- Repository: https://github.com/facebookresearch/faiss
- License: MIT
ContentVec — auspicious3000
Self-supervised speech representation model used as the speaker embedding extractor in RVC.
- Repository: https://github.com/auspicious3000/contentvec
- Paper: Qian et al. — ContentVec: An Improved Self-Supervised Speech Representation by Disentangling Speakers (ICML 2022)
music21 — MIT / Michael Scott Asato Cuthbert
Music analysis and notation toolkit powering MIDI cleanup, key detection, transposition, and sheet music export in the MIDI tab.
- Repository: https://github.com/cuthbertLab/music21
- Paper: Cuthbert & Ariza — music21: A Toolkit for Computer-Aided Musicology (2010)
- License: BSD 3-Clause
OpenSheetMusicDisplay (OSMD)
Browser-based MusicXML rendering (via VexFlow) for in-app sheet music preview.
- Repository: https://github.com/opensheetmusicdisplay/opensheetmusicdisplay
- License: MIT
LilyPond (optional)
Music engraving program used for PDF sheet music export via subprocess.
- Website: https://lilypond.org
- License: GPL 3.0 (external binary, not bundled)
PyTorch — Meta (Facebook AI Research)
Deep learning framework underlying all inference pipelines.
- Repository: https://github.com/pytorch/pytorch
- License: BSD-3-Clause
Hugging Face Diffusers
Diffusion pipeline framework used to load and run Stable Audio Open.
- Repository: https://github.com/huggingface/diffusers
- License: Apache 2.0
Hugging Face Transformers
Tokenizer and model infrastructure used by the generation pipelines.
- Repository: https://github.com/huggingface/transformers
- License: Apache 2.0
librosa
Audio analysis and feature extraction used in the audio profiler and resampling utilities.
- Repository: https://github.com/librosa/librosa
- Paper: McFee et al. — librosa: Audio and Music Signal Analysis in Python (SciPy 2015)
- License: ISC
FluidSynth
Software synthesizer used for MIDI preview rendering and Mix tab audio.
- Repository: https://github.com/FluidSynth/fluidsynth
- License: LGPL-2.1
wavesurfer.js — katspaugh
Waveform visualization in the browser, used for all audio players and the global transport bar.
- Repository: https://github.com/katspaugh/wavesurfer.js
- License: BSD-3-Clause
FastAPI — Sebastián Ramírez (tiangolo)
Web framework powering the StemForge backend API.
- Repository: https://github.com/fastapi/fastapi
- License: MIT
Uvicorn — Encode
ASGI server running the FastAPI application.
- Repository: https://github.com/encode/uvicorn
- License: BSD-3-Clause
uv — Astral
Blazing-fast Python package manager and resolver used for deterministic environments.
- Repository: https://github.com/astral-sh/uv
- License: MIT / Apache 2.0
Additional dependencies
StemForge also relies on many other excellent open-source libraries including NumPy, SciPy, soundfile, mido, pretty_midi, einops, safetensors, accelerate, pydub, soxr, ai-edge-litert (TFLite runtime), torchcrepe, torchfcpe, noisereduce and stftpitchshift. Thank you to all their maintainers and contributors.
If you believe your project should be listed here and is not, please open an issue and we will add it.