Audiogen Codec (agc)

February 20, 2024 ยท View on GitHub

We are announcing the open source release of Audiogen Codec (agc) ๐ŸŽ‰. A low compression 48khz stereo neural audio codec for general audio, optimizing for audio fidelity ๐ŸŽต.

It comes in two flavors:

  • agc-continuous ๐Ÿ”„ KL regularized, 32 channels, 100hz.
  • agc-discrete ๐Ÿ”ข 24 stages of residual vector quantization, 50hz.

AGC (Audiogen Codec) is a convolutional autoencoder based on the DAC architecture, which holds SOTA ๐Ÿ†. We found that training with EMA and adding a perceptual loss term with CLAP features improved performance. These codecs, being low compression, outperform Meta's EnCodec and DAC on general audio as validated from internal blind ELO games ๐ŸŽฒ.

We trained (relatively) very low compression codecs in the pursuit of solving a core issue regarding general music and audio generation, low acoustic quality and audible artifacts, which hinder industry use for these models ๐Ÿšซ๐ŸŽถ. Our hope is to encourage researchers to build hierarchical generative audio models that can efficiently use high sequence length representations without sacrificing semantic abilities ๐Ÿง .

This codec will power Audiogen's upcoming models. Stay tuned! ๐Ÿš€

ELO Image

Installation

pip install audiogen-agc

Usage

from agc import AGC

agc = AGC.from_pretrained("Audiogen/agc-continuous") # or "agc-discrete"

audio = torch.randn(1, 2, 480000) # 48khz stereo

z = agc.encode(audio) # (1, 32, 6000) or (1, 24, 3000)

reconstructed_audio = agc.decode(z) # (1, 2, 480000)

Misc

Example colab: https://colab.research.google.com/drive/1MXeBYMY-dZ3Yas-5rXzggMONIlDDQ5VG#scrollTo=9mtfSc-r4dkn (credit: Christoph from LAION)

Examples

https://audiogen.notion.site/Audiogen-Codec-Examples-546fe64596f54e20be61deae1c674f20