Meta open source new audio compression technology EnCodec

Meta announced a new open-source audio compression technology EnCodec in a blog, claiming that the compressed file size is 10 times smaller than the MP3 format. Meta's Fundamental Artificial Intelligence Research (FAIR) team has achieved success in the field of AI-driven audio hyper compression, building a three-part system and training it end-to-end to compress audio data to target size, which can then be decoded using a neural network.

Compared to MP3 at 64 kbps, Meta's new technology, EnCodec, achieves a compression ratio of approximately 10 times without loss of quality.

The three parts of EnCodec include:

  • Encoder: Takes uncompressed data and converts it to a higher dimensional and lower frame rate representation.

  • Quantizer: Compress this representation to the target size, and reconstruct the original signal by training the quantizer to give the desired size (or set of sizes) while preserving the most important information. This compressed representation is stored on a disk or sent over a network and is the equivalent of a .mp3 file on a computer.

  • Decoder: Converts the compressed signal back to a waveform that is as similar as possible to the original. The key to lossy compression is to identify changes that are imperceptible to humans since perfect reconstruction is impossible at low bit rates. To this end, EnCodec uses the discriminator to improve the perceptual quality of the generated samples, creating a cat-and-mouse-like game where the job of the discriminator is to distinguish between real and reconstructed samples. Compressed models attempt to fool the discriminator by pushing the reconstructed samples to be more perceptually similar to the original samples to generate samples.

Meta said that the technology does not yet cover video, but it is currently planned, with the goal of improving the audio experience of video conferencing, streaming movies, and playing games with friends in VR.

