
Open-source CVPR 2025 AI model from Sony AI and UIUC that generates frame-synchronized audio from video and text inputs.
Some links may be affiliate links. We may earn a small commission at no extra cost to you. Learn more

Open-source CVPR 2025 AI model from Sony AI and UIUC that generates frame-synchronized audio from video and text inputs.
Category
Audio Editing
MMAudio is fully free and open-source. Code and model weights are available on GitHub at hkchengrex/MMAudio. No-installation online demos are available via Hugging Face and Replicate. No licensing fee is charged; users should review the repository license and training dataset terms before commercial deployment.
| Plan | Details |
|---|---|
| Free | Fully free and open-source. Available on GitHub, Hugging Face, and Replicate. Local installation requires a GPU with 8GB+ VRAM, Python, PyTorch 2.5.1+, and a Linux environment. |
Quick Summary
MMAudio is an open-source AI model developed by researchers at the University of Illinois Urbana-Champaign and Sony AI that generates synchronized audio tracks from video input and optional text prompts, accepted at CVPR 2025. Its core architectural contribution is multimodal joint training — simultaneously training on video-audio and text-audio datasets — combined with a conditional synchronization module that aligns generated audio with video frames at sub-frame precision. It is designed for researchers, video creators, game developers, and technical users who need high-quality AI-generated audio that follows the visual content and timing of a video clip without manual sound design.
Associated Tags
AI video to audio, audio synchronization AI, open-source audio AI, generative audio model, sound generation AI, CVPR 2025 paper, Sony AI research
How professionals leverage MMAudio – Open-Source AI Video-to-Audio Synthesis with Frame-Level Synchronization

Reviewed by Sohail Akhtar
Lead Editor & Founder
What we like
Limitations
Who should use MMAudio?
Alibaba research framework that animates a single portrait image into a lip-synced talking or singing video using an audio-to-video diffusion model.
Free browser-based AI tool that separates vocals and instrumentals from any audio file in seconds, with additional tools for pitch control, BPM detection, stem
Adobe's AI audio tool that removes noise, cleans speech, and edits podcast recordings to studio quality in the browser.
Generate AI music covers in any voice or style from uploaded songs or voice samples and download as MP3.