by peab on 2/9/25, 8:34 PM with 55 comments
by ipsum2 on 2/9/25, 10:02 PM
> Vocal Synthesis: This allows one to generate new audio that sounds like someone singing. One can write lyrics, as well as melody, and have the AI generate an audio that can match it. You could even specify how you want the voice to sound like. Google has also presented models capable of vocal synthesis, such as googlesingsong.
Google's singsong paper does the exact opposite. Given human vocals, it produces an musical accompaniment.
by chaosprint on 2/9/25, 10:48 PM
In 2019, I built this thing called RaveForce [github.com/chaosprint/RaveForce]. It was a fun project.
Back then, GANsynth was a big deal, looked amazing. But the sound quality… felt a bit lossy, you know? And MIDI generation, well, didn't really feel like "music generation" to me.
Now, I'm thinking about these things differently. Maybe the sound quality thing is like MP3 at first, then it becomes "good enough" – like a "retina moment" for audio? Diffusion models seem to be pushing this idea too. And MIDI, if used the right way, could be a really powerful tool.
Vocals synthesis and conversion are super cool. Feels like plugins, but next level. Really useful.
But what I really want to see is AI understanding music from the ground up. Like, a robot learning how synth parameters work. Then we can do 8bit music like the DRL breakthrough. Not just training on tons of copyrighted music, making variations, and selling it, which is very cheap.
by pier25 on 2/9/25, 9:02 PM
IMO this would be much more useful.
by TheAceOfHearts on 2/9/25, 9:04 PM
[0] https://suno.com/song/0caf26e0-073e-4480-91c4-71ae79ec0497
by vunderba on 2/9/25, 9:25 PM
> Stem Splitting: This allows one to take an existing song, and split the audio into distinct tracks, such as vocals, guitar, drums and bass. Demucs by Meta is an AI model for stem splitting.
+1 for Demucs (free and open source).
Our band went back and used Demucs-GUI on a bunch of our really old pre-DAW stuff - all we had was the final WAVs and it did a really good job splitting out drums, piano, bass, vocals, etc. with the htdemucs_6s model. There was some slight bleed between some of the stems but other than that it was seamless.
by xvector on 2/9/25, 9:25 PM
If this happens, main character syndrome may get a bit worse :)
by echelon on 2/9/25, 8:52 PM
AI models are tools, and engineers and artists should use them to do more per unit time.
Text prompted final results are lame and boring, but complex workflows orchestrated by domain practitioners are incredible.
We're entering an era where small teams will have big reach. Small studio movies will rival Pixar, electronic musicians will be able to conquer any genre, and indie game studios will take on AAA game releases.
The problem will be discovery. There will be a long tail of content that caters to diverse audiences, but not everyone will make it.
by intalentive on 2/9/25, 11:08 PM
by r33b33 on 2/10/25, 12:10 PM