from Hacker News

Audapolis: Edit audio files by transcript, not waveform

by mavsman on 7/22/24, 4:25 PM with 83 comments

by vunderba on 7/22/24, 6:13 PM
I remember when Adobe demoed this idea of being able to edit waveforms by the recognized text back in 2016 and it was pretty mind blowing for the time.
https://youtu.be/I3l4XLZ59iw
EDIT: I could also definitely see Audapolis being useful if you could integrate it into a podcast's post processing flow (volume normalization, de-essing) by recognizing certain verbal tics and automatically removing them from the audio such as "ummmm...", etc.
by bluelightning2k on 7/22/24, 7:40 PM
A genuinely free alternative to Descript sounds very useful.
I've always liked the idea of Descript and was considering building something similar before it came out. The problem is my use case is a couple of videos a year so doesn't fit with an expensive monthly subscription
by hammeiam on 7/22/24, 6:00 PM
I've spent some of my free time over the past couple of months working on something similar. It's in a decent state but I need help from somebody who understands the .fcpxml format so you can export your edits to Davinci and FCP.
Take a look at https://matcha.video
by petarb on 7/22/24, 5:11 PM
This is awesome to see as an open source project.
This functionality is some of my favorite when editing videos in Descript. It’s so much easier than chopping up waveforms in Audacity
by corn13read2 on 7/23/24, 6:32 AM
This is pretty dated and doesn't support whisper which is the de-facto speech recognition model currently
by raymond_goo on 7/22/24, 9:21 PM
Demo Video: https://pajowu.de/audapolis_intro.mp4
by Machado117 on 7/23/24, 9:56 AM
The other day I was using the voice memos app on iOS 18 and was surprised to find that it also supports editing the recording by transcript
by alsetmusic on 7/22/24, 5:46 PM
One of the hosts of a podcast that I listen to has had positive things to say about DeScript.[0] Just mentioning it because he's been talking about it for a few years so I expect its had a good amount of feature development over time.
[0] descript.com/
by pryelluw on 7/22/24, 6:44 PM
If the maintainer is reading, having a demo video would be nice.
by leetrout on 7/22/24, 6:27 PM
Hindenburg also added this capability.
> Hindenburg’s manuscript feature gives you a complete overview of your audio. You can select the text just as you would in a text document and watch as your edits are made in real-time. If you need to export your text in a specific format, no problem. Hindenburg supports the most common text and transcription export formats.
https://hindenburg.com/
by emadda on 7/22/24, 6:05 PM
Nice, are there plans to notarize the mac app?
I built something similar here: https://bigwav.app
by geekodour on 7/22/24, 5:49 PM
this looks great! will try out. I built a similar but very scrappy tool for the same usecase last year, I'd probably not build it if i found this.
[0] https://github.com/geekodour/wscribe-editor
by jdprgm on 7/22/24, 8:51 PM
This really needs a video demo or at least a more in depth text description of the features. Will download later to try but curious does this just do simple hard cuts on audio text or is there any ai magic for blending sentence timing if that makes sense?
A number of comments turned me onto Descript -- made a similar comment on another audio thread recently: drives me absolutely insane how all audio tools with any AI are web based monthly saas instead of offline private gpu upfront purchase.
by generalizations on 7/23/24, 1:35 AM
Combine this with the tech to generate new audio matching the speaker's voice profile, and you've really got something cool.
by jiehong on 7/22/24, 5:13 PM
That’s awesome!
Is 1 emoji for each commit title a new trend?
by j45 on 7/23/24, 2:18 PM
This is exciting to see - it seems the last release of was a year ago.
Can anyone clarify if this project is active?
by StarterPro on 7/23/24, 3:43 PM
Call me a jerk, but anyone who is editing audio seriously, probably wants the waveform, no?
by frakkingcylons on 7/22/24, 6:55 PM
Somewhat off-topic: I saw the funding note at the bottom - it’s pretty cool that the German government is giving some funding to projects like this. I wonder how much the US is doing in that regard, like if there’s a list of projects that tax dollars goes towards.
by iainctduncan on 7/22/24, 5:12 PM
IMHO you should really change the headline on this. I'm an audio person, and my first thought was "that's stupid, words are awful at describing sound". But then I looked, and editing transcriptions of voice recordings by word is actually a great idea. That was not the impression the headline gave me, FWIW!
by MForster on 7/22/24, 6:53 PM
And here I was expecting that I could edit the text and the app would change the audio file to say what I had typed...