from Hacker News

Launch HN: Wondercraft (YC S22) – Use text-to-speech to create podcasts easily

by diminikolaou on 8/11/23, 12:46 PM with 117 comments

Hi HN! We’re Dimitris and Youssef, founders of Wondercraft (https://www.wondercraft.ai/), a platform that leverages AI voices to make podcast creation simple. This video shows how it works: https://www.loom.com/share/fa8ac8eba8b9440dbe0321ccb8ba9426?....

“Hacker News Recap” (https://www.wondercraft.ai/podcasts/hacker-news-recap) a podcast produced using our platform, has been running for 4 months and currently gets close to 23k listens per month. We’ve made its analytics publicly available: https://op3.dev/show/f77aea62-97e5-5cce-92c6-9464e51c30c6.

Having previously attempted to start a podcast, we were well aware of the difficulties. Figuring out what equipment and software you need to buy is a daunting start. Editing is a lengthy and tedious process, technical difficulties often occur during recording, and planning logistics around recording is a hassle. As a result, content release is infrequent, which leads to lackluster growth.

At the same time, podcast consumption is experiencing exponential growth. There are 500M podcast listeners around the world, double in size compared to 5 years ago. Apart from the growth in listeners, podcasts are the medium that is most likely to influence behavior, which is the reason why the number of businesses having podcasts has grown 5x over the past 5 years. Finally, the last piece that led to the creation of Wondercraft is that text-to-speech models saw a big improvement about 6 months ago, with ElevenLabs releasing models with an output that is almost indistinguishable to humans (see HN thread here: https://news.ycombinator.com/item?id=34361651).

Wondercraft integrates realistic text-to-speech with an infrastructure that simplifies podcast creation. For example, you can integrate music, publish your podcast / create an RSS feed, generate a video for your episode, get assistance in the script generation, auto generate show notes and transcript and translate your podcast all together. All text based tasks (e.g. script assistance, show note generation, etc) are completed using a chain of custom prompts to LLM models. All text-to-speech is done through custom voices that are either synthetically generated or professionally cloned from Voice Actors, using the ElevenLabs platform. Tasks such as episode translation involve the use of both LLMs and ElevenLabs. Video generation runs using Remotion and the RSS feed is an XML creation and updating routine.

Since launching, we’ve had more than 13k users sign up to create their podcast. Use cases that we’re seeing include: businesses repurposing their blogs and generating video content for their socials; writers/bloggers/newsletters reaching audience through another medium; news outlets and publications adding a news rundown podcast in their lineup; businesses creating internal educational/cultural material; and podcast studios using Wondercraft to serve client needs faster.

Wondercraft is not a tool for fully AI generated content. Rather, we save people time by transferring content they’ve created (e.g. an article they’ve written) to another medium. This technology is best suited for news rundowns and narrational format podcasts (often used by businesses talking about a niche topic). And while interview and conversational formats will sound better person-to-person, the logistical and (often) sound quality issues remain, so we’re testing out an “Async Podcasts” feature, where an interviewee can respond to questions async in writing, share a photo and (optionally) a clip of their voice, and a podcast will be created out of it.

We’d love to hear any thoughts, comments or experiences you may have had in relation to leveraging text to speech for podcast creation. Thank you for taking the time to read!

  • by Kwpolska on 8/11/23, 5:58 PM

    People like podcasts, because they are interesting stories told by humans. Good podcasts have a lot of creativity behind them. Your HN Recap podcast uses a bland voice that sometimes struggles with tech terms, and the auto-generated summaries often feature deep details and miss the intention of the story. Auto-generated content on YouTube is usually misleading spam, how will you prevent your auto-generated podcasts from flooding podcast aggregators with such content?
  • by zurfer on 8/11/23, 1:52 PM

    I love it. I am also a regular listener to "PG Essays"[1]. I would never have read so many of his essays as I'm listening to.

    [1] https://podcasts.google.com/feed/aHR0cHM6Ly9hcGkyLndvbmRlcmN...

  • by vishnuharidas on 8/11/23, 2:02 PM

    I am a regular listener of the podcast "Hacker News Recap" (linked in the post description) and I always doubt if that's a real human reading the script. It is not a simple text-to-speech thing. Instead, it feels like a real human talking, with real emotion in it. I am already in love with Anna, the voice behind the HN Recap podcast!

    Also I played around with their podcast generation tool, where it neatly built a podcast from my blog posts. This is a good example of what Generative AI can do in the media domain. Congrats on the production launch! Keep up!

  • by jedberg on 8/11/23, 3:51 PM

    This is a great product: I've listened to the HN podcast and it's great.

    > podcast consumption is experiencing exponential growth

    I find this so interesting! I know my personal podcast consumption has fallen off a cliff since the pandemic started. I pretty much only listen to podcasts when I commute, and I stopped commuting then. I assumed that everyone did that but I guess I was wrong.

  • by aloknnikhil on 8/11/23, 10:59 PM

    I personally wouldn’t use this. I don’t know if your point about information being “locked” in written form is even being addressed here. There are so many audio books out there but I personally only really enjoy audio books delivered by the author themselves or someone who can actually capture the nuances in the text. So I think you’ll just end up moving this information from being locked in prose to being locked in sound, unless you can accurately capture the tone, nuances and the context around the whole text.
  • by dutchbrit on 8/12/23, 11:08 AM

    I was listening to an audiobook the other day on a commute that was also done with AI. The main issue I had was focus, the voice was very monotone to begin with, and at one point it pronounced “it” as “IT”. I didn’t finish listening… That said, the voice isn’t that bad in the HN example.
  • by mcpackieh on 8/11/23, 6:22 PM

    How will you prevent your service from being used to flood the world with worthless algorithmically generated slop?
  • by RankingMember on 8/11/23, 3:36 PM

    This is both brilliant and scary- I anticipate that the amount of web-scraped stuff about to land on Spotify's podcasts tab is going to be insane.
  • by porkbeer on 8/11/23, 6:53 PM

    Great, more robot voices. No thanks. The point of a podcast is the human part. I can just have gpt blather to me thru tts if i wanted fake podcasts. I regret saying this, but your tech will actively make the world worse.
  • by benzible on 8/11/23, 3:31 PM

    Different but related idea, this creates a personal podcast feed: https://reca.st
  • by monological on 8/11/23, 5:30 PM

    Podshorty does something kind of similar, but it takes any YouTube link, summarizes it and generates a podcast using the voices of the original speakers. Also creates transcripts so you can follow along. https://www.podshorty.com
  • by another-dave on 8/11/23, 6:28 PM

    Seems cool! One thing I noticed listening to one of the PG essays was that it changed voices for one of the pull quotes, which was a nice touch!

    Might be cool to have a feature that read out the source too, like someone would if a human was reading a quote from a book. Hard to control for everyone's different annotation style though I'd imagine.

  • by Imply8215 on 8/11/23, 1:29 PM

    Translating a podcast into so many languages with two clicks increases our reach so much. Great stuff Wondercraft, keep it up
  • by cca778 on 8/11/23, 7:18 PM

    Recently I have produced some short video lectures to distributed to research partners. I can write reasonably well in english, but my speaking is terrible. I manage to prepare fine-tuned english subtitles.

    A text-to-speech can help creating english audio tracks for those producing original content in other languages

  • by ilovetts on 8/12/23, 12:55 AM

    Hello, The TTS voice is fantastic. Any plan to make it available to developers on iOS, Android, Windows etc? The bundled TTS voices aren’t great on these platforms.
  • by kyriakosel on 8/12/23, 1:13 PM

    There are too many books that I would want to listen to and don't have audiobooks. i'd definitely give it a try with that in mind
  • by rw2 on 8/11/23, 4:20 PM

    Great product, first of all. I can really see a use for it. Are you afraid that this is too easy to clone?

    Someone with speechify: https://speechify.com/

    And who wants to write a spotify API write code can do this.

  • by colesantiago on 8/11/23, 3:19 PM

    This looks great and exciting, congrats on the launch.

    I am so happy that this exists, I was considering creating a podcast but it was too much effort involved and had to do and redo takes and other priorities.

    Will be considering using Wondercraft and others if they exist entirely for this now.

  • by GordonS on 8/11/23, 2:21 PM

    Not sure if I'm just not seeing it, but I can't find any information on pricing, or whether there's a free tier?

    (there's a "start for free" button, but that could mean anything, and it wants me to create an account)

  • by sakopov on 8/12/23, 6:12 AM

    Congrats on the launch! Looks like ElevenLabs is your direct competitor. How do you plan to differentiate? So far their pricing is a little better and they also provide the ability to create a custom voice model.
  • by causi on 8/11/23, 6:49 PM

    I wish there was more in this space geared toward audiobooks. There are so many brilliant novels in my collection that never got an audiobook release and it'd be amazing to be able to generate my own.
  • by swyx on 8/12/23, 3:44 PM

    congrats on launching! i've been a vocal fan for a little bit: https://twitter.com/swyx/status/1661848597728575489

    however when i tried signing up for your pod to make my own, i was disappointed that it would only take manually entered content. i want to hook it up to my twitter or rss or discord feed, and have you Do The Thing. please!

  • by hexage1814 on 8/12/23, 1:39 AM

    It's amazing how good your service is. Not only the text to speech, but like even the emphasis it knows how to give on each word. Fantastic times, fantastic times indeed.
  • by snissn on 8/12/23, 6:20 PM

    Your latest hn podcast post has a low level noise that makes it unlistenable to me. You should be able to use non ai tools to remove the background hiss/hum
  • by ksajadi on 8/11/23, 11:42 PM

    I look at it from a consumer of podcasts point of view not a producer. If the content is good and the voice quality is natural then this can only help unlock more good content, by lowering the barrier to entry. Can't see why that is a bad thing. Sure, there will always be the equivalent of content farms, but remember that content farms exist because of Google. Without Google traffic the incentive to create useless content diminishes. Podcasts are not like that. You might be tricked into listening to one episode of an AI content generated (not AI voiced) one but in all likelihood you won't subscribe, removing the incentive.
  • by jermaustin1 on 8/11/23, 3:21 PM

    I might have missed it on the site, but is there any plan for multiple voices on a single podcast? And any type of annotations to add emotion the voice (scared, excited, angry)?
  • by magdyks on 8/11/23, 12:50 PM

    I’ve been listening and really loving the hacker news recap. Keep up the good work and please let me listen in different languages!
  • by oo0shiny on 8/11/23, 4:00 PM

    This looks like a really cool idea. A question though: who holds the rights to the created audio? The user, or Wondercraft?
  • by languagehacker on 8/11/23, 5:25 PM

    I was half expecting this to be Wondery spinning off whatever they do to make all their narrators sound like robots.
  • by hdivider on 8/11/23, 3:25 PM

    This is a welcome tool indeed. Question: how would you describe how you're different to Descript?
  • by rgrieselhuber on 8/11/23, 2:41 PM

    This is pretty awesome. Are any languages other than English supported?
  • by funthree on 8/11/23, 2:12 PM

    1hr and 5hr is not long enough
  • by 0921kiyo on 8/11/23, 1:51 PM

    Congrats on the launch! The output quality is very good!