from Hacker News

Create AI videos by simply typing in text

by vladoh on 5/29/21, 9:02 PM with 110 comments

  • by LordDragonfang on 5/30/21, 12:11 AM

    Synthesia is also the name of a much more established, extremely popular midi/piano visualisation software[1]. If you've ever looked up "<song> piano tutorial" on youtube, you've probably seen that program.

    It's a shame they chose that name, since it was such a great play on words for the midi software (synesthesia is sound into colorful visuals, and midi uses synths) whereas this product has basically no relation.

    [1] https://synthesiagame.com/

  • by stevenicr on 5/29/21, 10:44 PM

    Avoid getting your video rejected. Please make sure you adhere to our content guidelines. Please keep your script professional and business related. Political, sexual, personal, criminal and discriminatory content will not be tolerated or approved.

    Ahh.. the anchor fm problem.. guess I'll need an open source version.

    I started toying with libreBot I think it's called - which allows you to do anything you want with these things if you self-host license for a grand I think it was.

    This synthesia didn't even get the first sentence I tried. It also requires a 'business email' and agree to terms that includes "I agree to receive occasional product information as per Synthesia Privacy Policy *"

    trying hard to keep the genie in the bottle aren't they.

  • by shannifin on 5/30/21, 1:04 AM

    While the tech is impressive in itself, still doesn't look to be something I'd pay for. The lip sync is annoyingly off, and the bland expressions that comes from not understanding context make the communication even worse. If having a visual talking head is that important for a project, still seems better to just hire someone.

    (On a side note, I'm not sure I understand the appeal of emotionally bland fake-smile talking heads in general, even when they're real.)

  • by question000 on 5/30/21, 1:42 AM

    Can you think of one good use for this product?

    No I'm not asking if you think you can you use this to make money, I'm asking do you personally want to sit through a video of a robot telling you do things? Are we supposed to believe this is preferable to simply reading this or hearing recorded audio? This is flat out consumer hostility, basically telling your customers to talk to a sock puppet instead of a real person, I hope this fails, I would pay money to make this illegal.

  • by xiphias2 on 5/29/21, 10:52 PM

    Here's the cookie text if you are lazy to read it...it sounds a bit creepy: https://share.synthesia.io/a4159eee-f70b-4318-a8bc-ec0fdf6af...
  • by erichurkman on 5/29/21, 10:59 PM

    Are sales spam emails going to start including personalized videos? I guess I'll look forward to the "Hello dollar sign firstname. I'm dollar sign agentname. My colleague recommended I connect with you, as you both work at dollar sign employer" template misfires.
  • by istorical on 5/29/21, 10:30 PM

    So where's the version that allows NSFW content? Can't be the only one who wanted to test this with erotica.
  • by firefoxd on 5/29/21, 10:16 PM

    Impressive. Funny enough I've started to see those faces appear on YouTube. The intention may be to create these corporate style videos, but I'm counting down the minutes until my aunt starts forwarding questionable things on WhatsApp.
  • by anonytrary on 5/29/21, 11:18 PM

    https://share.synthesia.io/d8860a05-2870-4315-9316-b03cbc76a...

    Animations are pretty good. Pronunciation could use some work. There also does not seem to be a way to influence the inflection, which is an absolutely crucial component for sales pitches. It's not so much what you say, but how you say it. Also, the right people have to sell the right things. Words coming from Elon's mouth in regards to cryptocurrency have a far greater effect on market behavior than the exact same words coming from this AI person's mouth.

  • by K0balt on 5/30/21, 12:27 PM

    Uncanny valley meets mixed messages and bad delivery.

    The incoherent facial expressions actually manage to confuse the message more than the dissociated pronunciation.... "witch is know small feet".

    This tech is a neat trick at this stage but is less useful than just leaving the text as text, in fact adding negative value to an already fully functional process.

    Fiver is a better option, and I would not recommend that.

    For an interesting and highly unethical experiment, someone should raise a thousand infants with this drivel and see what happens...I’m going to posit that the result is not good. Children’s narrations is exactly where this is headed though, I can see this as a multimillion view no effort YouTube babysitter.

    Children find a pleasant, smiling female face soothing...so this is going to be another way that the dollar and human laziness will use AI to make the world a slightly worse place.

  • by going_to_800 on 5/30/21, 9:19 AM

    What awful comments here, you're all criticizing something really exciting. Of course AI can't beat real humans, what do you expect? But it's closer we've ever been, especially since is available to consumers. People in sales and marketing know how valuable is this on improving conversion rates... if you're not in those fields, that's not for you, saying something it useless just because you have no knowledge in other domains, it's highly ignorant.
  • by nemothekid on 5/30/21, 6:17 AM

    Wow this feels like a blast from the past. There used to be a service that did exactly this (little help chats with "AI" generated voices), in the mid 2000s but instead of having human avatars they were animated. Seeing the woman speak immediately unlocked a memory in my kid brain.
  • by Swizec on 5/29/21, 10:31 PM

    Fantastic technology and I love that the videos look and sound super lifelike. The face looks like most instagram influencers with vanilla broad-appeal pretty faces, which I guess is the style these days.

    But what’s the point?

    If you’re gonna send someone a soulless corporate drone video, is that really better than a soulless corporate email? I thought the goal of doing video was that it’s more personable and human ... an AI video doesn’t quite hit those goals does it?

  • by geuis on 5/30/21, 3:49 AM

    Here’s a sample video with a custom script produced earlier https://share.synthesia.io/4b75b584-9b3b-4a96-86c2-6b34b8711...
  • by cs702 on 5/29/21, 10:30 PM

    Pretty good.... but not quite there yet, in my humble opinion.

    The lips, eyes, and facial features move in natural ways, but the head remains frozen in a somewhat unnatural manner. It's just inside the uncanny valley, with barely perceptible creepiness.

    I would hope to see improvements to make face/neck movements look more natural, to overcome these issues over time!

  • by 2bitencryption on 5/29/21, 10:27 PM

    There's something quite cyberpunk about smiling AI-generated corporate headshot faces extolling the wonders of <insert product here>. And I don't mean that in a good or bad way. I imagine we'll start seeing these all over the place quite soon.

    I mean, combine it with GPT-3 and you've got something that's nearly science fiction. Really interested to see where this goes.

  • by Cyril_HN on 5/29/21, 10:37 PM

    The eyes aren't quite right and sometimes.thr voice is a little off, but I probably wouldn't notice in a real world setting without prior knowledge.
  • by artur_makly on 5/30/21, 1:00 AM

    I want to see her on my wall, every day, bald, with green eyes. Spouting Shakespearean slurs at Alexa, then following up with some Rumi poetry, and a dash of Allan Watts..all powered by a Markov chain.
  • by andersco on 5/30/21, 1:11 AM

    Very close but not quite human. A text book example of the uncanny valley https://en.m.wikipedia.org/wiki/Uncanny_valley
  • by hyperpallium2 on 5/30/21, 4:59 AM

    rel. given a script, "generating all aspects of a cinematic scene, including staging, acting, editing, framing and lighting in Assassin's Creed Odyssey."

    https://youtube.com/watch?v=DFM5zbekZ7c hour-long dev talk (GDC)

  • by codeulike on 5/29/21, 11:02 PM

    Their David Beckham video is pretty good https://www.synthesia.io/post/david-beckham
  • by cupcake-unicorn on 5/30/21, 2:15 AM

    What's the point of using AI if it needs to be manually reviewed? I suppose the outputs are also manually reviewed as well to keep from the AI going rouge?
  • by p-sharma on 5/30/21, 9:18 AM

    People don't want to talk to computers, that's why chatbots (in their current form) fail one after the other. People also don't want to listen to emotionless robots. As long as this technology is not 100% accurately mimicking a human, the Uncanny valley effect will kick in and just leave an uncomfortable feeling.
  • by bredren on 5/30/21, 4:31 AM

    Here is an instructional reading of advice I gave my friend over text on how to use enzymatic cleaner should his new kittens have an accident:

    https://share.synthesia.io/2761933d-4ec7-48c7-b67e-85fc9d686...

  • by herval on 5/30/21, 8:01 PM

    I know I'll will probably sound a bit Luddite by saying this, but just the examples already make me cringe: a welcoming video for a corporation saying "we're looking forward to have you here", narrated by a _bot_, is as dehumanizing as it gets. :(
  • by ilaksh on 5/29/21, 11:31 PM

    Interesting. I hope the models were paid adequately, considering that they can now use them effectively for free infinitely.

    Reminds me of the movie The Congress.

    Obviously this technology has a long way to go, but it seems that that actors should feel less secure about their jobs being resistant to automation.

  • by FraserGreenlee on 5/29/21, 10:15 PM

    These videos are incredibly life like. I can see many virtual companions being made with this.
  • by MarkMc on 5/30/21, 1:59 AM

    Impressive, but not quite good enough to avoid the 'uncanny valley' - the lips are not perfectly synced to the audio. Also it should allow a way stress certain words in the input script.
  • by aishwaryaashok on 5/31/21, 9:47 AM

    So, a bit curious on how this factors in emotions and depth that could vary depending on the nature of the video [onboarding vs launch videos, say]? And, how to not run out of options for voice/person selection. It shouldn't end up being like the stock images (same faced used in multiple brands). How well of a brand identity gets maintained for say paying customers?
  • by andrewmcwatters on 5/30/21, 3:54 AM

    Ah dang, I pasted some literal Lorem Ipsum in to see how it would sound from the AI, and it just puts you through an invite funnel. Oh well.
  • by YeGoblynQueenne on 5/30/21, 7:42 PM

    >> Synthesia lets you create great business videos in minutes. Say goodbye to actors, film crews and expensive equipment.

    Yay! At last! And when we've automated away everyone's work, also say goodbye to synthesia and every other automation service, because there's no business left to use it. Woo-hoo, future world, here I come!

  • by system2 on 5/30/21, 4:21 AM

    1 - We will review your video 2 - You will receive your video in your email 3 - You will receive an account creation invite

    What a great sample.

  • by evan_ on 5/30/21, 2:04 AM

    A really creepy use case for this would be to combine it with one of those IP-to-company name lists. If you visit a vendor it could play a video greeting you by mentioning your business name. “Click here to learn what we can do for Acme Industries!”

    Again, super creepy and not really clear if it would drive engagement.

  • by dalmo3 on 6/1/21, 1:11 AM

    Wow, the Portuguese pronunciation, intonation and lipsync are incredibly accurate, 10x more so than the English voice. I wonder if that's true for other latin-ish languages and if that means those languages are easier to learn.
  • by pedalpete on 5/30/21, 12:15 AM

    I think in general the quality is quite good, but the characters lack personality. I think that is the opportunity. Create something with more lively movement. Think the Sham-wow guy.

    Anybody can stand blankly in front of a camera without emotion. But this is an impressive start.

  • by Meph504 on 5/30/21, 12:11 AM

    Will not demo anything that requires me to put in that much of my data to try their product.
  • by jordhy on 5/30/21, 1:25 PM

    I love it 1000%. Need to create videos for a new crypto. This helps translate the videos to 10 different languages and kick off a global service. It's not perfect but it's fast and looks very professional.
  • by mensetmanusman on 5/29/21, 10:32 PM

    Groups like nxivm are going to do strange things with this tech in the future.
  • by smusamashah on 5/30/21, 12:56 AM

    The require agreeing to sending promotional emails before creating the video.
  • by boboche on 5/30/21, 1:13 AM

    Would have been interresting to try out but unfortunately, the email prompt ended my evaluation. A lot of people will probably stop there and move on as well.
  • by ravenstine on 5/29/21, 11:54 PM

    Aw man, it kind of made it seem like it would be generated fast, but then you find out after putting in your information that it requires manual review.
  • by anotheryou on 5/30/21, 7:39 PM

    I'm more stunned by the good speech synthesis than by the already good visuals.

    Does anyone know what's under the hood for the text to speech?

  • by junon on 5/30/21, 7:14 AM

    No thanks. I don't like having to give you all of this personal information you really don't need in order to try your product.
  • by 0xx on 5/30/21, 1:58 PM

    Founder here. AMA :)

    To answer a few recurring questions in the thread

    ---> Use case.

    Video is a way more effective way to communicate than text. Not for the HN crowd, but if you're a blue collar worker a 2 minute video in your native language is much preferred to a 5 page pdf for training.

    Anyone who has tried to record a simple corporate video know the pain of cameras, film crews, 25 takes to get one that works and post production. Cumbersome, slow and multidisciplinary. By the time the video is done the content is out of date.

    Synthetic video is not yet at the quality of real video. Eventually it will be. But the mistake many are making here is comparing it to real video; it should be compared with text.

    In X years we'll be able to make Hollywood films on a laptop without needing anything but time and imagination. Just like we can digitally compose music in Ableton, create images in Photoshop and type novels on keyboards rather than with pen and paper.

    My (obviously biased;)) belief is that synthetic media will eventually become foundational technology that will move media production from cameras/microphones to API's. We'll be able to do all kind of things we couldn't do before.

    Eg. personalized and interactive rich media, video-driven chatbots and eventually Hollywood blockbusters made by your favourite YouTuber from his or her bedroom.

    ---> Uncanny valley

    Simulating real video is incredibly hard. We're constantly improving and launching more expressive synthesis soon.

    From our tests with some of our largest clients 8/10 people don't realise it's a synthetic video (unless they are asked to look for it).

    ---> Tech

    Has been developed over the last 3 yrs. Origins/team from Stanford/UCL/TUM.

    Learning: Going from research to working, scaleable product is hard and takes time. But very rewarding when it works.

    [1] https://www.youtube.com/watch?v=ohmajJTcpNk [2] https://www.youtube.com/watch?v=qc5P2bvfl44

    ---> Bad uses

    Bad actors will do bad things with synthetic media. Like with any other technology from smartphones to cars. We're moderating all content and building safeguards and verification + working with FAANG and others on detection and provenance technology.

    Recommended read - deepfakes perfectly follow the story arc of any new, powerful technology: https://journals.sagepub.com/doi/full/10.1177/17456916209193...

    ---> Actors

    Real actors getting rev share + upfront free from every video generated with their likeness. Like being a stock photo actor.

  • by jelling on 5/30/21, 2:39 AM

    I’m deeply interested in synthetic media but it’s hard to believe there is a shortage of people who want to be video presenters.
  • by devops000 on 5/29/21, 10:44 PM

    I created a step-by-step tutorial, but the voice still sounds too robotic. Unfortunately it doesn't inspire trust to users.
  • by darepublic on 5/30/21, 3:41 AM

    Gonna have dynamic open world video games too, where custom cut scenes can play based on your characters actions.
  • by lxe on 5/30/21, 2:27 AM

    Is this based on a paper/demo previously posted on HN? A vaguely remember seeing the faces elsewhere.
  • by Gualdrapo on 5/29/21, 11:05 PM

    It forces you to select that option to receive promotional emails from them before submiting a script.
  • by doener on 5/30/21, 11:41 AM

    This site does not let me try the demo without giving them the permission to send spam eMails.
  • by joshribakoff on 5/31/21, 12:54 AM

    After filling out the recaptcha I cannot scroll to the submit button on mobile safari.
  • by alexfromapex on 5/30/21, 4:29 PM

    Warning: you have to agree to receive marketing emails from them
  • by Exuma on 5/30/21, 12:48 AM

    Just give me the ability to be offensive. Who are you to stop me?
  • by flemhans on 5/29/21, 10:48 PM

    Scripts require manual review. It's not automated
  • by cush on 5/30/21, 2:02 AM

    The sample videos made me incredibly uncomfortable
  • by rkagerer on 5/30/21, 6:04 AM

    You want my email to try it out? Hard pass.
  • by ratsimihah on 5/30/21, 4:49 AM

    The lack of empathy in her voice is chilling
  • by aalfson on 5/30/21, 12:17 AM

    This is really cool.
  • by gibba999 on 5/30/21, 12:20 AM

    $3/minute of video seems a bit steep. $180/hour of video.