by robinwarren on 12/1/24, 10:19 PM with 6 comments
by 0_____0 on 12/2/24, 1:41 AM
The result here seems a bit like what I imagine "Audio Description" for visual media would read like as a transcript. Maybe it would work better if the model was trained on novels, rather than a more general corpus?
by dyauspitr on 12/2/24, 4:20 AM
What is the point of this? To see if AI can weave a story based on a sequence of stills? If that’s the case I would love to see what it could do with a series of security cameras as it follows a person around a highly surveilled city like London or Shenzen.