by selalipop on 8/1/24, 7:04 PM with 0 comments
The pipeline for generation is fairly straightforward:
- Prompt expansion: Trying to understand what the user said and what their underlying intent is to produce an even longer instruction so their character takes more than a single action per page.
- Character simulation: The background of each character registered in the scene is structured to help the LLM understand how they react to different situations, and a bucket of actions that they'll take is generated for the final output step
- Story understanding: A basic RAG pipeline to ensure details that are no longer in the context window get captured correctly
- Image generation: Every 5 pages an image is generated by summarizing the scene and generating an image generation prompt
- Final output: The output uses a formatting schema that denotes who's speaking what line, when certain actions are taking place etc. The goal is to eventually narrate with accurate voices based on the character that's speaking
-
You can also add characters and notes to the simulation by clicking Story Settings at the bottom