from Hacker News

Show HN: Experimenting with generating SVG diagrams and visualizations

by grbsh on 2/3/24, 7:39 PM with 0 comments

I ran a quick experiment to see if I could generate SVGs using an LLM with good layout. This is an interesting task, because it requires spatial reasoning and planning (it can easily start off well-laid out, but run out of space as you can see in some of the examples in the linked framer site).

I started with a single prompt to GPT4. The results were pretty terrible in terms of layout (many overlapping / obstructed elements), but (I think) show promise. Also, it takes a really long time to generate, because the SVG path is usually 1-2 characters per token.

Next, I implemented a visual critique and self-refinement. I created a rubric that asked a bunch of questions about layout and educational / communicative value. I take the initial generated SVG, convert it to PNG and send it to gpt-4-vision-preview with the rubric, asking for a critique and regeneration.

The generation + visual self-critique and refinement takes about 3 minutes per SVG, so I generated about 300 examples, eliminated the bottom 50%, and fine-tuned on gpt-3.5-turbo. The fine-tune now takes about 7-10 seconds to generate one SVG with comparable quality to the GPT4 + GPT4-V refinement pipeline (according to GPT4-V's own scoring of the rubric).

Thoughts on this approach? I'm considering doing 10-100x the compute / dataset size, wondering if this interests anyone else. Happy to expose a (free) API to the fine-tuned model if people are interested in playing around with it.