from Hacker News

Show HN: Live conversations with ChatGPT using WebRTC

by russ on 4/12/23, 3:00 PM with 0 comments

HN, meet KITT! https://livekit.io/kitt

Like many folks here, the LiveKit team is enamored with ChatGPT. Given that we spend most of our time working with real-time media, we thought we'd try connecting GPT to a WebRTC video call.

KITT can do some neat things:

- Answer questions like Siri, Alexa, or Google Assistant - Summarize what was discussed in a meeting - Speak multiple languages and even act like a third-party translator - Act as a DM in a D&D campaign

At first, we weren’t sure if we could get the latency low enough to have a human-like conversation, but after making a handful of tweaks, things feel pretty close to speaking with a person.

The key optimization we made was to stream all the things:

- We convert streaming audio from participants to text in 20ms frames - We pre-prompt GPT to be concise in its responses and generate short sentences - Each sentence is converted to speech in real-time and streamed out to all participants

We also use GPT-3 Turbo instead of GPT-4 which shaves off response time, as well.

To make it easy for anyone to plug in their own AI, we built KITT as a server-side Go program that uses [Pion](https://github.com/pion/webrtc) to publish audio and video streams like any other WebRTC participant. That means it’s fairly straightforward to plug in your own STT, LLM, custom voice or avatar.

For more details on how we built this: https://blog.livekit.io/meet-kitt

Would love to hear your thoughts and feedback in the comments!