by frenchtoast8 on 1/3/25, 4:40 AM with 1 comments
by frenchtoast8 on 1/3/25, 4:40 AM
According to the video, the team relied on "safeguards" to prevent this from happening. Two were mentioned:
1. "We ask the model to avoid some topics."
2. "We also created a second pass offensive language filter."
It sounds like the company considered avoiding certain language to be a hard requirement for this feature. The decision to rely almost entirely on the instructions in the model prompt leave me scratching my head. Is this a simple matter of a Series A company shipping something too quickly, or did the engineering team really believe that a couple instructions could guarantee certain responses? (Or both?)