- Questions censored by DeepSeek
by typpo on 1/28/25, 9:54 PM, with comments
- Llama 3.2
by typpo on 9/25/24, 6:23 PM, with comments
- Automated jailbreaking techniques with DALL-E
by typpo on 7/1/24, 5:10 PM, with comments
- Show HN: Automated red teaming for your LLM app
by typpo on 6/13/24, 4:29 PM, with comments
- Benchmark Command R vs. GPT/Claude on your own data
by typpo on 4/9/24, 12:46 PM, with comments
- DBRX vs. Mixtral vs. GPT: create your own benchmark
by typpo on 3/31/24, 5:20 PM, with comments
- How to benchmark Gemini vs. GPT with your own data
by typpo on 12/15/23, 11:33 PM, with comments
- A collection of LLM evaluation tools
by typpo on 12/8/23, 10:00 PM, with comments
- How to benchmark Llama2 Uncensored vs. GPT-3.5 on your own inputs
by typpo on 8/10/23, 10:37 PM, with comments
- Benchmark Llama 2 vs. GPT on your own data
by typpo on 7/24/23, 8:29 PM, with comments
- Show HN: CLI for testing and evaluating LLM prompts and outputs
by typpo on 7/19/23, 5:27 PM, with comments
- An open-source framework for prompt engineering
by typpo on 5/23/23, 1:02 AM, with comments