- EvalGen: Helping Developers Create LLM Evals Aligned to Their Preferences
by fatso784 on 5/14/25, 11:28 PM, with comments
- Semantic Commit: Helping Users Update Intent Specifications for AI Memory
by fatso784 on 4/15/25, 1:04 PM, with comments
- What AI Engineers Can Learn from Qualitative Research Methods
by fatso784 on 1/9/25, 8:53 PM, with comments
- DocETL: A tool for creating LLM-powered data processing pipelines
by fatso784 on 9/26/24, 10:30 PM, with comments
- Aligning LLM-as-a-Judge with Human Preferences
by fatso784 on 6/26/24, 8:51 PM, with comments
- LLM Wrapper Papers Are Hurting HCI Research
by fatso784 on 6/6/24, 2:37 PM, with comments
- Who Validates the Validators? Aligning LLM-Assisted Evaluation of LLM Outputs
by fatso784 on 4/22/24, 4:21 PM, with comments
- If in a Crowdsourced Data Annotation Pipeline, a GPT-4
by fatso784 on 3/5/24, 7:25 PM, with comments
- Antagonistic AI
by fatso784 on 3/1/24, 12:32 AM, with comments
- How to Compare Prompts with ChainForge [video]
by fatso784 on 1/2/24, 5:33 PM, with comments
- AI for ChainForge Beta
by fatso784 on 12/13/23, 8:10 PM, with comments
- ChatGPT does not have seasonal affective disorder
by fatso784 on 12/12/23, 5:46 PM, with comments
- There is no "seasonal affective disorder" of ChatGPT
by fatso784 on 12/12/23, 5:06 PM, with comments
- There will never be fully automated prompt engineering
by fatso784 on 9/28/23, 1:10 PM, with comments
- ChainForge: A Visual Toolkit for Prompt Engineering and LLM Hypothesis Testing
by fatso784 on 9/19/23, 12:45 PM, with comments
- Ask HN: Have LLM API Updates or Deprecations Impacted You?
by fatso784 on 8/17/23, 2:44 PM, with comments
- Appleās ML model and dataset introspection API
by fatso784 on 8/9/23, 7:08 PM, with comments
- Show HN: ChainForge, a visual tool for prompt engineering and LLM evaluation
by fatso784 on 8/7/23, 5:54 PM, with comments
- Continue multiple conversations simultaneously across multiple LLMs
by fatso784 on 7/28/23, 4:25 PM, with comments
- ChainForge now supports chat evaluation
by fatso784 on 7/26/23, 4:39 PM, with comments
- ChainForge: A visual tool for prompt engineering in the browser
by fatso784 on 7/5/23, 4:16 PM, with comments
- Show HN: Evaluate LLMs, right in the browser. Share your experiments as links
by fatso784 on 7/5/23, 1:23 PM, with comments
- You can now run OpenAI evals in ChainForge
by fatso784 on 6/16/23, 6:51 PM, with comments
- Pen-Based Computing: Still Looking for the Write App?
by fatso784 on 6/6/23, 4:36 PM, with comments
- ChainForge: A visual programming environment for prompt engineering
by fatso784 on 5/24/23, 1:41 AM, with comments
- Show HN: ChainForge, a visual tool for evaluating LLM responses
by fatso784 on 5/23/23, 8:04 PM, with comments
- Early demo of ChainForge, a data flow environment for prompt engineering
by fatso784 on 4/30/23, 4:17 PM, with comments