from Hacker News

Show HN: Calculate confidence score for OpenAI JSON output

by QueensGambit on 4/10/25, 3:16 PM with 7 comments

  • by QueensGambit on 4/10/25, 3:34 PM

    Hi everyone, I built @promptrepo/score because we’re no longer using generative AI just for suggestions — we’re making decisions with it. But generative AI is probabilistic, not the deterministic systems we’re used to. So when AI makes decisions, we need to know how confident it is, and how much we can trust each field in the output.

    This tool looks simple — it just converts OpenAI’s logprobs into field-level confidence scores — but that changes how you use AI in production. It lets you mark low-confidence fields, send them for human review, or retry with better grounding. In high-volume systems, you can also track low-confidence patterns to improve prompts or fine-tune with better data. Its a lightweight npm and has no dependencies, so its easy to integrate it into your AI workflows. Would love to hear your thoughts!

  • by siva7 on 4/10/25, 9:59 PM

    Wait, this doesn't work with a current-gen model like 4o? Is this a technical limitation?
  • by rboobesh on 4/10/25, 3:48 PM

    Can this work with nested JSON objects or arrays?
  • by manidoraisamy on 4/10/25, 3:22 PM

    Does it support Claude?