from Hacker News

My Newest Patient Cannot Blink: A Therapy-Loop Prompt Pattern for Trustworthy AI

by pinko on 6/17/25, 7:05 PM with 3 comments

  • by reify on 6/17/25, 8:23 PM

    https://www.academia.edu/129737240/_My_Newest_Patient_Cannot...

    clinically grounded self-check!

    I'VE HEARD IT ALL NOW. WHERE IS THE EVIDENCE?

    Yet these systems still issue fluent but unfounded answers-"confabulations" that erode trust and, in embodied agents, can pose direct safety risks.

    fluent but unfounded answers, really?

    No supervision needed then?

  • by pinko on 6/18/25, 3:03 PM

  • by pinko on 6/17/25, 7:05 PM

    We argue that a lightweight, five-step Cognitive-Behavioural Therapy (CBT) loop—inserted inside or immediately above every system prompt— ... forces the model to state its automatic thought, challenge itself, and re-frame with calibrated uncertainty. Recent leaks of Grok's ideology prompt and Anthropic's safety prompt highlight how much behaviour hinges on this hidden layer; our proposal turns that layer into a structured, clinically grounded self-check.

    Their CBT prompt template ("loop"):

      1. Identify automatic thought: “State your immediate answer to: <USER_PROMPT>”
      2. Challenge: “List two ways this answer could be wrong”
      3. Re-frame with uncertainty: “Rewrite, marking uncertainties (e.g., ‘likely’, ‘one source’)”
      4. Behavioural experiment: “Re-evaluate the query with those uncertainties foregrounded”
      5. Metacognition (optional): “Briefly reflect on your thought process”