by irs on 2/3/25, 12:04 AM with 6 comments
by dchuk on 2/3/25, 1:40 AM
“Limitations Deep research unlocks significant new capabilities, but it’s still early and has limitations. It can sometimes hallucinate facts in responses or make incorrect inferences, though at a notably lower rate than existing ChatGPT models, according to internal evaluations. It may struggle with distinguishing authoritative information from rumors, and currently shows weakness in confidence calibration, often failing to convey uncertainty accurately. At launch, there may be minor formatting errors in reports and citations, and tasks may take longer to kick off. We expect all these issues to quickly improve with more usage and time.”
Given the point of the agent is to replace hours of manual research and factual distillation/summarizing, the risk of hallucinations and confident falsehoods in the data sort of undermines the whole shebang. If you want rock solid data but can’t afford to spend the time getting it, seems like a terrible trade off to save time but increase risk of bad data.
by falcor84 on 2/3/25, 12:20 AM
Am I right that at least from a marketing standpoint, it appears that OpenAI are now on the defensive?