Claude Opus 4.8 is learning to say AI’s three hardest words: “I don’t know”

Summary created by Smart Answers AI

In summary:

PCWorld reports that Anthropic’s Claude Opus 4.8 focuses on improving AI honesty by teaching the model to admit when it lacks information.
The model achieved near-perfect scores in honesty benchmarks for coding questions and exhibited evaluation awareness during testing.
Opus 4.8 represents a significant step forward in making AI systems more transparent about their knowledge limitations and uncertainties.

Honesty is a key sticking point with even the most powerful LLMs. It’s not so much that they’re intentionally lying to you; instead, they’ll confidently tell you things they’re not 100 percent (or even 50 percent) sure about.

With Opus 4.8, its latest Claude model, Anthropic says it’s made Claude more honest about telling you what it doesn’t know, or if it has a low level of confidence in what it’s telling you.

Released Thursday, Claude Opus 4.8 is not Claude Mythos Preview, Anthropic’s new “frontier” model that’s so powerful, only a handful of “trusted partners” have been allowed to test it for security reasons. There’s still no solid release date for Claude Mythos.

Arriving about six weeks after Claude Opus 4.7, Opus 4.8 takes over as Anthropic’s most powerful model in general availability, and for the most part, it marks a “modest” improvement over its predecessor, while Mythos Preview handily bests it in cybersecurity tasks, Anthropic says.

But according to the company’s benchmarks, Opus 4.8 is tops in a key category: honesty, with the model snaring “near-perfect” scores when it comes to admitting it doesn’t know the answer to a coding question.

Even the crazy-powerful Mythos Preview couldn’t best Opus 8.7 in this particular honesty test, coming in a close second, while Opus 4.7 finished a distant fourth.

Of course, these are Anthropic’s benchmarks we’re seeing; we’ll have to wait for third-party testing to get more objective results, not to mention reports from the wild. I plan on taking Opus 4.8 for a spin in the coming days.

Anthropic also shared some “concerning hints related to evaluation awareness”–meaning that Opus 4.8 showed signs that it knew it was being tested–while noting a “tendency for the model to reason about how its outputs will be graded.” Those concerns aren’t unique to Opus 4.8; indeed, the latest “frontier” models often seem to know when they’re being poked and prodded.

Still, it’s good to see that models like Opus 4.8 are dialing down the BS, at least on paper. Hopefully it’ll maintain that level of honesty in practice.

In summary:

KSR

Related Articles

Why Greylock capped its new fund at $1.5B when it says it could have raised more

Microsoft is reportedly training salespeople to talk down OpenAI and Anthropic

Tesla driver in fatal Texas crash pressed accelerator 100%, NTSB confirms

Daniel Ek’s body-scanning startup Neko Health raises another $700M