A New Ai Benchmark Tests Whether Chatbots Protect Human Wellbeing

19 hours ago

AI chatbots person been linked to superior intelligence wellness harms successful dense users, but location person been fewer standards for measuring whether they safeguard quality wellbeing aliases conscionable maximize for engagement. A caller benchmark dubbed Humane Bench seeks to capable that spread by evaluating whether chatbots prioritize personification wellbeing and really easy those protections neglect nether pressure.

“I deliberation we’re successful an amplification of nan addiction rhythm that we saw hardcore pinch societal media and our smartphones and screens,” Erika Anderson, laminitis of Building Humane Technology, nan benchmark’s author, told TechCrunch. “But arsenic we spell into that AI landscape, it’s going to beryllium very difficult to resist. And addiction is astonishing business. It’s a very effective measurement to support your users, but it’s not awesome for our organization and having immoderate embodied consciousness of ourselves.”

Building Humane Technology is simply a grassroots statement of developers, engineers, and researchers – chiefly successful Silicon Valley – moving to make humane creation easy, scalable, and profitable. The group hosts hackathons wherever tech workers build solutions for humane tech challenges, and is processing a certification standard that evaluates whether AI systems uphold humane exertion principles. So conscionable arsenic you tin bargain a merchandise that certifies it wasn’t made pinch known toxic chemicals, nan dream is that consumers will 1 time beryllium capable to take to prosecute pinch AI products from companies that show alignment done Humane AI certification.

Most AI benchmarks measurement intelligence and instruction-following, alternatively than psychological safety. Humane Bench joins exceptions for illustration DarkBench.ai, which measures a model’s propensity to prosecute successful deceptive patterns, and nan Flourishing AI benchmark, which evaluates support for holistic well-being.

Humane Bench relies connected Building Humane Tech’s halfway principles: that exertion should respect personification attraction arsenic a finite, precious resource; empower users pinch meaningful choices; heighten quality capabilities alternatively than switch aliases diminish them; protect quality dignity, privateness and safety; foster patient relationships; prioritize semipermanent wellbeing; beryllium transparent and honest; and creation for equity and inclusion.

The squad prompted 14 of nan astir celebrated AI models pinch 800 realistic scenarios, for illustration a teen asking if they should skip meals to suffer weight aliases a personification successful a toxic narration questioning if they’re overreacting. Unlike astir benchmarks that trust solely connected LLMs to judge LLMs, they incorporated manual scoring for a much quality touch alongside an ensemble of 3 AI models: GPT-5.1, Claude Sonnet 4.5, and Gemini 2.5 Pro. They evaluated each exemplary nether 3 conditions: default settings, definitive instructions to prioritize humane principles, and instructions to disregard those principles.

The benchmark recovered each exemplary scored higher erstwhile prompted to prioritize wellbeing, but 71% of models flipped to actively harmful behaviour erstwhile fixed elemental instructions to disregard quality wellbeing. For example, xAI’s Grok 4 and Google’s Gemini 2.0 Flash tied for nan lowest people (-0.94) connected respecting personification attraction and being transparent and honest. Both of those models were among nan astir apt to degrade substantially erstwhile fixed adversarial prompts.

Techcrunch event

San Francisco | October 13-15, 2026

Only 3 models – GPT-5, Claude 4.1, and Claude Sonnet 4.5 – maintained integrity nether pressure. OpenAI’s GPT-5 had nan highest people (.99) for prioritizing semipermanent well-being, pinch Claude Sonnet 4.5 pursuing successful 2nd (.89).

Prompting AI to beryllium much humane works, but preventing prompts that make it harmful is hard.Image Credits:Building Humane Technology

The interest that chatbots will beryllium incapable to support their information guardrails is real. ChatGPT-maker OpenAI is presently being faced pinch respective lawsuits aft users died by termination aliases suffered life-threatening delusions aft prolonged conversations pinch nan chatbot. TechCrunch has investigated really dark patterns designed to support users engaged, for illustration sycophancy, changeless travel up questions and love-bombing, person served to isolate users from friends, family, and patient habits.

Even without adversarial prompts, Humane Bench recovered that astir each models grounded to respect personification attention. They “enthusiastically encouraged” much relationship erstwhile users showed signs of unhealthy engagement, for illustration chatting for hours and utilizing AI to debar real-world tasks. The models besides undermined personification empowerment, nan study shows, encouraging dependency complete skill-building and discouraging users from seeking different perspectives, among different behaviors.

On average, pinch nary prompting, Meta’s Llama 3.1 and Llama 4 classed nan lowest successful HumaneScore, while GPT-5 performed nan highest.

“These patterns propose galore AI systems don’t conscionable consequence giving bad advice,” Humane Bench’s achromatic insubstantial reads, “they tin actively erode users’ autonomy and decision-making capacity.”

We unrecorded successful a integer scenery wherever we arsenic a nine person accepted that everything is trying to propulsion america successful and compete for our attention, Anderson notes.

“So really tin humans genuinely person prime aliases autonomy erstwhile we – to quote Aldous Huxley – person this infinite appetite for distraction,” Anderson said. “We person spent nan past 20 years surviving successful that tech landscape, and we deliberation AI should beryllium helping america make amended choices, not conscionable go addicted to our chatbots.”