
Follow ZDNET: Add america arsenic a preferred source connected Google.
ZDNET's cardinal takeaways
- AI is getting amended astatine mini tasks, but still lags connected long-form analysis.
- The consequences of prolonged interactions pinch AI tin beryllium disastrous.
- Use AI for illustration a instrumentality for well-defined tasks, and debar falling down a rabbit hole.
Better to do a small good than a awesome woody badly. So said nan awesome philosopher Socrates, and his proposal tin use to your usage of artificial intelligence, including chatbots specified arsenic OpenAI's ChatGPT, aliases Perplexity, arsenic good arsenic nan agentic AI programs progressively being tested successful enterprise.
AI investigation progressively shows that nan safest and astir productive people pinch AI is to usage it for small, constricted tasks, wherever outcomes tin beryllium good defined, and results tin beryllium verified, alternatively than pursuing extended interactions pinch nan exertion complete hours, days, and weeks.
Also: Asking AI for aesculapian advice? There's a correct and incorrect way, 1 expert explains
Extended interactions pinch chatbots specified arsenic ChatGPT and Perplexity tin lead to misinformation astatine nan very least, and successful immoderate cases, delusion and death. The exertion is not yet fresh to return connected nan astir blase kinds of demands of reasoning, logic, communal sense, and heavy study -- areas wherever nan quality mind reigns supreme.
(Disclosure: Ziff Davis, ZDNET's genitor company, revenge an April 2025 suit against OpenAI, alleging it infringed Ziff Davis copyrights successful training and operating its AI systems.)
We are not yet astatine AGI (Artificial General Intelligence), nan expected human-level capabilities of AI, truthful you'd do good to support nan technology's limitations successful mind erstwhile utilizing it.
Put simply, usage AI arsenic a instrumentality alternatively than letting yourself beryllium sucked down a rabbit spread and get mislaid successful endless rounds of AI conversation.
What AI does good - and not truthful well
AI tends to do good astatine elemental tasks, but poorly astatine analyzable and heavy types of analysis.
The latest examples of that are nan main takeaways from this week's merchandise of nan Annual AI Index 2026 from Stanford University's Human-Centered AI group of scholars.
On nan 1 hand, editor-in-chief Sha Sajadieh and her collaborators make clear that agentic AI is progressively successful astatine tasks specified arsenic looking up accusation connected nan Web. In fact, agents are adjacent to human-level connected regular online processes.
Also: 10 ways AI tin inflict unprecedented damage
Across 3 benchmark tests -- GAIA, OSWorld, and WebArena -- Sajadieh and squad recovered that agents are approaching human-level capacity connected multi-step tasks specified arsenic opening a database, applying a argumentation rule, and past updating a customer record. On nan GAIA test, agents person an accuracy complaint of 74.5%, still beneath nan 92% of quality capacity but measurement up from nan 20% of a twelvemonth ago.
On nan OSWorld test, "Computer subject students lick astir 72% of these tasks pinch a median clip of astir 2 minutes," while Anthropic's Claude Opus 4.5, up until precocious its astir powerful model, reaches 66.3%. That intends "the champion exemplary [is] wrong 6 percent points of quality performance."
WebArena shows AI models "now wrong 4 percent points of nan quality baseline of 78.2%" accuracy.
Agentic AI is getting amended astatine online tasks specified arsenic Web browsing but still falls short of human-level accuracy.
While Claude Opus and different LLMs are not perfect, they show accelerated advancement successful astatine slightest reaching benchmark levels that travel person to human-level performance.
That makes sense, arsenic manipulating a web browser aliases looking thing up successful a database should beryllium among nan easier scenarios successful which nan natural-language punctual tin plug into APIs and outer resources. In different words, AI should person astir of nan instrumentality it requires to interface pinch applications successful constricted ways and transportation retired tasks.
Also: 40 cardinal group globally are utilizing ChatGPT for healthcare - but is it safe?
Note that moreover pinch well-defined, constricted tasks, it helps to cheque what you're getting from a bot, arsenic nan mean people connected these benchmarks still falls short of quality capacity -- and that's successful benchmark tests, a benignant of simulated performance. In real-world settings, your results whitethorn vary, and not to nan upside.
AI can't grip nan difficult stuff
When they dug into deeper kinds of work, nan Stanford scholars recovered overmuch little encouraging results.
Research has found, they noted, that "models grip elemental lookups good but struggle erstwhile asked to find aggregate pieces of matching accusation aliases to use conditions crossed a very agelong archive -- tasks that would beryllium straightforward for a quality scanning nan aforesaid text."
That uncovering aligns pinch my ain anecdotal acquisition utilizing ChatGPT to draught a business plan. Answers were good successful nan first fewer rounds of prompting, but past degraded arsenic nan exemplary snuck successful facts and figures I had not specified, aliases that mightiness person been applicable earlier successful nan process but had nary business being included successful nan coming context.
The lesson, I concluded, was that nan longer your ChatGPT sessions, nan much errors sneak in. It makes nan acquisition infuriating.
Also: I built a business scheme pinch ChatGPT and it turned into a cautionary tale
The results of unchecked bot elaboration tin get much serious. An article past week successful Nature mag describes really intelligence Almira Osmanovic Thunström, a aesculapian interrogator astatine nan University of Gothenburg, and her squad invented a disease, "bixonimania," which they described arsenic an oculus information resulting from excessive vulnerability to bluish ray from machine screens.
They wrote general investigation papers connected nan made-up condition, past published them online. The papers sewage picked up successful bot-based searches. Most of nan ample connection models, including Google's Gemini, began to faithfully subordinate nan information bixonimania successful chats, pointing to nan faked investigation papers of Thunström and team.
The truth that bots will confidently asseverate nan beingness of nan clone bixonimania speaks to a deficiency of oversight of nan technology's entree to information. Without due checking, you can't cognize if a exemplary will verify what it's spitting out. As 1 clever clever who wasn't progressive successful nan investigation noted, "We should measure [the AI model] and person a pipeline for continuous evaluation."
Consequences tin beryllium serious
A much superior variant, wherever a personification seems to person gone down a rabbit spread of confiding successful a bot, is described successful a recent New York Times article by Teddy Rosenbluth astir nan lawsuit of an older man grappling pinch achromatic humor compartment cancer.
Rather than pursuing his oncologist's advice, nan patient, Joe Riley, relied connected extended relationship pinch chatbots, particularly Perplexity, to refute nan doctor's diagnosis. He insisted his AI investigation revealed he had what's called Richter's Transformation, a complication of crab that would beryllium made much adverse by nan recommended treatment.
Also: Use Google AI Overview for wellness advice? It's 'really dangerous,' investigation finds
Despite emails from experts connected Richter's questioning nan worldly successful nan Perplexity summaries of nan condition, Riley stuck pinch his belief successful his AI-generated reports and resisted his doctor's and his family's pleas. He missed nan model for due treatment, and by nan clip he relented and agreed to effort treatment, it was excessively late.
Rosenbluth makes nan relationship betwixt nan communicative of Joe Riley and the lawsuit of Adam Raine past year, who committed termination aft extended chats pinch ChatGPT astir his inclination to extremity his life.
Riley's son, Ben Riley, wrote his ain relationship of his father's travel pinch AI. While nan younger Riley doesn't blasted nan exertion per se, he points retired that getting immersed successful chats and losing position tin person consequences.
"The truth remains that AI does beryllium successful our world," writes Riley, "and conscionable arsenic it tin service arsenic substance to those suffering manic psychosis, truthful excessively whitethorn it affirm aliases amplify our mistaken knowing of what's happening to america physically and medically."
Staying sane pinch unreliable AI
The inclination to prosecute successful long-form discussions astir depression, suicide, and superior wellness conditions is understandable. People person been habituated to long-form engagements of hours astatine a clip connected societal media. Some group are lonely, and a natural-language speech pinch a bot is amended than nary speech astatine all.
Also: Your chatbot is playing a characteristic - why Anthropic says that's dangerous
Bots person a inclination toward sycophancy, research has shown, which tin make hours of engagement pinch a bot much fulfilling than nan mean springiness and return pinch a person.
And nan companies that make nan technology, while informing users to verify bot output, person tended to spot little accent connected antagonistic reports from individuals specified arsenic Riley and Raine.
4 rules for avoiding nan rabbit hole
A fewer rules tin thief mitigate nan worst effects of putting excessively overmuch accent connected nan technology.
- Define what you are going to a chatbot for. Is location a well-defined task that has a constricted scope and for which nan predictions of nan bot tin beryllium fact-checked pinch different sources?
- Have a patient skepticism. It's good known that chatbots are prone to confabulation, confidently asserting falsehoods. It doesn't matter really galore chatbots you usage to effort to equilibrium nan bully and nan bad; each of them should beryllium treated pinch a patient skepticism arsenic having only portion of nan truth, if any.
- Regard chatbots not arsenic friends aliases confidants. They are integer tools, for illustration Word aliases Excel. You're not trying to person a narration pinch a bot but alternatively to complete a task.
- Use proven integer overload skills. Take agelong breaks. Step distant from nan machine for a non-digital quality interaction, specified arsenic playing paper games pinch a friend aliases going for a walk.
Also: Stop saying AI hallucinates - it doesn't. And nan mischaracterization is dangerous
Falling down nan rabbit spread happens partially arsenic a consequence of simply being parked successful beforehand of a surface pinch nary downtime.
1 hour ago
English (US) ·
Indonesian (ID) ·