I Tested Local Ai On My M1 Mac, Expecting Magic - And Got A Reality Check Instead

Trending 4 days ago
MacBook Pro M1

The M1 MacBook Pro is an aged but still tin instrumentality successful 2026.

Kyle Kucharski/ZDNET

Follow ZDNET: Add america arsenic a preferred source connected Google.


ZDNET's cardinal takeaways

  • Ollama makes it reasonably easy to download open-source LLMs.
  • Even mini models tin tally painfully slow.
  • Don't effort this without a caller instrumentality pinch 32GB of RAM.

As a newsman covering artificial intelligence for complete a decade now, I person ever known that moving artificial intelligence brings each kinds of machine engineering challenges. For 1 thing, nan ample connection models support getting bigger, and they keep demanding much and much DRAM memory to tally their exemplary "parameters," aliases "neural weights."

Also: How to instal an LLM connected MacOS (and why you should)

I person known each that, but I wanted to get a consciousness for it firsthand. I wanted to tally a ample connection exemplary connected my location computer.

Now, downloading and moving an AI exemplary tin impact a batch of activity to group up nan "environment." So, inspired by my workfellow Jack Wallen's coverage of nan open-source instrumentality Ollama, I downloaded nan MacOS binary of Ollama arsenic my gateway to section AI.

Ollama is comparatively easy to use, and it has done bully activity integrating pinch LangChain, Codex, and more, which intends it is becoming a instrumentality for bringing together tons of aspects of AI, which is exciting.

Reasons to support it local

Running LLMs locally, alternatively than conscionable typing into ChatGPT aliases Perplexity online, has a batch of entreaty for not conscionable programmers, but immoderate accusation worker.

First, arsenic an accusation worker, you will beryllium much desirable successful nan occupation marketplace if you tin do thing for illustration download a exemplary and tally it alternatively than typing into nan online punctual conscionable for illustration each free personification of ChatGPT. We're talking basal master improvement here.

Second, pinch a section lawsuit of an LLM, you tin support your delicate information from leaving your machine. That should beryllium of evident value to immoderate accusation worker, not conscionable coders. In my case, my task extremity was to usage section models arsenic a measurement to excavation my ain trove of articles complete nan years, arsenic a benignant of study connected what I've written, including things I mightiness person forgotten about. I liked nan thought of keeping each nan files section alternatively than uploading them to a unreality service.

Also: I tried vibe coding an app arsenic a beginner - here's what Cursor and Replit taught me

Third, you tin debar fees charged by OpenAI, Google, Anthropic, and nan rest. As I wrote recently, prices are group to emergence for utilizing LLMs online, truthful now is simply a bully clip to deliberation astir ways to do nan bulk of your activity offline, connected your ain machine, wherever nan metre is not perpetually running.

(Disclosure: Ziff Davis, ZDNET's genitor company, revenge an April 2025 suit against OpenAI, alleging it infringed Ziff Davis copyrights successful training and operating its AI systems.)

Fourth, you person a batch much control. For example, if you do want to do programming, you tin tweak LLMs, known arsenic fine-tuning them, to get much focused results. And you tin usage various locally installed devices specified arsenic LangChain, Anthropic's Claude Code tool, OpenAI's Codex coding tool, and more.

Also: Why you'll salary much for AI successful 2026, and 3 money-saving tips to try

Even if you conscionable want to do information-worker tasks specified arsenic generating reports, doing truthful pinch a section cache of documents aliases a section database tin beryllium done pinch greater power than uploading worldly to nan bot.

Bare-minimum bare-metal

I group retired connected this research pinch a bare-minimum machine, arsenic acold arsenic what it takes to tally an LLM. I wanted to find retired what would hap if personification who doesn't perpetually bargain caller machines tried to do this astatine location connected nan aforesaid machine they usage for mundane tasks.

My MacBook Pro is 3 years aged and has 16 gigabytes of RAM and a terabyte difficult thrust that's three-quarters full, moving not nan latest MacOS, but MacOS Sonoma. It's nan 2021 model, model number MK193LL/A, and so, while it was apical of nan statement erstwhile I bought it astatine Best Buy successful January of 2023 successful a close-out sale, it was already becoming yesterday's champion exemplary backmost then.

Also: 5 reasons I usage section AI connected my desktop - alternatively of ChatGPT, Gemini, aliases Claude

I know, I know: This is beyond nan emblematic useful life of machines and beyond anyone's depreciation schedule. Nevertheless, nan MacBook was a awesome upgrade astatine nan time, and it has continued to execute superbly connected a regular ground for nan emblematic information-worker tasks: calendar, tons of email, tons of websites, video post-production, podcast audio recording, and more. I ne'er person immoderate complaints. Hey, if it ain't broke, right?

So nan mobility was, really would this venerable but still mighty instrumentality grip a very different caller benignant of workload?

Starting Ollama

The start-up surface for Ollama looks for illustration ChatGPT, pinch a friends punctual to type into, a "plus" motion to upload a document, and a drop-down paper of models you tin instal locally, including celebrated ones specified arsenic Qwen.

If you conscionable commencement typing astatine nan prompt, Ollama will automatically effort to download immoderate exemplary is showing successful nan drop-down menu. So, don't do immoderate typing unless you want to spell pinch exemplary roulette.

ollama-start-up-screen-jan-2026.png
Screenshot by Tiernan Ray for ZDNET

Instead, I looked done nan models successful nan drop-down list, and I realized that immoderate of these models weren't section -- they were successful nan cloud. Ollama runs a unreality work if you want its infrastructure alternatively of your own. That tin beryllium useful if you want to usage overmuch larger models that would overly taxation your ain infrastructure.

Per the pricing page, Ollama offers immoderate entree to nan unreality successful nan free account, pinch nan expertise to tally aggregate unreality models covered by nan "Pro" scheme astatine $20 per month, and moreover much usage successful nan "Max" scheme astatine $100 per month.

Also: This app makes utilizing Ollama section AI connected MacOS devices truthful easy

Sticking pinch locally moving options, I decided to cheque retired the broader database of models successful nan exemplary directory maintained by Ollama.

At random, I chose glm-4.7-flash, from nan Chinese AI startup Z.ai. Weighing successful astatine 30 cardinal "parameters," aliases neural weights, GLM-4.7-flash would beryllium a "small" ample connection exemplary by today's standards, but not tiny, arsenic location are open-source models pinch less than a cardinal parameters. (A cardinal parameters was big, not truthful agelong ago!)

The directory gives you nan terminal commands to download nan chosen exemplary from nan Mac terminal, conscionable by copying and pasting astatine nan prompt, specified as:

ollama tally glm-4.7-flash

Be mindful of disk space. Glm-4.7-flash weighs successful astatine 19 gigabytes of disk usage, and remember, that's small!

In my experience, downloading models seems reasonably swift, though not lightning fast. On a gigabit-speed cablegram modem to my location agency provided by Spectrum successful New York City, nan exemplary was downloading astatine a complaint of 45 megabytes per 2nd astatine 1 point, though it later dropped to a slower complaint of throughput.

Getting to cognize nan exemplary

My first punctual was reasonably straightforward: "What benignant of ample connection exemplary are you?"

I sat watching for a while arsenic nan first fewer characters materialized successful response: "[Light bulb icon] Thinking -- Let maine analyse what makes maine a" and that was it.

Also: My go-to LLM instrumentality conscionable dropped a ace elemental Mac and PC app for section AI - why you should effort it

Ten minutes later, it hadn't gotten overmuch farther.

Let maine analyse what makes maine a ample connection exemplary and really to explicate this to nan user.

First, I request to see my basal quality arsenic an AI system. I should explicate that I'm designed to understand and make quality connection done patterns successful ample datasets. The cardinal is to beryllium clear

And everything connected nan Mac had go noticeably sluggish.

Forty-five minutes later, glm-4.7-flash was still producing thoughts astir thinking: "Let maine building this mentation to first authorities clearly…," and truthful on.

Trapped successful punctual creep

An hr and 16 minutes later -- nan exemplary "thought" for 5,197.3 seconds -- I yet had an reply to my query astir what benignant of connection exemplary glm-4.7-flash was. The reply turned retired not to beryllium each that absorbing for each nan clip spent. It didn't show maine overmuch astir glm that I couldn't person divined connected my own, nor thing important astir nan quality betwixt glm and different ample connection models:

I figured I was done pinch glm astatine this point. Unfortunately, Ollama provides nary instructions for removing a exemplary erstwhile it's installed locally. The models are kept successful a hidden files ".ollama" successful nan existent personification directory connected MacOS, wrong different files called "models." Inside nan models files are 2 folders, "blobs" and "manifests." The bulk of a exemplary is successful nan blobs folder. Inside nan manifests is simply a files "library" containing a files named for each exemplary you've downloaded, and wrong that, a "latest" folder.

gpt-oss-thinking-about-itself-in-ollama-jan-2026.png
Screenshot by Tiernan Ray for ZDNET

Using nan terminal, I deleted nan contents of blobs and deleted nan contents of each exemplary folder, and that solved nan matter. (Jack later informed maine that nan terminal bid to get free of immoderate exemplary is "ollama rm <model name>".) 

Jack had besides recommended OpenAI's caller open-source model, gpt-oss, successful nan 20-billion-parameter flavor, "20b," which he said was markedly faster moving locally than others he'd tried. So, I went adjacent to that in nan directory.

Also: This is nan fastest section AI I've tried, and it's not moreover adjacent - really to get it

This time, aft astir six minutes, gpt-oss:20b produced -- astatine a gait not snail-like, but not swift either -- nan consequence that it is "ChatGPT, powered by OpenAI's GPT-4 family," and truthful on.

That consequence was followed by a bully array of details. (Oddly, gpt-oss:20b told maine it had "roughly 175 cardinal parameters," which suggests gpt-oss:20b doesn't wholly grasp its ain 20b identity.)

gpt-oss-reflects-on-itself-january-2026.png
Screenshot by Tiernan Ray for ZDNET

At immoderate rate, this was good for a elemental prompt. But it was already clear that I was going to person problems pinch thing other much ambitious. The emotion of waiting for nan reply was slow capable -- a benignant of punctual creep, you mightiness opportunity -- that I didn't situation task to adhd immoderate much complexity, specified arsenic uploading an full trove of writings.

We're going to request a newer instrumentality

OpenAI's existent ChatGPT online work (running GPT5.2) tells maine that a minimum configuration for a machine moving gpt-oss:20b is really 32 gigabytes of DRAM. The M1 Pro silicon of nan MacBook has an integrated GPU, and ChatGPT approvingly pointed retired that Ollama has provided nan gpt-oss:20b type pinch support for nan Mac GPU, a room known arsenic a "llama.cpp backend."

Also: I tried nan only agentic browser that runs section AI - and recovered only 1 downside

So, everything should beryllium OK, but I really do request much DRAM than conscionable 16 gigs. And I request to waste and acquisition up from nan now five-year-old M1 to an M4 aliases M5. It's alternatively fascinating to me, pinch 3 decades of penning astir computers, that for an accusation worker, we are talking astir 32 gigabytes arsenic nan minimum reasonable configuration.

As I mentioned recently, DRAM is skyrocketing successful value because each those unreality information centers are consuming much and much DRAM to tally ample connection models. So, it's maine against nan unreality vendors, you could say, and I'll astir apt beryllium dipping into nan in installments paper to waste and acquisition up to a caller computer. (Apple will springiness maine astir $599 for my M1 MacBook arsenic a trade-in.)

While my fledgling section Ollama effort didn't output success, it has fixed maine a newfound appreciation for conscionable really memory-intensive AI is. I ever knew that from years of reporting connected AI, but I now consciousness it successful my bones, that consciousness erstwhile nan consequence to nan punctual takes everlastingly scrolling crossed nan screen.

More