Meta Llama: Everything You Need To Know About The Open Generative Ai Model

Trending 1 month ago

Like each Big Tech institution these days, Meta has its ain flagship generative AI model, called Llama. Llama is somewhat unique among awesome models successful that it’s “open,” meaning developers tin download and usage it nevertheless they please (with definite limitations). That’s in opposition to models like Anthropic’s Claude, Google’s Gemini, xAI’s Grok, and astir of OpenAI’s ChatGPT models, which tin only beryllium accessed via APIs. 

In nan liking of giving developers choice, however, Meta has besides collaborated pinch vendors, including AWS, Google Cloud, and Microsoft Azure, to make cloud-hosted versions of Llama available. In addition, nan company publishes tools, libraries, and recipes successful its Llama cookbook to thief developers fine-tune, evaluate, and accommodate nan models to their domain. With newer generations like Llama 3 and Llama 4, these capabilities person expanded to see autochthonal multimodal support and broader unreality rollouts. 

Here’s everything you request to cognize about Meta’s Llama, from its capabilities and editions to wherever you tin usage it. We’ll keep this station updated arsenic Meta releases upgrades and introduces caller dev devices to support nan model’s use.

What is Llama?

Llama is simply a family of models — not conscionable one. The latest type is Llama 4; it was released successful April 2025 and includes 3 models:  

  • Scout: 17 cardinal progressive parameters, 109 cardinal full parameters, and a discourse model of 10 cardinal tokens. 
  • Maverick: 17 cardinal progressive parameters, 400 cardinal full parameters, and a discourse model of 1 cardinal tokens. 
  • Behemoth: Not yet released but will have 288 cardinal active parameters and 2 trillion full parameters.  

(In information science, tokens are subdivided bits of earthy data, for illustration nan syllables “fan,” “tas” and “tic” successful nan connection “fantastic.”)  

A model’s context, aliases discourse window, refers to input information (e.g., text) that nan exemplary considers earlier generating output (e.g., additional text). Long discourse tin forestall models from “forgetting” nan contented of caller docs and data, and from veering disconnected taxable and extrapolating wrongly. However, longer discourse windows tin also result successful nan exemplary “forgetting” definite information guardrails and being much prone to nutrient contented that is successful statement pinch nan conversation, which has led some users toward delusional thinking.  

For reference, the 10 cardinal discourse window that Llama 4 Scout promises roughly equals the matter of astir 80 mean novels. Llama 4 Maverick’s 1 cardinal discourse model equals astir 8 novels.  

Techcrunch event

San Francisco | October 27-29, 2025

All of the Llama 4 models were trained connected “large amounts of unlabeled text, image, and video data” to springiness them “broad ocular understanding,” as good arsenic connected 200 languages, according to Meta.  

Llama 4 Scout and Maverick are Meta’s first open-weight natively multimodal models. They’re built utilizing a “mixture-of-experts” (MoE) architecture, which reduces computational load and improves ratio successful training and inference. Scout, for example, has 16 experts, and Maverick has 128 experts.   

Llama 4 Behemoth includes 16 experts, and Meta is referring to it arsenic a coach for nan smaller models. 

Llama 4 builds connected nan Llama 3 series, which included 3.1 and 3.2 models wide utilized for instruction-tuned applications and unreality deployment. 

What tin Llama do?

Like different generative AI models, Llama tin execute a scope of different assistive tasks, for illustration coding and answering basal mathematics questions, arsenic good arsenic summarizing documents in at slightest 12 languages (Arabic, English, German, French, Hindi, Indonesian, Italian, Portuguese, Hindi, Spanish, Tagalog, Thai, and Vietnamese). Most text-based workloads — deliberation analyzing large files for illustration PDFs and spreadsheets — are wrong its purview, and each Llama 4 models support text, image, and video input. 

Llama 4 Scout is designed for longer workflows and monolithic information analysis. Maverick is simply a generalist exemplary that is amended astatine balancing reasoning powerfulness and consequence speed, and is suitable for coding, chatbots, and method assistants. And Behemoth is designed for precocious research, exemplary distillation, and STEM tasks.  

Llama models, including Llama 3.1, tin beryllium configured to leverage third-party applications, tools, and APIs to execute tasks. They are trained to usage Brave Search for answering questions astir caller events; the Wolfram Alpha API for math- and science-related queries; and a Python expert for validating code. However, these devices require proper configuration and are not automatically enabled retired of nan box. 

Where tin I usage Llama?

If you’re looking to simply chat pinch Llama, it’s powering nan Meta AI chatbot experience on Facebook Messenger, WhatsApp, Instagram, Oculus, and Meta.ai in 40 countries. Fine-tuned versions of Llama are utilized successful Meta AI experiences successful complete 200 countries and territories.  

Llama 4 models Scout and Maverick are disposable connected Llama.com and Meta’s partners, including nan AI developer level Hugging Face. Behemoth is still successful training. Developers building with Llama tin download, use, or fine-tune nan exemplary crossed astir of nan celebrated unreality platforms. Meta claims it has more than 25 partners hosting Llama, including Nvidia, Databricks, Groq, Dell, and Snowflake. And while “selling access” to Meta’s openly disposable models isn’t Meta’s business model, nan institution makes immoderate money through revenue-sharing agreements with exemplary hosts. 

Some of these partners person built additional tools and services connected apical of Llama, including devices that fto nan models reference proprietary information and alteration them to tally astatine little latencies. 

Importantly, nan Llama license constrains really developers tin deploy nan model: App developers pinch much than 700 cardinal monthly users must petition a typical licence from Meta that nan institution will assistance connected its discretion. 

In May 2025, Meta launched a new program to incentivize startups to adopt its Llama models. Llama for Startups gives companies support from Meta’s Llama squad and entree to imaginable funding.  

Alongside Llama, Meta provides devices intended to make nan exemplary “safer” to use:  

  • Llama Guard, a moderation framework. 
  • Prompt Guard, a instrumentality to protect against prompt injection attacks. 
  • CyberSecEval, a cybersecurity consequence appraisal suite. 
  • Llama Firewall, a information guardrail designed to alteration building unafraid AI systems. 
  • Code Shield, which provides support for inference-time filtering of insecure codification produced by LLMs.  

Llama Guard tries to observe perchance problematic contented either fed into — aliases generated — by a Llama model, including contented relating to criminal activity, kid exploitation, copyright violations, hate, self-harm and intersexual abuse. That said, it’s clearly not a metallic slug since Meta’s ain erstwhile guidelines allowed nan chatbot to prosecute successful sensual and romanticist chats pinch minors, and immoderate reports show those turned into sexual conversations. Developers can customize the categories of blocked contented and use nan blocks to each nan languages Llama supports. 

Like Llama Guard, Prompt Guard tin artifact matter intended for Llama, but only matter meant to “attack” nan exemplary and get it to behave successful undesirable ways. Meta claims that Llama Guard tin take sides against explicitly malicious prompts (i.e., jailbreaks that attempt to get astir Llama’s built-in information filters) successful summation to prompts that contain “injected inputs.” The Llama Firewall useful to observe and forestall risks for illustration punctual injection, insecure code, and risky instrumentality interactions. And Code Shield helps mitigate insecure codification suggestions and offers unafraid bid execution for 7 programming languages. 

As for CyberSecEval, it’s less a instrumentality than a postulation of benchmarks to measurement exemplary security. CyberSecEval can measure nan consequence a Llama exemplary poses (at slightest according to Meta’s criteria) to app developers and extremity users successful areas for illustration “automated societal engineering” and “scaling violative cyber operations.” 

Llama’s limitations

Image Credits:Artificial Analysis

Llama comes pinch definite risks and limitations, for illustration each generative AI models. For example, while its astir caller exemplary has multimodal features, those are mainly limited to the English language for now. 

Zooming out, Meta utilized a dataset of pirated e-books and articles to train its Llama models. A national judge precocious sided pinch Meta successful a copyright suit brought against nan institution by 13 book authors, ruling that the usage of copyrighted useful for training fell nether “fair use.” However, if Llama regurgitates a copyrighted snippet and personification uses it successful a product, they could perchance beryllium infringing connected copyright and beryllium liable.  

Meta also controversially trains its AI connected Instagram and Facebook posts, photos and captions, and makes it difficult for users to opt out.  

Programming is different area where it’s wise to tread lightly erstwhile utilizing Llama. That’s because Llama mightiness — perhaps more so than its generative AI counterparts — produce buggy aliases insecure code. On LiveCodeBench, a benchmark that tests AI models connected competitory coding problems, Meta’s Llama 4 Maverick exemplary achieved a people of 40%. That’s compared to 85% for OpenAI’s GPT-5 high and 83% for xAI’s Grok 4 Fast. 

As always, it’s best to person a quality master reappraisal immoderate AI-generated codification earlier incorporating it into a work aliases software. 

Finally, arsenic pinch different AI models, Llama models are still blameworthy of generating plausible-sounding but mendacious aliases misleading information, whether that’s in coding, ineligible guidance, or emotional conversations pinch AI personas.  

This was primitively published connected September 8, 2024 and is updated regularly pinch caller information.

More