Meta's vp of generative AI, Ahmad Al-Dahle, took on social network X today announce the publication from Llama 3.3, the newest open source Multilingual Large Language (LLM) model from the parent company of Facebook, Instagram, WhatsApp and Quest VR.
He wrote: “Llama 3.3 improves core performance at a significantly lower cost, making it much more accessible to all the open source community.”
With 70 billion parameters – or settings that control the model's behavior – Llama 3.3 delivers results on par with Meta's 405B parameter model from the summer's Llama 3.1, but at a fraction of the fee and computational effort – e.g. B. the GPU capability required to operate the model in a conclusion.
It's designed to supply premium performance and accessibility, but in a smaller package than previous base models.
Meta's Llama 3.3 is obtainable under the Llama 3.3 Community License Agreementwhich grants a non-exclusive, royalty-free license to make use of, reproduce, distribute and modify the model and its results. Developers incorporating Llama 3.3 into services or products must provide appropriate attribution equivalent to “Built with Llama” and cling to a suitable use policy that prohibits activities equivalent to generating harmful content, violating laws, or enabling cyberattacks. While the license is mostly free, organizations with over 700 million monthly lively users must purchase a business license directly from Meta.
A press release from the AI ​​at Meta team underscores this vision: “Llama 3.3 delivers leading performance and quality for text-based use cases at a fraction of the inference cost.”
How much savings are we actually talking about? Just a few rough calculations:
According to Llama 3.1-405B, between 243GB and 1944GB of GPU memory is required Substratus Blog (for the open source cross-cloud substrate). Meanwhile, the older Llama 2-70B requires between 42 and 168 GB of GPU memory Same blogalthough the identical applies supposedly only 4 GBor as Exo Labs has shown, just a few Mac computers with M4 chips and no discrete GPUs.
So if the GPU savings for models with lower parameters proceed on this case, those that wish to deploy Meta's strongest open source Llama models can expect a saving of as much as 1940 GB of GPU memory, or possibly a 24- times reduced GPU load for an ordinary 80GB Nvidia H100 GPU.
Estimated $25,000 per H100 GPUThat's potentially as much as $600,000 in upfront GPU cost savings – not to say ongoing power costs.
A high-performance model in a small form factor
Accordingly Meta AI on XThe Llama 3.3 model significantly outperforms the identical size Llama 3.1-70B in addition to Amazon's recent Nova Pro model in several benchmarks equivalent to multilingual dialogue, reasoning and other advanced natural language processing (NLP) tasks (Nova outperforms it in HumanEval coding tasks ). ).
According to the knowledge provided by Meta within the “model map” published on its website, Llama 3.3 was pre-trained on 15 trillion tokens from “publicly available” data and refined on over 25 million synthetically generated examples.
The development of the model leverages 39.3 million GPU hours on H100-80GB hardware, highlighting Meta's commitment to energy efficiency and sustainability.
Llama 3.3 leads MGSM in multilingual reasoning tasks with an accuracy rate of 91.1%, demonstrating its effectiveness in supporting languages ​​equivalent to German, French, Italian, Hindi, Portuguese, Spanish and Thai along with English.
Cost-effective and environmentally conscious
Llama 3.3 is specifically optimized for low-cost inference, with token generation costs as little as $0.01 per million tokens.
This makes the model highly competitive against industry peers equivalent to GPT-4 and Claude 3.5 and offers greater affordability for developers seeking to deploy sophisticated AI solutions.
Meta has also highlighted the environmental responsibility of this publication. Despite its intensive training process, the corporate used renewable energy to offset greenhouse gas emissions, leading to net-zero emissions within the training phase. Site-related emissions totaled 11,390 tonnes of CO2 equivalent, but Meta's renewable energy initiatives ensured sustainability.
Advanced features and deployment options
The model introduces several improvements, including an extended context window with 128,000 tokens (comparable to GPT-4o, about 400 pages of book text), making it suitable for long-form content generation and other advanced use cases.
Its architecture includes Grouped Query Attention (GQA), improving scalability and performance during inference.
Llama 3.3 is designed around user preferences for safety and helpfulness and leverages reinforcement learning with human feedback (RLHF) and supervised fine-tuning (SFT). This alignment ensures robust rejection of inappropriate prompts and wizard-like behavior optimized for real-world applications.
Llama 3.3 is already available for download Meta, Hugging face, GitHuband other platforms with integration options for researchers and developers. Meta also offers resources equivalent to Llama Guard 3 and Prompt Guard to assist users deploy the model safely and responsibly.