#News

Meta AI Releases 2.4X Faster And 56% Sleeker Llama 3.2 Model

Meta AI Releases 2.4X Faster And 56% Sleeker Llama 3.2 Model

Date: October 25, 2024

Meta has taken a significant step towards providing faster and smaller AI models to broader audiences with better mobile device compatibility.

Meta AI is making significant upgrades to its Llama models to offer 2.4X faster processing of tasks while reducing up to 56% in model sizes. This effort is derived from the rising hardware processing requirements and lack of adequate power supply that demand sleeker AI models with equivalent performance capabilities. High energy costs, lengthy training times, and expensive semiconductors necessary for computational power can now be met with the release of the latest Llama 3.2 AI model. 

The new AI model is built on two distinct techniques: Quantization-Aware Training (QAT) with LoRA adapters, which prioritizes accuracy, and SpinQuant, a state-of-the-art post-training quantization method to enable portability. The release includes downloadable formats for both versions.

The shrink-down of model sizes backed by powerful output capabilities will dramatically enhance research and business optimization efforts through cutting-edge AI technologies without needing specialized and costlier infrastructure.

Llama 3.2 has surpassed quality and safety industry benchmarks while achieving a whopping 2-4X faster processing speeds. The new AI model also achieved an average 56% reduction in size and 41% lesser memory usage compared to its previous BF16 format.

This advancement also improves compatibility with mobile devices, offering better features for mobile users within their hardware specifications. This reduction is powered by a technique that precisely reduces the model’s weights and activations from 32-bit floating-point numbers to lower-bit representations. 

Meta AI also utilizes 8-bit and 4-bit quantization strategies, which reduce memory consumption and computational power demands while ensuring the retention of critical features of Llama 3, like advanced Natural Language Processing, real-time application integrations, and visual inference tasks.

Meta AI has also partnered with industry-leading partners to make the sleeker AI model Llama 3.2 available on Qualcomm and MediaTek System on Chips (SoCs) with Arm CPUs. The partnership aims to empower advanced performance capabilities on consumer-grade hardware, tapping a broader audience network and popular platforms. Llama 3.2 underscores the importance of addressing scalability issues common for businesses and research organizations while maintaining a high level of performance. Early benchmarks indicate that Quantized Llama 3.2 performs approximately 95% of the full Llama 3 model at 60% less memory usage. This achievement will help establish higher credibility for AI chatbots in terms of reducing the environmental impact of training and deploying LLMs.

Arpit Dubey

By Arpit Dubey LinkedIn Icon

Have newsworthy information in tech we can share with our community?

Post Project Image

Fill in the details, and our team will get back to you soon.

Contact Information
+ * =