Date: October 25, 2024
Meta has taken a significant step towards providing faster and smaller AI models to broader audiences with better mobile device compatibility.
Meta AI is making significant upgrades to its Llama models to offer 2.4X faster processing of tasks while reducing up to 56% in model sizes. This effort is derived from the rising hardware processing requirements and lack of adequate power supply that demand sleeker AI models with equivalent performance capabilities. High energy costs, lengthy training times, and expensive semiconductors necessary for computational power can now be met with the release of the latest Llama 3.2 AI model.
The new AI model is built on two distinct techniques: Quantization-Aware Training (QAT) with LoRA adapters, which prioritizes accuracy, and SpinQuant, a state-of-the-art post-training quantization method to enable portability. The release includes downloadable formats for both versions.
The shrink-down of model sizes backed by powerful output capabilities will dramatically enhance research and business optimization efforts through cutting-edge AI technologies without needing specialized and costlier infrastructure.
Llama 3.2 has surpassed quality and safety industry benchmarks while achieving a whopping 2-4X faster processing speeds. The new AI model also achieved an average 56% reduction in size and 41% lesser memory usage compared to its previous BF16 format.
This advancement also improves compatibility with mobile devices, offering better features for mobile users within their hardware specifications. This reduction is powered by a technique that precisely reduces the model’s weights and activations from 32-bit floating-point numbers to lower-bit representations.
Meta AI also utilizes 8-bit and 4-bit quantization strategies, which reduce memory consumption and computational power demands while ensuring the retention of critical features of Llama 3, like advanced Natural Language Processing, real-time application integrations, and visual inference tasks.
Meta AI has also partnered with industry-leading partners to make the sleeker AI model Llama 3.2 available on Qualcomm and MediaTek System on Chips (SoCs) with Arm CPUs. The partnership aims to empower advanced performance capabilities on consumer-grade hardware, tapping a broader audience network and popular platforms. Llama 3.2 underscores the importance of addressing scalability issues common for businesses and research organizations while maintaining a high level of performance. Early benchmarks indicate that Quantized Llama 3.2 performs approximately 95% of the full Llama 3 model at 60% less memory usage. This achievement will help establish higher credibility for AI chatbots in terms of reducing the environmental impact of training and deploying LLMs.
By Arpit Dubey
Arpit is a dreamer, wanderer, and tech nerd who loves to jot down tech musings and updates. Armed with a Bachelor's in Business Administration and a knack for crafting compelling narratives and a sharp specialization in everything from Predictive Analytics to FinTech—and let’s not forget SaaS, healthcare, and more. Arpit crafts content that’s as strategic as it is compelling. With a Logician mind, he is always chasing sunrises and tech advancements while secretly preparing for the robot uprising.
Reddit Unveils AI-Powered Search Tool for Smarter Results
Reddit launched Reddit Answers, an AI-powered search tool that curates and summarizes discussions to enhance user experience and reduce reliance on Google.
OpenAI Scraps o3 Model, Pushes for Unified GPT-5 in a Major AI Overhaul
OpenAI is canceling its o3 AI model and merging it into GPT-5 for a simpler, more powerful system. A big move to stay ahead in the AI race.
Virtual Reality in Healthcare: Revolutionizing Patient Care
Experience the power of virtual reality in healthcare as it transforms medical training, patient care, and treatment methods with immersive technology for better accuracy, efficiency, and improved outcomes.
Google I/O 2025: Dates Announced for the Tech Giant’s Biggest Event of the Year
Google I/O 2025 is set for May 20-21! Expect big AI reveals, Android 16 updates, and more. Registrations are open for keynotes, demos, and game-changing tech innovations!