Date: July 18, 2024
TTT models can be an efficient alternative to transformer models in terms of energy consumption and processing vast amounts of data.
Ever since the boom of generative AI, transformers have been an integral part of AI chatbot performance. Whether it is text-to-action or text-to-media generation, transformers power everything that happens in the Artificial Intelligence space. However, these transformers are also the primary reason why most AI giants are facing power supply challenges.
Transformers are currently at the heart of every text-generating AI model, such as Anthropic’s Clause, Google’s Gemini, and OpenAI’s GPT-4o. However, transformers are not efficient in processing vast amounts of data due to one particular function known as Lookup. Off-the-shelf hardware cannot run the transformers’ processing and analyzing functions of vast amounts of data.
Another drawback of Transformers is its Hidden State function, which is a sea of long-form data. This set of data is accessed through the lookup function every time the AI chatbot performs a function. It is like reading an entire book again and again even to understand one line of context.
“If you think of a transformer as an intelligent entity, then the lookup table — its hidden state — is the transformer’s brain. This specialized brain enables the well-known capabilities of transformers such as in-context learning,” said Yu Sun, a post-doc at Stanford and a co-contributor on the TTT research.
Researchers at Stanford, UC San Diego, UC Berkeley, and Meta are developing a new processing technology called Test-Time Training. The TTT team claims that its models can not only process more data than transformers but also do it without consuming as much power. While this can emerge as a potential solution to the limitations of Transformers. TTT systems are in their initial development phase and need more substantial data to back their efficiency against the legacy transformers.
Another alternative solution to Transformers with more data backing is the State Space Model. Mistral, an AI startup, has released Codestral Mamba based on SSMs, which can perform generative AI functions more efficiently and can scale up to larger amounts of data. A121 Labs is also exploring the efficiency of SSMs, making it evident that a breakthrough is much needed in the generative AI computation and energy consumption space.
By Arpit Dubey
Arpit is a dreamer, wanderer, and tech nerd who loves to jot down tech musings and updates. Armed with a Bachelor's in Business Administration and a knack for crafting compelling narratives and a sharp specialization in everything from Predictive Analytics to FinTech—and let’s not forget SaaS, healthcare, and more. Arpit crafts content that’s as strategic as it is compelling. With a Logician mind, he is always chasing sunrises and tech advancements while secretly preparing for the robot uprising.
Reddit Unveils AI-Powered Search Tool for Smarter Results
Reddit launched Reddit Answers, an AI-powered search tool that curates and summarizes discussions to enhance user experience and reduce reliance on Google.
OpenAI Scraps o3 Model, Pushes for Unified GPT-5 in a Major AI Overhaul
OpenAI is canceling its o3 AI model and merging it into GPT-5 for a simpler, more powerful system. A big move to stay ahead in the AI race.
Virtual Reality in Healthcare: Revolutionizing Patient Care
Experience the power of virtual reality in healthcare as it transforms medical training, patient care, and treatment methods with immersive technology for better accuracy, efficiency, and improved outcomes.
Google I/O 2025: Dates Announced for the Tech Giant’s Biggest Event of the Year
Google I/O 2025 is set for May 20-21! Expect big AI reveals, Android 16 updates, and more. Registrations are open for keynotes, demos, and game-changing tech innovations!