#News

TTT Models May Be The Next Big Thing In Generative AI

TTT Models May Be The Next Big Thing In Generative AI

Date: July 18, 2024

TTT models can be an efficient alternative to transformer models in terms of energy consumption and processing vast amounts of data.

Ever since the boom of generative AI, transformers have been an integral part of AI chatbot performance. Whether it is text-to-action or text-to-media generation, transformers power everything that happens in the Artificial Intelligence space. However, these transformers are also the primary reason why most AI giants are facing power supply challenges.

Transformers are currently at the heart of every text-generating AI model, such as Anthropic’s Clause, Google’s Gemini, and OpenAI’s GPT-4o. However, transformers are not efficient in processing vast amounts of data due to one particular function known as Lookup. Off-the-shelf hardware cannot run the transformers’ processing and analyzing functions of vast amounts of data.

Another drawback of Transformers is its Hidden State function, which is a sea of long-form data. This set of data is accessed through the lookup function every time the AI chatbot performs a function. It is like reading an entire book again and again even to understand one line of context.

“If you think of a transformer as an intelligent entity, then the lookup table — its hidden state — is the transformer’s brain. This specialized brain enables the well-known capabilities of transformers such as in-context learning,” said Yu Sun, a post-doc at Stanford and a co-contributor on the TTT research.

Researchers at Stanford, UC San Diego, UC Berkeley, and Meta are developing a new processing technology called Test-Time Training. The TTT team claims that its models can not only process more data than transformers but also do it without consuming as much power. While this can emerge as a potential solution to the limitations of Transformers. TTT systems are in their initial development phase and need more substantial data to back their efficiency against the legacy transformers.

Another alternative solution to Transformers with more data backing is the State Space Model. Mistral, an AI startup, has released Codestral Mamba based on SSMs, which can perform generative AI functions more efficiently and can scale up to larger amounts of data. A121 Labs is also exploring the efficiency of SSMs, making it evident that a breakthrough is much needed in the generative AI computation and energy consumption space.

Arpit Dubey

By Arpit Dubey LinkedIn Icon

Have newsworthy information in tech we can share with our community?

Post Project Image

Fill in the details, and our team will get back to you soon.

Contact Information
+ * =