Posted by Alumni from Substack
May 9, 2024
Transformer architectures have been the dominant paradigm in LLMs leading to exceptional advancements in research and development. The question of whether transformers will be the final architecture to reach AGI versus the real possibility of new architecture paradigm has been a passionate topic of debate in the AI community. Recently, researchers from Princeton University and Carnegie Mellon proposed the Mamba architecture based on state space models(SSMs) which has become the most viable alternative to transformers. Instead of thinking about SSMs vs. transformers, could we try to combine the two' This is the thesis behind a new model called Jamba released by the ambitious team at AI21 Labs. Jamba combines transformers and SSMs in a single architecture that could open new avenues for the future of LLMs. Until this point, the creation of LLMs has largely hinged on the use of traditional Transformer structures, known for their robust capabilities. However, these structures have two... learn more