Member-only story

The non Transformer architecture has stood up! The first pure attention free large model, surpassing open source giant Llama 3.1

6 min readOct 29, 2024

The large model of Mamba architecture once again challenges Transformer.

Is the Mamba architecture model finally going to “stand up” this time? Since its first launch in December 2023, Mamba has become a strong competitor to Transformers.

Since then, models using the Mamba architecture have continued to emerge, such as Mistral’s first open-source large model Codestral 7B based on the Mamba architecture.

Today, the Abu Dhabi Institute for Technology Innovation (TII) released a report New open-source Mamba model — Falcon Mamba 7B 。

First, let’s summarize the highlights of Falcon Mamba 7B: it can handle sequences of any length without increasing memory storage, and can run on a single 24GB A10 GPU.

Currently, Falcon Mamba 7B can be viewed and used on Hugging Face, which utilizes a novel causal decoder model Mamba State Space Language Model (SSLM) architecture To handle various text generation tasks.

From the results, Falcon Mamba 7B outperforms leading models of the same size class on some benchmarks, including Meta’s Llama 3 8B, Llama 3.1 8B, and Mistral 7B.

The non Transformer architecture has stood up! The first pure attention free large model, surpassing open source giant Llama 3.1

Written by Beck Moulton

No responses yet