Home > EulerStack > Tutorials > Mixers > 2. Mamba in detail

2. Mamba in detail

One-Line Summary

"Maintains a dynamic state per token, sweeping left-to-right once, with O(N) cost."

How Does It Work?

Where Attention "looks at all past tokens simultaneously," Mamba "carries a small memory (state) and sweeps left to right, updating it" — like an RNN. But unlike classical RNN/LSTM, Mamba's state update rule changes dynamically per input (selective SSM). This is what makes it powerful.

Core idea: 1. State Space Model (SSM): discretize h'(t) = A·h(t) + B·x(t) to h[k] = Ā·h[k-1] + B̄·x[k]. 2. Selective: A, B depend on input x (not fixed). 3. Parallel scan: left-to-right dependency computed in parallel on GPU. 4. Hardware-aware: custom CUDA kernels optimized for SRAM.

Result: O(N) linear scaling, fixed-size state (d_state), replaces KV cache with a small per-layer state at inference.

Strengths

Weaknesses

Mamba1 vs Mamba2

Real-World Use

When Is Mamba Good?

Scenario Mamba quality
Long doc summarization (≥ 32K) ★★★★★ linear cost
Real-time streaming ★★★★★ tiny state
Time series / DNA ★★★★★ long sequential input
Coding (exact symbol match) ★★★ (hybrid with Attention recommended)
Short chat (≤ 4K) ★★★ (less benefit)

EulerStack YAML

layer_templates:
  mamba_layer:
    mixer:
      type: mamba
      mamba:
        variant: mamba2
        d_state: 128
        d_conv: 4
        expand: 2
    ffn:
      type: gated_mlp
      activation: swiglu
    state:
      ssm_state: true

Papers