0. Where EulerStack Fits — an ADL for LLMs
Read this first. Understanding what kind of tool EulerStack is makes every later tutorial fall into place.
1. One-line summary
EulerStack is an Architecture Description Language (ADL) for LLMs. It separates architecture out of the PyTorch files where structure, training, and serving usually live tangled together, and expresses it in a declarative language built for that purpose. This is the same kind of abstraction step the semiconductor industry took when Verilog and VHDL replaced schematics-plus-simulation-C in chip design.
2. Why a design language?
Suppose you want to try DeepSeek-V3's MLA attention. Today's workflow:
- Fork
modeling_llama.pyfrom HuggingFacetransformers. - Rewrite
LlamaAttention: splitW_q / W_k / W_vintoW_q, W_kv_latent, W_k_up, W_v_up. - Update
forwardand the KV-cache path because the cache shape changed. - Add state-dict key mapping for
save_pretrained / from_pretrained. - Around 200-300 lines of diff. The intent — "try MLA" — is essentially one line inside that.
Your intent is one line; the mechanics are hundreds. That gap imposes three costs:
- Review burden — reviewers must sort "what is the architectural change?" from "what is mechanical plumbing?".
- Lost intent — two months later, even you may struggle to recover what the essential change was.
- Hidden coupling — a structural change is interleaved with the training loop, the tokenizer adapter, and the serving serialiser, so a mistake in one silently breaks another.
The root cause is a tool gap. LLM models have always been described in general-purpose programming languages. Python handles the training loop, data preprocessing, serving adapters, eval scripts, and model structure all at once. That universality is exactly why it is not specialised for describing structure.
Why does a specialised language change the outcome? A dedicated vocabulary delivers three things:
- Compactness — a one-line structural change reads as one line.
- Verifiability — domain constraints like "MLA's
latent_dimmust be smaller thand_model" can be rejected at the language level, before anything runs. - Separation of concerns — structure is structure, training is training, serving is serving. Each axis evolves independently.
EulerStack is a declarative language dedicated to LLM architecture description, aimed squarely at these three.
3. The HDL precedent
"General-purpose code → domain-specific declarative language" is a pattern that has recurred throughout software history. The closest precedent EulerStack follows is HDL (Hardware Description Language) from the semiconductor world.
3.1 Pre-HDL (~1984)
Chip designers worked with: - Schematics — logic gates drawn by hand - Simulation C code — behaviour expressed imperatively - Manual optimisation — layout and timing by hand
All of this was code or drawings. But there was no dedicated language for describing a chip. A single file tangled "how it behaves" with "how it is implemented".
3.2 HDL arrives (Verilog 1984, VHDL 1987)
Verilog and VHDL emerged as architecture description languages (ADLs). Their distinguishing properties:
- Declarative: "this module has these inputs/outputs and this behaviour"
- Hierarchical: submodules compose into top-level modules
- Synthesis vs. simulation split: the same HDL specification feeds both simulation (verification) and synthesis (the real chip)
This was the essential leap. Chip-design know-how shifted out of general-purpose code and into a dedicated design language.
3.3 The parallel with LLMs
| Pre-HDL chip design | Pre-EulerStack LLM design | |
|---|---|---|
| Medium | Schematics + simulation C | modeling_xxx.py (PyTorch) |
| Problem | Behaviour/implementation tangled | Structure/training/serving tangled |
| Reuse | copy-paste | copy-paste |
| Change tracking | no schematic diff | intent buried in 200-line diff |
| Post-HDL chip design | Post-EulerStack LLM design | |
|---|---|---|
| Medium | Verilog/VHDL (ADL) | EulerStack YAML (ADL) |
| Separation | behaviour/synthesis/verification | structure/training/serving |
| Reuse | modules, IP cores | templates, presets |
| Change tracking | HDL diff = design change | YAML diff = architecture change |
The essential move is a lift in abstraction level — not "write better code," but "put a dedicated design language above the layer where code used to do everything." That is the change HDL brought to the semiconductor industry, and the change EulerStack aims to bring to LLM design.
4. What EulerStack actually does
┌──────────────────────────────────────────────────────┐
│ EulerStack YAML (ADL) │ ← design language
│ schema_version: 1 │
│ model: { d_model: 4096, n_heads: 32, ... } │
│ layer_templates: { ... } │
│ layer_schedule: [ ... ] │
└────────────────┬────────────────────────────────────┘
│ compile (= HDL "synthesis")
▼
┌──────────────────────────────────────────────────────┐
│ HuggingFace PreTrainedModel │ ← executable form
│ config.json + model.safetensors │
└────────────────┬────────────────────────────────────┘
│
├──► Training (HF Trainer / Megatron / Axolotl)
├──► Fine-tune (LLaMA-Factory / torchtune / PEFT)
├──► Serving (vLLM / SGLang / TensorRT-LLM)
└──► Eval (lm-eval-harness)
Just as HDL separated synthesis from simulation, EulerStack separates architecture description from training / serving execution. The same YAML spec is the single source of truth for structure, whether the consumer is a trainer, a server, or an evaluator.
5. What a dedicated design language gives you
HDL's benefits to chip designers map one-to-one to EulerStack's benefits for LLM designers.
5.1 Separation of concerns
Pre-HDL engineers tangled "gate behaviour" with "gate layout" in one
document. HDL split them. Pre-EulerStack engineers tangled model
structure, training-specific logic, and serving adapters in one
modeling_xxx.py. EulerStack enforces a triangular split: YAML for
structure, separate scripts for training, HF ecosystem for serving.
5.2 Review-ability
In HDL, "this change adds a multiplier" shows up cleanly in a diff.
In schematic days you compared a thousand drawings. EulerStack is the
same: a PR saying "swap attention for MLA and push rope_theta to 500K"
shows up as 5 YAML lines. No more spelunking through 200 lines of
modeling_custom.py to find the actual intent.
5.3 Reproducibility
An HDL spec synthesised with the same tool chain yields the same chip.
A EulerStack YAML compiled on a different user's machine yields the
same HF model. compile is a pure function — deterministic output
from YAML input.
5.4 Composability
HDL modules instantiate inside other modules, forming complex chips.
EulerStack layer_templates compose inside a layer_schedule, and
separate primitives (MLA + MoE + Titans + ODE + execution_modes) layer
on one another orthogonally. The arch_expert_kitchen_sink preset
covered in Tutorial 10 — Paper → YAML is a
concrete demonstration.
6. The lego-block interface — how EulerStack meets other tools
EulerStack's surface against neighbouring tools is small and clean.
| Boundary | What flows | Who owns it |
|---|---|---|
| Top (input) | Hand-authored YAML spec | Researcher / engineer |
| Bottom (output) | HF PreTrainedModel |
The Transformers ecosystem |
| Right (meta) | config.v1_extensions |
Plugins consume it |
| Left (reverse) | from_pretrained → reconstruct YAML (v1.2 roadmap) |
— |
Coming in: one YAML-diff line = one architecture change.
Going out: the same AutoModelForCausalLM.from_pretrained(..., trust_remote_code=True)
contract that Llama / Mistral / Jamba already ride on. "Keep your training
and serving stack; swap only the model definition."
7. Relationship with other "design language" tools — not competition, stratification
7.1 Other tools that describe structure (direct comparison)
| Tool | Definition mode | Relationship to EulerStack |
|---|---|---|
| HF transformers | Pre-registered architectures + trust_remote_code |
EulerStack sits above it. YAML → HF model |
| Modular Transformers (HF experiment) | modeling_xxx.py diffs |
Same problem at code level; EulerStack at design-language level. Analogous to HDL vs. a circuit simulator. |
| nanoGPT / litgpt | Single-file reference impl | Educational. EulerStack is the assembly layer above them. |
| Ludwig | Declarative ML framework | Similar concept; too thin on LLM-specific structure. EulerStack is far more granular. |
7.2 Training & serving stacks — complementary
Just as HDL coexists with synthesis and simulation tools, EulerStack coexists with training and serving stacks.
| Tool | Role | How to combine with EulerStack |
|---|---|---|
| Megatron-LM / TorchTitan / GPT-NeoX | Distributed pretraining | EulerStack defines, these execute |
| HF Trainer / Composer / Levanter | Single~few-node training | Via AutoModelForCausalLM |
| Axolotl / LLaMA-Factory / torchtune | Fine-tune recipes | EulerStack brings structure, they bring recipe |
| PEFT / bitsandbytes / Unsloth | Efficient fine-tune / quantisation | Orthogonal — HF-compatible, so automatic |
| vLLM / SGLang / TensorRT-LLM | Serving | Orthogonal — standard mixers serve as-is |
7.3 Solving a different problem
| Tool | What it does | Why it isn't EulerStack |
|---|---|---|
| mergekit | Combine pretrained weights | Weight-level. EulerStack is architecture-level |
| Once-for-All / AutoGluon / archai | NAS (automated search) | Automation. EulerStack is deliberate hand-picked comparison |
| Keras / PyTorch Lightning | Generic DL runners | General purpose. EulerStack is an LLM-specific ADL |
| fairseq | Research seq2seq | Code-first. EulerStack is declarative |
8. When to reach for EulerStack — and when not to
✅ Reach for EulerStack when
Q1. You want to reproduce a new architecture from a recent paper
(DeepSeek-V3, Jamba, R1, Titans, ...).
→ A preset or a 5-20 line YAML suffices. modeling_custom.py effort
collapses to zero. See Tutorial 10: Paper → YAML.
Q2. You want to ablate combinations that don't exist in the literature (MLA + MoD + branched + MoE, ...). → EulerStack is the only public tool that fits. One training script, N YAML diffs.
Q3. You want architecture changes tracked in Git. → YAML = architecture. PR messages carry intent without prose.
Q4. You want to serve a custom LLM via standard HF tooling.
→ compile → save_pretrained → vLLM / TGI without special cases.
Q5. You want to onboard a new hire on "why this architecture?" → The 53-preset 3-tier catalogue (Validated → Hybrid → Experimental) is the learning path.
❌ Don't reach for EulerStack when
Q1. "I want to fine-tune Llama 3 on my data." → LLaMA-Factory / Axolotl. Pretrained weight loading is not a first-class feature in EulerStack (planned for v1.2).
Q2. "I want to iterate on the training recipe." → HF Trainer / Axolotl. EulerStack owns structure only — it has no training loop (just as HDL has no fabrication).
Q3. "I want to sweep 1000 architectures automatically." → NAS tools (Once-for-All, archai). EulerStack is for hand-picked comparison.
Q4. "I need Megatron-scale distributed pretraining right now." → Define with EulerStack, execute with TorchTitan / Megatron. EulerStack does not ship a launcher.
Q5. "I want to merge two pretrained models." → mergekit. EulerStack does not operate on weights.
9. What the "design language" concretely unlocks
Compare what a "5-line YAML diff" corresponds to in other tools.
Case A. "Swap attention for MLA"
EulerStack YAML diff:
layer_templates:
decoder:
mixer:
type: attention
- attention: { qkv_bias: false }
+ attention: { qkv_bias: false, latent_dim: 384 }
One line changed.
Equivalent modeling_llama.py-based diff (estimate):
- Rewrite LlamaAttention.__init__: rebuild W_q, W_k, W_v into W_q,
W_kv_latent, W_k_up, W_v_up, W_out
- Change QKV computation in forward
- Update KV-cache path (shape changed)
- Add state-dict key mapping for save_pretrained / from_pretrained
- Roughly 200-300 lines
Case B. "Separate a reasoning phase from the answer phase (R1)"
EulerStack YAML additions:
execution_modes:
- { name: think, max_tokens: 8192, kv_share: true, loss_weight: 0.0 }
- { name: answer, max_tokens: 2048, loss_weight: 1.0 }
transition:
type: special_token
token: "<think_end>"
Six lines added.
Code-based implementation:
- Register the special token in the tokenizer
- Rewrite GenerationMixin.generate() to be phase-aware
- Add loss weighting to the training loop
- Define config serialisation
- Roughly 500-700 lines across training + generate + serialization
Case C. "Attach Titans memory to every attention layer with inference update"
EulerStack YAML additions:
memory:
type: neural_memory
update_at_inference: true
params: { hidden: 1024 }
inner_lr: 0.001
persistence: session
Five lines added.
Code-based implementation:
- Author a new nn.Module class (forward + surprise)
- Wire memory into every layer
- Expose a step_memory_at_inference() hook
- Implement gradient isolation
- Persist state keys on save/load
- Roughly 300-500 lines plus debugging skill
In all three, the intent of the architecture change fits in a few YAML lines. The implementation is EulerStack's problem. You think at the design-language level.
10. Summary
EulerStack is a dedicated design language for LLM architecture.
Structure used to live scattered across general-purpose Python files
like modeling_xxx.py; EulerStack lifts it one layer up into a
declarative language whose sole job is describing structure. This is
the same abstraction step the semiconductor world took with Verilog
and VHDL.
Training, serving, and fine-tuning are the domain of specialised tools (HF Trainer, vLLM, Axolotl, and others) that already do those jobs very well. EulerStack therefore does not cover them. It focuses on design alone, and shapes that design so it plugs into the HF ecosystem without friction.
Next
- Tutorial 1: Validate a Spec — write your first YAML and pass it.
- Tutorial 2: Use Presets — pick one of 53 starting points.
- Tutorial 9: New Primitives — the full ADL vocabulary (MLA / Jamba hybrid / R1 / Titans / MoD / Neural-ODE / ...).
- Tutorial 10: Paper → YAML — dialogue case studies porting four recent papers into YAML.