Home > EulerStack > Tutorials > 0. Where EulerStack Fits — an ADL for LLMs

0. Where EulerStack Fits — an ADL for LLMs

Read this first. Understanding what kind of tool EulerStack is makes every later tutorial fall into place.

1. One-line summary

EulerStack is an Architecture Description Language (ADL) for LLMs. It separates architecture out of the PyTorch files where structure, training, and serving usually live tangled together, and expresses it in a declarative language built for that purpose. This is the same kind of abstraction step the semiconductor industry took when Verilog and VHDL replaced schematics-plus-simulation-C in chip design.

2. Why a design language?

Suppose you want to try DeepSeek-V3's MLA attention. Today's workflow:

  1. Fork modeling_llama.py from HuggingFace transformers.
  2. Rewrite LlamaAttention: split W_q / W_k / W_v into W_q, W_kv_latent, W_k_up, W_v_up.
  3. Update forward and the KV-cache path because the cache shape changed.
  4. Add state-dict key mapping for save_pretrained / from_pretrained.
  5. Around 200-300 lines of diff. The intent — "try MLA" — is essentially one line inside that.

Your intent is one line; the mechanics are hundreds. That gap imposes three costs:

The root cause is a tool gap. LLM models have always been described in general-purpose programming languages. Python handles the training loop, data preprocessing, serving adapters, eval scripts, and model structure all at once. That universality is exactly why it is not specialised for describing structure.

Why does a specialised language change the outcome? A dedicated vocabulary delivers three things:

  1. Compactness — a one-line structural change reads as one line.
  2. Verifiability — domain constraints like "MLA's latent_dim must be smaller than d_model" can be rejected at the language level, before anything runs.
  3. Separation of concerns — structure is structure, training is training, serving is serving. Each axis evolves independently.

EulerStack is a declarative language dedicated to LLM architecture description, aimed squarely at these three.

3. The HDL precedent

"General-purpose code → domain-specific declarative language" is a pattern that has recurred throughout software history. The closest precedent EulerStack follows is HDL (Hardware Description Language) from the semiconductor world.

3.1 Pre-HDL (~1984)

Chip designers worked with: - Schematics — logic gates drawn by hand - Simulation C code — behaviour expressed imperatively - Manual optimisation — layout and timing by hand

All of this was code or drawings. But there was no dedicated language for describing a chip. A single file tangled "how it behaves" with "how it is implemented".

3.2 HDL arrives (Verilog 1984, VHDL 1987)

Verilog and VHDL emerged as architecture description languages (ADLs). Their distinguishing properties:

This was the essential leap. Chip-design know-how shifted out of general-purpose code and into a dedicated design language.

3.3 The parallel with LLMs

Pre-HDL chip design Pre-EulerStack LLM design
Medium Schematics + simulation C modeling_xxx.py (PyTorch)
Problem Behaviour/implementation tangled Structure/training/serving tangled
Reuse copy-paste copy-paste
Change tracking no schematic diff intent buried in 200-line diff
Post-HDL chip design Post-EulerStack LLM design
Medium Verilog/VHDL (ADL) EulerStack YAML (ADL)
Separation behaviour/synthesis/verification structure/training/serving
Reuse modules, IP cores templates, presets
Change tracking HDL diff = design change YAML diff = architecture change

The essential move is a lift in abstraction level — not "write better code," but "put a dedicated design language above the layer where code used to do everything." That is the change HDL brought to the semiconductor industry, and the change EulerStack aims to bring to LLM design.

4. What EulerStack actually does

┌──────────────────────────────────────────────────────┐
│  EulerStack YAML (ADL)                               │  ← design language
│  schema_version: 1                                    │
│  model: { d_model: 4096, n_heads: 32, ... }           │
│  layer_templates: { ... }                             │
│  layer_schedule: [ ... ]                              │
└────────────────┬────────────────────────────────────┘
                 │  compile   (= HDL "synthesis")
                 ▼
┌──────────────────────────────────────────────────────┐
│  HuggingFace PreTrainedModel                          │  ← executable form
│  config.json + model.safetensors                     │
└────────────────┬────────────────────────────────────┘
                 │
                 ├──► Training (HF Trainer / Megatron / Axolotl)
                 ├──► Fine-tune (LLaMA-Factory / torchtune / PEFT)
                 ├──► Serving (vLLM / SGLang / TensorRT-LLM)
                 └──► Eval (lm-eval-harness)

Just as HDL separated synthesis from simulation, EulerStack separates architecture description from training / serving execution. The same YAML spec is the single source of truth for structure, whether the consumer is a trainer, a server, or an evaluator.

5. What a dedicated design language gives you

HDL's benefits to chip designers map one-to-one to EulerStack's benefits for LLM designers.

5.1 Separation of concerns

Pre-HDL engineers tangled "gate behaviour" with "gate layout" in one document. HDL split them. Pre-EulerStack engineers tangled model structure, training-specific logic, and serving adapters in one modeling_xxx.py. EulerStack enforces a triangular split: YAML for structure, separate scripts for training, HF ecosystem for serving.

5.2 Review-ability

In HDL, "this change adds a multiplier" shows up cleanly in a diff. In schematic days you compared a thousand drawings. EulerStack is the same: a PR saying "swap attention for MLA and push rope_theta to 500K" shows up as 5 YAML lines. No more spelunking through 200 lines of modeling_custom.py to find the actual intent.

5.3 Reproducibility

An HDL spec synthesised with the same tool chain yields the same chip. A EulerStack YAML compiled on a different user's machine yields the same HF model. compile is a pure function — deterministic output from YAML input.

5.4 Composability

HDL modules instantiate inside other modules, forming complex chips. EulerStack layer_templates compose inside a layer_schedule, and separate primitives (MLA + MoE + Titans + ODE + execution_modes) layer on one another orthogonally. The arch_expert_kitchen_sink preset covered in Tutorial 10 — Paper → YAML is a concrete demonstration.

6. The lego-block interface — how EulerStack meets other tools

EulerStack's surface against neighbouring tools is small and clean.

Boundary What flows Who owns it
Top (input) Hand-authored YAML spec Researcher / engineer
Bottom (output) HF PreTrainedModel The Transformers ecosystem
Right (meta) config.v1_extensions Plugins consume it
Left (reverse) from_pretrained → reconstruct YAML (v1.2 roadmap)

Coming in: one YAML-diff line = one architecture change.

Going out: the same AutoModelForCausalLM.from_pretrained(..., trust_remote_code=True) contract that Llama / Mistral / Jamba already ride on. "Keep your training and serving stack; swap only the model definition."

7. Relationship with other "design language" tools — not competition, stratification

7.1 Other tools that describe structure (direct comparison)

Tool Definition mode Relationship to EulerStack
HF transformers Pre-registered architectures + trust_remote_code EulerStack sits above it. YAML → HF model
Modular Transformers (HF experiment) modeling_xxx.py diffs Same problem at code level; EulerStack at design-language level. Analogous to HDL vs. a circuit simulator.
nanoGPT / litgpt Single-file reference impl Educational. EulerStack is the assembly layer above them.
Ludwig Declarative ML framework Similar concept; too thin on LLM-specific structure. EulerStack is far more granular.

7.2 Training & serving stacks — complementary

Just as HDL coexists with synthesis and simulation tools, EulerStack coexists with training and serving stacks.

Tool Role How to combine with EulerStack
Megatron-LM / TorchTitan / GPT-NeoX Distributed pretraining EulerStack defines, these execute
HF Trainer / Composer / Levanter Single~few-node training Via AutoModelForCausalLM
Axolotl / LLaMA-Factory / torchtune Fine-tune recipes EulerStack brings structure, they bring recipe
PEFT / bitsandbytes / Unsloth Efficient fine-tune / quantisation Orthogonal — HF-compatible, so automatic
vLLM / SGLang / TensorRT-LLM Serving Orthogonal — standard mixers serve as-is

7.3 Solving a different problem

Tool What it does Why it isn't EulerStack
mergekit Combine pretrained weights Weight-level. EulerStack is architecture-level
Once-for-All / AutoGluon / archai NAS (automated search) Automation. EulerStack is deliberate hand-picked comparison
Keras / PyTorch Lightning Generic DL runners General purpose. EulerStack is an LLM-specific ADL
fairseq Research seq2seq Code-first. EulerStack is declarative

8. When to reach for EulerStack — and when not to

✅ Reach for EulerStack when

Q1. You want to reproduce a new architecture from a recent paper (DeepSeek-V3, Jamba, R1, Titans, ...). → A preset or a 5-20 line YAML suffices. modeling_custom.py effort collapses to zero. See Tutorial 10: Paper → YAML.

Q2. You want to ablate combinations that don't exist in the literature (MLA + MoD + branched + MoE, ...). → EulerStack is the only public tool that fits. One training script, N YAML diffs.

Q3. You want architecture changes tracked in Git. → YAML = architecture. PR messages carry intent without prose.

Q4. You want to serve a custom LLM via standard HF tooling.compile → save_pretrained → vLLM / TGI without special cases.

Q5. You want to onboard a new hire on "why this architecture?" → The 53-preset 3-tier catalogue (Validated → Hybrid → Experimental) is the learning path.

❌ Don't reach for EulerStack when

Q1. "I want to fine-tune Llama 3 on my data." → LLaMA-Factory / Axolotl. Pretrained weight loading is not a first-class feature in EulerStack (planned for v1.2).

Q2. "I want to iterate on the training recipe." → HF Trainer / Axolotl. EulerStack owns structure only — it has no training loop (just as HDL has no fabrication).

Q3. "I want to sweep 1000 architectures automatically." → NAS tools (Once-for-All, archai). EulerStack is for hand-picked comparison.

Q4. "I need Megatron-scale distributed pretraining right now." → Define with EulerStack, execute with TorchTitan / Megatron. EulerStack does not ship a launcher.

Q5. "I want to merge two pretrained models." → mergekit. EulerStack does not operate on weights.

9. What the "design language" concretely unlocks

Compare what a "5-line YAML diff" corresponds to in other tools.

Case A. "Swap attention for MLA"

EulerStack YAML diff:

 layer_templates:
   decoder:
     mixer:
       type: attention
-      attention: { qkv_bias: false }
+      attention: { qkv_bias: false, latent_dim: 384 }

One line changed.

Equivalent modeling_llama.py-based diff (estimate): - Rewrite LlamaAttention.__init__: rebuild W_q, W_k, W_v into W_q, W_kv_latent, W_k_up, W_v_up, W_out - Change QKV computation in forward - Update KV-cache path (shape changed) - Add state-dict key mapping for save_pretrained / from_pretrained - Roughly 200-300 lines

Case B. "Separate a reasoning phase from the answer phase (R1)"

EulerStack YAML additions:

execution_modes:
  - { name: think, max_tokens: 8192, kv_share: true, loss_weight: 0.0 }
  - { name: answer, max_tokens: 2048, loss_weight: 1.0 }
transition:
  type: special_token
  token: "<think_end>"

Six lines added.

Code-based implementation: - Register the special token in the tokenizer - Rewrite GenerationMixin.generate() to be phase-aware - Add loss weighting to the training loop - Define config serialisation - Roughly 500-700 lines across training + generate + serialization

Case C. "Attach Titans memory to every attention layer with inference update"

EulerStack YAML additions:

memory:
  type: neural_memory
  update_at_inference: true
  params: { hidden: 1024 }
  inner_lr: 0.001
  persistence: session

Five lines added.

Code-based implementation: - Author a new nn.Module class (forward + surprise) - Wire memory into every layer - Expose a step_memory_at_inference() hook - Implement gradient isolation - Persist state keys on save/load - Roughly 300-500 lines plus debugging skill

In all three, the intent of the architecture change fits in a few YAML lines. The implementation is EulerStack's problem. You think at the design-language level.

10. Summary

EulerStack is a dedicated design language for LLM architecture. Structure used to live scattered across general-purpose Python files like modeling_xxx.py; EulerStack lifts it one layer up into a declarative language whose sole job is describing structure. This is the same abstraction step the semiconductor world took with Verilog and VHDL.

Training, serving, and fine-tuning are the domain of specialised tools (HF Trainer, vLLM, Axolotl, and others) that already do those jobs very well. EulerStack therefore does not cover them. It focuses on design alone, and shapes that design so it plugs into the HF ecosystem without friction.

Next