0. Where EulerStack Fits — an ADL for LLMs

Read this first. Understanding what kind of tool EulerStack is makes every later tutorial fall into place.

1. One-line summary

EulerStack is an Architecture Description Language (ADL) for LLMs. It separates architecture out of the PyTorch files where structure, training, and serving usually live tangled together, and expresses it in a declarative language built for that purpose. This is the same kind of abstraction step the semiconductor industry took when Verilog and VHDL replaced schematics-plus-simulation-C in chip design.

2. Why a design language?

Suppose you want to try DeepSeek-V3's MLA attention. Today's workflow:

Fork modeling_llama.py from HuggingFace transformers.
Rewrite LlamaAttention: split W_q / W_k / W_v into W_q, W_kv_latent, W_k_up, W_v_up.
Update forward and the KV-cache path because the cache shape changed.
Add state-dict key mapping for save_pretrained / from_pretrained.
Around 200-300 lines of diff. The intent — "try MLA" — is essentially one line inside that.

Your intent is one line; the mechanics are hundreds. That gap imposes three costs:

Review burden — reviewers must sort "what is the architectural change?" from "what is mechanical plumbing?".
Lost intent — two months later, even you may struggle to recover what the essential change was.
Hidden coupling — a structural change is interleaved with the training loop, the tokenizer adapter, and the serving serialiser, so a mistake in one silently breaks another.

The root cause is a tool gap. LLM models have always been described in general-purpose programming languages. Python handles the training loop, data preprocessing, serving adapters, eval scripts, and model structure all at once. That universality is exactly why it is not specialised for describing structure.

Why does a specialised language change the outcome? A dedicated vocabulary delivers three things:

Compactness — a one-line structural change reads as one line.
Verifiability — domain constraints like "MLA's latent_dim must be smaller than d_model" can be rejected at the language level, before anything runs.
Separation of concerns — structure is structure, training is training, serving is serving. Each axis evolves independently.

EulerStack is a declarative language dedicated to LLM architecture description, aimed squarely at these three.

3. The HDL precedent

"General-purpose code → domain-specific declarative language" is a pattern that has recurred throughout software history. The closest precedent EulerStack follows is HDL (Hardware Description Language) from the semiconductor world.

3.1 Pre-HDL (~1984)

Chip designers worked with: - Schematics — logic gates drawn by hand - Simulation C code — behaviour expressed imperatively - Manual optimisation — layout and timing by hand

All of this was code or drawings. But there was no dedicated language for describing a chip. A single file tangled "how it behaves" with "how it is implemented".

3.2 HDL arrives (Verilog 1984, VHDL 1987)

Verilog and VHDL emerged as architecture description languages (ADLs). Their distinguishing properties:

Declarative: "this module has these inputs/outputs and this behaviour"
Hierarchical: submodules compose into top-level modules
Synthesis vs. simulation split: the same HDL specification feeds both simulation (verification) and synthesis (the real chip)

This was the essential leap. Chip-design know-how shifted out of general-purpose code and into a dedicated design language.

3.3 The parallel with LLMs

	Pre-HDL chip design	Pre-EulerStack LLM design
Medium	Schematics + simulation C	`modeling_xxx.py` (PyTorch)
Problem	Behaviour/implementation tangled	Structure/training/serving tangled
Reuse	copy-paste	copy-paste
Change tracking	no schematic diff	intent buried in 200-line diff

	Post-HDL chip design	Post-EulerStack LLM design
Medium	Verilog/VHDL (ADL)	EulerStack YAML (ADL)
Separation	behaviour/synthesis/verification	structure/training/serving
Reuse	modules, IP cores	templates, presets
Change tracking	HDL diff = design change	YAML diff = architecture change

The essential move is a lift in abstraction level — not "write better code," but "put a dedicated design language above the layer where code used to do everything." That is the change HDL brought to the semiconductor industry, and the change EulerStack aims to bring to LLM design.

4. What EulerStack actually does

┌──────────────────────────────────────────────────────┐
│  EulerStack YAML (ADL)                               │  ← design language
│  schema_version: 1                                    │
│  model: { d_model: 4096, n_heads: 32, ... }           │
│  layer_templates: { ... }                             │
│  layer_schedule: [ ... ]                              │
└────────────────┬────────────────────────────────────┘
                 │  compile   (= HDL "synthesis")
                 ▼
┌──────────────────────────────────────────────────────┐
│  HuggingFace PreTrainedModel                          │  ← executable form
│  config.json + model.safetensors                     │
└────────────────┬────────────────────────────────────┘
                 │
                 ├──► Training (HF Trainer / Megatron / Axolotl)
                 ├──► Fine-tune (LLaMA-Factory / torchtune / PEFT)
                 ├──► Serving (vLLM / SGLang / TensorRT-LLM)
                 └──► Eval (lm-eval-harness)

Just as HDL separated synthesis from simulation, EulerStack separates architecture description from training / serving execution. The same YAML spec is the single source of truth for structure, whether the consumer is a trainer, a server, or an evaluator.

5. What a dedicated design language gives you

HDL's benefits to chip designers map one-to-one to EulerStack's benefits for LLM designers.

5.1 Separation of concerns

Pre-HDL engineers tangled "gate behaviour" with "gate layout" in one document. HDL split them. Pre-EulerStack engineers tangled model structure, training-specific logic, and serving adapters in one modeling_xxx.py. EulerStack enforces a triangular split: YAML for structure, separate scripts for training, HF ecosystem for serving.

5.2 Review-ability

In HDL, "this change adds a multiplier" shows up cleanly in a diff. In schematic days you compared a thousand drawings. EulerStack is the same: a PR saying "swap attention for MLA and push rope_theta to 500K" shows up as 5 YAML lines. No more spelunking through 200 lines of modeling_custom.py to find the actual intent.

5.3 Reproducibility

An HDL spec synthesised with the same tool chain yields the same chip. A EulerStack YAML compiled on a different user's machine yields the same HF model. compile is a pure function — deterministic output from YAML input.

5.4 Composability

HDL modules instantiate inside other modules, forming complex chips. EulerStack layer_templates compose inside a layer_schedule, and separate primitives (MLA + MoE + Titans + ODE + execution_modes) layer on one another orthogonally. The arch_expert_kitchen_sink preset covered in Tutorial 10 — Paper → YAML is a concrete demonstration.

6. The lego-block interface — how EulerStack meets other tools

EulerStack's surface against neighbouring tools is small and clean.

Boundary	What flows	Who owns it
Top (input)	Hand-authored YAML spec	Researcher / engineer
Bottom (output)	HF `PreTrainedModel`	The Transformers ecosystem
Right (meta)	`config.v1_extensions`	Plugins consume it
Left (reverse)	`from_pretrained` → reconstruct YAML (v1.2 roadmap)	—

Coming in: one YAML-diff line = one architecture change.

Going out: the same AutoModelForCausalLM.from_pretrained(..., trust_remote_code=True) contract that Llama / Mistral / Jamba already ride on. "Keep your training and serving stack; swap only the model definition."

7. Relationship with other "design language" tools — not competition, stratification

7.1 Other tools that describe structure (direct comparison)

Tool	Definition mode	Relationship to EulerStack
HF transformers	Pre-registered architectures + `trust_remote_code`	EulerStack sits above it. YAML → HF model
Modular Transformers (HF experiment)	`modeling_xxx.py` diffs	Same problem at code level; EulerStack at design-language level. Analogous to HDL vs. a circuit simulator.
nanoGPT / litgpt	Single-file reference impl	Educational. EulerStack is the assembly layer above them.
Ludwig	Declarative ML framework	Similar concept; too thin on LLM-specific structure. EulerStack is far more granular.

7.2 Training & serving stacks — complementary

Just as HDL coexists with synthesis and simulation tools, EulerStack coexists with training and serving stacks.

Tool	Role	How to combine with EulerStack
Megatron-LM / TorchTitan / GPT-NeoX	Distributed pretraining	EulerStack defines, these execute
HF Trainer / Composer / Levanter	Single~few-node training	Via `AutoModelForCausalLM`
Axolotl / LLaMA-Factory / torchtune	Fine-tune recipes	EulerStack brings structure, they bring recipe
PEFT / bitsandbytes / Unsloth	Efficient fine-tune / quantisation	Orthogonal — HF-compatible, so automatic
vLLM / SGLang / TensorRT-LLM	Serving	Orthogonal — standard mixers serve as-is

7.3 Solving a different problem

Tool	What it does	Why it isn't EulerStack
mergekit	Combine pretrained weights	Weight-level. EulerStack is architecture-level
Once-for-All / AutoGluon / archai	NAS (automated search)	Automation. EulerStack is deliberate hand-picked comparison
Keras / PyTorch Lightning	Generic DL runners	General purpose. EulerStack is an LLM-specific ADL
fairseq	Research seq2seq	Code-first. EulerStack is declarative

8. When to reach for EulerStack — and when not to

✅ Reach for EulerStack when

Q1. You want to reproduce a new architecture from a recent paper (DeepSeek-V3, Jamba, R1, Titans, ...). → A preset or a 5-20 line YAML suffices. modeling_custom.py effort collapses to zero. See Tutorial 10: Paper → YAML.

Q2. You want to ablate combinations that don't exist in the literature (MLA + MoD + branched + MoE, ...). → EulerStack is the only public tool that fits. One training script, N YAML diffs.

Q3. You want architecture changes tracked in Git. → YAML = architecture. PR messages carry intent without prose.

Q4. You want to serve a custom LLM via standard HF tooling. → compile → save_pretrained → vLLM / TGI without special cases.

Q5. You want to onboard a new hire on "why this architecture?" → The 53-preset 3-tier catalogue (Validated → Hybrid → Experimental) is the learning path.

❌ Don't reach for EulerStack when

Q1. "I want to fine-tune Llama 3 on my data." → LLaMA-Factory / Axolotl. Pretrained weight loading is not a first-class feature in EulerStack (planned for v1.2).

Q2. "I want to iterate on the training recipe." → HF Trainer / Axolotl. EulerStack owns structure only — it has no training loop (just as HDL has no fabrication).

Q3. "I want to sweep 1000 architectures automatically." → NAS tools (Once-for-All, archai). EulerStack is for hand-picked comparison.

Q4. "I need Megatron-scale distributed pretraining right now." → Define with EulerStack, execute with TorchTitan / Megatron. EulerStack does not ship a launcher.

Q5. "I want to merge two pretrained models." → mergekit. EulerStack does not operate on weights.

9. What the "design language" concretely unlocks

Compare what a "5-line YAML diff" corresponds to in other tools.

Case A. "Swap attention for MLA"

EulerStack YAML diff:

 layer_templates:
   decoder:
     mixer:
       type: attention
-      attention: { qkv_bias: false }
+      attention: { qkv_bias: false, latent_dim: 384 }

One line changed.

Equivalent modeling_llama.py-based diff (estimate): - Rewrite LlamaAttention.__init__: rebuild W_q, W_k, W_v into W_q, W_kv_latent, W_k_up, W_v_up, W_out - Change QKV computation in forward - Update KV-cache path (shape changed) - Add state-dict key mapping for save_pretrained / from_pretrained - Roughly 200-300 lines

Case B. "Separate a reasoning phase from the answer phase (R1)"

EulerStack YAML additions:

execution_modes:
  - { name: think, max_tokens: 8192, kv_share: true, loss_weight: 0.0 }
  - { name: answer, max_tokens: 2048, loss_weight: 1.0 }
transition:
  type: special_token
  token: "<think_end>"

Six lines added.

Code-based implementation: - Register the special token in the tokenizer - Rewrite GenerationMixin.generate() to be phase-aware - Add loss weighting to the training loop - Define config serialisation - Roughly 500-700 lines across training + generate + serialization

Case C. "Attach Titans memory to every attention layer with inference update"

EulerStack YAML additions:

memory:
  type: neural_memory
  update_at_inference: true
  params: { hidden: 1024 }
  inner_lr: 0.001
  persistence: session

Five lines added.

Code-based implementation: - Author a new nn.Module class (forward + surprise) - Wire memory into every layer - Expose a step_memory_at_inference() hook - Implement gradient isolation - Persist state keys on save/load - Roughly 300-500 lines plus debugging skill

In all three, the intent of the architecture change fits in a few YAML lines. The implementation is EulerStack's problem. You think at the design-language level.

10. Summary

EulerStack is a dedicated design language for LLM architecture. Structure used to live scattered across general-purpose Python files like modeling_xxx.py; EulerStack lifts it one layer up into a declarative language whose sole job is describing structure. This is the same abstraction step the semiconductor world took with Verilog and VHDL.

Training, serving, and fine-tuning are the domain of specialised tools (HF Trainer, vLLM, Axolotl, and others) that already do those jobs very well. EulerStack therefore does not cover them. It focuses on design alone, and shapes that design so it plugs into the HF ecosystem without friction.

Tutorial 1: Validate a Spec — write your first YAML and pass it.
Tutorial 2: Use Presets — pick one of 53 starting points.
Tutorial 9: New Primitives — the full ADL vocabulary (MLA / Jamba hybrid / R1 / Titans / MoD / Neural-ODE / ...).
Tutorial 10: Paper → YAML — dialogue case studies porting four recent papers into YAML.

1. Validate a Spec Next →