EulerStack

A modular YAML-driven LLM architecture assembler

Describe an LLM with a single declarative YAML spec, then let a 5-layer pipeline (DSL → Schema → IR → Compiler → CLI) handle validation, normalization, and compilation. 31 presets on two axes (llm_ size × variant, 16 + arch_ skill-level walkthrough, 15) provide ready starting points, while compile --output-dir emits a HuggingFace model directory (config.json + model.safetensors) that hands off directly to EulerForge for training. Presets are only starting points — edit d_model, n_heads, or layer counts to assemble models of any size. All CLI help, logs, warnings, and errors are translated into 5 languages (ko / en / zh / ja / es).

Core Features

Layer Templates & Schedule

Define named layer templates (mixer + FFN + norm + residual) and use schedules to specify arrangement order and repetition counts.

Mixer Types Attention, Mamba, RetNet, Hyena
FFN Types MLP, Gated MLP (SwiGLU), MoE (top-k routing)
Norm RMSNorm, LayerNorm (pre/post position)
Residual Sequential, Parallel

Validation & Realism

A 3-stage process — schema structure → cross-field compatibility → heuristic realism checks — catches design errors before compilation. Every error is printed in the 3-line format (Category: what / Fix: / See:).

Structure Unknown keys, type/enum, required fields, positive constraints
Compatibility Mixer↔state mismatches (e.g., mamba + kv_cache forbidden)
Realism head_dim range (32–256), target_params mismatch (>30%), MoE expert ratio, seq_len/d_model ratio, family_hint consistency, vocab/tokenizer consistency, tie_weight consistency, rope_scaling bounds
Error Categories ValidationError, CompatibilityError, CompileError, NormalizationError

Start with a single YAML

A declarative spec of ~10 lines fully describes the shape of a model.

schema_version: 2 model: { name: "my-llm", d_model: 2048, vocab_size: 32000, max_seq_len: 4096, n_heads: 16 } tokenizer_contract: { type: hf, pretrained: gpt2 } embedding: { type: learned, positional: rope } layer_templates: decoder: mixer: { type: attention, attention: {} } ffn: { type: gated_mlp, activation: swiglu } layer_schedule: - { template: decoder, repeat: 24 } head: { type: causal_lm }

Presets: 31 on two axes

Two orthogonal axes: llm_ (size × variant, 16) and arch_ (skill-level walkthrough, 15). Presets are starting points — edit d_model, n_heads, and layer count to assemble a model at any scale.

llm_ — Size × Architectural Variant (16)

4 sizes (0.8B / 2B / 4B / 16B) × 4 variants (simple / mistral / jamba / moe).

Scalesimplemistraljambamoe
0.8Bllm_0p8b_simple (~810M)llm_0p8b_mistral (~810M)llm_0p8b_jamba (~1.04B)llm_0p8b_moe (~770M)
2Bllm_2b_simple (~2.01B)llm_2b_mistral (~2.01B)llm_2b_jamba (~2.39B)llm_2b_moe (~2.05B)
4Bllm_4b_simple (~3.97B)llm_4b_mistral (~3.97B)llm_4b_jamba (~4.67B)llm_4b_moe (~4.03B)
16Bllm_16b_simple (~15.26B)llm_16b_mistral (~15.26B)llm_16b_jamba (~18.18B)llm_16b_moe (~15.71B)

Variant semantics: simple = pure attention (Llama); mistral = attention + sliding window (1 global : 3 sliding per 4 layers); jamba = Mamba + Attention hybrid (3:1); moe = attention + MoE FFN (1-in-4 layers, 8 experts, top-2).

arch_ — Skill-Level Walkthrough (15, all ~2B)

All at a common ~2B budget so comparisons isolate architectural choices from scale. The expert level crosses MoE × mixer as a 2D design space and includes 2 speculative compositions not yet published in the literature.

LevelPreset~ParamsOne-linerResearch basis
beginnerarch_beginner_gpt22.07BClassic Transformer (MHA + LayerNorm post + GeLU)Vaswani 2017, GPT-2
beginnerarch_beginner_llama2.01BModern baseline (GQA + RMSNorm pre + SwiGLU)Llama 2/3
intermediatearch_intermediate_mistral2.01B1 global : 3 sliding attentionMistral 7B
intermediatearch_intermediate_gemma22.08B1:1 alternating global/localGemma 2
intermediatearch_intermediate_qwen_longctx2.01BRoPE scaling factor 4, 32K ctxQwen 2/3
advancedarch_advanced_jamba2.39BMamba + Attention 3:1 hybridJamba-1.5 (AI21)
advancedarch_advanced_samba1.99BMamba + Sliding attention 1:1Samba (Microsoft)
advancedarch_advanced_retnet2.21BPure RetNet (attention-free)Sun 2023
expertarch_expert_research2.22B4 mixers + MoE 3-phaseResearch-grade
expertarch_expert_mixtral_moe1.89BPure attn + every-layer MoE (8 × top-2)Mixtral 8x7B
expertarch_expert_striped_hyena1.96BHyena + Attention 4:1, 128KStripedHyena
expertarch_expert_blackmamba_moe2.10BMamba + MoE (MoE on non-attn mixer)BlackMamba, MoE-Mamba
expertarch_expert_deepseek_moe1.84BFine-grained MoE (32 × top-3)DeepSeek-V2/V3
expertarch_expert_retnet_moe1.98BRetNet + MoE (speculative, no paper)Sun 2023 + MoE-Mamba extrapolation
expertarch_expert_frontier_full_moe1.93BAttention-free, multi-mixer + all-MoE (most speculative)Composition prediction

No upper limit — presets are starting points. EulerStack can assemble a model of any size by editing d_model, n_heads, and layer count.

CLI Reference

Follows the eulerwa CLI family convention. All errors are printed in the 3-line format (Category: what / Fix: / See:).

Top-Level Commands

validate Validate a YAML spec (--report includes realism checks)
explain Human-readable model summary (layers, parameter estimate)
compile IR → JSON runtime config (--output) or HF model directory (--output-dir)
schema Print YAML schema structure
presets list / show Enumerate presets or show details for one

Common Options

--lang Output language (ko/en/zh/ja/es). Root option; default ko
--preset YAML spec file path
--validate-only Validate and exit without further work
--output / -o JSON runtime config output path
--output-dir HF model directory output (config.json + model.safetensors)
--print-config / --dry-run Print resolved config to stdout

5-language i18n CLI

Every CLI help page, log message, warning, and error is translated into ko / en / zh / ja / es. Default language is Korean (ko); switch via the --lang root option or the EULERSTACK_LANG environment variable. Command names, option names, and the Fix: / See: labels in the 3-line error format stay untranslated for script compatibility.

eulerstack validate --preset my_model.yml
# Korean (default)

eulerstack --lang en validate --preset my_model.yml
# English

EULERSTACK_LANG=ja eulerstack validate --preset my_model.yml
# env var also works

HF model directory → EulerForge training

compile --output-dir writes a HuggingFace-compatible directory (config.json + model.safetensors) — the primary handoff path into the EulerForge training pipeline.

eulerstack compile --preset my_model.yml --output-dir ./my_model

# Load it from Python
from transformers import AutoModelForCausalLM
model = AutoModelForCausalLM.from_pretrained("./my_model", trust_remote_code=True)

5-Layer Architecture

From YAML spec to a trainable model — 5 layers with strict separation of concerns.

Layer 1: DSL User-authored YAML v2 spec (declarative model definition)
Layer 2: Schema Structural validation — unknown keys, type/enum, required fields, cross-field compatibility
Layer 3: IR Normalized canonical structure (default fills, template expansion)
Layer 4: Compiler IR → JSON runtime config or HF model directory (config.json + model.safetensors) — loadable via AutoModelForCausalLM.from_pretrained() for EulerForge training
Layer 5: CLI validate / explain / compile / schema / presets — all messages i18n-translated across 5 languages

Tutorials

Tutorials are maintained in Korean (ko) and English (en) under the upstream repo at docs/tutorials/{ko,en}/. This page lists the table of contents; full tutorials are available in the repository.

Core Tutorials

quickstart.mdLanguage-neutral CLI landing (uses --lang)
01_validate_a_spec.mdValidating a YAML spec
02_use_presets.mdUsing presets
03_compile_and_explain.mdCompile & explain
04_prepare_data.mdPrepare training data
05_sanity_train.mdSanity training loop
06_arch_walkthrough.mdNEW — step-by-step tour of the 15 arch_ presets (skill-level walkthrough)

Mixer deep dives (mixers/)

00_overview.mdMixers concept — why mix attention / mamba / retnet / hyena
01_attention.mdAttention in depth
02_mamba.mdMamba in depth
03_retnet.mdRetNet in depth
04_hyena.mdHyena in depth

All tutorials are available in the repository.

Install & Quickstart

Install

pip install -e .

# or include dev dependencies
pip install -e ".[dev]"

Quickstart

# List presets (Korean default)
eulerstack presets list

# Validate with realism report
eulerstack validate --preset my_model.yml --report

# Build an HF model directory → hand off to EulerForge training
eulerstack compile --preset my_model.yml --output-dir ./my_model

# Switch CLI messages to English
eulerstack --lang en validate --preset my_model.yml

Design LLM Architectures with EulerStack

Combine Attention, Mamba, RetNet, Hyena, and MoE into hybrid models with a single YAML spec — then hand off the HuggingFace model directory to EulerForge for training.

Get Started on GitHub