A modular YAML-driven LLM architecture assembler
Describe an LLM with a single declarative YAML spec, then let a 5-layer pipeline (DSL → Schema → IR → Compiler → CLI) handle validation, normalization, and compilation. 31 presets on two axes (llm_ size × variant, 16 + arch_ skill-level walkthrough, 15) provide ready starting points, while compile --output-dir emits a HuggingFace model directory (config.json + model.safetensors) that hands off directly to EulerForge for training. Presets are only starting points — edit d_model, n_heads, or layer counts to assemble models of any size. All CLI help, logs, warnings, and errors are translated into 5 languages (ko / en / zh / ja / es).
Define named layer templates (mixer + FFN + norm + residual) and use schedules to specify arrangement order and repetition counts.
| Mixer Types | Attention, Mamba, RetNet, Hyena |
|---|---|
| FFN Types | MLP, Gated MLP (SwiGLU), MoE (top-k routing) |
| Norm | RMSNorm, LayerNorm (pre/post position) |
| Residual | Sequential, Parallel |
A 3-stage process — schema structure → cross-field compatibility → heuristic realism checks — catches design errors before compilation. Every error is printed in the 3-line format (Category: what / Fix: / See:).
| Structure | Unknown keys, type/enum, required fields, positive constraints |
|---|---|
| Compatibility | Mixer↔state mismatches (e.g., mamba + kv_cache forbidden) |
| Realism | head_dim range (32–256), target_params mismatch (>30%), MoE expert ratio, seq_len/d_model ratio, family_hint consistency, vocab/tokenizer consistency, tie_weight consistency, rope_scaling bounds |
| Error Categories | ValidationError, CompatibilityError, CompileError, NormalizationError |
A declarative spec of ~10 lines fully describes the shape of a model.
Two orthogonal axes: llm_ (size × variant, 16) and arch_ (skill-level walkthrough, 15). Presets are starting points — edit d_model, n_heads, and layer count to assemble a model at any scale.
llm_ — Size × Architectural Variant (16)4 sizes (0.8B / 2B / 4B / 16B) × 4 variants (simple / mistral / jamba / moe).
| Scale | simple | mistral | jamba | moe |
|---|---|---|---|---|
| 0.8B | llm_0p8b_simple (~810M) | llm_0p8b_mistral (~810M) | llm_0p8b_jamba (~1.04B) | llm_0p8b_moe (~770M) |
| 2B | llm_2b_simple (~2.01B) | llm_2b_mistral (~2.01B) | llm_2b_jamba (~2.39B) | llm_2b_moe (~2.05B) |
| 4B | llm_4b_simple (~3.97B) | llm_4b_mistral (~3.97B) | llm_4b_jamba (~4.67B) | llm_4b_moe (~4.03B) |
| 16B | llm_16b_simple (~15.26B) | llm_16b_mistral (~15.26B) | llm_16b_jamba (~18.18B) | llm_16b_moe (~15.71B) |
Variant semantics: simple = pure attention (Llama); mistral = attention + sliding window (1 global : 3 sliding per 4 layers); jamba = Mamba + Attention hybrid (3:1); moe = attention + MoE FFN (1-in-4 layers, 8 experts, top-2).
arch_ — Skill-Level Walkthrough (15, all ~2B)All at a common ~2B budget so comparisons isolate architectural choices from scale. The expert level crosses MoE × mixer as a 2D design space and includes 2 speculative compositions not yet published in the literature.
| Level | Preset | ~Params | One-liner | Research basis |
|---|---|---|---|---|
| beginner | arch_beginner_gpt2 | 2.07B | Classic Transformer (MHA + LayerNorm post + GeLU) | Vaswani 2017, GPT-2 |
| beginner | arch_beginner_llama | 2.01B | Modern baseline (GQA + RMSNorm pre + SwiGLU) | Llama 2/3 |
| intermediate | arch_intermediate_mistral | 2.01B | 1 global : 3 sliding attention | Mistral 7B |
| intermediate | arch_intermediate_gemma2 | 2.08B | 1:1 alternating global/local | Gemma 2 |
| intermediate | arch_intermediate_qwen_longctx | 2.01B | RoPE scaling factor 4, 32K ctx | Qwen 2/3 |
| advanced | arch_advanced_jamba | 2.39B | Mamba + Attention 3:1 hybrid | Jamba-1.5 (AI21) |
| advanced | arch_advanced_samba | 1.99B | Mamba + Sliding attention 1:1 | Samba (Microsoft) |
| advanced | arch_advanced_retnet | 2.21B | Pure RetNet (attention-free) | Sun 2023 |
| expert | arch_expert_research | 2.22B | 4 mixers + MoE 3-phase | Research-grade |
| expert | arch_expert_mixtral_moe | 1.89B | Pure attn + every-layer MoE (8 × top-2) | Mixtral 8x7B |
| expert | arch_expert_striped_hyena | 1.96B | Hyena + Attention 4:1, 128K | StripedHyena |
| expert | arch_expert_blackmamba_moe | 2.10B | Mamba + MoE (MoE on non-attn mixer) | BlackMamba, MoE-Mamba |
| expert | arch_expert_deepseek_moe | 1.84B | Fine-grained MoE (32 × top-3) | DeepSeek-V2/V3 |
| expert | arch_expert_retnet_moe | 1.98B | RetNet + MoE (speculative, no paper) | Sun 2023 + MoE-Mamba extrapolation |
| expert | arch_expert_frontier_full_moe | 1.93B | Attention-free, multi-mixer + all-MoE (most speculative) | Composition prediction |
No upper limit — presets are starting points. EulerStack can assemble a model of any size by editing d_model, n_heads, and layer count.
Follows the eulerwa CLI family convention. All errors are printed in the 3-line format (Category: what / Fix: / See:).
validate |
Validate a YAML spec (--report includes realism checks) |
|---|---|
explain |
Human-readable model summary (layers, parameter estimate) |
compile |
IR → JSON runtime config (--output) or HF model directory (--output-dir) |
schema |
Print YAML schema structure |
presets list / show |
Enumerate presets or show details for one |
--lang |
Output language (ko/en/zh/ja/es). Root option; default ko |
|---|---|
--preset |
YAML spec file path |
--validate-only |
Validate and exit without further work |
--output / -o |
JSON runtime config output path |
--output-dir |
HF model directory output (config.json + model.safetensors) |
--print-config / --dry-run |
Print resolved config to stdout |
Every CLI help page, log message, warning, and error is translated into ko / en / zh / ja / es. Default language is Korean (ko); switch via the --lang root option or the EULERSTACK_LANG environment variable. Command names, option names, and the Fix: / See: labels in the 3-line error format stay untranslated for script compatibility.
compile --output-dir writes a HuggingFace-compatible directory (config.json + model.safetensors) — the primary handoff path into the EulerForge training pipeline.
From YAML spec to a trainable model — 5 layers with strict separation of concerns.
| Layer 1: DSL | User-authored YAML v2 spec (declarative model definition) |
|---|---|
| Layer 2: Schema | Structural validation — unknown keys, type/enum, required fields, cross-field compatibility |
| Layer 3: IR | Normalized canonical structure (default fills, template expansion) |
| Layer 4: Compiler | IR → JSON runtime config or HF model directory (config.json + model.safetensors) — loadable via AutoModelForCausalLM.from_pretrained() for EulerForge training |
| Layer 5: CLI | validate / explain / compile / schema / presets — all messages i18n-translated across 5 languages |
Tutorials are maintained in Korean (ko) and English (en) under the upstream repo at docs/tutorials/{ko,en}/. This page lists the table of contents; full tutorials are available in the repository.
quickstart.md | Language-neutral CLI landing (uses --lang) |
|---|---|
01_validate_a_spec.md | Validating a YAML spec |
02_use_presets.md | Using presets |
03_compile_and_explain.md | Compile & explain |
04_prepare_data.md | Prepare training data |
05_sanity_train.md | Sanity training loop |
06_arch_walkthrough.md | NEW — step-by-step tour of the 15 arch_ presets (skill-level walkthrough) |
mixers/)00_overview.md | Mixers concept — why mix attention / mamba / retnet / hyena |
|---|---|
01_attention.md | Attention in depth |
02_mamba.md | Mamba in depth |
03_retnet.md | RetNet in depth |
04_hyena.md | Hyena in depth |
All tutorials are available in the repository.
Combine Attention, Mamba, RetNet, Hyena, and MoE into hybrid models with a single YAML spec — then hand off the HuggingFace model directory to EulerForge for training.
Get Started on GitHub