EulerStack

面向 LLM 的架构描述语言 (ADL)

EulerStack 是面向 LLM 的架构描述语言 (Architecture Description Language)。它把结构、训练、服务纠缠在一起的 PyTorch 模型文件中"架构"这一维剥离出来，交给一门专门的声明式语言来表达 —— 这与半导体产业用 Verilog/VHDL 取代 schematics+C 是同一种抽象层级的跃迁。一份 YAML 规格经过 5 层管线（DSL → Schema → IR → Compiler → CLI）完成校验、规范化与编译，compile --output-dir 直接输出 HuggingFace 模型目录（config.json + model.safetensors）并可无缝交给 EulerForge 训练。57 个预设（24 llm_ + 33 arch_，其中 arch_expert_*_mini 9 个）按 3 层学习路径（已验证的产业标准 → 近期混合/MoE → v1 实验性原语）组织，所有 CLI 信息翻译为 5 种语言（ko / en / zh / ja / es）。v0.1.5 以向后兼容的规格扩展方式新增了 μP 缩放（training_hints.scaling）、分化辅助目标（training_hints.differentiation_objectives）与组织声明（tissue）—— 全部默认关闭（OFF），既有 v0.1.4 YAML 无需修改即可使用。

Tutorials (16 parts, EN) CLI Reference (EN)

教程与 CLI 参考仅提供韩语 / 英语版本。

核心功能

Layer Templates & Schedule

定义具名的层模板(mixer + FFN + norm + residual)，通过schedule指定排列顺序与重复次数。

混合器类型	Attention, Mamba, RetNet, Hyena
FFN类型	MLP, Gated MLP (SwiGLU), MoE (top-k routing)
Norm	RMSNorm, LayerNorm (pre/post)
Residual	Sequential, Parallel, Hyper-Connection (mHC)
Head	causal_lm, causal_lm_mtp (Multi-Token Prediction)

校验 & 现实性

模式结构校验 → 跨字段兼容性 → 启发式现实性检查的三阶段流程，在编译前捕获设计错误。所有错误均以三行格式输出(Category: what / Fix: / See:)。

结构	未知键、类型/enum、必填字段、正数约束
兼容性	mixer↔state 不匹配（例如禁止mamba + kv_cache）
现实性	head_dim范围(32–256)、target_params偏差(>30%)、MoE专家比例、seq_len/d_model比例、family_hint一致性、vocab/tokenizer一致性、tie_weight一致性、rope_scaling范围
错误类别	`ValidationError`, `CompatibilityError`, `CompileError`, `NormalizationError`

从一份YAML开始

约10行的声明式规格即可完整描述模型形态。

schema_version: 1
model: { name: "my-llm", d_model: 2048, vocab_size: 32000, max_seq_len: 4096, n_heads: 16 }
tokenizer_contract: { type: hf, pretrained: gpt2 }
embedding: { type: learned, positional: rope }
layer_templates:
  decoder:
    mixer: { type: attention, attention: {} }
    ffn:   { type: gated_mlp, activation: swiglu }
layer_schedule:
  - { template: decoder, repeat: 24 }
head: { type: causal_lm }
            

v0.1.5 规格扩展（可选，默认关闭 OFF） —— μP 缩放·分化辅助目标·组织(tissue)声明：

# 在上面的规格中追加以下内容（既有 YAML 无需修改即可兼容）
training_hints:
  scaling: { parametrization: mup, base_width: 256 }   # μP (W-AS-1)
  differentiation_objectives: { usage_probe_coef: 0.01 } # 分化辅助目标 (W-AS-2)
tissue:                                                  # 组织/列声明 (W-AS-3)
  columns:
    - { name: global_integration, templates: [decoder], role: global_binding }
  connectivity: ring
            

预设：3 层学习路径 × 57 个

按 v1 "产业排序原则"组织：已验证产业标准 → 近期混合/MoE → v1 实验性原语。总计 24 llm_ + 33 arch_ = 57 个（33 arch_ = beginner 2 · intermediate 3 · advanced 5 · expert 23，其中 *_mini 9 个）。预设仅为起点 —— 编辑 d_model、n_heads 与层数即可组装任意规模的模型。

`arch_` — 技能级走查 (17 个，1B–2B)

17 个预设，规模 1B–2B，调整后可在 24 GB GPU 上进行实验（相较原始论文缩减 d_model）。若需复现论文规模，请放大 d_model。Expert 级已扩展为 MoE × 混合器 × 深度 / 感受野 的三维设计空间，包含 文献中尚未发表的 4 个 speculative 组合（retnet_moe、frontier_full_moe、progressive_stack、dilated_longnet）。

级别	预设	~参数	一句话描述	研究来源
beginner	`arch_beginner_gpt2`	~1.1B	Classic Transformer (MHA + LayerNorm post + GeLU)	Vaswani 2017, GPT-2
beginner	`arch_beginner_llama`	~1.1B	现代基线 (GQA + RMSNorm pre + SwiGLU)	Llama 2/3
intermediate	`arch_intermediate_mistral`	~1.3B	1 全局 : 3 滑动 attention	Mistral 7B
intermediate	`arch_intermediate_gemma2`	~1.3B	1:1 全局 / 局部交替	Gemma 2
intermediate	`arch_intermediate_qwen_longctx`	~1.3B	RoPE scaling factor 4，32K ctx	Qwen 2/3
advanced	`arch_advanced_jamba`	~1.2B	Mamba + Attention 3:1 混合	Jamba-1.5 (AI21)
advanced	`arch_advanced_samba`	~1.0B	Mamba + Sliding attention 1:1	Samba (Microsoft)
advanced	`arch_advanced_retnet`	~1.3B	纯 RetNet (attention-free)	Sun 2023
expert	`arch_expert_research`	~1.5B	4 mixers + MoE 3-phase	Research-grade
expert	`arch_expert_mixtral_moe`	~1.9B	纯 attn + 每层 MoE (8 × top-2)	Mixtral 8x7B
expert	`arch_expert_striped_hyena`	~1.0B	Hyena + Attention 4:1，128K	StripedHyena
expert	`arch_expert_blackmamba_moe`	~1.5B	Mamba + MoE（在 non-attn 混合器上加 MoE）	BlackMamba, MoE-Mamba
expert	`arch_expert_deepseek_moe`	~2.0B	Fine-grained MoE (32 × top-3)	DeepSeek-V2/V3
expert NEW	`arch_expert_dsv4_v3fallback`	~2.0B	DeepSeek-V4 规格（V3 fallback 路径）	DeepSeek-V3/V4
expert	`arch_expert_retnet_moe`	~1.5B	RetNet + MoE（speculative，无论文）	Sun 2023 + MoE 外推
expert	`arch_expert_frontier_full_moe`	~2.0B	Attention-free、多混合器 + 全 MoE（最为 speculative）	组合预测
expert	`arch_expert_progressive_stack`	~1.5B	深度方向 hyena→mamba→retnet→attn+MoE（无论文）	层级预测
expert (speculative)	`arch_expert_dilated_longnet`	~2.0B	时间金字塔：mamba+sw(1K→4K→16K)+global+MoE（无论文）	Longnet + Jamba 外推
expert (v1 B5)	`arch_expert_reasoning_r1`	~1.3B	2 阶段推理 (think / answer)	DeepSeek-R1 (2025), Quiet-STaR
expert (v1 B4.1)	`arch_expert_titans_memory`	~1.2B	参数化记忆 + 测试时更新	Titans (Google 2024–2025)
expert (v1 B3.2)	`arch_expert_dual_stream`	~1.4B	Monoidal 并行 (Mamba ∥ Attention)	Jamba × PaLM 泛化
expert (capstone)	`arch_expert_kitchen_sink`	—	在同一份规格中汇聚所有原语以做最大面检验	综合验证

`arch_expert_*_mini` — 小规模 speculative 实验 (9 个，~80M–360M)

speculative expert 架构的小规模变体。保留相同设计思想，但将 d_model 缩至 384–512、约 12 层，使 完整训练 ablation 可在单张消费级 GPU 上完成。用于在 2B 全量训练之前快速迭代架构假设。首选实验为 arch_expert_progressive_stack_mini。

预设	~Total	~Active	Mirror of	教学作用
`arch_expert_progressive_stack_mini`	~86M	~86M	`arch_expert_progressive_stack`	推荐首选实验
`arch_expert_blackmamba_moe_mini`	~156M	~90M	`arch_expert_blackmamba_moe`	SSM 上的 partial-sparse MoE
`arch_expert_mixtral_moe_mini`	~175M	~90M	`arch_expert_mixtral_moe`	经典 every-layer MoE 基线
`arch_expert_dilated_longnet_mini`	~83M	~75M	`arch_expert_dilated_longnet`	长上下文时间金字塔
`arch_expert_deepseek_moe_mini`	~357M	~60M	`arch_expert_deepseek_moe`	⚠ 观察 fine-grained MoE 失败
`arch_expert_frontier_full_moe_mini`	~106M	~60M	`arch_expert_frontier_full_moe`	⚠ 最实验性；预期失败
`arch_expert_dsv4_flash_mini` NEW	~180M	~70M	DeepSeek-V4	DSv4 + Flash/NSA 压缩注意力
`arch_expert_dsv4_subset_mini` NEW	~180M	~70M	DeepSeek-V4	DSv4 功能子集
`arch_expert_mhc_moe_mini` NEW	~150M	~70M	mHC + MoE	multi-Hyper-Connection 残差 + MoE

`llm_` — 规模×架构变体 (24 个)

5 个规模(0.1B / 0.8B / 2B / 4B / 16B) × 5 个变体(simple / mistral / jamba / moe / mla)。0.1B 不含 moe。

规模	simple	mistral	jamba	moe	mla
0.1B	`llm_0p1b_simple`	`llm_0p1b_mistral`	`llm_0p1b_jamba`	—	`llm_0p1b_mla`
0.8B	`llm_0p8b_simple`	`llm_0p8b_mistral`	`llm_0p8b_jamba`	`llm_0p8b_moe`	`llm_0p8b_mla`
2B	`llm_2b_simple`	`llm_2b_mistral`	`llm_2b_jamba`	`llm_2b_moe`	`llm_2b_mla`
4B	`llm_4b_simple`	`llm_4b_mistral`	`llm_4b_jamba`	`llm_4b_moe`	`llm_4b_mla`
16B	`llm_16b_simple`	`llm_16b_mistral`	`llm_16b_jamba`	`llm_16b_moe`	`llm_16b_mla`

变体语义：simple = 纯Attention(Llama)；mistral = Attention + Sliding Window(每4层1全局:3滑动)；jamba = Mamba + Attention 混合(3:1)；moe = Attention + MoE FFN(每4层1个，8 experts, top-2)；mla = Multi-head Latent Attention (DeepSeek-V3 风格的 KV 压缩)。

无上限——预设仅为起点。通过编辑d_model、n_heads与层数，EulerStack可以组装任意规模的模型。

CLI参考

遵循eulerwa产品家族的通用CLI约定。所有错误均以三行格式(Category: what / Fix: / See:)输出。

顶层命令

`validate`	校验YAML规格(`--report`包含现实性报告)
`explain`	模型结构摘要(层、参数估计)
`compile`	IR → JSON运行时配置(`--output`) 或 HF模型目录(`--output-dir`)
`schema`	打印YAML模式结构
`presets list` / `show`	列出预设或查看某个预设的详情

通用选项

`--lang`	输出语言(ko/en/zh/ja/es)。根选项，默认ko
`--preset`	YAML规格文件路径
`--validate-only`	仅校验并退出
`--output / -o`	JSON运行时配置输出路径
`--output-dir`	HF模型目录输出(config.json + model.safetensors)
`--print-config` / `--dry-run`	将解析后的配置输出到stdout

5语言 i18n CLI

所有CLI的help、日志、警告与错误信息均翻译为ko / en / zh / ja / es。默认语言为韩语(ko)，可通过根选项--lang或环境变量EULERSTACK_LANG切换。命令名、选项名以及三行错误格式中的Fix: / See:标签保持不翻译，以确保脚本兼容性。

eulerstack validate --preset my_model.yml
# 韩语(默认)

eulerstack --lang zh validate --preset my_model.yml
# 中文

EULERSTACK_LANG=en eulerstack validate --preset my_model.yml
# 环境变量同样生效

HF模型目录 → EulerForge训练

compile --output-dir生成HuggingFace兼容的模型目录(config.json + model.safetensors)——这是交付给EulerForge训练管线的主要路径。

eulerstack compile --preset my_model.yml --output-dir ./my_model

# 在Python中加载
from transformers import AutoModelForCausalLM
model = AutoModelForCausalLM.from_pretrained("./my_model", trust_remote_code=True)

5层架构

从YAML规格到可训练模型——5层各司其职，严格分离。

Layer 1: DSL	用户编写的 YAML 规格 (schema_version 1，声明式模型定义)
Layer 2: Schema	结构校验——未知键、类型/enum、必填字段、跨字段兼容性
Layer 3: IR	规范化的Canonical结构表示(默认值填充、模板展开)
Layer 4: Compiler	IR → JSON运行时配置或 HF模型目录(config.json + model.safetensors)——可通过`AutoModelForCausalLM.from_pretrained()`加载以交付EulerForge训练
Layer 5: CLI	`validate` / `explain` / `compile` / `schema` / `presets`——全部消息均应用5语言i18n

教程

教程以韩语(ko)与英语(en)并行维护（上游仓库路径：docs/tutorials/{ko,en}/）。以下链接指向本站的英文教程；目前暂未提供中文版。

核心教程 (11 篇)

`00_positioning`	先读此篇 —— EulerStack 的定位：面向 LLM 的架构描述语言 (ADL)
`01_validate_a_spec`	校验 YAML 规格
`02_use_presets`	使用预设
`03_spec_reference`	NEW — 规格参考
`04_compile_and_explain`	Compile & explain
`05_prepare_data`	准备训练数据
`06_sanity_train`	Sanity 训练循环
`07_arch_walkthrough`	技能级架构走查（17 个 `arch_` 预设）
`08_expert_mini_walkthrough`	Expert Mini 预设走查（单 GPU ablation）
`09_new_primitives_walkthrough`	NEW — v1 Phase B 新原语（MLA / Titans / MoD / Dual-Stream / Neural-ODE / TTT）
`10_paper_to_yaml`	NEW — 论文 → YAML 移植案例（DeepSeek-V3 / Jamba / DeepSeek-R1 / Titans）

混合器深入 (`mixers/`，5 篇)

`00_overview`	混合器概念概述——为何混合 attention / mamba / retnet / hyena
`01_attention`	Attention 详解
`02_mamba`	Mamba 详解
`03_retnet`	RetNet 详解
`04_hyena`	Hyena 详解

教程仅提供韩语 / 英语版本。

安装与快速开始

安装

pip install -e .

# 或包含开发依赖
pip install -e ".[dev]"

快速开始

# 浏览预设(默认韩语)
eulerstack presets list

# 校验 + 现实性报告
eulerstack validate --preset my_model.yml --report

# 生成HF模型目录 → 交付 EulerForge 训练
eulerstack compile --preset my_model.yml --output-dir ./my_model

# 切换输出为中文
eulerstack --lang zh validate --preset my_model.yml

用EulerStack设计LLM架构

一份YAML组合Attention、Mamba、RetNet、Hyena与MoE构建混合模型，并将HuggingFace模型目录直接交付给EulerForge训练。

在GitHub上开始