Getting Started

A quick-start guide for fine-tuning LLMs with EulerForge. For detailed explanations of each injection strategy, refer to the strategy-specific tutorials.

1. Installation and Environment

Prerequisites

Python >= 3.9
PyTorch >= 2.1
CUDA-capable GPU (recommended; CPU also works for testing)

Installation

# Clone the repository
git clone <repo_url>
cd eulerforge

# Install in editable mode
pip install -e .

# For development (includes tests)
pip install -e ".[dev]" pytest

Verify Installation

python -c "import eulerforge; print('OK')"
eulerforge train --help

2. Data Preparation

First, run the Data Preprocessing Guide. It converts the raw data in data/ into EulerForge standard raw JSONL format.

Provided Data and Purposes

File	Format	Purpose
`data/sft_10k_raw.jsonl`	`{prompt, response}`	SFT training (01-04), PPO (08)
`data/dpo_10k_raw.jsonl`	`{prompt, chosen, rejected}`	DPO training (05)
`data/dpo_10k_raw.jsonl`	`{prompt, chosen, rejected}`	ORPO (06), RM (07)

Using Raw Data

Specify data.format=raw and the data will be automatically tokenized during training:

eulerforge train --preset configs/presets/<preset>.yml \
    --set data.format=raw \
    --set data.task=sft \
    --set data.path=data/sft_10k_raw.jsonl \
    --set data.max_length=512

Details: Data Preprocessing Guide

3. Core Concepts: Where / What / When

EulerForge organizes fine-tuning from three perspectives:

Perspective	Question	Responsible Component
Where	Where in the model to inject?	`BackboneAdapter` -- Explores transformer blocks, FFN, attention
What	What to inject?	`InjectionStrategy` -- Module transformations like LoRA, MoE, experts
When	When to train which parameters?	`PhaseScheduler` -- Controls trainable groups per phase

Each strategy tutorial explains these three perspectives step by step.

4. Backbone Adapters

The correct adapter is automatically selected based on the backbone configuration key:

`backbone` Value	Adapter	Compatible Models
`qwen3`	Qwen3Adapter	Qwen2, Qwen2.5, Qwen3 series
`llama`	LlamaAdapter	LLaMA 2/3, TinyLlama, Mistral (dense)
`mixtral`	MixtralAdapter	Mixtral (native MoE)

5. Choosing an Injection Strategy

Strategy Comparison Table

Strategy	Transform Target	Phases	MoE Section	Compatible Models	Suitable For
`dense_lora`	Linear -> LoRALinear	1	Not needed	All	Simple fine-tuning, quick experiments
`mixture_lora`	Linear -> MixtureLoRALinear	2	Required	All	Multi-task, adaptive LoRA
`moe_expert_lora`	FFN -> MoEFFN + LoRA	3	Required	Dense only	Dense-to-MoE conversion
`native_moe_expert_lora`	LoRA on existing experts	1	Not needed	Mixtral only	Native MoE fine-tuning

Strategy Tutorials

Plain LoRA Tutorial -- The most basic LoRA fine-tuning
LoRA MoE Tutorial -- Router + multiple LoRA experts
FFN MoE Expert LoRA Tutorial -- Convert dense models to MoE
Native MoE Expert LoRA Tutorial -- Mixtral-specific fine-tuning

Training Method Tutorials

DPO Training Guide -- Preference-based alignment training (reference model)
ORPO Training Guide -- Preference-based alignment (no reference model needed)
RM Training Guide -- Reward model training (Bradley-Terry)
PPO/RLHF Guide -- PPO-based RLHF pipeline

Model-Specific Tutorials

LLaMA Fine-Tuning Guide -- Llama-3.2-1B SFT + TinyLlama DPO

Training Method Selection Table

Purpose	Recommended Method	Data	Reference Model
Basic fine-tuning	SFT	instruction/response pairs	Not needed
Preference alignment (precise)	DPO	chosen/rejected pairs	Needed (adapter disable)
Preference alignment (efficient)	ORPO	chosen/rejected pairs	Not needed
Reward function learning	RM	chosen/rejected pairs	Not needed
RLHF (generate->reward->update)	PPO	Prompts + RM checkpoint	Needed (adapter disable)

Which Strategy Should You Choose?

Is the model already an MoE architecture? (e.g., Mixtral)
  ├── Yes → native_moe_expert_lora
  └── No (Dense model)
        ├── Simple fine-tuning? → dense_lora
        ├── Multi-task/adaptive? → mixture_lora
        └── Convert to MoE structure? → moe_expert_lora

6. Phase Schedule Overview

The phase schedule controls when to train which parameter groups. Define it declaratively in training.phases.

Parameter Groups

Group	Training Target
`lora`	LoRA parameters within FFN (lora_A, lora_B)
`router`	MoE router weights
`base_ffn`	Original FFN weights (gate_proj, up_proj, down_proj)
`attn_lora`	LoRA parameters within attention projections

Phase Patterns by Strategy

Strategy	Phase Configuration
`dense_lora`	1 phase: `[lora, attn_lora]`
`mixture_lora`	2 phases: `[router]` -> `[lora, attn_lora]`
`moe_expert_lora`	3 phases: `[router]` -> `[lora, attn_lora]` -> `[lora, attn_lora, router, base_ffn]`
`native_moe_expert_lora`	1 phase: `[lora, attn_lora]`

For detailed phase configuration and timelines, refer to the "When" section of each strategy tutorial.

7. CLI Quick Start

Basic Training (raw data)

eulerforge train --preset configs/presets/<preset>.yml \
    --set data.format=raw \
    --set data.task=sft \
    --set data.path=data/sft_10k_raw.jsonl \
    --set data.max_length=512

Available Presets

Preset File	Strategy	Training Type
`qwen3.5_0.8b_dense_lora_sft.yml`	dense_lora	SFT
`qwen3.5_0.8b_mixture_lora_sft.yml`	mixture_lora	SFT
`qwen3.5_0.8b_moe_expert_lora_sft.yml`	moe_expert_lora	SFT
`qwen3.5_0.8b_moe_expert_lora_dpo.yml`	moe_expert_lora	DPO
`qwen3.5_0.8b_dense_lora_orpo.yml`	dense_lora	ORPO
`qwen3.5_0.8b_dense_lora_rm.yml`	dense_lora	RM
`qwen3.5_0.8b_dense_lora_ppo.yml`	dense_lora	PPO
`llama3_1b_dense_lora_sft.yml`	dense_lora	SFT
`tinyllama_1.1b_dense_lora_dpo.yml`	dense_lora	DPO
`mixtral_native_expert_lora_sft.yml`	native_moe_expert_lora	SFT

Configuration Overrides

Override any configuration value using dot-path notation:

eulerforge train --preset configs/presets/qwen3.5_0.8b_dense_lora_sft.yml \
    --set data.format=raw \
    --set data.task=sft \
    --set data.path=data/sft_10k_raw.jsonl \
    --set data.max_length=512 \
    --set training.lr=2e-5 \
    --set injection.lora_r=32

Useful CLI Options

Option	Description
`--validate-only`	Validate config file only, no model loading
`--preflight`	Load model + apply injection + verify parameters, no training
`--debug`	Enable debug mode
`--debug-trainable-names`	Print trainable parameter names
`--debug-every N`	Print debug info every N steps

For a full CLI reference, see the CLI Documentation.

8. Running Benchmarks

eulerforge bench --target-dir /path/to/checkpoint \
    --ref-model Qwen/Qwen3.5-0.8B-Base \
    --test-data /path/to/test.jsonl \
    --output-file results.jsonl

9. Common Troubleshooting

Symptom	Cause	Solution
`Missing required top-level section`	backbone/injection/training missing in YAML	Add the missing section
`Unknown strategy 'xxx'`	Typo in strategy name	Check supported strategy names
`No trainable parameters`	Target keywords don't match	Check parameter names with `--debug-trainable-names`
OOM (out of memory)	Insufficient VRAM	`model.load_precision.mode: int4` (4bit QLoRA), reduce `batch_size`, reduce `lora_r`
Phase transition not occurring	Steps outside `max_train_steps` range	Check phase step values

For strategy-specific troubleshooting, refer to the "Debugging and Troubleshooting" section of each tutorial.

10. Scratch Pretraining (pretrain)

To train a newly assembled model from scratch (e.g., built with EulerStack) rather than an existing HF model, use the eulerforge pretrain command.

eulerforge pretrain --preset configs/presets/pretrain/eulerstack_hybrid_moe.yml

pretrain is a completely separate pipeline from train, performing full-parameter causal LM training. It does not use LoRA injection or phase scheduling.

Details: 17_pretrain.md

11. Training Pipeline -- From SFT to PPO

EulerForge supports 5 training types. Always start with SFT first:

SFT → DPO (or ORPO) → Deploy       # Most common (2 stages)
SFT → RM → PPO → Deploy             # Full RLHF (3 stages)
SFT → ORPO → RM → PPO → Deploy      # ORPO-based full RLHF

Warning: Applying DPO/ORPO/RM/PPO directly to a base model will degrade performance. Always provide instruction-following ability via SFT first, then proceed with preference learning.

The checkpoint from each stage becomes the model input for the next stage.

Details: 18_training_pipeline.md

0. Data Preprocessing Next →