Getting Started
A quick-start guide for fine-tuning LLMs with EulerForge. For detailed explanations of each injection strategy, refer to the strategy-specific tutorials.
1. Installation and Environment
Prerequisites
- Python >= 3.9
- PyTorch >= 2.1
- CUDA-capable GPU (recommended; CPU also works for testing)
Installation
# Clone the repository
git clone <repo_url>
cd eulerforge
# Install in editable mode
pip install -e .
# For development (includes tests)
pip install -e ".[dev]" pytest
Verify Installation
python -c "import eulerforge; print('OK')"
eulerforge train --help
2. Data Preparation
First, run the Data Preprocessing Guide. It converts the raw data in
data/into EulerForge standard raw JSONL format.
Provided Data and Purposes
| File | Format | Purpose |
|---|---|---|
data/sft_10k_raw.jsonl |
{prompt, response} |
SFT training (01-04), PPO (08) |
data/dpo_10k_raw.jsonl |
{prompt, chosen, rejected} |
DPO training (05) |
data/dpo_10k_raw.jsonl |
{prompt, chosen, rejected} |
ORPO (06), RM (07) |
Using Raw Data
Specify data.format=raw and the data will be automatically tokenized during training:
eulerforge train --preset configs/presets/<preset>.yml \
--set data.format=raw \
--set data.task=sft \
--set data.path=data/sft_10k_raw.jsonl \
--set data.max_length=512
Details: Data Preprocessing Guide
3. Core Concepts: Where / What / When
EulerForge organizes fine-tuning from three perspectives:
| Perspective | Question | Responsible Component |
|---|---|---|
| Where | Where in the model to inject? | BackboneAdapter -- Explores transformer blocks, FFN, attention |
| What | What to inject? | InjectionStrategy -- Module transformations like LoRA, MoE, experts |
| When | When to train which parameters? | PhaseScheduler -- Controls trainable groups per phase |
Each strategy tutorial explains these three perspectives step by step.
4. Backbone Adapters
The correct adapter is automatically selected based on the backbone configuration key:
backbone Value |
Adapter | Compatible Models |
|---|---|---|
qwen3 |
Qwen3Adapter | Qwen2, Qwen2.5, Qwen3 series |
llama |
LlamaAdapter | LLaMA 2/3, TinyLlama, Mistral (dense) |
mixtral |
MixtralAdapter | Mixtral (native MoE) |
5. Choosing an Injection Strategy
Strategy Comparison Table
| Strategy | Transform Target | Phases | MoE Section | Compatible Models | Suitable For |
|---|---|---|---|---|---|
dense_lora |
Linear -> LoRALinear | 1 | Not needed | All | Simple fine-tuning, quick experiments |
mixture_lora |
Linear -> MixtureLoRALinear | 2 | Required | All | Multi-task, adaptive LoRA |
moe_expert_lora |
FFN -> MoEFFN + LoRA | 3 | Required | Dense only | Dense-to-MoE conversion |
native_moe_expert_lora |
LoRA on existing experts | 1 | Not needed | Mixtral only | Native MoE fine-tuning |
Strategy Tutorials
- Plain LoRA Tutorial -- The most basic LoRA fine-tuning
- LoRA MoE Tutorial -- Router + multiple LoRA experts
- FFN MoE Expert LoRA Tutorial -- Convert dense models to MoE
- Native MoE Expert LoRA Tutorial -- Mixtral-specific fine-tuning
Training Method Tutorials
- DPO Training Guide -- Preference-based alignment training (reference model)
- ORPO Training Guide -- Preference-based alignment (no reference model needed)
- RM Training Guide -- Reward model training (Bradley-Terry)
- PPO/RLHF Guide -- PPO-based RLHF pipeline
Model-Specific Tutorials
- LLaMA Fine-Tuning Guide -- Llama-3.2-1B SFT + TinyLlama DPO
Training Method Selection Table
| Purpose | Recommended Method | Data | Reference Model |
|---|---|---|---|
| Basic fine-tuning | SFT | instruction/response pairs | Not needed |
| Preference alignment (precise) | DPO | chosen/rejected pairs | Needed (adapter disable) |
| Preference alignment (efficient) | ORPO | chosen/rejected pairs | Not needed |
| Reward function learning | RM | chosen/rejected pairs | Not needed |
| RLHF (generate->reward->update) | PPO | Prompts + RM checkpoint | Needed (adapter disable) |
Which Strategy Should You Choose?
Is the model already an MoE architecture? (e.g., Mixtral)
├── Yes → native_moe_expert_lora
└── No (Dense model)
├── Simple fine-tuning? → dense_lora
├── Multi-task/adaptive? → mixture_lora
└── Convert to MoE structure? → moe_expert_lora
6. Phase Schedule Overview
The phase schedule controls when to train which parameter groups. Define it declaratively in training.phases.
Parameter Groups
| Group | Training Target |
|---|---|
lora |
LoRA parameters within FFN (lora_A, lora_B) |
router |
MoE router weights |
base_ffn |
Original FFN weights (gate_proj, up_proj, down_proj) |
attn_lora |
LoRA parameters within attention projections |
Phase Patterns by Strategy
| Strategy | Phase Configuration |
|---|---|
dense_lora |
1 phase: [lora, attn_lora] |
mixture_lora |
2 phases: [router] -> [lora, attn_lora] |
moe_expert_lora |
3 phases: [router] -> [lora, attn_lora] -> [lora, attn_lora, router, base_ffn] |
native_moe_expert_lora |
1 phase: [lora, attn_lora] |
For detailed phase configuration and timelines, refer to the "When" section of each strategy tutorial.
7. CLI Quick Start
Basic Training (raw data)
eulerforge train --preset configs/presets/<preset>.yml \
--set data.format=raw \
--set data.task=sft \
--set data.path=data/sft_10k_raw.jsonl \
--set data.max_length=512
Available Presets
| Preset File | Strategy | Training Type |
|---|---|---|
qwen3.5_0.8b_dense_lora_sft.yml |
dense_lora | SFT |
qwen3.5_0.8b_mixture_lora_sft.yml |
mixture_lora | SFT |
qwen3.5_0.8b_moe_expert_lora_sft.yml |
moe_expert_lora | SFT |
qwen3.5_0.8b_moe_expert_lora_dpo.yml |
moe_expert_lora | DPO |
qwen3.5_0.8b_dense_lora_orpo.yml |
dense_lora | ORPO |
qwen3.5_0.8b_dense_lora_rm.yml |
dense_lora | RM |
qwen3.5_0.8b_dense_lora_ppo.yml |
dense_lora | PPO |
llama3_1b_dense_lora_sft.yml |
dense_lora | SFT |
tinyllama_1.1b_dense_lora_dpo.yml |
dense_lora | DPO |
mixtral_native_expert_lora_sft.yml |
native_moe_expert_lora | SFT |
Configuration Overrides
Override any configuration value using dot-path notation:
eulerforge train --preset configs/presets/qwen3.5_0.8b_dense_lora_sft.yml \
--set data.format=raw \
--set data.task=sft \
--set data.path=data/sft_10k_raw.jsonl \
--set data.max_length=512 \
--set training.lr=2e-5 \
--set injection.lora_r=32
Useful CLI Options
| Option | Description |
|---|---|
--validate-only |
Validate config file only, no model loading |
--preflight |
Load model + apply injection + verify parameters, no training |
--debug |
Enable debug mode |
--debug-trainable-names |
Print trainable parameter names |
--debug-every N |
Print debug info every N steps |
For a full CLI reference, see the CLI Documentation.
8. Running Benchmarks
eulerforge bench --target-dir /path/to/checkpoint \
--ref-model Qwen/Qwen3.5-0.8B-Base \
--test-data /path/to/test.jsonl \
--output-file results.jsonl
9. Common Troubleshooting
| Symptom | Cause | Solution |
|---|---|---|
Missing required top-level section |
backbone/injection/training missing in YAML | Add the missing section |
Unknown strategy 'xxx' |
Typo in strategy name | Check supported strategy names |
No trainable parameters |
Target keywords don't match | Check parameter names with --debug-trainable-names |
| OOM (out of memory) | Insufficient VRAM | model.load_precision.mode: int4 (4bit QLoRA), reduce batch_size, reduce lora_r |
| Phase transition not occurring | Steps outside max_train_steps range |
Check phase step values |
For strategy-specific troubleshooting, refer to the "Debugging and Troubleshooting" section of each tutorial.
10. Scratch Pretraining (pretrain)
To train a newly assembled model from scratch (e.g., built with EulerStack) rather than an existing HF model, use the eulerforge pretrain command.
eulerforge pretrain --preset configs/presets/pretrain/eulerstack_hybrid_moe.yml
pretrain is a completely separate pipeline from train, performing full-parameter causal LM training. It does not use LoRA injection or phase scheduling.
Details: 17_pretrain.md
11. Training Pipeline -- From SFT to PPO
EulerForge supports 5 training types. Always start with SFT first:
SFT → DPO (or ORPO) → Deploy # Most common (2 stages)
SFT → RM → PPO → Deploy # Full RLHF (3 stages)
SFT → ORPO → RM → PPO → Deploy # ORPO-based full RLHF
Warning: Applying DPO/ORPO/RM/PPO directly to a base model will degrade performance. Always provide instruction-following ability via SFT first, then proceed with preference learning.
The checkpoint from each stage becomes the model input for the next stage.
Details: 18_training_pipeline.md