13. LLaMA Fine-Tuning
Overview
EulerForge natively supports LLaMA family models. With the backbone: llama setting, you can apply the same injection strategies and training methods to models such as LLaMA 2/3, TinyLlama, and Mistral (dense).
This tutorial covers two representative combinations:
| Model | Training Type | Preset |
|---|---|---|
| Meta Llama-3.2-1B | SFT | configs/presets/llama3_1b_dense_lora_sft.yml |
| TinyLlama-1.1B-Chat | DPO | configs/presets/tinyllama_1.1b_dense_lora_dpo.yml |
- Suitable for: Fine-tuning LLaMA family models, experimenting with non-Qwen backbones, SFT-to-DPO pipelines
- Prerequisites: Completed the Getting Started Guide and Data Preprocessing
1. LLaMA Backbone Adapter
Setting backbone: llama automatically selects the LlamaAdapter.
Compatible Models
| Model | HuggingFace ID | Parameters |
|---|---|---|
| Llama 3.2 1B | meta-llama/Llama-3.2-1B |
1.24B |
| Llama 3.2 3B | meta-llama/Llama-3.2-3B |
3.21B |
| Llama 3.1 8B | meta-llama/Llama-3.1-8B |
8.03B |
| TinyLlama 1.1B Chat | TinyLlama/TinyLlama-1.1B-Chat-v1.0 |
1.10B |
| Mistral 7B (dense) | mistralai/Mistral-7B-v0.3 |
7.24B |
Note: Mixtral (MoE) uses
backbone: mixtral.backbone: llamais for dense architectures only.
Structural Comparison with Qwen3
| Item | Qwen3 | LLaMA |
|---|---|---|
| FFN Projections | gate_proj, up_proj, down_proj |
gate_proj, up_proj, down_proj |
| Attention Projections | q_proj, k_proj, v_proj, o_proj |
q_proj, k_proj, v_proj, o_proj |
backbone value |
qwen3 |
llama |
target_keywords |
identical | identical |
Since the FFN and attention projection names are the same, the injection section settings are exactly the same as those for Qwen3 presets. The only differences are backbone and model_name.
2. Preset A: Llama-3.2-1B SFT
Configuration File
configs/presets/llama3_1b_dense_lora_sft.yml:
device: cuda:0
backbone: llama
model_name: meta-llama/Llama-3.2-1B
injection:
strategy: dense_lora
lora_r: 48
lora_alpha: 96
lora_dropout: 0.05
target_keywords: [gate_proj, up_proj, down_proj]
start_layer: 0
num_layers: 0 # 0 = all layers
attn_lora:
enabled: true
keywords: [q_proj, v_proj]
training:
type: sft
phases:
- step: 0
trainable: ["lora", "attn_lora"]
lr: 1.0e-5
weight_decay: 0.01
warmup_steps: 200
max_train_steps: 5000
batch_size: 4
grad_accum_steps: 4
max_grad_norm: 1.0
log_steps: 50
save_steps: 1000
val_steps: 500
Key Points
backbone: llama: Uses LlamaAdapter (compatible with both Llama 2/3)model_name: meta-llama/Llama-3.2-1B: Llama 3.2 1B Base modeldense_lora+ single phase: The simplest LoRA configuration- Trainable parameters: ~28.7M (LoRA parameters only)
Execution
# Preflight check
eulerforge train --preset configs/presets/llama3_1b_dense_lora_sft.yml --preflight
# SFT training
eulerforge train --preset configs/presets/llama3_1b_dense_lora_sft.yml \
--set data.format=raw \
--set data.task=sft \
--set data.path=data/sft_10k_raw.jsonl \
--set data.max_length=512
3. Preset B: TinyLlama-1.1B-Chat DPO
Configuration File
configs/presets/tinyllama_1.1b_dense_lora_dpo.yml:
device: cuda:0
backbone: llama
model_name: TinyLlama/TinyLlama-1.1B-Chat-v1.0
injection:
strategy: dense_lora
lora_r: 48
lora_alpha: 96
lora_dropout: 0.05
target_keywords: [gate_proj, up_proj, down_proj]
start_layer: 0
num_layers: 0 # 0 = all layers
attn_lora:
enabled: true
keywords: [q_proj, v_proj]
training:
type: dpo
dpo_beta: 0.1
phases:
- step: 0
trainable: ["lora", "attn_lora"]
lr: 5.0e-6
weight_decay: 0.01
warmup_steps: 100
max_train_steps: 5000
batch_size: 2
grad_accum_steps: 8
max_grad_norm: 1.0
log_steps: 50
save_steps: 1000
val_steps: 500
Differences from the SFT Preset
-model_name: meta-llama/Llama-3.2-1B
+model_name: TinyLlama/TinyLlama-1.1B-Chat-v1.0
training:
- type: sft
+ type: dpo
+ dpo_beta: 0.1
- lr: 1.0e-5
+ lr: 5.0e-6
- warmup_steps: 200
+ warmup_steps: 100
- batch_size: 4
+ batch_size: 2
- grad_accum_steps: 4
+ grad_accum_steps: 8
| Change | Reason |
|---|---|
type: dpo |
DPO loss function + reference model logic |
dpo_beta: 0.1 |
Preference strength parameter |
lr halved |
DPO fine-tunes an already trained model |
batch_size halved |
DPO processes 2x tokens (chosen + rejected) |
grad_accum_steps doubled |
Maintains effective batch size (2x8 ~ 4x4) |
Key Points
- TinyLlama-1.1B-Chat: Based on Llama 2 architecture, uses the same
backbone: llama - Chat model: SFT is already applied, making it suitable for additional alignment via DPO
- Trainable parameters: ~31.1M
Execution
# Preflight check
eulerforge train --preset configs/presets/tinyllama_1.1b_dense_lora_dpo.yml --preflight
# DPO training
eulerforge train --preset configs/presets/tinyllama_1.1b_dense_lora_dpo.yml \
--set data.format=raw \
--set data.task=prompted_preference \
--set data.path=data/dpo_10k_raw.jsonl \
--set data.max_length=512
4. SFT to DPO Pipeline
This is a typical pipeline where SFT is performed first on a LLaMA model, followed by DPO.
Step 1: SFT Training
eulerforge train --preset configs/presets/llama3_1b_dense_lora_sft.yml \
--set data.format=raw \
--set data.task=sft \
--set data.path=data/sft_10k_raw.jsonl \
--set data.max_length=512
# → Checkpoint saved to outputs/run_YYYYMMDD_HHMMSS/
Step 2: DPO Training (Starting from SFT Checkpoint)
# Override model_name in the TinyLlama DPO preset with the SFT checkpoint
eulerforge train --preset configs/presets/tinyllama_1.1b_dense_lora_dpo.yml \
--set model_name=outputs/run_YYYYMMDD_HHMMSS/checkpoint_final \
--set data.format=raw \
--set data.task=prompted_preference \
--set data.path=data/dpo_10k_raw.jsonl \
--set data.max_length=512
Note: You can use the same model for both SFT and DPO, or use different LLaMA models. Just change
model_name.
5. Hyperparameter Tuning
LoRA Parameters
# Reduce lora_r (saves memory, reduces parameter count)
eulerforge train --preset configs/presets/llama3_1b_dense_lora_sft.yml \
--set injection.lora_r=16 \
--set injection.lora_alpha=32 \
--set data.format=raw --set data.task=sft \
--set data.path=data/sft_10k_raw.jsonl --set data.max_length=512
DPO Beta Adjustment
# Conservative alignment (stays closer to the reference model)
eulerforge train --preset configs/presets/tinyllama_1.1b_dense_lora_dpo.yml \
--set training.dpo_beta=0.05 \
--set data.format=raw --set data.task=prompted_preference \
--set data.path=data/dpo_10k_raw.jsonl --set data.max_length=512
Restricting Layer Range
# Apply LoRA only to the last 8 layers (Llama-3.2-1B = 16 layers)
eulerforge train --preset configs/presets/llama3_1b_dense_lora_sft.yml \
--set injection.start_layer=8 \
--set injection.num_layers=8 \
--set data.format=raw --set data.task=sft \
--set data.path=data/sft_10k_raw.jsonl --set data.max_length=512
6. Benchmarking
After training, you can evaluate the results with eulerforge bench.
Target-only (Checking SFT Results)
eulerforge bench --preset configs/bench/sft_target_only.yml \
--set bench.models.target.provider=local_hf \
--set bench.models.target.model=null \
--target-output-dir outputs/run_YYYYMMDD_HHMMSS \
--checkpoint final
Pairwise Comparison (Before/After DPO)
eulerforge bench --preset configs/bench/preference_pairwise.yml \
--set bench.models.target.provider=local_hf \
--set bench.models.target.model=null \
--target-output-dir outputs/run_YYYYMMDD_HHMMSS \
--checkpoint final
7. Troubleshooting
| Symptom | Cause | Solution |
|---|---|---|
backbone 'llama' not found |
LlamaAdapter not registered | Verify you have the latest version of EulerForge |
| OOM (out of memory) | Even LLaMA 1B requires VRAM for LoRA+DPO | Reduce batch_size, reduce lora_r, add model.load_precision.mode: int4 |
| SFT loss does not converge to 0 | max_length too short, data is truncated |
Increase data.max_length (512 -> 1024) |
DPO accuracy stuck at 0.5 |
dpo_beta too small or data quality issues |
Increase dpo_beta or check data quality |
model_name download fails |
HuggingFace access restrictions | Run huggingface-cli login, confirm model license agreement |
Next Steps
- SFT details: Plain LoRA Tutorial
- DPO details: DPO Training Guide
- Combining other strategies: LoRA MoE, FFN MoE Expert LoRA
- Benchmarking details: Bench Guide
- Hyperparameter optimization with Grid Search: Grid Search Guide