13. LLaMA Fine-Tuning

Overview

EulerForge natively supports LLaMA family models. With the backbone: llama setting, you can apply the same injection strategies and training methods to models such as LLaMA 2/3, TinyLlama, and Mistral (dense).

This tutorial covers two representative combinations:

Model	Training Type	Preset
Meta Llama-3.2-1B	SFT	`configs/presets/llama3_1b_dense_lora_sft.yml`
TinyLlama-1.1B-Chat	DPO	`configs/presets/tinyllama_1.1b_dense_lora_dpo.yml`

Suitable for: Fine-tuning LLaMA family models, experimenting with non-Qwen backbones, SFT-to-DPO pipelines
Prerequisites: Completed the Getting Started Guide and Data Preprocessing

1. LLaMA Backbone Adapter

Setting backbone: llama automatically selects the LlamaAdapter.

Compatible Models

Model	HuggingFace ID	Parameters
Llama 3.2 1B	`meta-llama/Llama-3.2-1B`	1.24B
Llama 3.2 3B	`meta-llama/Llama-3.2-3B`	3.21B
Llama 3.1 8B	`meta-llama/Llama-3.1-8B`	8.03B
TinyLlama 1.1B Chat	`TinyLlama/TinyLlama-1.1B-Chat-v1.0`	1.10B
Mistral 7B (dense)	`mistralai/Mistral-7B-v0.3`	7.24B

Note: Mixtral (MoE) uses backbone: mixtral. backbone: llama is for dense architectures only.

Structural Comparison with Qwen3

Item	Qwen3	LLaMA
FFN Projections	`gate_proj`, `up_proj`, `down_proj`	`gate_proj`, `up_proj`, `down_proj`
Attention Projections	`q_proj`, `k_proj`, `v_proj`, `o_proj`	`q_proj`, `k_proj`, `v_proj`, `o_proj`
`backbone` value	`qwen3`	`llama`
`target_keywords`	identical	identical

Since the FFN and attention projection names are the same, the injection section settings are exactly the same as those for Qwen3 presets. The only differences are backbone and model_name.

2. Preset A: Llama-3.2-1B SFT

Configuration File

configs/presets/llama3_1b_dense_lora_sft.yml:

device: cuda:0
backbone: llama
model_name: meta-llama/Llama-3.2-1B

injection:
  strategy: dense_lora
  lora_r: 48
  lora_alpha: 96
  lora_dropout: 0.05
  target_keywords: [gate_proj, up_proj, down_proj]
  start_layer: 0
  num_layers: 0            # 0 = all layers
  attn_lora:
    enabled: true
    keywords: [q_proj, v_proj]

training:
  type: sft
  phases:
    - step: 0
      trainable: ["lora", "attn_lora"]
  lr: 1.0e-5
  weight_decay: 0.01
  warmup_steps: 200
  max_train_steps: 5000
  batch_size: 4
  grad_accum_steps: 4
  max_grad_norm: 1.0
  log_steps: 50
  save_steps: 1000
  val_steps: 500

Key Points

backbone: llama: Uses LlamaAdapter (compatible with both Llama 2/3)
model_name: meta-llama/Llama-3.2-1B: Llama 3.2 1B Base model
dense_lora + single phase: The simplest LoRA configuration
Trainable parameters: ~28.7M (LoRA parameters only)

Execution

# Preflight check
eulerforge train --preset configs/presets/llama3_1b_dense_lora_sft.yml --preflight

# SFT training
eulerforge train --preset configs/presets/llama3_1b_dense_lora_sft.yml \
    --set data.format=raw \
    --set data.task=sft \
    --set data.path=data/sft_10k_raw.jsonl \
    --set data.max_length=512

3. Preset B: TinyLlama-1.1B-Chat DPO

Configuration File

configs/presets/tinyllama_1.1b_dense_lora_dpo.yml:

device: cuda:0
backbone: llama
model_name: TinyLlama/TinyLlama-1.1B-Chat-v1.0

injection:
  strategy: dense_lora
  lora_r: 48
  lora_alpha: 96
  lora_dropout: 0.05
  target_keywords: [gate_proj, up_proj, down_proj]
  start_layer: 0
  num_layers: 0            # 0 = all layers
  attn_lora:
    enabled: true
    keywords: [q_proj, v_proj]

training:
  type: dpo
  dpo_beta: 0.1
  phases:
    - step: 0
      trainable: ["lora", "attn_lora"]
  lr: 5.0e-6
  weight_decay: 0.01
  warmup_steps: 100
  max_train_steps: 5000
  batch_size: 2
  grad_accum_steps: 8
  max_grad_norm: 1.0
  log_steps: 50
  save_steps: 1000
  val_steps: 500

Differences from the SFT Preset

-model_name: meta-llama/Llama-3.2-1B
+model_name: TinyLlama/TinyLlama-1.1B-Chat-v1.0

 training:
-  type: sft
+  type: dpo
+  dpo_beta: 0.1
-  lr: 1.0e-5
+  lr: 5.0e-6
-  warmup_steps: 200
+  warmup_steps: 100
-  batch_size: 4
+  batch_size: 2
-  grad_accum_steps: 4
+  grad_accum_steps: 8

Change	Reason
`type: dpo`	DPO loss function + reference model logic
`dpo_beta: 0.1`	Preference strength parameter
`lr` halved	DPO fine-tunes an already trained model
`batch_size` halved	DPO processes 2x tokens (chosen + rejected)
`grad_accum_steps` doubled	Maintains effective batch size (2x8 ~ 4x4)

Key Points

TinyLlama-1.1B-Chat: Based on Llama 2 architecture, uses the same backbone: llama
Chat model: SFT is already applied, making it suitable for additional alignment via DPO
Trainable parameters: ~31.1M

Execution

# Preflight check
eulerforge train --preset configs/presets/tinyllama_1.1b_dense_lora_dpo.yml --preflight

# DPO training
eulerforge train --preset configs/presets/tinyllama_1.1b_dense_lora_dpo.yml \
    --set data.format=raw \
    --set data.task=prompted_preference \
    --set data.path=data/dpo_10k_raw.jsonl \
    --set data.max_length=512

4. SFT to DPO Pipeline

This is a typical pipeline where SFT is performed first on a LLaMA model, followed by DPO.

Step 1: SFT Training

eulerforge train --preset configs/presets/llama3_1b_dense_lora_sft.yml \
    --set data.format=raw \
    --set data.task=sft \
    --set data.path=data/sft_10k_raw.jsonl \
    --set data.max_length=512
# → Checkpoint saved to outputs/run_YYYYMMDD_HHMMSS/

Step 2: DPO Training (Starting from SFT Checkpoint)

# Override model_name in the TinyLlama DPO preset with the SFT checkpoint
eulerforge train --preset configs/presets/tinyllama_1.1b_dense_lora_dpo.yml \
    --set model_name=outputs/run_YYYYMMDD_HHMMSS/checkpoint_final \
    --set data.format=raw \
    --set data.task=prompted_preference \
    --set data.path=data/dpo_10k_raw.jsonl \
    --set data.max_length=512

Note: You can use the same model for both SFT and DPO, or use different LLaMA models. Just change model_name.

5. Hyperparameter Tuning

LoRA Parameters

# Reduce lora_r (saves memory, reduces parameter count)
eulerforge train --preset configs/presets/llama3_1b_dense_lora_sft.yml \
    --set injection.lora_r=16 \
    --set injection.lora_alpha=32 \
    --set data.format=raw --set data.task=sft \
    --set data.path=data/sft_10k_raw.jsonl --set data.max_length=512

DPO Beta Adjustment

# Conservative alignment (stays closer to the reference model)
eulerforge train --preset configs/presets/tinyllama_1.1b_dense_lora_dpo.yml \
    --set training.dpo_beta=0.05 \
    --set data.format=raw --set data.task=prompted_preference \
    --set data.path=data/dpo_10k_raw.jsonl --set data.max_length=512

Restricting Layer Range

# Apply LoRA only to the last 8 layers (Llama-3.2-1B = 16 layers)
eulerforge train --preset configs/presets/llama3_1b_dense_lora_sft.yml \
    --set injection.start_layer=8 \
    --set injection.num_layers=8 \
    --set data.format=raw --set data.task=sft \
    --set data.path=data/sft_10k_raw.jsonl --set data.max_length=512

6. Benchmarking

After training, you can evaluate the results with eulerforge bench.

Target-only (Checking SFT Results)

eulerforge bench --preset configs/bench/sft_target_only.yml \
    --set bench.models.target.provider=local_hf \
    --set bench.models.target.model=null \
    --target-output-dir outputs/run_YYYYMMDD_HHMMSS \
    --checkpoint final

Pairwise Comparison (Before/After DPO)

eulerforge bench --preset configs/bench/preference_pairwise.yml \
    --set bench.models.target.provider=local_hf \
    --set bench.models.target.model=null \
    --target-output-dir outputs/run_YYYYMMDD_HHMMSS \
    --checkpoint final

7. Troubleshooting

Symptom	Cause	Solution
`backbone 'llama' not found`	LlamaAdapter not registered	Verify you have the latest version of EulerForge
OOM (out of memory)	Even LLaMA 1B requires VRAM for LoRA+DPO	Reduce `batch_size`, reduce `lora_r`, add `model.load_precision.mode: int4`
SFT loss does not converge to 0	`max_length` too short, data is truncated	Increase `data.max_length` (512 -> 1024)
DPO `accuracy` stuck at 0.5	`dpo_beta` too small or data quality issues	Increase `dpo_beta` or check data quality
`model_name` download fails	HuggingFace access restrictions	Run `huggingface-cli login`, confirm model license agreement

Next Steps

SFT details: Plain LoRA Tutorial
DPO details: DPO Training Guide
Combining other strategies: LoRA MoE, FFN MoE Expert LoRA
Benchmarking details: Bench Guide
Hyperparameter optimization with Grid Search: Grid Search Guide

← Prev 12. Hyperparameter Search (Grid / Random / Bayes)14. LoRA Handoff Scheduling Next →