Home > EulerForge > Tutorials > 13. LLaMA Fine-Tuning

13. LLaMA Fine-Tuning

Overview

EulerForge natively supports LLaMA family models. With the backbone: llama setting, you can apply the same injection strategies and training methods to models such as LLaMA 2/3, TinyLlama, and Mistral (dense).

This tutorial covers two representative combinations:

Model Training Type Preset
Meta Llama-3.2-1B SFT configs/presets/llama3_1b_dense_lora_sft.yml
TinyLlama-1.1B-Chat DPO configs/presets/tinyllama_1.1b_dense_lora_dpo.yml

1. LLaMA Backbone Adapter

Setting backbone: llama automatically selects the LlamaAdapter.

Compatible Models

Model HuggingFace ID Parameters
Llama 3.2 1B meta-llama/Llama-3.2-1B 1.24B
Llama 3.2 3B meta-llama/Llama-3.2-3B 3.21B
Llama 3.1 8B meta-llama/Llama-3.1-8B 8.03B
TinyLlama 1.1B Chat TinyLlama/TinyLlama-1.1B-Chat-v1.0 1.10B
Mistral 7B (dense) mistralai/Mistral-7B-v0.3 7.24B

Note: Mixtral (MoE) uses backbone: mixtral. backbone: llama is for dense architectures only.

Structural Comparison with Qwen3

Item Qwen3 LLaMA
FFN Projections gate_proj, up_proj, down_proj gate_proj, up_proj, down_proj
Attention Projections q_proj, k_proj, v_proj, o_proj q_proj, k_proj, v_proj, o_proj
backbone value qwen3 llama
target_keywords identical identical

Since the FFN and attention projection names are the same, the injection section settings are exactly the same as those for Qwen3 presets. The only differences are backbone and model_name.


2. Preset A: Llama-3.2-1B SFT

Configuration File

configs/presets/llama3_1b_dense_lora_sft.yml:

device: cuda:0
backbone: llama
model_name: meta-llama/Llama-3.2-1B

injection:
  strategy: dense_lora
  lora_r: 48
  lora_alpha: 96
  lora_dropout: 0.05
  target_keywords: [gate_proj, up_proj, down_proj]
  start_layer: 0
  num_layers: 0            # 0 = all layers
  attn_lora:
    enabled: true
    keywords: [q_proj, v_proj]

training:
  type: sft
  phases:
    - step: 0
      trainable: ["lora", "attn_lora"]
  lr: 1.0e-5
  weight_decay: 0.01
  warmup_steps: 200
  max_train_steps: 5000
  batch_size: 4
  grad_accum_steps: 4
  max_grad_norm: 1.0
  log_steps: 50
  save_steps: 1000
  val_steps: 500

Key Points

Execution

# Preflight check
eulerforge train --preset configs/presets/llama3_1b_dense_lora_sft.yml --preflight

# SFT training
eulerforge train --preset configs/presets/llama3_1b_dense_lora_sft.yml \
    --set data.format=raw \
    --set data.task=sft \
    --set data.path=data/sft_10k_raw.jsonl \
    --set data.max_length=512

3. Preset B: TinyLlama-1.1B-Chat DPO

Configuration File

configs/presets/tinyllama_1.1b_dense_lora_dpo.yml:

device: cuda:0
backbone: llama
model_name: TinyLlama/TinyLlama-1.1B-Chat-v1.0

injection:
  strategy: dense_lora
  lora_r: 48
  lora_alpha: 96
  lora_dropout: 0.05
  target_keywords: [gate_proj, up_proj, down_proj]
  start_layer: 0
  num_layers: 0            # 0 = all layers
  attn_lora:
    enabled: true
    keywords: [q_proj, v_proj]

training:
  type: dpo
  dpo_beta: 0.1
  phases:
    - step: 0
      trainable: ["lora", "attn_lora"]
  lr: 5.0e-6
  weight_decay: 0.01
  warmup_steps: 100
  max_train_steps: 5000
  batch_size: 2
  grad_accum_steps: 8
  max_grad_norm: 1.0
  log_steps: 50
  save_steps: 1000
  val_steps: 500

Differences from the SFT Preset

-model_name: meta-llama/Llama-3.2-1B
+model_name: TinyLlama/TinyLlama-1.1B-Chat-v1.0

 training:
-  type: sft
+  type: dpo
+  dpo_beta: 0.1
-  lr: 1.0e-5
+  lr: 5.0e-6
-  warmup_steps: 200
+  warmup_steps: 100
-  batch_size: 4
+  batch_size: 2
-  grad_accum_steps: 4
+  grad_accum_steps: 8
Change Reason
type: dpo DPO loss function + reference model logic
dpo_beta: 0.1 Preference strength parameter
lr halved DPO fine-tunes an already trained model
batch_size halved DPO processes 2x tokens (chosen + rejected)
grad_accum_steps doubled Maintains effective batch size (2x8 ~ 4x4)

Key Points

Execution

# Preflight check
eulerforge train --preset configs/presets/tinyllama_1.1b_dense_lora_dpo.yml --preflight

# DPO training
eulerforge train --preset configs/presets/tinyllama_1.1b_dense_lora_dpo.yml \
    --set data.format=raw \
    --set data.task=prompted_preference \
    --set data.path=data/dpo_10k_raw.jsonl \
    --set data.max_length=512

4. SFT to DPO Pipeline

This is a typical pipeline where SFT is performed first on a LLaMA model, followed by DPO.

Step 1: SFT Training

eulerforge train --preset configs/presets/llama3_1b_dense_lora_sft.yml \
    --set data.format=raw \
    --set data.task=sft \
    --set data.path=data/sft_10k_raw.jsonl \
    --set data.max_length=512
# → Checkpoint saved to outputs/run_YYYYMMDD_HHMMSS/

Step 2: DPO Training (Starting from SFT Checkpoint)

# Override model_name in the TinyLlama DPO preset with the SFT checkpoint
eulerforge train --preset configs/presets/tinyllama_1.1b_dense_lora_dpo.yml \
    --set model_name=outputs/run_YYYYMMDD_HHMMSS/checkpoint_final \
    --set data.format=raw \
    --set data.task=prompted_preference \
    --set data.path=data/dpo_10k_raw.jsonl \
    --set data.max_length=512

Note: You can use the same model for both SFT and DPO, or use different LLaMA models. Just change model_name.


5. Hyperparameter Tuning

LoRA Parameters

# Reduce lora_r (saves memory, reduces parameter count)
eulerforge train --preset configs/presets/llama3_1b_dense_lora_sft.yml \
    --set injection.lora_r=16 \
    --set injection.lora_alpha=32 \
    --set data.format=raw --set data.task=sft \
    --set data.path=data/sft_10k_raw.jsonl --set data.max_length=512

DPO Beta Adjustment

# Conservative alignment (stays closer to the reference model)
eulerforge train --preset configs/presets/tinyllama_1.1b_dense_lora_dpo.yml \
    --set training.dpo_beta=0.05 \
    --set data.format=raw --set data.task=prompted_preference \
    --set data.path=data/dpo_10k_raw.jsonl --set data.max_length=512

Restricting Layer Range

# Apply LoRA only to the last 8 layers (Llama-3.2-1B = 16 layers)
eulerforge train --preset configs/presets/llama3_1b_dense_lora_sft.yml \
    --set injection.start_layer=8 \
    --set injection.num_layers=8 \
    --set data.format=raw --set data.task=sft \
    --set data.path=data/sft_10k_raw.jsonl --set data.max_length=512

6. Benchmarking

After training, you can evaluate the results with eulerforge bench.

Target-only (Checking SFT Results)

eulerforge bench --preset configs/bench/sft_target_only.yml \
    --set bench.models.target.provider=local_hf \
    --set bench.models.target.model=null \
    --target-output-dir outputs/run_YYYYMMDD_HHMMSS \
    --checkpoint final

Pairwise Comparison (Before/After DPO)

eulerforge bench --preset configs/bench/preference_pairwise.yml \
    --set bench.models.target.provider=local_hf \
    --set bench.models.target.model=null \
    --target-output-dir outputs/run_YYYYMMDD_HHMMSS \
    --checkpoint final

7. Troubleshooting

Symptom Cause Solution
backbone 'llama' not found LlamaAdapter not registered Verify you have the latest version of EulerForge
OOM (out of memory) Even LLaMA 1B requires VRAM for LoRA+DPO Reduce batch_size, reduce lora_r, add model.load_precision.mode: int4
SFT loss does not converge to 0 max_length too short, data is truncated Increase data.max_length (512 -> 1024)
DPO accuracy stuck at 0.5 dpo_beta too small or data quality issues Increase dpo_beta or check data quality
model_name download fails HuggingFace access restrictions Run huggingface-cli login, confirm model license agreement

Next Steps