1. Dense LoRA
Overview
dense_lora is the most basic fine-tuning strategy in EulerForge. It wraps nn.Linear layers in the model's FFN (Feed-Forward Network) and attention layers with LoRALinear, freezing the original weights and training only small low-rank parameters.
- Suitable for: Simple fine-tuning, domain adaptation, rapid experimentation
- Compatible models: Qwen, LLaMA, Gemma 3, Mixtral (all backbones)
- Reference presets:
configs/presets/qwen3.5_0.8b_dense_lora_sft.yml,configs/presets/gemma3_1b_dense_lora_sft.yml
Prerequisites
- EulerForge installation complete (see Getting Started)
- Data preprocessing complete (
data/sft_10k_raw.jsonlgenerated)
1. Where: Where Is It Injected?
BackboneAdapter traverses the model structure to find injection targets.
Discovery Process
BackboneAdapter.find_transformer_layers(model)-- finds all transformer blocks.- Within each block, searches for
nn.Linearmodules matchingtarget_keywords. - (Optional) If
attn_lora.enabled: true, attention projections are also searched.
Target Modules
| Area | Target Keywords | Target Modules |
|---|---|---|
| FFN | gate_proj, up_proj, down_proj |
Linear layers inside FFN |
| Attention | q_proj, v_proj |
Attention projection Linear layers |
Related Configuration
backbone: qwen3 # Backbone adapter selection (qwen3/qwen3.5/llama3/gemma3/mixtral)
injection:
target_keywords: [gate_proj, up_proj, down_proj] # FFN targets
start_layer: 0 # Starting layer for injection (0 = from the beginning)
num_layers: 0 # Number of layers to apply (0 = all)
attn_lora:
enabled: true # Enable attention LoRA
keywords: [q_proj, v_proj] # Attention targets
You can control the injection scope with start_layer and num_layers. For example, in a 28-layer model, setting start_layer: 14, num_layers: 14 applies LoRA only to the last 14 layers.
2. What: What Is Injected?
The DenseLoRAInjection class calls inject_dense_lora_inplace() to modify the model in-place.
Transformation Process
Each target nn.Linear is wrapped with LoRALinear:
Before: nn.Linear(in_features, out_features)
After: LoRALinear
+-- base_layer: nn.Linear (frozen, original weights)
+-- lora_A: Parameter(r, in_features) <- trainable
+-- lora_B: Parameter(out_features, r) <- trainable
+-- scaling: alpha / r
Forward Operation
[Input x]
+-- base_layer(x) -> base_out (original output, frozen)
|
+-- dropout(x)
-> x @ lora_A.T -> (batch, r) low-rank projection
-> result @ lora_B.T -> (batch, out) restoration
-> * scaling -> lora_out scaling
Final output = base_out + lora_out
scaling = alpha / r(e.g., 96/48 = 2.0) controls the magnitude of LoRA output.dropoutis applied only to the LoRA branch input.- If
r=0, LoRA acts as Identity (no change in output).
Related Configuration
injection:
strategy: dense_lora # Strategy selection
lora_r: 48 # LoRA rank (trainable parameter size)
lora_alpha: 96 # Scaling factor (scaling = alpha/r)
lora_dropout: 0.05 # LoRA branch dropout
Parameter Guide:
- lora_r: Larger values increase expressiveness but also memory/computation cost. Typically 16-64.
- lora_alpha: Usually set to 2x lora_r (scaling = 2.0).
- lora_dropout: Prevents overfitting. Recommended range: 0.0-0.1.
3. When: When Are Which Parameters Trained?
dense_lora uses the simplest single-phase schedule.
Phase Configuration
training:
phases:
- step: 0
trainable: ["lora", "attn_lora"]
From step 0 to the end of training, only the lora (FFN LoRA) and attn_lora (attention LoRA) groups are trained.
Timeline
Step 0 -----------------------------------------> Step 5000
|
+-- [lora + attn_lora training]
base_layer: frozen
lora_A, lora_B: trainable
- Single phase, so there are no phase transitions.
- No optimizer reconstruction occurs.
- The
routergroup is not used (dense_lora has no router).
4. Full Configuration File Walkthrough
Full contents of configs/presets/qwen3.5_0.8b_dense_lora_sft.yml:
# -- Model Info --
device: cuda:0 # GPU device
backbone: qwen3 # [Where] Backbone adapter: Qwen3Adapter
model_name: Qwen/Qwen3.5-0.8B-Base # HuggingFace model ID
# -- Injection Settings --
injection:
strategy: dense_lora # [What] Dense LoRA strategy
lora_r: 48 # [What] LoRA rank
lora_alpha: 96 # [What] Scaling factor (96/48 = 2.0)
lora_dropout: 0.05 # [What] LoRA dropout
target_keywords: [gate_proj, up_proj, down_proj] # [Where] FFN target keywords
start_layer: 0 # [Where] Starting layer
num_layers: 0 # [Where] 0 = all layers
attn_lora: # [Where] Attention LoRA
enabled: true
keywords: [q_proj, v_proj]
# -- Training Settings --
training:
type: sft # SFT (Supervised Fine-Tuning)
phases: # [When] Phase schedule
- step: 0
trainable: ["lora", "attn_lora"] # Train only LoRA from step 0
lr: 1.0e-5 # Learning rate
weight_decay: 0.01 # Weight decay
warmup_steps: 200 # Learning rate warmup steps
max_train_steps: 5000 # Maximum training steps
batch_size: 4 # Batch size
grad_accum_steps: 4 # Gradient accumulation steps (effective batch = 4*4 = 16)
max_grad_norm: 1.0 # Gradient clipping
log_steps: 50 # Logging interval
save_steps: 1000 # Checkpoint save interval
val_steps: 500 # Validation interval
5. Running
Basic Execution
eulerforge train --preset configs/presets/qwen3.5_0.8b_dense_lora_sft.yml \
--set data.format=raw \
--set data.task=sft \
--set data.path=data/sft_10k_raw.jsonl \
--set data.max_length=512
Configuration Overrides
eulerforge train --preset configs/presets/qwen3.5_0.8b_dense_lora_sft.yml \
--set data.format=raw \
--set data.task=sft \
--set data.path=data/sft_10k_raw.jsonl \
--set data.max_length=512 \
--set training.lr=2e-5 \
--set injection.lora_r=32 \
--set training.max_train_steps=10000
Validate Configuration Only
eulerforge train --preset configs/presets/qwen3.5_0.8b_dense_lora_sft.yml \
--validate-only
Preflight Check
Loads the model and applies injection, then displays parameter counts by phase group. Training is not performed.
eulerforge train --preset configs/presets/qwen3.5_0.8b_dense_lora_sft.yml \
--preflight
Debug Mode
eulerforge train --preset configs/presets/qwen3.5_0.8b_dense_lora_sft.yml \
--debug \
--debug-trainable-names \
--debug-every 10 \
--set data.format=raw \
--set data.task=sft \
--set data.path=data/sft_10k_raw.jsonl \
--set data.max_length=512
6. Checkpoint Structure
When training completes, checkpoints contain base weights + LoRA parameters.
Checkpoint structure:
+-- layer.N.mlp.gate_proj.base_layer.weight <- original weight (frozen)
+-- layer.N.mlp.gate_proj.lora_A <- LoRA trained parameter
+-- layer.N.mlp.gate_proj.lora_B <- LoRA trained parameter
+-- layer.N.mlp.up_proj.base_layer.weight
+-- layer.N.mlp.up_proj.lora_A
+-- layer.N.mlp.up_proj.lora_B
+-- ...
+-- (if attn_lora enabled, q_proj, v_proj follow the same pattern)
Bench Loading
When loading a dense_lora checkpoint with eulerforge bench:
- LoRA is merged into base:
merged = base_w + (lora_B @ lora_A) * (alpha / r) - The result is a dense model (standard model without LoRA)
- Inference as a regular HF model without any MoE structure
7. Debugging and Troubleshooting
| Symptom | Cause | Solution |
|---|---|---|
| "No trainable parameters" | target_keywords don't match the model's actual layer names |
Check parameter names with --debug-trainable-names |
| "LoRA layers will act as Identity" | lora_r: 0 is set |
Set lora_r to 1 or higher |
| OOM (out of memory) | Insufficient VRAM for model size | Add model.load_precision.mode: int4 (4bit QLoRA), reduce batch_size, reduce lora_r |
| "dense_lora typically has no router params" warning | router group included in phase |
Remove router from trainable |
Next Steps
- If you need multi-task adaptive LoRA -> Mixture LoRA Tutorial
- To convert a dense model to MoE -> MoE Expert LoRA Tutorial
- For DPO training -> DPO Training Guide