1. Dense LoRA

Overview

dense_lora is the most basic fine-tuning strategy in EulerForge. It wraps nn.Linear layers in the model's FFN (Feed-Forward Network) and attention layers with LoRALinear, freezing the original weights and training only small low-rank parameters.

Suitable for: Simple fine-tuning, domain adaptation, rapid experimentation
Compatible models: Qwen, LLaMA, Gemma 3, Mixtral (all backbones)
Reference presets: configs/presets/qwen3.5_0.8b_dense_lora_sft.yml, configs/presets/gemma3_1b_dense_lora_sft.yml

Prerequisites

EulerForge installation complete (see Getting Started)
Data preprocessing complete (data/sft_10k_raw.jsonl generated)

1. Where: Where Is It Injected?

BackboneAdapter traverses the model structure to find injection targets.

Discovery Process

BackboneAdapter.find_transformer_layers(model) -- finds all transformer blocks.
Within each block, searches for nn.Linear modules matching target_keywords.
(Optional) If attn_lora.enabled: true, attention projections are also searched.

Target Modules

Area	Target Keywords	Target Modules
FFN	`gate_proj`, `up_proj`, `down_proj`	Linear layers inside FFN
Attention	`q_proj`, `v_proj`	Attention projection Linear layers

backbone: qwen3                    # Backbone adapter selection (qwen3/qwen3.5/llama3/gemma3/mixtral)

injection:
  target_keywords: [gate_proj, up_proj, down_proj]  # FFN targets
  start_layer: 0                   # Starting layer for injection (0 = from the beginning)
  num_layers: 0                    # Number of layers to apply (0 = all)
  attn_lora:
    enabled: true                  # Enable attention LoRA
    keywords: [q_proj, v_proj]     # Attention targets

You can control the injection scope with start_layer and num_layers. For example, in a 28-layer model, setting start_layer: 14, num_layers: 14 applies LoRA only to the last 14 layers.

2. What: What Is Injected?

The DenseLoRAInjection class calls inject_dense_lora_inplace() to modify the model in-place.

Transformation Process

Each target nn.Linear is wrapped with LoRALinear:

Before: nn.Linear(in_features, out_features)

After:  LoRALinear
        +-- base_layer: nn.Linear (frozen, original weights)
        +-- lora_A: Parameter(r, in_features)    <- trainable
        +-- lora_B: Parameter(out_features, r)    <- trainable
        +-- scaling: alpha / r

Forward Operation

[Input x]
    +-- base_layer(x)           -> base_out (original output, frozen)
    |
    +-- dropout(x)
        -> x @ lora_A.T          -> (batch, r)       low-rank projection
        -> result @ lora_B.T     -> (batch, out)      restoration
        -> * scaling              -> lora_out          scaling

Final output = base_out + lora_out

scaling = alpha / r (e.g., 96/48 = 2.0) controls the magnitude of LoRA output.
dropout is applied only to the LoRA branch input.
If r=0, LoRA acts as Identity (no change in output).

injection:
  strategy: dense_lora             # Strategy selection
  lora_r: 48                       # LoRA rank (trainable parameter size)
  lora_alpha: 96                   # Scaling factor (scaling = alpha/r)
  lora_dropout: 0.05               # LoRA branch dropout

Parameter Guide: - lora_r: Larger values increase expressiveness but also memory/computation cost. Typically 16-64. - lora_alpha: Usually set to 2x lora_r (scaling = 2.0). - lora_dropout: Prevents overfitting. Recommended range: 0.0-0.1.

3. When: When Are Which Parameters Trained?

dense_lora uses the simplest single-phase schedule.

Phase Configuration

training:
  phases:
    - step: 0
      trainable: ["lora", "attn_lora"]

From step 0 to the end of training, only the lora (FFN LoRA) and attn_lora (attention LoRA) groups are trained.

Timeline

Step 0 -----------------------------------------> Step 5000
  |
  +-- [lora + attn_lora training]
      base_layer: frozen
      lora_A, lora_B: trainable

Single phase, so there are no phase transitions.
No optimizer reconstruction occurs.
The router group is not used (dense_lora has no router).

4. Full Configuration File Walkthrough

Full contents of configs/presets/qwen3.5_0.8b_dense_lora_sft.yml:

# -- Model Info --
device: cuda:0                              # GPU device
backbone: qwen3                             # [Where] Backbone adapter: Qwen3Adapter
model_name: Qwen/Qwen3.5-0.8B-Base           # HuggingFace model ID

# -- Injection Settings --
injection:
  strategy: dense_lora                      # [What] Dense LoRA strategy
  lora_r: 48                                # [What] LoRA rank
  lora_alpha: 96                            # [What] Scaling factor (96/48 = 2.0)
  lora_dropout: 0.05                        # [What] LoRA dropout
  target_keywords: [gate_proj, up_proj, down_proj]  # [Where] FFN target keywords
  start_layer: 0                            # [Where] Starting layer
  num_layers: 0                             # [Where] 0 = all layers
  attn_lora:                                # [Where] Attention LoRA
    enabled: true
    keywords: [q_proj, v_proj]

# -- Training Settings --
training:
  type: sft                                 # SFT (Supervised Fine-Tuning)
  phases:                                   # [When] Phase schedule
    - step: 0
      trainable: ["lora", "attn_lora"]      # Train only LoRA from step 0
  lr: 1.0e-5                                # Learning rate
  weight_decay: 0.01                        # Weight decay
  warmup_steps: 200                         # Learning rate warmup steps
  max_train_steps: 5000                     # Maximum training steps
  batch_size: 4                             # Batch size
  grad_accum_steps: 4                       # Gradient accumulation steps (effective batch = 4*4 = 16)
  max_grad_norm: 1.0                        # Gradient clipping
  log_steps: 50                             # Logging interval
  save_steps: 1000                          # Checkpoint save interval
  val_steps: 500                            # Validation interval

5. Running

Basic Execution

eulerforge train --preset configs/presets/qwen3.5_0.8b_dense_lora_sft.yml \
    --set data.format=raw \
    --set data.task=sft \
    --set data.path=data/sft_10k_raw.jsonl \
    --set data.max_length=512

Configuration Overrides

eulerforge train --preset configs/presets/qwen3.5_0.8b_dense_lora_sft.yml \
    --set data.format=raw \
    --set data.task=sft \
    --set data.path=data/sft_10k_raw.jsonl \
    --set data.max_length=512 \
    --set training.lr=2e-5 \
    --set injection.lora_r=32 \
    --set training.max_train_steps=10000

Validate Configuration Only

eulerforge train --preset configs/presets/qwen3.5_0.8b_dense_lora_sft.yml \
    --validate-only

Preflight Check

Loads the model and applies injection, then displays parameter counts by phase group. Training is not performed.

eulerforge train --preset configs/presets/qwen3.5_0.8b_dense_lora_sft.yml \
    --preflight

Debug Mode

eulerforge train --preset configs/presets/qwen3.5_0.8b_dense_lora_sft.yml \
    --debug \
    --debug-trainable-names \
    --debug-every 10 \
    --set data.format=raw \
    --set data.task=sft \
    --set data.path=data/sft_10k_raw.jsonl \
    --set data.max_length=512

6. Checkpoint Structure

When training completes, checkpoints contain base weights + LoRA parameters.

Checkpoint structure:
+-- layer.N.mlp.gate_proj.base_layer.weight   <- original weight (frozen)
+-- layer.N.mlp.gate_proj.lora_A              <- LoRA trained parameter
+-- layer.N.mlp.gate_proj.lora_B              <- LoRA trained parameter
+-- layer.N.mlp.up_proj.base_layer.weight
+-- layer.N.mlp.up_proj.lora_A
+-- layer.N.mlp.up_proj.lora_B
+-- ...
+-- (if attn_lora enabled, q_proj, v_proj follow the same pattern)

Bench Loading

When loading a dense_lora checkpoint with eulerforge bench:

LoRA is merged into base: merged = base_w + (lora_B @ lora_A) * (alpha / r)
The result is a dense model (standard model without LoRA)
Inference as a regular HF model without any MoE structure

7. Debugging and Troubleshooting

Symptom	Cause	Solution
"No trainable parameters"	`target_keywords` don't match the model's actual layer names	Check parameter names with `--debug-trainable-names`
"LoRA layers will act as Identity"	`lora_r: 0` is set	Set `lora_r` to 1 or higher
OOM (out of memory)	Insufficient VRAM for model size	Add `model.load_precision.mode: int4` (4bit QLoRA), reduce `batch_size`, reduce `lora_r`
"dense_lora typically has no router params" warning	`router` group included in phase	Remove `router` from `trainable`

Next Steps

If you need multi-task adaptive LoRA -> Mixture LoRA Tutorial
To convert a dense model to MoE -> MoE Expert LoRA Tutorial
For DPO training -> DPO Training Guide

← Prev 0. Data Preprocessing 2. Mixture-of-LoRAs Next →