12. Hyperparameter Search (Grid / Random / Bayes)
Use the eulerforge grid command to systematically search hyperparameters such as learning rate, LoRA rank, target layers, and attention targets.
Preparation
Install Optuna:
pip install eulerforge[hpo]
Quick Start
# 1. Dry-run with example spec (validation only)
eulerforge grid configs/grid/sft_random_search.yml --dry-run
# 2. Run
eulerforge grid configs/grid/sft_random_search.yml
After completion, results are saved to outputs/grid/sft_random/summary.json.
YAML Spec Structure
version: 1
base_preset: "configs/presets/qwen3.5_0.8b_dense_lora_sft.yml" # Base configuration
run:
output_root: "outputs/grid" # Result storage directory
max_trials: 10 # Maximum number of trials
max_train_steps: 500 # Training steps per trial
data: # Optional (uses base_preset data if absent)
format: "raw"
task: "sft"
path: "data/sft_10k_raw.jsonl"
objective:
direction: "minimize" # "minimize" | "maximize"
metric: "train/total_loss" # Key from metrics.jsonl
step_agg: "last" # "last" | "min" | "mean"
search:
method: "random" # "grid" | "random" | "bayes"
sampler:
seed: 42
space:
- name: "training.lr"
type: "float"
low: 1e-6
high: 3e-4
log: true
- name: "injection.lora_r"
type: "int"
low: 8
high: 64
step: 8
- name: "injection.lora_dropout"
type: "categorical"
choices: [0.0, 0.05, 0.1]
Searchable Parameters (Space Reference)
Any setting from the base_preset can be specified using dot-path notation in space. Below is a categorized list of parameters that are most effective to search in practice.
Training Parameters (training.*)
| Parameter | Description | Recommended Search Range | Notes |
|---|---|---|---|
training.lr |
Learning rate | 1e-6 ~ 3e-4 (log) | Most important, always search |
training.weight_decay |
L2 regularization | 0.0 ~ 0.1 | |
training.warmup_steps |
Warmup step count | 50 ~ 500 | |
training.batch_size |
Batch size | [2, 4, 8] | Depends on GPU memory |
training.grad_accum_steps |
Gradient accumulation | [1, 2, 4, 8] | effective batch = batch x accum |
training.max_grad_norm |
Gradient clipping | [0.5, 1.0, 2.0] |
Training Type-Specific Parameters
| Parameter | Training Type | Recommended Search Range | Notes |
|---|---|---|---|
training.orpo_lambda |
ORPO | 0.1 ~ 2.0 | SFT vs preference loss ratio |
training.dpo_beta |
DPO | [0.05, 0.1, 0.2, 0.5] | Preference strength |
training.ppo.clip_range |
PPO | [0.1, 0.2, 0.3] | PPO clipping ε |
training.ppo.kl_coef |
PPO | [0.05, 0.1, 0.2] | KL penalty coefficient |
LoRA Structure Parameters (injection.*)
| Parameter | Description | Recommended Search Range | Notes |
|---|---|---|---|
injection.lora_r |
LoRA rank | 8 ~ 64 (step 8) | Larger = more expressive / more memory |
injection.lora_alpha |
LoRA scaling | 16 ~ 128 (step 16) | Typically lora_r x 2 |
injection.lora_dropout |
LoRA dropout | [0.0, 0.05, 0.1] |
LoRA Application Scope (injection.*)
| Parameter | Description | Recommended Search Range | Notes |
|---|---|---|---|
injection.start_layer |
Starting layer for application | 0 ~ 20 (step 4) | Later = task-specific |
injection.num_layers |
Number of layers to apply | [0, 4, 8, 12, 16] | 0 = all |
injection.target_keywords |
FFN LoRA targets | (see table below) | List value |
injection.attn_lora.enabled |
Attention LoRA activation | [true, false] | |
injection.attn_lora.keywords |
Attention LoRA targets | (see table below) | List value |
target_keywords Combination Examples
| Combination | Description |
|---|---|
[gate_proj, up_proj, down_proj] |
Full FFN (default) |
[gate_proj, down_proj] |
gate+down only (excluding up) |
[up_proj, down_proj] |
up+down only |
attn_lora.keywords Combination Examples
| Combination | Description |
|---|---|
[q_proj, v_proj] |
Q+V only (default, most common) |
[q_proj, k_proj, v_proj, o_proj] |
Full attention (maximum expressiveness) |
[q_proj] |
Q only (minimal configuration) |
List Value Support: List parameters such as
target_keywordsandattn_lora.keywordscan also be searched. Usecategoricaltype and specify multiple list combinations.
Search Method Selection
method: "random" — Random Search
Supports both continuous ranges and discrete choices. Suitable for quickly exploring the full space.
search:
method: "random"
space:
- name: "training.lr"
type: "float"
low: 1e-6
high: 1e-4
log: true
method: "grid" — Grid Search
Note: Only choices or categorical type can be used. Continuous ranges (low/high) will cause an error.
search:
method: "grid"
space:
- name: "injection.lora_r"
type: "categorical"
choices: [8, 16, 32] # ✅ Allowed for grid
- name: "training.lr"
type: "float"
choices: [1e-5, 5e-5, 1e-4] # ✅ choices format is allowed
# - name: "training.lr"
# type: "float"
# low: 1e-6
# high: 1e-4 # ❌ Not allowed for grid — error
method: "bayes" — Bayesian Optimization (TPE)
Learns from previous trial results to focus on promising areas. Efficient when the number of trials is limited.
search:
method: "bayes"
sampler:
seed: 42
space:
- name: "training.lr"
type: "float"
low: 1e-6
high: 1e-4
log: true
- name: "training.orpo_lambda"
type: "float"
low: 0.1
high: 2.0
Parameter Types
| Type | Description | Required Fields |
|---|---|---|
float |
Continuous real number | low + high or choices |
int |
Integer | low + high or choices |
categorical |
Discrete selection | choices |
Common optional fields:
- log: true — Log scale (float/int)
- step: N — Step interval (int)
Practical Search Strategies
Stage 1: Core HPs First
Start by searching only 3 parameters: lr + lora_r + lora_dropout (same as the basic example).
Stage 2: Add Structural Search
Fix the optimal values from Stage 1 and then search layer ranges:
space:
- name: "training.lr"
type: "float"
choices: [5e-5] # Fix optimal lr from Stage 1
- name: "injection.start_layer"
type: "categorical"
choices: [0, 8, 12, 16]
- name: "injection.num_layers"
type: "categorical"
choices: [0, 4, 8, 12]
Stage 3: Target Module Search
Search which modules benefit most from LoRA application:
space:
- name: "injection.target_keywords"
type: "categorical"
choices:
- [gate_proj, up_proj, down_proj] # Full
- [gate_proj, down_proj] # Reduced
- name: "injection.attn_lora.keywords"
type: "categorical"
choices:
- [q_proj, v_proj] # Q+V (default)
- [q_proj, k_proj, v_proj, o_proj] # Full
- name: "injection.attn_lora.enabled"
type: "categorical"
choices: [true, false] # Compare with/without attn LoRA
Tip: When the number of search dimensions is large, increase
max_trialsaccordingly. For random/bayes, at least 5-10x the number of dimensions is recommended.
Bench Evaluation (bench_eval)
After each trial's training, you can evaluate quality using a bench judge. Loss minimization and judge scores are tracked simultaneously, and the best trial for each criterion is reported separately.
run:
objective:
direction: "minimize"
metric: "train/total_loss"
step_agg: "last"
bench_eval:
enabled: true
bench_preset: "configs/bench/sft_judge.yml" # Bench YAML path
metric: "avg_score" # Judge score criterion
checkpoint: "final" # Checkpoint to evaluate
How It Works
- After each trial's training completes, a bench evaluation is run using the bench config specified in
bench_preset - The trial's checkpoint (
final/latest/best) is automatically set as the target model - The bench judge evaluates inference results and assigns scores
- The summary reports the best by loss and best by bench score separately
bench_eval Settings
| Field | Description | Default |
|---|---|---|
enabled |
Whether to activate | false |
bench_preset |
Bench YAML path (with judge) | (required) |
metric |
Score key to extract from bench summary | avg_score |
checkpoint |
Trial checkpoint to evaluate | final |
Valid metric values: avg_score, target_avg_score, baseline_avg_score
bench_preset Example
The bench_preset is identical to a regular bench YAML. The grid engine automatically overrides the target section with the trial checkpoint:
# configs/bench/sft_judge.yml (for bench_eval)
bench:
task: sft
data_path: data/sft_1k_raw.jsonl
sample:
k: 10
seed: 42
models:
target:
device: "cuda:0"
dtype: "bfloat16"
# baseline: # Optional: baseline model comparison
# enabled: true
# model_dir: "Qwen/Qwen3.5-0.8B-Base"
# device: "cuda:0"
judge:
enabled: true
provider: ollama
model: "gpt-oss:20b"
mode: pointwise
Output Structure
outputs/grid/
├── trial_0000/
│ ├── metrics.jsonl # Per-step metrics
│ ├── resolved_config.json # Applied configuration
│ ├── checkpoint-latest/
│ └── bench_eval/ # Bench eval results (when enabled)
│ ├── results.jsonl
│ └── summary.json
├── trial_0001/
│ └── ...
├── summary.json # Overall result summary
└── summary.csv # CSV version
summary.json example (with bench_eval enabled):
{
"best_trial": {
"number": 3,
"value": 1.2345,
"params": {"training.lr": 5e-05, "injection.lora_r": 16}
},
"best_by_bench": {
"number": 1,
"bench_score": 7.5,
"value": 1.8901,
"params": {"training.lr": 1e-04, "injection.lora_r": 32}
},
"all_trials": [
{"number": 0, "value": 2.456, "bench_score": 5.2, "params": {...}, "state": "COMPLETE"},
{"number": 1, "value": 1.890, "bench_score": 7.5, "params": {...}, "state": "COMPLETE"},
...
],
"bench_eval": {
"metric": "avg_score",
"checkpoint": "final",
"bench_preset": "configs/bench/sft_judge.yml"
}
}
metrics.jsonl
Each trial's metrics.jsonl records metrics for every training step:
{"step": 10, "train/total_loss": 2.345, "train/main_loss": 2.301, "train/learning_rate": 1e-05}
{"step": 20, "train/total_loss": 2.201, ...}
Specifying this key in objective.metric uses it as the trial's objective function value.
Provided Examples
| File | Method | Training Type | Default Space |
|---|---|---|---|
configs/grid/sft_random_search.yml |
random | SFT | lr, lora_r, dropout + commented-out extended space |
configs/grid/dpo_grid_search.yml |
grid | DPO | lora_r, dropout + commented-out beta, layers, keywords |
configs/grid/orpo_bayes_search.yml |
bayes | ORPO | lr, orpo_lambda, lora_r + commented-out extended space |
Usage tip: Uncomment the commented-out space items in each example to expand the search range.
References
- Spec rules in detail: docs/fixtures/specs/grid_search_spec.md
- CLI reference: docs/cli.md
- Validation rules: docs/fixtures/validation_rules.md