12. Hyperparameter Search (Grid / Random / Bayes)

Use the eulerforge grid command to systematically search hyperparameters such as learning rate, LoRA rank, target layers, and attention targets.

Preparation

Install Optuna:

pip install eulerforge[hpo]

Quick Start

# 1. Dry-run with example spec (validation only)
eulerforge grid configs/grid/sft_random_search.yml --dry-run

# 2. Run
eulerforge grid configs/grid/sft_random_search.yml

After completion, results are saved to outputs/grid/sft_random/summary.json.

YAML Spec Structure

version: 1
base_preset: "configs/presets/qwen3.5_0.8b_dense_lora_sft.yml"  # Base configuration

run:
  output_root: "outputs/grid"       # Result storage directory
  max_trials: 10                    # Maximum number of trials
  max_train_steps: 500              # Training steps per trial

  data:                             # Optional (uses base_preset data if absent)
    format: "raw"
    task: "sft"
    path: "data/sft_10k_raw.jsonl"

  objective:
    direction: "minimize"           # "minimize" | "maximize"
    metric: "train/total_loss"      # Key from metrics.jsonl
    step_agg: "last"                # "last" | "min" | "mean"

search:
  method: "random"                  # "grid" | "random" | "bayes"
  sampler:
    seed: 42
  space:
    - name: "training.lr"
      type: "float"
      low: 1e-6
      high: 3e-4
      log: true

    - name: "injection.lora_r"
      type: "int"
      low: 8
      high: 64
      step: 8

    - name: "injection.lora_dropout"
      type: "categorical"
      choices: [0.0, 0.05, 0.1]

Searchable Parameters (Space Reference)

Any setting from the base_preset can be specified using dot-path notation in space. Below is a categorized list of parameters that are most effective to search in practice.

Training Parameters (training.*)

Parameter	Description	Recommended Search Range	Notes
`training.lr`	Learning rate	1e-6 ~ 3e-4 (log)	Most important, always search
`training.weight_decay`	L2 regularization	0.0 ~ 0.1
`training.warmup_steps`	Warmup step count	50 ~ 500
`training.batch_size`	Batch size	[2, 4, 8]	Depends on GPU memory
`training.grad_accum_steps`	Gradient accumulation	[1, 2, 4, 8]	effective batch = batch x accum
`training.max_grad_norm`	Gradient clipping	[0.5, 1.0, 2.0]

Training Type-Specific Parameters

Parameter	Training Type	Recommended Search Range	Notes
`training.orpo_lambda`	ORPO	0.1 ~ 2.0	SFT vs preference loss ratio
`training.dpo_beta`	DPO	[0.05, 0.1, 0.2, 0.5]	Preference strength
`training.ppo.clip_range`	PPO	[0.1, 0.2, 0.3]	PPO clipping ε
`training.ppo.kl_coef`	PPO	[0.05, 0.1, 0.2]	KL penalty coefficient

LoRA Structure Parameters (injection.*)

Parameter	Description	Recommended Search Range	Notes
`injection.lora_r`	LoRA rank	8 ~ 64 (step 8)	Larger = more expressive / more memory
`injection.lora_alpha`	LoRA scaling	16 ~ 128 (step 16)	Typically lora_r x 2
`injection.lora_dropout`	LoRA dropout	[0.0, 0.05, 0.1]

LoRA Application Scope (injection.*)

Parameter	Description	Recommended Search Range	Notes
`injection.start_layer`	Starting layer for application	0 ~ 20 (step 4)	Later = task-specific
`injection.num_layers`	Number of layers to apply	[0, 4, 8, 12, 16]	0 = all
`injection.target_keywords`	FFN LoRA targets	(see table below)	List value
`injection.attn_lora.enabled`	Attention LoRA activation	[true, false]
`injection.attn_lora.keywords`	Attention LoRA targets	(see table below)	List value

target_keywords Combination Examples

Combination	Description
`[gate_proj, up_proj, down_proj]`	Full FFN (default)
`[gate_proj, down_proj]`	gate+down only (excluding up)
`[up_proj, down_proj]`	up+down only

attn_lora.keywords Combination Examples

Combination	Description
`[q_proj, v_proj]`	Q+V only (default, most common)
`[q_proj, k_proj, v_proj, o_proj]`	Full attention (maximum expressiveness)
`[q_proj]`	Q only (minimal configuration)

List Value Support: List parameters such as target_keywords and attn_lora.keywords can also be searched. Use categorical type and specify multiple list combinations.

Search Method Selection

method: "random" — Random Search

Supports both continuous ranges and discrete choices. Suitable for quickly exploring the full space.

search:
  method: "random"
  space:
    - name: "training.lr"
      type: "float"
      low: 1e-6
      high: 1e-4
      log: true

method: "grid" — Grid Search

Note: Only choices or categorical type can be used. Continuous ranges (low/high) will cause an error.

search:
  method: "grid"
  space:
    - name: "injection.lora_r"
      type: "categorical"
      choices: [8, 16, 32]          # ✅ Allowed for grid
    - name: "training.lr"
      type: "float"
      choices: [1e-5, 5e-5, 1e-4]  # ✅ choices format is allowed
    # - name: "training.lr"
    #   type: "float"
    #   low: 1e-6
    #   high: 1e-4                   # ❌ Not allowed for grid — error

method: "bayes" — Bayesian Optimization (TPE)

Learns from previous trial results to focus on promising areas. Efficient when the number of trials is limited.

search:
  method: "bayes"
  sampler:
    seed: 42
  space:
    - name: "training.lr"
      type: "float"
      low: 1e-6
      high: 1e-4
      log: true
    - name: "training.orpo_lambda"
      type: "float"
      low: 0.1
      high: 2.0

Parameter Types

Type	Description	Required Fields
`float`	Continuous real number	`low` + `high` or `choices`
`int`	Integer	`low` + `high` or `choices`
`categorical`	Discrete selection	`choices`

Common optional fields: - log: true — Log scale (float/int) - step: N — Step interval (int)

Practical Search Strategies

Stage 1: Core HPs First

Start by searching only 3 parameters: lr + lora_r + lora_dropout (same as the basic example).

Stage 2: Add Structural Search

Fix the optimal values from Stage 1 and then search layer ranges:

space:
  - name: "training.lr"
    type: "float"
    choices: [5e-5]                    # Fix optimal lr from Stage 1
  - name: "injection.start_layer"
    type: "categorical"
    choices: [0, 8, 12, 16]
  - name: "injection.num_layers"
    type: "categorical"
    choices: [0, 4, 8, 12]

Stage 3: Target Module Search

Search which modules benefit most from LoRA application:

space:
  - name: "injection.target_keywords"
    type: "categorical"
    choices:
      - [gate_proj, up_proj, down_proj]   # Full
      - [gate_proj, down_proj]            # Reduced
  - name: "injection.attn_lora.keywords"
    type: "categorical"
    choices:
      - [q_proj, v_proj]                  # Q+V (default)
      - [q_proj, k_proj, v_proj, o_proj]  # Full
  - name: "injection.attn_lora.enabled"
    type: "categorical"
    choices: [true, false]                # Compare with/without attn LoRA

Tip: When the number of search dimensions is large, increase max_trials accordingly. For random/bayes, at least 5-10x the number of dimensions is recommended.

Bench Evaluation (bench_eval)

After each trial's training, you can evaluate quality using a bench judge. Loss minimization and judge scores are tracked simultaneously, and the best trial for each criterion is reported separately.

run:
  objective:
    direction: "minimize"
    metric: "train/total_loss"
    step_agg: "last"

  bench_eval:
    enabled: true
    bench_preset: "configs/bench/sft_judge.yml"  # Bench YAML path
    metric: "avg_score"                          # Judge score criterion
    checkpoint: "final"                          # Checkpoint to evaluate

How It Works

After each trial's training completes, a bench evaluation is run using the bench config specified in bench_preset
The trial's checkpoint (final/latest/best) is automatically set as the target model
The bench judge evaluates inference results and assigns scores
The summary reports the best by loss and best by bench score separately

bench_eval Settings

Field	Description	Default
`enabled`	Whether to activate	`false`
`bench_preset`	Bench YAML path (with judge)	(required)
`metric`	Score key to extract from bench summary	`avg_score`
`checkpoint`	Trial checkpoint to evaluate	`final`

Valid metric values: avg_score, target_avg_score, baseline_avg_score

bench_preset Example

The bench_preset is identical to a regular bench YAML. The grid engine automatically overrides the target section with the trial checkpoint:

# configs/bench/sft_judge.yml (for bench_eval)
bench:
  task: sft
  data_path: data/sft_1k_raw.jsonl
  sample:
    k: 10
    seed: 42
  models:
    target:
      device: "cuda:0"
      dtype: "bfloat16"
    # baseline:                          # Optional: baseline model comparison
    #   enabled: true
    #   model_dir: "Qwen/Qwen3.5-0.8B-Base"
    #   device: "cuda:0"
    judge:
      enabled: true
      provider: ollama
      model: "gpt-oss:20b"
      mode: pointwise

Output Structure

outputs/grid/
├── trial_0000/
│   ├── metrics.jsonl          # Per-step metrics
│   ├── resolved_config.json   # Applied configuration
│   ├── checkpoint-latest/
│   └── bench_eval/            # Bench eval results (when enabled)
│       ├── results.jsonl
│       └── summary.json
├── trial_0001/
│   └── ...
├── summary.json               # Overall result summary
└── summary.csv                # CSV version

summary.json example (with bench_eval enabled):

{
  "best_trial": {
    "number": 3,
    "value": 1.2345,
    "params": {"training.lr": 5e-05, "injection.lora_r": 16}
  },
  "best_by_bench": {
    "number": 1,
    "bench_score": 7.5,
    "value": 1.8901,
    "params": {"training.lr": 1e-04, "injection.lora_r": 32}
  },
  "all_trials": [
    {"number": 0, "value": 2.456, "bench_score": 5.2, "params": {...}, "state": "COMPLETE"},
    {"number": 1, "value": 1.890, "bench_score": 7.5, "params": {...}, "state": "COMPLETE"},
    ...
  ],
  "bench_eval": {
    "metric": "avg_score",
    "checkpoint": "final",
    "bench_preset": "configs/bench/sft_judge.yml"
  }
}

metrics.jsonl

Each trial's metrics.jsonl records metrics for every training step:

{"step": 10, "train/total_loss": 2.345, "train/main_loss": 2.301, "train/learning_rate": 1e-05}
{"step": 20, "train/total_loss": 2.201, ...}

Specifying this key in objective.metric uses it as the trial's objective function value.

Provided Examples

File	Method	Training Type	Default Space
`configs/grid/sft_random_search.yml`	random	SFT	lr, lora_r, dropout + commented-out extended space
`configs/grid/dpo_grid_search.yml`	grid	DPO	lora_r, dropout + commented-out beta, layers, keywords
`configs/grid/orpo_bayes_search.yml`	bayes	ORPO	lr, orpo_lambda, lora_r + commented-out extended space

Usage tip: Uncomment the commented-out space items in each example to expand the search range.

References

Spec rules in detail: docs/fixtures/specs/grid_search_spec.md
CLI reference: docs/cli.md
Validation rules: docs/fixtures/validation_rules.md

← Prev 11. Inference Benchmark 13. LLaMA Fine-Tuning Next →