Home > EulerForge > Tutorials > 12. Hyperparameter Search (Grid / Random / Bayes)

12. Hyperparameter Search (Grid / Random / Bayes)

Use the eulerforge grid command to systematically search hyperparameters such as learning rate, LoRA rank, target layers, and attention targets.


Preparation

Install Optuna:

pip install eulerforge[hpo]

Quick Start

# 1. Dry-run with example spec (validation only)
eulerforge grid configs/grid/sft_random_search.yml --dry-run

# 2. Run
eulerforge grid configs/grid/sft_random_search.yml

After completion, results are saved to outputs/grid/sft_random/summary.json.


YAML Spec Structure

version: 1
base_preset: "configs/presets/qwen3.5_0.8b_dense_lora_sft.yml"  # Base configuration

run:
  output_root: "outputs/grid"       # Result storage directory
  max_trials: 10                    # Maximum number of trials
  max_train_steps: 500              # Training steps per trial

  data:                             # Optional (uses base_preset data if absent)
    format: "raw"
    task: "sft"
    path: "data/sft_10k_raw.jsonl"

  objective:
    direction: "minimize"           # "minimize" | "maximize"
    metric: "train/total_loss"      # Key from metrics.jsonl
    step_agg: "last"                # "last" | "min" | "mean"

search:
  method: "random"                  # "grid" | "random" | "bayes"
  sampler:
    seed: 42
  space:
    - name: "training.lr"
      type: "float"
      low: 1e-6
      high: 3e-4
      log: true

    - name: "injection.lora_r"
      type: "int"
      low: 8
      high: 64
      step: 8

    - name: "injection.lora_dropout"
      type: "categorical"
      choices: [0.0, 0.05, 0.1]

Searchable Parameters (Space Reference)

Any setting from the base_preset can be specified using dot-path notation in space. Below is a categorized list of parameters that are most effective to search in practice.

Training Parameters (training.*)

Parameter Description Recommended Search Range Notes
training.lr Learning rate 1e-6 ~ 3e-4 (log) Most important, always search
training.weight_decay L2 regularization 0.0 ~ 0.1
training.warmup_steps Warmup step count 50 ~ 500
training.batch_size Batch size [2, 4, 8] Depends on GPU memory
training.grad_accum_steps Gradient accumulation [1, 2, 4, 8] effective batch = batch x accum
training.max_grad_norm Gradient clipping [0.5, 1.0, 2.0]

Training Type-Specific Parameters

Parameter Training Type Recommended Search Range Notes
training.orpo_lambda ORPO 0.1 ~ 2.0 SFT vs preference loss ratio
training.dpo_beta DPO [0.05, 0.1, 0.2, 0.5] Preference strength
training.ppo.clip_range PPO [0.1, 0.2, 0.3] PPO clipping ε
training.ppo.kl_coef PPO [0.05, 0.1, 0.2] KL penalty coefficient

LoRA Structure Parameters (injection.*)

Parameter Description Recommended Search Range Notes
injection.lora_r LoRA rank 8 ~ 64 (step 8) Larger = more expressive / more memory
injection.lora_alpha LoRA scaling 16 ~ 128 (step 16) Typically lora_r x 2
injection.lora_dropout LoRA dropout [0.0, 0.05, 0.1]

LoRA Application Scope (injection.*)

Parameter Description Recommended Search Range Notes
injection.start_layer Starting layer for application 0 ~ 20 (step 4) Later = task-specific
injection.num_layers Number of layers to apply [0, 4, 8, 12, 16] 0 = all
injection.target_keywords FFN LoRA targets (see table below) List value
injection.attn_lora.enabled Attention LoRA activation [true, false]
injection.attn_lora.keywords Attention LoRA targets (see table below) List value

target_keywords Combination Examples

Combination Description
[gate_proj, up_proj, down_proj] Full FFN (default)
[gate_proj, down_proj] gate+down only (excluding up)
[up_proj, down_proj] up+down only

attn_lora.keywords Combination Examples

Combination Description
[q_proj, v_proj] Q+V only (default, most common)
[q_proj, k_proj, v_proj, o_proj] Full attention (maximum expressiveness)
[q_proj] Q only (minimal configuration)

List Value Support: List parameters such as target_keywords and attn_lora.keywords can also be searched. Use categorical type and specify multiple list combinations.


Search Method Selection

Supports both continuous ranges and discrete choices. Suitable for quickly exploring the full space.

search:
  method: "random"
  space:
    - name: "training.lr"
      type: "float"
      low: 1e-6
      high: 1e-4
      log: true

Note: Only choices or categorical type can be used. Continuous ranges (low/high) will cause an error.

search:
  method: "grid"
  space:
    - name: "injection.lora_r"
      type: "categorical"
      choices: [8, 16, 32]          # ✅ Allowed for grid
    - name: "training.lr"
      type: "float"
      choices: [1e-5, 5e-5, 1e-4]  # ✅ choices format is allowed
    # - name: "training.lr"
    #   type: "float"
    #   low: 1e-6
    #   high: 1e-4                   # ❌ Not allowed for grid — error

method: "bayes" — Bayesian Optimization (TPE)

Learns from previous trial results to focus on promising areas. Efficient when the number of trials is limited.

search:
  method: "bayes"
  sampler:
    seed: 42
  space:
    - name: "training.lr"
      type: "float"
      low: 1e-6
      high: 1e-4
      log: true
    - name: "training.orpo_lambda"
      type: "float"
      low: 0.1
      high: 2.0

Parameter Types

Type Description Required Fields
float Continuous real number low + high or choices
int Integer low + high or choices
categorical Discrete selection choices

Common optional fields: - log: true — Log scale (float/int) - step: N — Step interval (int)


Practical Search Strategies

Stage 1: Core HPs First

Start by searching only 3 parameters: lr + lora_r + lora_dropout (same as the basic example).

Fix the optimal values from Stage 1 and then search layer ranges:

space:
  - name: "training.lr"
    type: "float"
    choices: [5e-5]                    # Fix optimal lr from Stage 1
  - name: "injection.start_layer"
    type: "categorical"
    choices: [0, 8, 12, 16]
  - name: "injection.num_layers"
    type: "categorical"
    choices: [0, 4, 8, 12]

Search which modules benefit most from LoRA application:

space:
  - name: "injection.target_keywords"
    type: "categorical"
    choices:
      - [gate_proj, up_proj, down_proj]   # Full
      - [gate_proj, down_proj]            # Reduced
  - name: "injection.attn_lora.keywords"
    type: "categorical"
    choices:
      - [q_proj, v_proj]                  # Q+V (default)
      - [q_proj, k_proj, v_proj, o_proj]  # Full
  - name: "injection.attn_lora.enabled"
    type: "categorical"
    choices: [true, false]                # Compare with/without attn LoRA

Tip: When the number of search dimensions is large, increase max_trials accordingly. For random/bayes, at least 5-10x the number of dimensions is recommended.


Bench Evaluation (bench_eval)

After each trial's training, you can evaluate quality using a bench judge. Loss minimization and judge scores are tracked simultaneously, and the best trial for each criterion is reported separately.

run:
  objective:
    direction: "minimize"
    metric: "train/total_loss"
    step_agg: "last"

  bench_eval:
    enabled: true
    bench_preset: "configs/bench/sft_judge.yml"  # Bench YAML path
    metric: "avg_score"                          # Judge score criterion
    checkpoint: "final"                          # Checkpoint to evaluate

How It Works

  1. After each trial's training completes, a bench evaluation is run using the bench config specified in bench_preset
  2. The trial's checkpoint (final/latest/best) is automatically set as the target model
  3. The bench judge evaluates inference results and assigns scores
  4. The summary reports the best by loss and best by bench score separately

bench_eval Settings

Field Description Default
enabled Whether to activate false
bench_preset Bench YAML path (with judge) (required)
metric Score key to extract from bench summary avg_score
checkpoint Trial checkpoint to evaluate final

Valid metric values: avg_score, target_avg_score, baseline_avg_score

bench_preset Example

The bench_preset is identical to a regular bench YAML. The grid engine automatically overrides the target section with the trial checkpoint:

# configs/bench/sft_judge.yml (for bench_eval)
bench:
  task: sft
  data_path: data/sft_1k_raw.jsonl
  sample:
    k: 10
    seed: 42
  models:
    target:
      device: "cuda:0"
      dtype: "bfloat16"
    # baseline:                          # Optional: baseline model comparison
    #   enabled: true
    #   model_dir: "Qwen/Qwen3.5-0.8B-Base"
    #   device: "cuda:0"
    judge:
      enabled: true
      provider: ollama
      model: "gpt-oss:20b"
      mode: pointwise

Output Structure

outputs/grid/
├── trial_0000/
│   ├── metrics.jsonl          # Per-step metrics
│   ├── resolved_config.json   # Applied configuration
│   ├── checkpoint-latest/
│   └── bench_eval/            # Bench eval results (when enabled)
│       ├── results.jsonl
│       └── summary.json
├── trial_0001/
│   └── ...
├── summary.json               # Overall result summary
└── summary.csv                # CSV version

summary.json example (with bench_eval enabled):

{
  "best_trial": {
    "number": 3,
    "value": 1.2345,
    "params": {"training.lr": 5e-05, "injection.lora_r": 16}
  },
  "best_by_bench": {
    "number": 1,
    "bench_score": 7.5,
    "value": 1.8901,
    "params": {"training.lr": 1e-04, "injection.lora_r": 32}
  },
  "all_trials": [
    {"number": 0, "value": 2.456, "bench_score": 5.2, "params": {...}, "state": "COMPLETE"},
    {"number": 1, "value": 1.890, "bench_score": 7.5, "params": {...}, "state": "COMPLETE"},
    ...
  ],
  "bench_eval": {
    "metric": "avg_score",
    "checkpoint": "final",
    "bench_preset": "configs/bench/sft_judge.yml"
  }
}

metrics.jsonl

Each trial's metrics.jsonl records metrics for every training step:

{"step": 10, "train/total_loss": 2.345, "train/main_loss": 2.301, "train/learning_rate": 1e-05}
{"step": 20, "train/total_loss": 2.201, ...}

Specifying this key in objective.metric uses it as the trial's objective function value.


Provided Examples

File Method Training Type Default Space
configs/grid/sft_random_search.yml random SFT lr, lora_r, dropout + commented-out extended space
configs/grid/dpo_grid_search.yml grid DPO lora_r, dropout + commented-out beta, layers, keywords
configs/grid/orpo_bayes_search.yml bayes ORPO lr, orpo_lambda, lora_r + commented-out extended space

Usage tip: Uncomment the commented-out space items in each example to expand the search range.


References