12. 하이퍼파라미터 탐색 (Grid / Random / Bayes)

eulerforge grid 명령으로 학습률, LoRA rank, 적용 레이어, attention 대상 등 하이퍼파라미터를 체계적으로 탐색합니다.

준비

Optuna 설치:

pip install eulerforge[hpo]

빠른 시작

# 1. 예시 spec으로 dry-run (검증만)
eulerforge grid configs/grid/sft_random_search.yml --dry-run

# 2. 실제 실행
eulerforge grid configs/grid/sft_random_search.yml

완료 후 outputs/grid/sft_random/summary.json에 결과가 저장됩니다.

YAML Spec 구조

version: 1
base_preset: "configs/presets/qwen3.5_0.8b_dense_lora_sft.yml"  # 기반 설정

run:
  output_root: "outputs/grid"       # 결과 저장 디렉토리
  max_trials: 10                    # 최대 trial 횟수
  max_train_steps: 500              # 각 trial의 학습 스텝 수

  data:                             # 선택사항 (없으면 base_preset 데이터 사용)
    format: "raw"
    task: "sft"
    path: "data/sft_10k_raw.jsonl"

  objective:
    direction: "minimize"           # "minimize" | "maximize"
    metric: "train/total_loss"      # metrics.jsonl의 키
    step_agg: "last"                # "last" | "min" | "mean"

search:
  method: "random"                  # "grid" | "random" | "bayes"
  sampler:
    seed: 42
  space:
    - name: "training.lr"
      type: "float"
      low: 1e-6
      high: 3e-4
      log: true

    - name: "injection.lora_r"
      type: "int"
      low: 8
      high: 64
      step: 8

    - name: "injection.lora_dropout"
      type: "categorical"
      choices: [0.0, 0.05, 0.1]

탐색 가능한 파라미터 (Space 레퍼런스)

space에는 base_preset의 어떤 설정이든 dot-path로 지정할 수 있습니다. 아래는 실전에서 탐색 효과가 큰 파라미터를 카테고리별로 정리한 것입니다.

학습 파라미터 (training.*)

파라미터	설명	추천 탐색 범위	비고
`training.lr`	학습률	1e-6 ~ 3e-4 (log)	가장 중요, 반드시 탐색
`training.weight_decay`	L2 정규화	0.0 ~ 0.1
`training.warmup_steps`	warmup 스텝 수	50 ~ 500
`training.batch_size`	배치 크기	[2, 4, 8]	GPU 메모리에 따라
`training.grad_accum_steps`	그래디언트 누적	[1, 2, 4, 8]	effective batch = batch × accum
`training.max_grad_norm`	그래디언트 클리핑	[0.5, 1.0, 2.0]

훈련 타입별 파라미터

파라미터	훈련 타입	추천 탐색 범위	비고
`training.orpo_lambda`	ORPO	0.1 ~ 2.0	SFT vs 선호 loss 비율
`training.dpo_beta`	DPO	[0.05, 0.1, 0.2, 0.5]	선호 강도
`training.ppo.clip_range`	PPO	[0.1, 0.2, 0.3]	PPO 클리핑 ε
`training.ppo.kl_coef`	PPO	[0.05, 0.1, 0.2]	KL 페널티 계수

LoRA 구조 파라미터 (injection.*)

파라미터	설명	추천 탐색 범위	비고
`injection.lora_r`	LoRA rank	8 ~ 64 (step 8)	클수록 표현력 ↑ / 메모리 ↑
`injection.lora_alpha`	LoRA 스케일링	16 ~ 128 (step 16)	보통 lora_r × 2
`injection.lora_dropout`	LoRA dropout	[0.0, 0.05, 0.1]

LoRA 적용 범위 (injection.*)

파라미터	설명	추천 탐색 범위	비고
`injection.start_layer`	적용 시작 레이어	0 ~ 20 (step 4)	뒤쪽 = task-specific
`injection.num_layers`	적용 레이어 수	[0, 4, 8, 12, 16]	0 = 전체
`injection.target_keywords`	FFN LoRA 대상	(아래 표 참조)	리스트 값
`injection.attn_lora.enabled`	Attention LoRA 활성화	[true, false]
`injection.attn_lora.keywords`	Attention LoRA 대상	(아래 표 참조)	리스트 값

target_keywords 조합 예시

조합	설명
`[gate_proj, up_proj, down_proj]`	FFN 전체 (기본값)
`[gate_proj, down_proj]`	gate+down만 (up 제외)
`[up_proj, down_proj]`	up+down만

attn_lora.keywords 조합 예시

조합	설명
`[q_proj, v_proj]`	Q+V만 (기본값, 가장 일반적)
`[q_proj, k_proj, v_proj, o_proj]`	전체 attention (표현력 극대화)
`[q_proj]`	Q만 (최소 구성)

리스트 값 지원: target_keywords, attn_lora.keywords 등 리스트 파라미터도 탐색 가능합니다. categorical 타입으로 여러 리스트 조합을 지정하세요.

탐색 방법 선택

method: "random" — 랜덤 서치

연속 범위와 이산 선택 모두 지원. 빠르게 전체 공간을 탐색할 때 적합.

search:
  method: "random"
  space:
    - name: "training.lr"
      type: "float"
      low: 1e-6
      high: 1e-4
      log: true

method: "grid" — 그리드 서치

주의: choices 또는 categorical 타입만 사용 가능. 연속 범위(low/high)는 오류.

search:
  method: "grid"
  space:
    - name: "injection.lora_r"
      type: "categorical"
      choices: [8, 16, 32]          # ✅ grid 허용
    - name: "training.lr"
      type: "float"
      choices: [1e-5, 5e-5, 1e-4]  # ✅ choices 형식은 허용
    # - name: "training.lr"
    #   type: "float"
    #   low: 1e-6
    #   high: 1e-4                   # ❌ grid에서 불가 — 오류

method: "bayes" — 베이지안 최적화 (TPE)

이전 trial 결과를 학습하여 유망한 공간을 집중 탐색. trial 수가 적을 때 효율적.

search:
  method: "bayes"
  sampler:
    seed: 42
  space:
    - name: "training.lr"
      type: "float"
      low: 1e-6
      high: 1e-4
      log: true
    - name: "training.orpo_lambda"
      type: "float"
      low: 0.1
      high: 2.0

파라미터 타입

타입	설명	필수 필드
`float`	연속 실수	`low` + `high` 또는 `choices`
`int`	정수	`low` + `high` 또는 `choices`
`categorical`	이산 선택	`choices`

공통 선택 필드: - log: true — 로그 스케일 (float/int) - step: N — 스텝 간격 (int)

실전 탐색 전략

1단계: 핵심 HP 먼저

처음에는 lr + lora_r + lora_dropout 3개만 탐색합니다 (기본 예시와 동일).

2단계: 구조 탐색 추가

1단계 최적값을 고정한 뒤 레이어 범위를 탐색합니다:

space:
  - name: "training.lr"
    type: "float"
    choices: [5e-5]                    # 1단계에서 찾은 최적 lr 고정
  - name: "injection.start_layer"
    type: "categorical"
    choices: [0, 8, 12, 16]
  - name: "injection.num_layers"
    type: "categorical"
    choices: [0, 4, 8, 12]

3단계: 적용 대상 탐색

어떤 모듈에 LoRA를 적용하는 것이 효과적인지 탐색합니다:

space:
  - name: "injection.target_keywords"
    type: "categorical"
    choices:
      - [gate_proj, up_proj, down_proj]   # 전체
      - [gate_proj, down_proj]            # 축소
  - name: "injection.attn_lora.keywords"
    type: "categorical"
    choices:
      - [q_proj, v_proj]                  # Q+V (기본)
      - [q_proj, k_proj, v_proj, o_proj]  # 전체
  - name: "injection.attn_lora.enabled"
    type: "categorical"
    choices: [true, false]                # attn LoRA 유무 비교

팁: 탐색 차원이 많으면 max_trials을 충분히 늘리세요. random/bayes에서는 차원 수 × 5~10회 이상이 권장됩니다.

Bench 평가 (bench_eval)

각 trial 훈련 후 bench judge로 품질을 평가할 수 있습니다. Loss 최소화와 judge 점수를 동시에 추적하여, 각 기준별 최적 trial을 별도로 보고합니다.

run:
  objective:
    direction: "minimize"
    metric: "train/total_loss"
    step_agg: "last"

  bench_eval:
    enabled: true
    bench_preset: "configs/bench/sft_judge.yml"  # bench YAML 경로
    metric: "avg_score"                          # judge 점수 기준
    checkpoint: "final"                          # 평가할 체크포인트

동작 방식

각 trial의 훈련이 완료된 후, bench_preset에 지정된 bench config로 bench 평가 실행
Trial의 체크포인트(final/latest/best)가 target 모델로 자동 설정됨
Bench의 judge가 추론 결과를 평가하여 점수를 매김
summary에서 loss 기준 best와 bench 기준 best를 각각 보고

bench_eval 설정

필드	설명	기본값
`enabled`	활성화 여부	`false`
`bench_preset`	bench YAML 경로 (judge 포함)	(필수)
`metric`	bench summary에서 추출할 점수 키	`avg_score`
`checkpoint`	평가할 trial 체크포인트	`final`

metric 유효값: avg_score, target_avg_score, baseline_avg_score

bench_preset 예시

bench_preset은 기존 bench YAML과 동일합니다. target 섹션은 grid engine이 자동으로 trial 체크포인트로 오버라이드합니다:

# configs/bench/sft_judge.yml (bench_eval용)
bench:
  task: sft
  data_path: data/sft_1k_raw.jsonl
  sample:
    k: 10
    seed: 42
  models:
    target:
      device: "cuda:0"
      dtype: "bfloat16"
    # baseline:                          # 선택: baseline 모델 비교
    #   enabled: true
    #   model_dir: "Qwen/Qwen3.5-0.8B-Base"
    #   device: "cuda:0"
    judge:
      enabled: true
      provider: ollama
      model: "gpt-oss:20b"
      mode: pointwise

출력 구조

outputs/grid/
├── trial_0000/
│   ├── metrics.jsonl          # 스텝별 지표
│   ├── resolved_config.json   # 적용된 설정
│   ├── checkpoint-latest/
│   └── bench_eval/            # bench_eval 결과 (활성화 시)
│       ├── results.jsonl
│       └── summary.json
├── trial_0001/
│   └── ...
├── summary.json               # 전체 결과 요약
└── summary.csv                # CSV 버전

summary.json 예시 (bench_eval 활성화 시):

{
  "best_trial": {
    "number": 3,
    "value": 1.2345,
    "params": {"training.lr": 5e-05, "injection.lora_r": 16}
  },
  "best_by_bench": {
    "number": 1,
    "bench_score": 7.5,
    "value": 1.8901,
    "params": {"training.lr": 1e-04, "injection.lora_r": 32}
  },
  "all_trials": [
    {"number": 0, "value": 2.456, "bench_score": 5.2, "params": {...}, "state": "COMPLETE"},
    {"number": 1, "value": 1.890, "bench_score": 7.5, "params": {...}, "state": "COMPLETE"},
    ...
  ],
  "bench_eval": {
    "metric": "avg_score",
    "checkpoint": "final",
    "bench_preset": "configs/bench/sft_judge.yml"
  }
}

metrics.jsonl

각 trial의 metrics.jsonl은 매 학습 스텝의 지표를 기록합니다:

{"step": 10, "train/total_loss": 2.345, "train/main_loss": 2.301, "train/learning_rate": 1e-05}
{"step": 20, "train/total_loss": 2.201, ...}

objective.metric에 이 키를 지정하면 trial의 목적 함수 값으로 사용됩니다.

제공 예시

파일	방법	학습 타입	기본 space
`configs/grid/sft_random_search.yml`	random	SFT	lr, lora_r, dropout + 주석 처리된 확장 space
`configs/grid/dpo_grid_search.yml`	grid	DPO	lora_r, dropout + 주석 처리된 beta, 레이어, 키워드
`configs/grid/orpo_bayes_search.yml`	bayes	ORPO	lr, orpo_lambda, lora_r + 주석 처리된 확장 space

활용법: 각 예시의 주석 처리된 space 항목을 풀어서 탐색 범위를 확장하세요.

참고

Spec 규칙 상세: docs/fixtures/specs/grid_search_spec.md
CLI 레퍼런스: docs/cli.md
검증 규칙: docs/fixtures/validation_rules.md

← 이전 11. 추론 벤치마크 13. LLaMA 파인튜닝 다음 →