튜토리얼 6: 매니페스트 검증 — 흔한 실수와 오류 메시지

eulerweave validate 명령은 파이프라인을 실행하기 전에 매니페스트의 오류를 사전에 포착합니다. 이 튜토리얼에서는 흔한 실수와 오류 메시지, 수정 방법을 설명합니다.

사전 요구 사항: 튜토리얼 1: 빠른 시작의 매니페스트 구조에 익숙해야 합니다.

검증 순서

eulerweave validate manifest.yaml을 실행하면 다음 검사를 순서대로 수행합니다:

YAML 파싱 — 유효한 YAML인가?
version — version: 1이 존재하는가?
track — pretrain, sft, dpo 중 하나인가?
inputs — type, uri 올바른가? 추출기 등록 여부, 옵션 스키마 검증
pipeline blocks — id, type, slot, input_type, output_type 존재하는가?
type chain — 블록 간 타입 일치하는가?
slot ordering — 슬롯이 올바른 순서인가?
constraints — 블록 파라미터 유효한가?
metrics params — 메트릭 샘플링 파라미터 유효한가?
profile policy — 프로필 정책을 만족하는가?
exports — 내보내기 설정 유효한가?

오류 1: 잘못된 YAML 구문

pipeline:
  - id: norm1
    params:
      min_length: 50
     max_length: 10000  # ← 들여쓰기 불일치

Validation error: YAML parse error at line 14
  Could not parse YAML: mapping values are not allowed here
  Hint: Check indentation around line 14.

수정: 동일 레벨의 키는 같은 수의 공백으로 들여쓰기. 탭 사용 금지.

오류 2: `version` 필드 누락

track: sft          # ← version 누락
inputs: ...

Validation error: Missing required field 'version'
Fix: Add 'version: 1' to the top of your manifest.

오류 3: 잘못된 track

version: 1
track: finetuning   # ← 지원되지 않는 값

Validation error: Invalid track 'finetuning'
  Supported tracks: pretrain, sft, dpo
  Hint: Did you mean 'sft'?

오류 4: 타입 체인 불일치

pipeline:
  - id: norm1
    type: normalize_text
    output_type: TextDocument

  - id: sft1
    type: build_sft_messages
    input_type: SFTMessages      # ← TextDocument이어야 함

Compilation error: Type chain mismatch
  Block 'norm1' outputs TextDocument
  Block 'sft1' expects SFTMessages
  Fix: Change block 'sft1' input_type to 'TextDocument'

오류 5: Slot 순서 위반

pipeline:
  - id: sft1
    slot: build_task              # ← normalize보다 먼저 옴
  - id: norm1
    slot: normalize

Compilation error: Slot ordering violation
  Block 'norm1' has slot 'normalize' but appears after 'build_task'
  Required order: normalize → filter → dedup → enrich → metrics → build_task → export

오류 6: 필수 블록 필드 누락

pipeline:
  - id: norm1
    type: normalize_text
    # ← slot, input_type, output_type 누락

Validation error: Block 'norm1' is missing required fields
  Missing: slot, input_type, output_type

오류 7: 중복 블록 ID

pipeline:
  - id: step1
    type: normalize_text
    ...
  - id: step1              # ← 중복 ID
    type: build_sft_messages
    ...

Validation error: Duplicate block ID 'step1'
  Found at positions 1 and 2.
  Fix: Rename one of the blocks.

오류 8: 미등록 추출기

inputs:
  - type: xlsx               # ← 등록되지 않은 추출기
    uri: data/train.xlsx

Compilation warning: No extractor plugin registered for type 'xlsx'
  Installed extractors: txt, jsonl, csv, parquet, html, pdf
  Hint: Install a plugin or use an installed type.
  See: docs/tutorials/05_plugins_extractors.md

오류 9: PDF 추출기 옵션 오류

잘못된 strategy 값

inputs:
  - type: pdf
    uri: data/paper.pdf
    options:
      strategy: magic          # ← auto, text, ocr 중 하나여야 함

Compilation error: Input option validation failed for 'pdf'
  'magic' is not one of ['auto', 'text', 'ocr']
  Fix: Use strategy 'auto', 'text', or 'ocr'.
  See: docs/tutorials/02_pdf_to_training_data.md

허용되지 않는 옵션 키

inputs:
  - type: pdf
    uri: data/paper.pdf
    options:
      bad_key: true            # ← 허용되지 않는 키

Compilation error: Input option validation failed for 'pdf'
  Additional properties not allowed: 'bad_key'
  Allowed options: strategy, page_range

CSV 옵션 오류

inputs:
  - type: csv
    uri: data/train.csv
    options:
      text_column: 123         # ← 문자열이어야 함

Compilation error: Input option validation failed for 'csv'
  '123' is not of type 'string' for field 'text_column'

오류 10: 프로필 정책 위반

pipeline:
  - id: qna1
    type: build_langextract_qna    # ← LLM 필요
    params:
      model: "gpt-oss:20b"

profile:
  allow_external_llm: false        # ← LLM 금지와 충돌

Compilation error: Profile policy violation
  Block 'qna1' (build_langextract_qna) requires external LLM access,
  but profile.allow_external_llm is false.
  Fix: Set allow_external_llm: true, or remove the LLM-dependent block.

오류 11: 빈 파이프라인

pipeline: []

Validation error: Pipeline is empty
  The 'pipeline' field must contain at least one block.

오류 12: 메트릭 파라미터 오류

pipeline:
  - id: m1
    type: metrics_text_basic
    slot: metrics
    params:
      sample_rate: 2.0           # ← 0.0~1.0 범위여야 함

Compilation error: Invalid metrics parameter
  sample_rate must be between 0.0 and 1.0, got 2.0

오류 메시지 구조

모든 eulerweave 오류는 일관된 형식을 따릅니다:

<Category>: <what went wrong>
Fix: <one-line remediation>
See: <docs path>

카테고리	의미
`Validation error:`	YAML 파싱 또는 스키마 검증 실패
`Compilation error:`	그래프 컴파일 실패 (타입 체인, 슬롯 순서, 옵션 검증, 정책)
`Compilation warning:`	경고 (미등록 추출기 등) — 실행은 가능

검증 팁

항상 실행 전에 검증하세요. eulerweave validate는 빠릅니다.
플러그인을 확인하세요. "unknown input type" 오류 시: bash eulerweave plugins list eulerweave plugins doctor
타입 체인을 추적하세요. 블록이 많을수록 타입 체인을 그려보세요: TextDocument → TextDocument → TextDocument → SFTQnA → ExportedDataset
슬롯 순서를 기억하세요: normalize → filter → dedup → enrich → metrics → build_task → export
YAML 린터를 사용하세요: bash pip install yamllint yamllint manifest.yaml

필수 매니페스트 필드 요약

version: 1                        # 필수: 정수
track: sft                        # 필수: pretrain | sft | dpo

inputs:                           # 필수: 최소 1개
  - type: jsonl                   # 필수: 등록된 추출기 이름
    uri: data/train.jsonl         # 필수: 경로 또는 URI
    options: {}                   # 선택: 추출기별 옵션

pipeline:                         # 필수: 최소 1개 블록
  - id: norm1                     # 필수: 고유 식별자
    type: normalize_text          # 필수: 블록 타입
    slot: normalize               # 필수: 파이프라인 슬롯
    input_type: TextDocument      # 필수: 입력 타입
    output_type: TextDocument     # 필수: 출력 타입
    params: {}                    # 선택: 블록 매개변수

exports:                          # 필수: 최소 1개
  - type: jsonl
    path: out/result.jsonl

profile:                          # 선택 (기본값 사용)
  cpu_only: false
  allow_external_llm: true

다음 단계

튜토리얼 7: MDS 내보내기 — 스트리밍 형식 내보내기
튜토리얼 8: 메트릭 — 파이프라인 품질 통계
튜토리얼 5: 플러그인 개발 — 커스텀 추출기 만들기