Pattern 04. Judge Node and Quality Loop — Iterate Until Satisfied

Learning Objectives

After completing this tutorial, you will be able to:

Declare a Judge node and correctly configure the evaluator_v1 schema
Understand the relationship between route_values and edges, and prevent JUDGE_ROUTE_COVERAGE_ERROR
Design an evaluate → revise → evaluate loop pattern and properly bound cycles
Explain the relationship between max_iterations and the UNBOUNDED_CYCLE error
Read and interpret Judge results from pattern_events.jsonl

Prerequisites

03_simple_linear.md completed (linear pattern writing experience)
my_first_pattern.yaml exists (created in tutorial 03)

ls my_first_pattern.yaml
euleragent pattern validate my_first_pattern.yaml

1. Why Do We Need a Judge Node?

In the linear pattern from 03_simple_linear.md, the write node executes only once. Even if the output quality is low, it just terminates. In practice, it should work like this:

Write draft → Quality evaluation
                │
                ├── Good enough → Complete
                │
                └── Insufficient → Revise → Re-evaluate → ...

The Judge node handles this "evaluate → branch" role. It asks the LLM for an evaluation and routes to different nodes based on the result.

2. Understanding the Judge Node

evaluator_v1 Schema

judge.schema: evaluator_v1 is a built-in evaluation schema. Using this schema requests the following structured JSON response from the Judge LLM:

{
  "score": 0.87,
  "route": "finalize",
  "reason": "Core concepts clearly explained. Code example quality excellent.",
  "suggestions": [
    "A stronger opening would be beneficial",
    "Adding a call-to-action (CTA) to the conclusion would improve completeness"
  ]
}

score: 0.0~1.0. Used for comparison with pass_threshold (runtime reference value)
route: One of route_values. Used for actual routing decisions
reason: Evaluation rationale (recorded in logs)
suggestions: Automatically passed to the next revise node

Declaring route_values

A Judge node must declare possible routing values in route_values. Every value must have a corresponding edge.

nodes:
  - id: evaluate
    kind: judge
    judge:
      schema: evaluator_v1
      route_values: [finalize, revise]    # Edges required for both

edges:
  - from: evaluate
    to: finalize
    when: "judge.route == finalize"       # Covers finalize

  - from: evaluate
    to: revise
    when: "judge.route == revise"         # Covers revise

when Condition Syntax

The when DSL used for Judge routing:

# Route value comparison
when: "judge.route == finalize"
when: "judge.route == revise"

# Score threshold (optional)
when: "judge.score >= 0.85"
when: "judge.score < 0.7"

3. Pattern Design

We add a Judge loop to the blog writing pattern created earlier.

[research] → [draft] → [evaluate] → finalize
                            │
                            └── judge.route == revise
                                    │
                                    ▼
                                 [revise] ──────────────┐
                                                        │
                        ◄─────────────────────────────-┘

Complete flow:

┌─────────────────────────────────────────────────────────────────┐
│ blog_with_judge.pattern Flow Diagram                             │
├─────────────────────────────────────────────────────────────────┤
│                                                                 │
│  [research]                                                     │
│     │ Topic investigation (llm/execute, exclude: web.search)    │
│     │ when: true                                                │
│     ▼                                                           │
│  [draft]                                                        │
│     │ Write draft (llm/execute)                                 │
│     │ when: true                                                │
│     ▼                                                           │
│  [evaluate] ──────── when: judge.route == finalize ─────────────┐
│     │ Quality evaluation (judge/evaluator_v1)                   │
│     │ when: judge.route == revise                               │
│     ▼                                                           │
│  [revise]                                                       │
│     │ Improve draft (llm/execute)                               │
│     │ when: true                                                │
│     └──────────────────────────► [evaluate] (loop max 3 times)  │
│                                                                 │
│  ┌──────────────────────────────────────────────────────────┐   │
│  │ [FINALIZE]  Save blog_post.md                            │◄──┘
│  └──────────────────────────────────────────────────────────┘   │
│                                                                 │
└─────────────────────────────────────────────────────────────────┘

4. Writing the YAML

Create the blog_with_judge.yaml file.

id: blog.quality_loop
version: 1
category: writing
description: "Blog writing pattern with quality loop using Judge node"

defaults:
  # max_iterations is required since there is a cycle!
  # The evaluate → revise → evaluate loop repeats up to 3 times
  max_iterations: 3

  # Maximum number of tool calls across the entire execution
  max_total_tool_calls: 15

  # Judge prefers finalize routing when score is at or above this threshold
  # (Runtime reference value - influences Judge LLM's decision)
  pass_threshold: 0.85

nodes:
  # ── Node 1: research ──
  - id: research
    kind: llm
    runner:
      mode: execute
      exclude_tools: [web.search, web.fetch]
    prompt:
      system_append: |
        You are a technical researcher. Organize key points about the topic.
        Output: Research notes in markdown structure (500-800 words)
    artifacts:
      primary: research_notes.md

  # ── Node 2: draft ──
  - id: draft
    kind: llm
    runner:
      mode: execute
      exclude_tools: [web.search, web.fetch, shell.exec]
    prompt:
      system_append: |
        You are a technical blog writer.
        Write a complete blog post draft based on the research notes.

        Requirements:
        - Length: 800-1200 words
        - Structure: Introduction → Body (3 sections) → Conclusion
        - Audience: Experienced developers
        - Include code examples
        - Markdown format
    artifacts:
      primary: blog_post.md

  # ── Node 3: evaluate (Judge) ──
  - id: evaluate
    kind: judge             # Declared as judge type
    judge:
      schema: evaluator_v1  # Built-in evaluation schema
      # List of possible routing values.
      # All must be covered by edges below!
      route_values: [finalize, revise]
    prompt:
      system_append: |
        You are a technical blog editor-in-chief. Evaluate the blog post using the following criteria.

        Evaluation Criteria:
        - Technical accuracy (30%): Is the information accurate and up-to-date?
        - Structure and readability (25%): Are the logical flow and section divisions clear?
        - Code quality (25%): Are code examples executable and clear?
        - Reader value (20%): Can readers learn something new?

        Choose 'finalize' if score >= 0.85, otherwise choose 'revise'.
        Write suggestions that are specific and actionable.

  # ── Node 4: revise ──
  - id: revise
    kind: llm
    runner:
      mode: execute
      exclude_tools: [web.search, web.fetch, shell.exec]
    prompt:
      system_append: |
        You are a technical blog writer.
        Improve the blog post by incorporating the editor's feedback.

        Important: Incorporate all of the editor's suggestions,
        but maintain the overall structure and technical content of the post.
        Rewrite the entire improved post.
    artifacts:
      primary: blog_post.md  # Same filename as draft — overwrites

edges:
  # Linear flow
  - from: research
    to: draft
    when: "true"

  - from: draft
    to: evaluate
    when: "true"

  # Judge routing — must cover all values in route_values
  - from: evaluate
    to: finalize
    when: "judge.route == finalize"   # finalize route

  - from: evaluate
    to: revise
    when: "judge.route == revise"     # revise route

  # Re-evaluate after revision (loop)
  - from: revise
    to: evaluate
    when: "true"

finalize:
  artifact: blog_post.md

5. Validation

euleragent pattern validate blog_with_judge.yaml

Expected output:

Validating pattern: blog_with_judge.yaml

  Stage 1 (Schema)      PASS
  Stage 2 (Structural)  PASS
  Stage 3 (IR Analysis) PASS

  Cycle detected: evaluate → revise → evaluate
  Bounded by: max_iterations = 3  ✓

Validation complete: 0 errors, 0 warnings

6. Compilation

euleragent pattern compile blog_with_judge.yaml

Check the cycle information in the compilation output:

{
  "id": "blog.quality_loop",
  "entry_node": "research",
  "cycles": [
    {
      "path": ["evaluate", "revise", "evaluate"],
      "length": 2,
      "bounded_by": "max_iterations",
      "max_iterations": 3
    }
  ],
  "nodes": {
    "evaluate": {
      "kind": "judge",
      "judge": {
        "schema": "evaluator_v1",
        "route_values": ["finalize", "revise"],
        "route_coverage": {
          "finalize": { "covered": true, "edge": "evaluate→finalize" },
          "revise": { "covered": true, "edge": "evaluate→revise" }
        }
      }
    }
  }
}

Verify from route_coverage that all route_values are covered.

7. Execution and Checking Judge Results

Installation and Execution

cp blog_with_judge.yaml .euleragent/patterns/
euleragent pattern run blog.quality_loop my-agent \
  --task "Understanding Docker Container Networking — Comparing bridge, host, and overlay modes" \
  --project default

Expected output (Judge decides to revise):

[run:g7b3e1d4] Starting pattern: blog.quality_loop

  ✓ research     Completed (11s)
  ✓ draft        Completed (16s) — 1,023 words
  ✓ evaluate     Completed (8s)
                 score: 0.71 → route: revise
                 reason: "Insufficient code examples and shallow overlay network explanation"
                 suggestions:
                   - "Add docker network create command examples"
                   - "Add real-world overlay network use cases (Docker Swarm)"
  ✓ revise       Completed (19s) — 1,187 words (improved)
  ✓ evaluate     Completed (7s)
                 score: 0.89 → route: finalize
                 reason: "Thorough code examples, clear structure, high reader value"
  ✓ finalize     Completed

Run g7b3e1d4 completed. (2 evaluate iterations)
Artifact: .euleragent/runs/g7b3e1d4/artifacts/blog_post.md

Checking Judge Results in the Event Stream

cat .euleragent/runs/g7b3e1d4/pattern_events.jsonl | grep '"node":"evaluate"'

Output:

{"ts":"2026-02-23T14:32:18Z","event":"node.complete","node":"evaluate","kind":"judge","result":{"score":0.71,"route":"revise","reason":"Insufficient code examples and shallow overlay network explanation","suggestions":["Add docker network create command examples","Add real-world overlay network use cases (Docker Swarm)"]},"iteration":1}
{"ts":"2026-02-23T14:32:52Z","event":"node.complete","node":"evaluate","kind":"judge","result":{"score":0.89,"route":"finalize","reason":"Thorough code examples, clear structure, high reader value","suggestions":[]},"iteration":2}

8. Demonstrating the Error When max_iterations Is Removed

Let us intentionally trigger the error. Remove or comment out defaults.max_iterations in blog_with_judge.yaml.

# defaults:
#   max_iterations: 3   ← Remove this line

Run validation:

euleragent pattern validate blog_with_judge.yaml

Expected output:

Validating pattern: blog_with_judge.yaml

  Stage 1 (Schema)      PASS
  Stage 2 (Structural)  PASS
  Stage 3 (IR Analysis) FAIL

  ERROR [UNBOUNDED_CYCLE]
    Cycle detected: evaluate → revise → evaluate
    This cycle has no bound. Set defaults.max_iterations to limit iterations.
    Hint: Add to defaults section:
      max_iterations: 3

Validation complete: 1 error, 0 warnings

Restoring max_iterations will resolve the error.

9. The Role of pass_threshold

defaults.pass_threshold: 0.85 is a runtime hint passed to the Judge LLM. It is automatically injected into the Judge's system_append.

Evaluation guidelines the Judge LLM receives:
  pass_threshold: 0.85
  → "Choose 'finalize' if score >= 0.85, otherwise choose 'revise'"

pass_threshold does not force the Judge's routing. The Judge LLM makes the final decision. This value serves to communicate the expected quality level to the Judge.

10. Key Concept Explanations

Difference Between Cycles and Linear Flows

Linear pattern (no cycles):

[A] → [B] → [C] → finalize

max_iterations not required. Each node executes exactly once.

Quality loop pattern (with cycles):

[A] → [B] → [C] → finalize
              ↑      |
              └─[D]←─┘ (when C selects revise)

max_iterations required. The C-D-C loop can repeat infinitely.

Can We Use an llm Node Instead of a Judge for Evaluation?

Technically, you can branch from an llm node using when: "true" conditions. However, the judge node has the following advantages:

Structured responses (score, route, suggestions) are guaranteed via the evaluator_v1 schema
Compile-time coverage verification is possible through route_values declaration
suggestions are automatically passed to the next revise node
Evaluation results are recorded in a structured format in the event stream

How max_iterations Works

max_iterations: 3 limits the number of loops within a cycle. If the Judge still selects revise after 3 iterations, the runtime forcibly routes to finalize.

Iteration 1: evaluate(score:0.71, revise) → revise
Iteration 2: evaluate(score:0.79, revise) → revise
Iteration 3: evaluate(score:0.83, revise) → ⚠️ max_iterations reached → forced finalize

At this point, a warning is recorded in the event stream.

11. Practice Exercise: More Granular Routing

The current pattern has only two paths: finalize or revise. Extend it to apply different improvement intensities based on the score.

Exercise: Score-Based 3-Level Routing

# Modify the evaluate node
judge:
  schema: evaluator_v1
  route_values: [finalize, light_edit, major_rewrite]

# Modify system_append
system_append: |
  Evaluation criteria:
  - score >= 0.85: 'finalize'
  - score 0.65-0.84: 'light_edit' (minor corrections)
  - score < 0.65: 'major_rewrite' (complete rewrite)

# Add new nodes
- id: light_edit
  kind: llm
  runner:
    mode: execute
  prompt:
    system_append: |
      Incorporate only the top 2 suggestions from the editor
      and improve the blog post with minimal modifications.

- id: major_rewrite
  kind: llm
  runner:
    mode: execute
  prompt:
    system_append: |
      Completely rewrite the blog post.
      Re-read research_notes.md and write with an entirely new approach.

# Add new edges
- from: evaluate
  to: finalize
  when: "judge.route == finalize"

- from: evaluate
  to: light_edit
  when: "judge.route == light_edit"

- from: evaluate
  to: major_rewrite
  when: "judge.route == major_rewrite"

- from: light_edit
  to: evaluate
  when: "true"

- from: major_rewrite
  to: evaluate
  when: "true"

euleragent pattern validate blog_three_routes.yaml

Verify that JUDGE_ROUTE_COVERAGE_ERROR does not occur.

Next Steps

You now understand Judge loops. Next, learn how to safely integrate real web search into patterns.

Next tutorial: 05_web_research.md — Use web search under HITL approval with force_tool: web.search
Human review: 06_human_gate.md — Create a gate where humans evaluate directly instead of a Judge
3-way routing: 07_multi_route.md — Explore the above practice exercise in greater depth