09. 3-Tier LLM Strategy: Cut Costs by 90% with OpenAI x Ollama x Gemini

Learning Objectives

After completing this tutorial, you will be able to:

Configure OpenAI, Ollama, and Gemini -- three external/local LLM profiles -- in workspace.yaml
Generate high-quality plans with --llm-plan (OpenAI) and final deliverables with --llm-final (Gemini)
Delegate intermediate execution (bulk calls) to local Ollama to reduce costs by over 90%
Understand the is_external flag and the HITL approval workflow
Experience the fallback-never-abort policy (external LLM not approved leads to local fallback; run continues)
Track per-profile call history and costs in the audit log

Prerequisites

Workspace initialization completed (euleragent init)
Understanding of Plan/Execute mode from 02_plan_execute_mode.md
Understanding of the HITL approval workflow from 03_hitl_approval.md
Agent created:

euleragent new market-analyst --template personal-assistant

Ollama running (ollama serve)
OpenAI API key -- you can still practice fallback behavior without one
Gemini API key -- you can still practice fallback behavior without one

Step 1: Why Do You Need a 3-Tier LLM Strategy?

Cost Structure of Agent Tasks

Suppose an agent is writing an "AI market analysis report." How many LLM calls occur across the entire process?

┌─────────────────────────────────────────────────────┐
│          에이전트 실행 흐름과 LLM 호출 횟수           │
├───────────────────┬──────────────┬──────────────────┤
│  계획 수립 (Plan) │ 중간 실행    │ 최종 다듬기       │
│  1회 호출         │ 5~10회 호출  │ 1~2회 호출       │
│  ■                │ ■■■■■■■■■■  │ ■■               │
│  "전체 구조 설계" │ "도구호출,   │ "최종 보고서      │
│                   │  초안 작성,  │  정리"            │
│                   │  수정 반복"  │                   │
└───────────────────┴──────────────┴──────────────────┘
                      ↑ 이 부분이 가장 많다!

Phase	LLM Calls	Role	Required Quality
Planning	1 call	Overall structure design, step decomposition	High -- good plans lead to good results
Intermediate Execution	5-10 calls	Tool calls, draft generation, iterative revisions	Medium -- sufficient as long as the structure is followed
Final Deliverable	1-2 calls	Final report formatting, polishing	High -- determines the final quality

Key insight: Intermediate execution accounts for 70-80% of all calls, but there is no need to use expensive external LLMs for this part. Local Ollama is sufficient.

Cost Comparison of the 3-Tier Strategy

Strategy	Planning (1 call)	Intermediate (8 calls)	Final (2 calls)	Total Cost
All external LLM	$0.03	$0.24	$0.06	$0.33
All local Ollama	$0	$0	$0	$0 (lower quality)
3-Tier strategy	$0.03 (OpenAI)	$0 (Ollama)	$0.02 (Gemini)	$0.05

Conclusion: Simply switching the intermediate execution -- which accounts for the bulk of the cost -- to local yields over 85% cost savings. The quality of planning and final deliverables remains intact.

euleragent's Scope Model

euleragent routes LLM profiles using two scopes:

┌──────────────────────────────────────────┐
│             스코프 라우팅                  │
├──────────────────┬───────────────────────┤
│   plan 스코프    │    final 스코프        │
│  --llm-plan      │   --llm-final         │
│  mode=plan 실행  │   mode=execute 실행   │
│                  │                       │
│  "무엇을 할지    │   "실제로 수행하고     │
│   설계한다"      │    결과물을 만든다"    │
└──────────────────┴───────────────────────┘
         │                     │
    지정 안 하면          지정 안 하면
    default_llm_profile   default_llm_profile
    (로컬 Ollama)         (로컬 Ollama)

The 3-Tier strategy is implemented as 3-phase execution on top of this model:

Phase 1: plan mode  + --llm-plan openai_main    → OpenAI (고품질 계획)
Phase 2: execute mode (프로필 미지정)            → Ollama (저비용 대량 실행)
Phase 3: execute mode + --llm-final gemini_flash → Gemini (고품질 최종본)

Step 2: Configure 3 Profiles in workspace.yaml

Profile Configuration

# .euleragent/config/workspace.yaml

llm_profiles:
  # 1. 로컬 Ollama — 기본 프로필 (무료, 승인 불필요)
  ollama_local:
    provider: ollama
    base_url: http://localhost:11434
    model: qwen3:32b
    timeout_seconds: 120
    is_external: false

  # 2. OpenAI GPT-4o — 계획 수립 전용 (유료, 승인 필요)
  openai_planner:
    provider: openai
    base_url: https://api.openai.com/v1
    api_key: ${OPENAI_API_KEY}
    model: gpt-4o
    is_external: true

  # 3. Gemini Flash — 최종 결과물 전용 (유료, 승인 필요)
  gemini_final:
    provider: openai                  # OpenAI-compatible SDK 사용
    base_url: https://generativelanguage.googleapis.com/v1beta/openai/
    api_key: ${GEMINI_API_KEY}
    model: gemini-2.0-flash
    is_external: true

# 기본값은 항상 로컬 Ollama
default_llm_profile: ollama_local

Configuration Field Descriptions

Field	Description
(dict key)	Unique profile name -- referenced in CLI as `--llm-plan openai_planner`
`provider`	`ollama` or `openai` (supports all OpenAI-compatible endpoints via the OpenAI SDK)
`base_url`	LLM API endpoint (Gemini, Perplexity, etc. differ only in `base_url`)
`api_key`	`${environment_variable}` interpolation recommended -- never enter directly in workspace.yaml
`model`	Model name for the provider
`is_external`	`true`: external service, HITL approval required / `false`: local, no approval needed

What provider: openai means: It does not mean "only OpenAI works." It means using the OpenAI Python SDK to call OpenAI-compatible APIs. Simply change the base_url to connect to Gemini, Perplexity, Azure OpenAI, vLLM, and more.

Environment Variable Setup

# OpenAI API 키
export OPENAI_API_KEY="sk-proj-YOUR_KEY_HERE"

# Gemini API 키
export GEMINI_API_KEY="AIza-YOUR_KEY_HERE"

Security: Do not enter API keys directly in workspace.yaml. The ${OPENAI_API_KEY} syntax reads values from environment variables at runtime.

Step 3: External Profile Approval -- Preparation

Profiles with is_external: true always require HITL approval. First, run without approval to observe the fallback behavior, then proceed with the approval process.

3.1 -- Running Without Approval (Verify Fallback)

euleragent run market-analyst \
  --task "AI 시장 분석 보고서 작성" \
  --mode plan \
  --llm-plan openai_planner

[run] Starting plan mode for market-analyst
[run] LLM plan scope: profile 'openai_planner' (is_external=true)
[run] ⚠ External profile 'openai_planner' requires approval. Falling back to local default.
[approval] Created: kind=llm_profile_enable, tool=llm.external_call, profile=openai_planner
[llm] Generating plan with default provider (ollama_local)...
[run] Plan generated: plan.md (523 tokens)
[run] Run finalized: a1b2c3d4

Key behavior: 1. openai_planner has is_external: true -- approval required 2. No approval -- falls back to local Ollama (run continues!) 3. A kind: llm_profile_enable approval record is automatically created

Fallback-never-abort principle: euleragent will never abort a run because an external LLM was not approved. It falls back to the local default profile to generate results and waits for approval.

3.2 -- Check Pending Approvals

euleragent approve show

Pending Approvals:
  ID        KIND                  TOOL                PROFILE            RISK    STATUS
  ap_001    llm_profile_enable    llm.external_call   openai_planner     high    pending

3.3 -- View Approval Record Details

euleragent approve show ap_001

{
  "id": "ap_001",
  "kind": "llm_profile_enable",
  "tool_name": "llm.external_call",
  "tool_params": {
    "scope": "plan",
    "profile": "openai_planner",
    "provider": "openai",
    "model": "gpt-4o"
  },
  "risk_level": "high",
  "side_effects": ["External LLM call"],
  "status": "pending"
}

Why this approval is needed: - Data leakage: The task description and agent context are transmitted to an external service (api.openai.com) - Cost incurred: External API calls are billed - Audit tracking: Recording what data went external is mandatory

3.4 -- Batch-Approve Both External Profiles

Run the Gemini profile once as well to create an approval record:

euleragent run market-analyst \
  --task "테스트" \
  --mode execute \
  --llm-final gemini_final

[run] ⚠ External profile 'gemini_final' requires approval. Falling back to local default.
[approval] Created: kind=llm_profile_enable, tool=llm.external_call, profile=gemini_final

Now process approvals for both external profiles at once:

euleragent approve accept-all --tool llm.external_call --actor "user:you"

Batch accepted: 2 approval(s)
  ap_001: llm_profile_enable (openai_planner)
  ap_002: llm_profile_enable (gemini_final)

accept-all --tool: Batch-accepts only approvals for the llm.external_call tool. Approvals for other tools (file.write, web.search, etc.) are not affected.

Step 4: Phase 1 -- High-Quality Planning with OpenAI

Now that approvals are complete, let's execute the 3-Tier strategy.

Execution

euleragent run market-analyst \
  --task "2026년 AI 에이전트 시장 분석 보고서를 작성하라. 시장 규모, 주요 플레이어, 기술 트렌드, 리스크 요인을 포함하고, 최종적으로 투자 시사점을 도출하라." \
  --mode plan \
  --llm-plan openai_planner

Expected Output

[run] Starting plan mode for market-analyst
[run] LLM plan scope: profile 'openai_planner' (is_external=true)
[run] ✓ Using approved profile 'openai_planner' for plan scope
[llm] Generating plan with openai_planner (gpt-4o)...
[run] Plan generated: 1 file(s)
  └─ plan.md (1847 tokens)
[run] Run finalized: p1_abc123

──────────────────────────────────────
Phase 1 완료
  Profile:  openai_planner (gpt-4o)
  LLM 호출: 1회
  비용:     ~$0.03 (입력 1.2K + 출력 1.8K tokens)
──────────────────────────────────────

Verify the Generated Plan

cat .euleragent/runs/p1_abc123/artifacts/plan.md

# AI 에이전트 시장 분석 보고서 — 실행 계획

## 1. 시장 규모 조사
- 글로벌 AI 에이전트 시장 규모 (2024-2030 전망)
- 세그먼트별 분석 (엔터프라이즈, 소비자, 개발자 도구)

## 2. 주요 플레이어 분석
- 빅테크 (OpenAI, Google, Anthropic, Microsoft)
- 스타트업 (Cognition, Adept, Imbue, ...)
- 오픈소스 진영 (LangChain, CrewAI, AutoGen)

## 3. 기술 트렌드
- 에이전트 프레임워크 진화 (ReAct → Plan-and-Execute → Multi-Agent)
- 도구 사용(Tool Use) 표준화
- 장기 메모리와 상태 관리

## 4. 리스크 요인
- 규제 불확실성 (EU AI Act, 미국 행정명령)
- 환각(Hallucination) 문제와 신뢰성
- 비용 구조와 수익 모델

## 5. 투자 시사점
- 유망 투자 영역
- 리스크 대비 전략

Point: The plan created by GPT-4o is well-structured with specific sub-items. This plan becomes the "navigation" for the subsequent Ollama execution.

Step 5: Phase 2 -- Bulk Execution with Ollama (Cost $0)

With the plan established, the actual writing is handled by local Ollama. When no profile flag is specified, default_llm_profile: ollama_local is used.

Execution

euleragent run market-analyst \
  --task "위 계획에 따라 AI 에이전트 시장 분석 보고서 초안을 작성하라. 각 섹션을 3문단 이상으로 상세히 서술하라." \
  --mode execute \
  --max-loops 5

Expected Output

[run] Starting execute mode for market-analyst
[run] No scoped profile → using default provider (ollama_local)
[llm] Loop 1: Generating initial draft with ollama_local (qwen3:32b)...
[tool] file.write → draft_report.md (waiting approval)
[approval] Created: tool_call, file.write
...
[llm] Loop 2: Revising section 1 with ollama_local...
[llm] Loop 3: Adding section 2 data with ollama_local...
[llm] Loop 4: Completing sections 3-4 with ollama_local...
[llm] Loop 5: Finalizing section 5 with ollama_local...
[run] Artifacts: draft_report.md (4200 tokens)
[run] Run finalized: p2_def456

──────────────────────────────────────
Phase 2 완료
  Profile:  ollama_local (qwen3:32b)
  LLM 호출: 5회 (도구 호출 포함 총 8회 상호작용)
  비용:     $0.00 (로컬 LLM — 완전 무료)
──────────────────────────────────────

Key Observations

Phase 2에서 발생한 LLM 호출:

  Loop 1: 초안 구조 생성         ← Ollama ($0)
  Loop 2: 섹션 1 수정/보강       ← Ollama ($0)
  Loop 3: 섹션 2 데이터 추가     ← Ollama ($0)
  Loop 4: 섹션 3-4 완성          ← Ollama ($0)
  Loop 5: 섹션 5 마무리          ← Ollama ($0)
  ─────────────────────────────
  합계: 5회 호출 × $0 = $0.00

  만약 전부 GPT-4o였다면:
  합계: 5회 × ~$0.03 = $0.15

This phase accounts for 70-80% of the total cost. Using local Ollama makes this cost zero.

Advantages of local Ollama: - $0 cost -- free no matter how many calls - Fast iteration -- no network latency (3-8 seconds/response locally) - Data security -- sensitive content never leaves your machine - Offline capable -- runs without network access

Step 6: Phase 3 -- Final Polish with Gemini

With the draft complete, the final polishing is handled by Gemini Flash.

Execution

euleragent run market-analyst \
  --task "기존 초안(draft_report.md)을 전문 투자 리서치 보고서 수준으로 다듬어라. 문장을 간결하게 하고, 데이터 기반 근거를 보강하며, 투자 시사점 섹션을 강화하라." \
  --mode execute \
  --llm-final gemini_final

Expected Output

[run] Starting execute mode for market-analyst
[run] LLM final scope: profile 'gemini_final' (is_external=true)
[run] ✓ Using approved profile 'gemini_final' for final scope
[llm] Generating final deliverable with gemini_final (gemini-2.0-flash)...
[tool] file.write → final_report.md (waiting approval)
...
[run] Artifacts: final_report.md (5100 tokens)
[run] Run finalized: p3_ghi789

──────────────────────────────────────
Phase 3 완료
  Profile:  gemini_final (gemini-2.0-flash)
  LLM 호출: 2회
  비용:     ~$0.02 (입력 4.2K + 출력 5.1K tokens)
──────────────────────────────────────

Verify the Final Result

# Phase 2 초안 vs Phase 3 최종본 비교
wc -w .euleragent/runs/p2_def456/artifacts/draft_report.md
# → 약 2500단어

wc -w .euleragent/runs/p3_ghi789/artifacts/final_report.md
# → 약 3200단어 (보강된 내용 포함)

Why Gemini Flash was chosen: - 50-70% cheaper than GPT-4o while maintaining comparable quality - Strong at processing long documents (1 million token context) - Simple to configure with OpenAI-compatible API (just change the base_url)

Step 7: Cost Analysis -- Verifying Actual Cost Savings

Tracking Costs via Audit Logs

Check the profile information in each Phase's input.json:

# Phase 1 — OpenAI 사용 확인
python -m json.tool .euleragent/runs/p1_abc123/input.json

{
  "agent": "market-analyst",
  "task": "2026년 AI 에이전트 시장 분석 보고서...",
  "mode": "plan",
  "llm_plan_profile": "openai_planner"
}

# Phase 2 — Ollama 사용 확인 (프로필 미지정 = 기본값)
python -m json.tool .euleragent/runs/p2_def456/input.json

{
  "agent": "market-analyst",
  "task": "위 계획에 따라...",
  "mode": "execute"
}

If the llm_plan_profile/llm_final_profile field is absent, the default profile (Ollama) was used.

# Phase 3 — Gemini 사용 확인
python -m json.tool .euleragent/runs/p3_ghi789/input.json

{
  "agent": "market-analyst",
  "task": "기존 초안을 다듬어라...",
  "mode": "execute",
  "llm_final_profile": "gemini_final"
}

Overall Cost Summary

┌──────────────────────────────────────────────────────────────┐
│              3-Tier 전략 비용 분석                             │
├─────────┬───────────────┬──────────┬──────────┬──────────────┤
│  Phase  │ 프로필        │ LLM 호출 │ 비용     │ 비고          │
├─────────┼───────────────┼──────────┼──────────┼──────────────┤
│  1.계획 │ openai_planner│   1회    │  $0.03   │ 고품질 계획   │
│  2.실행 │ ollama_local  │   5회    │  $0.00   │ 저비용 핵심!  │
│  3.마무리│ gemini_final  │   2회    │  $0.02   │ 고품질 최종본 │
├─────────┼───────────────┼──────────┼──────────┼──────────────┤
│  합계   │               │   8회    │  $0.05   │              │
└─────────┴───────────────┴──────────┴──────────┴──────────────┘

비교: 전부 GPT-4o였다면  → 8회 × $0.03 = $0.24
비교: 전부 Ollama였다면  → 8회 × $0.00 = $0.00 (품질↓)

3-Tier 절감율: ($0.24 - $0.05) / $0.24 = 79% 절감

The difference amplifies at scale. Running 100 tasks: - All external LLM: 800 calls x $0.03 = $24.00 - 3-Tier strategy: Planning 100 calls ($3) + Execution 500 calls ($0) + Final 200 calls ($4) = $7.00 (71% savings) - For complex tasks with a higher execution ratio, savings can reach over 90%.

Step 8: Approval Workflow Details

8.1 -- Changing the Profile During Approval with `--edit-params`

An administrator can change the requested profile to a cheaper one when approving, considering costs.

# 팀원이 GPT-4o를 요청했지만...
euleragent run market-analyst \
  --task "경쟁사 분석" --mode plan --llm-plan openai_planner
# → ap_010 승인 대기 (openai_planner 요청)

# 관리자가 Gemini Flash로 변경하여 승인 (더 저렴)
euleragent approve accept ap_010 \
  --actor "user:you" \
  --edit-params '{"profile": "gemini_final"}'

Accepted (with edits): ap_010
  Original Profile: openai_planner (gpt-4o)
  Applied Profile:  gemini_final (gemini-2.0-flash)

The changed profile is applied on re-run:

euleragent run market-analyst \
  --task "경쟁사 분석" --mode plan --llm-plan openai_planner

[run] ✓ Using approved profile 'gemini_final' for plan scope (edited from 'openai_planner')

Audit tracking: Both the original request (tool_params.profile) and the modified value (final_params.profile) are recorded in the approval record.

8.2 -- Behavior When Approval is Denied

euleragent approve deny ap_011

A denied profile is not used, and a new approval record is created on the next run. The run is never aborted -- it always falls back to local.

Step 9: Automation with Dynamic Mode (2-Tier Variant)

Instead of three separate runs, Dynamic mode allows OpenAI to design the plan and Ollama to execute automatically.

Execution Flow of Dynamic Mode

--dynamic --llm-plan openai_planner
        │
        ▼
1. system.plan_workflow 호출 ← openai_planner (GPT-4o)
   "워크플로우를 3단계로 설계"
        │
        ▼
2. Phase 0: 시장 조사     ← ollama_local (기본 프로필)
        │
        ▼
3. Phase 1: 초안 작성     ← ollama_local (기본 프로필)
        │
        ▼
4. Phase 2: 리뷰/수정     ← ollama_local (기본 프로필)

Execution

euleragent run market-analyst \
  --task "2026년 AI 에이전트 시장 분석 보고서 작성" \
  --mode plan \
  --dynamic \
  --llm-plan openai_planner \
  --max-loops 10

Expected Output

[run] Starting dynamic workflow for market-analyst
[run] LLM plan scope: openai_planner → system.plan_workflow에만 적용

# 워크플로우 설계 — OpenAI
[llm:plan] Generating workflow with openai_planner (gpt-4o)...
[workflow] 3 phases planned:
  Phase 0: Market Research  (plan mode)
  Phase 1: Draft Report     (execute mode)
  Phase 2: Quality Review   (plan mode)

# 각 phase — 기본 Ollama
[phase:0] Running 'Market Research' with default provider (ollama_local)
[phase:1] Running 'Draft Report' with default provider (ollama_local)
[phase:2] Running 'Quality Review' with default provider (ollama_local)

[run] Dynamic workflow completed: dyn_jkl012

Dynamic Mode Cost

Phase	Profile	Calls	Cost
system.plan_workflow	openai_planner (GPT-4o)	1 call	$0.03
Phase 0-2	ollama_local	6-8 calls	$0.00
Total		7-9 calls	$0.03

2-Tier vs 3-Tier: Dynamic mode lacks the Gemini final polishing step, so costs are lower, but the final quality may be lower than the 3-Tier approach. Choose based on your use case.

Step 10: agent.yaml Profile Override and Priority

You can set a different default LLM profile per agent.

agent.yaml Configuration

# .euleragent/agents/market-analyst/agent.yaml
name: market-analyst
template: personal-assistant
llm:
  profile: ollama_local       # 이 에이전트의 기본 프로필

# .euleragent/agents/premium-writer/agent.yaml
name: premium-writer
template: marketing-expert
llm:
  profile: openai_planner     # 이 에이전트는 항상 OpenAI 사용

Priority Order

1. CLI 옵션: --llm-plan / --llm-final     (최우선)
       ▼
2. agent.yaml: llm.profile                (에이전트별 기본값)
       ▼
3. workspace.yaml: default_llm_profile    (워크스페이스 기본값)

# agent.yaml에 openai_planner가 설정되어 있지만, CLI에서 ollama_local을 지정
euleragent run premium-writer --mode plan --llm-plan ollama_local --task "..."
# → ollama_local 사용 (CLI 우선)

# CLI 옵션 없이 실행
euleragent run premium-writer --mode plan --task "..."
# → openai_planner 사용 (agent.yaml 설정)

# agent.yaml에 llm.profile이 없는 에이전트
euleragent run market-analyst --mode plan --task "..."
# → ollama_local 사용 (workspace default)

Step 11: Tracking LLM Routing in Audit Logs

Checking with euleragent logs

euleragent logs p1_abc123

Run: p1_abc123
Agent: market-analyst
Mode: plan
Task: 2026년 AI 에이전트 시장 분석 보고서...
LLM Plan: openai_planner (gpt-4o)
LLM Final: (default)

--- Approvals ---
ap_001: llm_profile_enable (llm.external_call) → accepted

--- Artifacts ---
plan.md (1847 tokens)

external_transmission.jsonl Audit Log

External LLM calls are recorded in external_transmission.jsonl:

cat .euleragent/runs/p1_abc123/artifacts/external_transmission.jsonl

{
  "type": "llm_call",
  "provider": "openai",
  "model": "gpt-4o",
  "profile": "openai_planner",
  "scope": "plan",
  "endpoint": "https://api.openai.com/v1/chat/completions",
  "tokens_sent": 1200,
  "tokens_received": 1847,
  "timestamp": "2026-02-24T15:32:05Z"
}

cat .euleragent/runs/p3_ghi789/artifacts/external_transmission.jsonl

{
  "type": "llm_call",
  "provider": "openai",
  "model": "gemini-2.0-flash",
  "profile": "gemini_final",
  "scope": "final",
  "endpoint": "https://generativelanguage.googleapis.com/v1beta/openai/chat/completions",
  "tokens_sent": 4200,
  "tokens_received": 5100,
  "timestamp": "2026-02-24T15:45:20Z"
}

Phase 2 (Ollama) runs have no external_transmission.jsonl. Local LLM calls are not external transmissions, so they are not recorded in this log. This is the security and cost advantage of local LLMs.

Event Stream

The Runner emits events during LLM profile routing:

Event	When Emitted	Phase 1	Phase 2	Phase 3
`llm.profile.applied`	Approved external profile applied	Emitted	-	Emitted
`llm.profile.fallback`	Unapproved external profile falls back to local	(On first run)	-	-

3-Tier Strategy Summary -- Full Flow

┌─────────────────────────────────────────────────────────────┐
│                   3-Tier LLM 전략                            │
│                                                              │
│  Step 1: workspace.yaml에 3개 프로필 설정                     │
│    ollama_local (로컬, 무료)                                  │
│    openai_planner (외부, 고품질 계획)                          │
│    gemini_final (외부, 고품질 최종본)                          │
│                          │                                   │
│                          ▼                                   │
│  Step 2: 외부 프로필 승인                                     │
│    approve accept-all --tool llm.external_call               │
│                          │                                   │
│                          ▼                                   │
│  Step 3: 3-Phase 실행                                        │
│                                                              │
│    Phase 1 ─── plan mode + --llm-plan openai_planner ───┐   │
│    [OpenAI]    1회 호출, $0.03                            │   │
│    "고품질 계획 수립"                                      │   │
│                          │                                │   │
│                          ▼                                │   │
│    Phase 2 ─── execute mode (프로필 미지정) ──────────────┤   │
│    [Ollama]    5~10회 호출, $0.00                          │   │
│    "대량 실행 — 여기서 비용의 대부분 절감!"                 │   │
│                          │                                │   │
│                          ▼                                │   │
│    Phase 3 ─── execute mode + --llm-final gemini_final ──┘   │
│    [Gemini]    1~2회 호출, $0.02                              │
│    "최종 다듬기"                                              │
│                          │                                   │
│                          ▼                                   │
│  총 비용: $0.05 (전부 외부 시 $0.24 → 79~90% 절감)           │
└─────────────────────────────────────────────────────────────┘

Frequently Asked Questions

Q: Why use provider: openai for Gemini?

euleragent uses the OpenAI Python SDK to call all OpenAI-compatible endpoints. Google Gemini provides an OpenAI-compatible API, so you just set the base_url to the Gemini endpoint and the same SDK handles the call. The same applies to Perplexity, Azure OpenAI, vLLM, and more.

Q: Does a local profile (is_external: false) also require approval?

No. Profiles with is_external: false are used immediately without approval. HITL approval is only required when data is transmitted to an external service.

Q: Is the approval permanent? Do I need to re-approve every time?

Once approved (accept), you can use the same profile+scope combination in subsequent runs without re-approval. The Runner finds and reuses matching existing approvals from the ApprovalQueue.

Q: What if I want to use a different profile for Phase 2 instead of Ollama?

Specifying the --llm-final option applies that profile to all LLM calls in execute mode. If you want Phase 2 with Gemini:

euleragent run market-analyst --mode execute --llm-final gemini_final --task "..."

In this case, Phase 2 and Phase 3 merge into a 2-Phase strategy.

Q: What happens if I don't configure llm_profiles?

If llm_profiles is not configured, the default llm_profiles + default_llm_profile settings from the generated workspace.yaml are used as-is. The legacy format (default_provider + ollama: + openai: blocks) is still supported for backward compatibility, but is no longer used in newly created workspaces.

Q: Can I add Perplexity too?

Yes. Use provider: openai with Perplexity's base_url:

llm_profiles:
  perplexity_sonar:
    provider: openai
    base_url: https://api.perplexity.ai
    api_key: ${PPLX_API_KEY}
    model: sonar
    is_external: true

See llm_providers.md for the full provider support matrix.

Next step: This concludes the euleragent Basic tutorial series. Next, refer to the Advanced Pattern Tutorials or the Custom Pattern Authoring Guide.

09. 3-Tier LLM Strategy: Cut Costs by 90% with OpenAI x Ollama x Gemini

Learning Objectives

Prerequisites

Step 1: Why Do You Need a 3-Tier LLM Strategy?

Cost Structure of Agent Tasks

Cost Comparison of the 3-Tier Strategy

euleragent's Scope Model

Step 2: Configure 3 Profiles in workspace.yaml

Profile Configuration

Configuration Field Descriptions

Environment Variable Setup

Step 3: External Profile Approval -- Preparation

3.1 -- Running Without Approval (Verify Fallback)

3.2 -- Check Pending Approvals

3.3 -- View Approval Record Details

3.4 -- Batch-Approve Both External Profiles

Step 4: Phase 1 -- High-Quality Planning with OpenAI

Execution

Expected Output

Verify the Generated Plan

Step 5: Phase 2 -- Bulk Execution with Ollama (Cost $0)

Execution

Expected Output

Key Observations

Step 6: Phase 3 -- Final Polish with Gemini

Execution

Expected Output

Verify the Final Result

Step 7: Cost Analysis -- Verifying Actual Cost Savings

Tracking Costs via Audit Logs

Overall Cost Summary

Step 8: Approval Workflow Details

8.1 -- Changing the Profile During Approval with --edit-params

8.2 -- Behavior When Approval is Denied

Step 9: Automation with Dynamic Mode (2-Tier Variant)

Execution Flow of Dynamic Mode

Execution

Expected Output

Dynamic Mode Cost

Step 10: agent.yaml Profile Override and Priority

agent.yaml Configuration

Priority Order

Step 11: Tracking LLM Routing in Audit Logs

Checking with euleragent logs

external_transmission.jsonl Audit Log

Event Stream

3-Tier Strategy Summary -- Full Flow

Frequently Asked Questions

8.1 -- Changing the Profile During Approval with `--edit-params`