EulerPress

Enterprise Document Translation & AI Training Data Tool

A Local-First CLI translation tool that precisely translates only prose text while perfectly preserving non-translatable elements such as code blocks, math expressions, and URLs. From enterprise document localization to AI training JSONL data translation, all handled in a single pipeline.

Open Source

Core Features

Intelligent document translation that never touches code or math

Precision-Preserving Translation

A markdown-it-py based, fully CommonMark-compliant parser automatically identifies and preserves code fences, inline code, math expressions, front matter, and raw HTML. Only prose text is translated.

Supported Formats Markdown, HTML, Plain Text, JSONL
Preserved Elements Code blocks, inline code, math ($...$, $$...$$), URLs, front matter
Parser markdown-it-py (CommonMark 100%), BeautifulSoup4 (HTML)

Local-First Architecture

Uses Ollama local LLM as the default translation engine, ensuring confidential enterprise documents never leave your network. OpenAI API is also available as an option.

Translation Engine Ollama (local), OpenAI (optional)
Data Security All processing completed locally, no external transmission
Configuration YAML declarative config, CLI override support

AI Training Data Translation

A dedicated pipeline for large-scale translation of JSONL training data

Traindata Pipeline

The eulerpress traindata command translates JSONL training data at high speed using Ollama-based concurrent HTTP requests.

  • Math Preservation: Protects LaTeX math expressions ($...$, $$...$$, \(...\), \[...\]) with placeholders, then restores them.
  • Concurrent Processing: ThreadPoolExecutor-based multi-worker, per-record parallel translation.
  • Incremental Output: Writes to file immediately upon record completion; results are preserved even on interruption.
  • Resume Support: Skips existing output records and translates only new ones.

Quality Assurance

Automatically validates translation quality and flags problematic results.

  • Translation Validation: Length ratio checks, number-only detection, placeholder count verification.
  • Format Preservation Scoring: Detects damage to code fences, math expressions, and URLs.
  • Auto Chunking: Splits long text at sentence boundaries to maintain translation quality.
  • Glossary Search: Tavily-based domain glossary ensures consistent translations.

CLI Reference

Five core commands cover the entire document translation workflow

translate

Translates documents according to a YAML config file. Source directory, target language, model, and more can be overridden via CLI.

traindata

Concurrently translates JSONL training data with Ollama. Supports math preservation, incremental output, and resume.

validate

Validates YAML config files without executing. Outputs 3-line format errors if issues are found.

plan

Dry run: previews the number of target files, segments, and estimated token count before translation.

doctor

Checks system dependencies (Ollama binary, server connection, available models).

Architecture

Modular design for independently extensible formats, engines, and quality assurance

config.yaml → Loader → Validator → EulerPressConfig ↓ Translator ├── discover_files() ├── for each file: │ ├── get_parser(ext) → Parser │ ├── parser.parse() → [Segment] │ ├── provider.translate(chunks) │ └── parser.render(segments) → output └── write output (mirror structure)

Package Structure

config YAML schema, loading, validation
parsers Markdown, HTML, Plain Text parsers
engine Translation providers (Ollama, OpenAI, Fake)
core Orchestrator, planner, doctor, errors
scoring Quality scoring, model selection, API evaluation
traindata JSONL training data translation (Ollama concurrent)
glossary Domain glossary search (Tavily)

Technical Specifications

Language Python 3.12+
Markdown Parser markdown-it-py + mdformat (CommonMark 100%)
HTML Parser BeautifulSoup4 + lxml
Translation Engine Ollama (local), OpenAI (cloud)
Chunking Strategy sentence, whitespace, hard
Error Format 3-line format (Category / Fix / See)
License MIT (including all dependencies)

Tutorials

Get up to speed with EulerPress through step-by-step guides

Tutorials coming soon.

Installation & Getting Started

Install EulerPress and start your first translation

Installation

pip install eulerpress

# Install Ollama local LLM
ollama pull gemma3:27b

Requirements

Python 3.12+

Ollama (for local translation)

Automate Your Document Translation with EulerPress

Local-first, code-preserving, enterprise-grade translation tool.

Get Started on GitHub Contact Us