EulerPress – Enterprise Document Translation Tool

Core Features

Intelligent document translation that never touches code or math

Precision-Preserving Translation

A markdown-it-py based, fully CommonMark-compliant parser automatically identifies and preserves code fences, inline code, math expressions, front matter, and raw HTML. Only prose text is translated.

Supported Formats	Markdown, HTML, Plain Text, JSONL
Preserved Elements	Code blocks, inline code, math ($...$, $$...$$), URLs, front matter
Parser	markdown-it-py (CommonMark 100%), BeautifulSoup4 (HTML)

Local-First Architecture

Uses Ollama local LLM as the default translation engine, ensuring confidential enterprise documents never leave your network. OpenAI API is also available as an option.

Translation Engine	Ollama (local), OpenAI (optional)
Data Security	All processing completed locally, no external transmission
Configuration	YAML declarative config, CLI override support

AI Training Data Translation

A dedicated pipeline for large-scale translation of JSONL training data

Traindata Pipeline

The eulerpress traindata command translates JSONL training data at high speed using Ollama-based concurrent HTTP requests.

Math Preservation: Protects LaTeX math expressions ($...$, $$...$$, $...$, \[...\]) with placeholders, then restores them.
Concurrent Processing: ThreadPoolExecutor-based multi-worker, per-record parallel translation.
Incremental Output: Writes to file immediately upon record completion; results are preserved even on interruption.
Resume Support: Skips existing output records and translates only new ones.

Quality Assurance

Automatically validates translation quality and flags problematic results.

Translation Validation: Length ratio checks, number-only detection, placeholder count verification.
Format Preservation Scoring: Detects damage to code fences, math expressions, and URLs.
Auto Chunking: Splits long text at sentence boundaries to maintain translation quality.
Glossary Search: Tavily-based domain glossary ensures consistent translations.

CLI Reference

Five core commands cover the entire document translation workflow

`translate`

Translates documents according to a YAML config file. Source directory, target language, model, and more can be overridden via CLI.

`traindata`

Concurrently translates JSONL training data with Ollama. Supports math preservation, incremental output, and resume.

`validate`

Validates YAML config files without executing. Outputs 3-line format errors if issues are found.

`plan`

Dry run: previews the number of target files, segments, and estimated token count before translation.

`doctor`

Checks system dependencies (Ollama binary, server connection, available models).

Architecture

Modular design for independently extensible formats, engines, and quality assurance

config.yaml → Loader → Validator → EulerPressConfig ↓ Translator ├── discover_files() ├── for each file: │ ├── get_parser(ext) → Parser │ ├── parser.parse() → [Segment] │ ├── provider.translate(chunks) │ └── parser.render(segments) → output └── write output (mirror structure)

Package Structure

`config`	YAML schema, loading, validation
`parsers`	Markdown, HTML, Plain Text parsers
`engine`	Translation providers (Ollama, OpenAI, Fake)
`core`	Orchestrator, planner, doctor, errors
`scoring`	Quality scoring, model selection, API evaluation
`traindata`	JSONL training data translation (Ollama concurrent)
`glossary`	Domain glossary search (Tavily)

Technical Specifications

Language	Python 3.12+
Markdown Parser	markdown-it-py + mdformat (CommonMark 100%)
HTML Parser	BeautifulSoup4 + lxml
Translation Engine	Ollama (local), OpenAI (cloud)
Chunking Strategy	sentence, whitespace, hard
Error Format	3-line format (Category / Fix / See)
License	MIT (including all dependencies)

Tutorials

Get up to speed with EulerPress through step-by-step guides

Tutorials coming soon.

Installation & Getting Started

Install EulerPress and start your first translation

Installation

pip install eulerpress

# Install Ollama local LLM
ollama pull gemma3:27b

Requirements

Python 3.12+

Ollama (for local translation)

GitHub

eulerwa/eulerpress

Automate Your Document Translation with EulerPress

Local-first, code-preserving, enterprise-grade translation tool.

Get Started on GitHub Contact Us