Enterprise Document Translation & AI Training Data Tool
A Local-First CLI translation tool that precisely translates only prose text while perfectly preserving non-translatable elements such as code blocks, math expressions, and URLs. From enterprise document localization to AI training JSONL data translation, all handled in a single pipeline.
Open SourceIntelligent document translation that never touches code or math
A markdown-it-py based, fully CommonMark-compliant parser automatically identifies and preserves code fences, inline code, math expressions, front matter, and raw HTML. Only prose text is translated.
| Supported Formats | Markdown, HTML, Plain Text, JSONL |
|---|---|
| Preserved Elements | Code blocks, inline code, math ($...$, $$...$$), URLs, front matter |
| Parser | markdown-it-py (CommonMark 100%), BeautifulSoup4 (HTML) |
Uses Ollama local LLM as the default translation engine, ensuring confidential enterprise documents never leave your network. OpenAI API is also available as an option.
| Translation Engine | Ollama (local), OpenAI (optional) |
|---|---|
| Data Security | All processing completed locally, no external transmission |
| Configuration | YAML declarative config, CLI override support |
A dedicated pipeline for large-scale translation of JSONL training data
The eulerpress traindata command translates JSONL training data at high speed using Ollama-based concurrent HTTP requests.
Automatically validates translation quality and flags problematic results.
Five core commands cover the entire document translation workflow
translateTranslates documents according to a YAML config file. Source directory, target language, model, and more can be overridden via CLI.
traindataConcurrently translates JSONL training data with Ollama. Supports math preservation, incremental output, and resume.
validateValidates YAML config files without executing. Outputs 3-line format errors if issues are found.
planDry run: previews the number of target files, segments, and estimated token count before translation.
doctorChecks system dependencies (Ollama binary, server connection, available models).
Modular design for independently extensible formats, engines, and quality assurance
config |
YAML schema, loading, validation |
|---|---|
parsers |
Markdown, HTML, Plain Text parsers |
engine |
Translation providers (Ollama, OpenAI, Fake) |
core |
Orchestrator, planner, doctor, errors |
scoring |
Quality scoring, model selection, API evaluation |
traindata |
JSONL training data translation (Ollama concurrent) |
glossary |
Domain glossary search (Tavily) |
| Language | Python 3.12+ |
|---|---|
| Markdown Parser | markdown-it-py + mdformat (CommonMark 100%) |
| HTML Parser | BeautifulSoup4 + lxml |
| Translation Engine | Ollama (local), OpenAI (cloud) |
| Chunking Strategy | sentence, whitespace, hard |
| Error Format | 3-line format (Category / Fix / See) |
| License | MIT (including all dependencies) |
Get up to speed with EulerPress through step-by-step guides
Tutorials coming soon.
Install EulerPress and start your first translation
Python 3.12+
Ollama (for local translation)
Local-first, code-preserving, enterprise-grade translation tool.
Get Started on GitHub Contact Us