NPU Inference Composition & Simulation Stack
Define inference graphs with 123 operators across 13 groups and 10 data types. Compile spec.yaml to .npuart artifacts and simulate or deploy to Zynq-7020 FPGA hardware — all from a single CLI.
Open Source123 operators, 13 groups, 10 data types — from spec to deployment artifact
A comprehensive operator set covering all common inference operations, organized into 13 logical groups.
| Operators | 123 operators in 13 groups (arithmetic, activation, reduce, normalization, pooling, convolution, recurrent, attention, elementwise, shape, quantization, custom, control) |
|---|---|
| Data Types | 10 dtypes: float32, float16, bfloat16, int8, uint8, int16, int32, int64, bool, complex64 |
| Spec Format | spec.yaml — declarative graph definition with typed edges and operator parameters |
From YAML specification to deployable hardware artifact in a validated, reproducible pipeline.
| Validate | Schema + operator compatibility checks |
|---|---|
| Compile | spec.yaml → IR → optimization passes → .npuart |
| Simulate | Cycle-accurate simulation with profiling data |
| Deploy | Zynq-7020 FPGA target with board-smoke verification |
Deterministic, reproducible, and auditable inference at every step
All inference graphs are defined in spec.yaml — human-readable, version-controllable, and diffable. No hidden state or implicit configuration.
Simulation results are bit-exact across runs. The same spec.yaml always produces the same .npuart artifact and the same inference outputs.
Board-smoke tests verify hardware compatibility before deployment. Calibration and profiling ensure real-world performance matches simulation.
Single entry point eulernpu — 11 subcommands cover the entire workflow
validateValidate spec.yaml schema, operator compatibility, and dtype constraints.
compileCompile spec.yaml to .npuart artifact through the optimization pipeline.
runExecute a compiled .npuart artifact with input data and produce outputs.
simCycle-accurate simulation of the inference graph with timing data.
profileGenerate per-operator latency, memory, and throughput profiling reports.
explainHuman-readable summary of the graph structure, operator count, and data flow.
board-smokeRun hardware compatibility smoke tests on the target FPGA board.
calibrateCalibrate quantization parameters using representative input data.
benchmarkRun throughput and latency benchmarks on compiled artifacts.
replayReplay a recorded inference session for debugging and validation.
compress-cacheCompress and manage the compilation cache for faster rebuilds.
Step-by-step guides to get started with EulerNPU quickly
Tutorials coming soon.
Install EulerNPU and compile your first inference graph
Python 3.12+
Zynq-7020 FPGA board (for hardware deployment)
From spec.yaml to hardware deployment, in a single CLI.
Get Started on GitHub Contact Us