LeanVoice, Pricing

−4× compute

Distilled, not throttled.

The teacher model is a 460M-parameter flow-matching giant. We distil it down to a 117M-parameter student that ships everything you can hear, including breath, laugh, sigh, prosody, and pitch contour, but at a quarter of the FLOPs. No quality knob, no "lite" tier.

CPU-native

Runs without a GPU.

Most "neural" TTS quietly rents A100s in the basement and passes the bill along. We ship an ONNX Runtime graph that hits 9–12× realtime on an ordinary 8-core cloud CPU. No CUDA, no driver lock-in, no $3/hour accelerator surcharge that you ultimately pay for.

8 steps · not 50

Flow matching, not diffusion.

Classic diffusion TTS takes 30 to 50 denoising passes for studio quality. Our flow-matching decoder needs eight, and four is still usable. Every step you don't run is latency and cost you don't bill. The whole inference loop is six times tighter than the published baseline.

Streams sentence-by-sentence

Streamed, not batched.

The first audio chunk leaves the server before the full sentence has been synthesised. Audio streams back sentence by sentence over HTTP, so the client can start playback while the rest of the line is still being written. No waiting for the whole paragraph, no buffering pause.

23 langs · 1 model

One model, every language.

Most TTS providers ship a different model per language and charge you separately for "multilingual". We trained the phonemiser, the acoustic model, and the vocoder once across twenty-three languages, so there's nothing to swap, nothing to warm up, and nothing extra to bill when a Hindi voice answers a French question.

$0 cold-start

Scales to zero, wakes in two seconds.

The whole serving stack fits in roughly 500 MB and a single Python process. Container cold-starts complete in about two seconds, which means traffic that idles can run on serverless workers that bill by the second. No standing GPU bill, no warm-pool overhead, no "minimum committed throughput".

Every number above is from production benchmarks, not a slide deck. The full methodology, model card, and per-machine latency table live on the research page.

Pay only for what you use. Three tenths of a cent.