How it's this cheap
Engineering, not arithmetic.
This is not a loss leader, an introductory rate, or a thin reseller margin. It's the price you pay when you rebuild lifelike voice synthesis from first principles, ship every layer in-house, and run the whole thing on commodity CPUs. Six engineering choices, each chipping an order of magnitude off the bill, until two tenths of a cent a minute became a comfortable price.
01
−4× compute
Distilled, not throttled.
The teacher model is a 460M-parameter flow-matching giant. We distil it down to a 117M-parameter student that ships everything you can hear, including breath, laugh, sigh, prosody, and pitch contour, but at a quarter of the FLOPs. No quality knob, no "lite" tier.
02
CPU-native
Runs without a GPU.
Most "neural" TTS quietly rents A100s in the basement and passes the bill along. We ship an ONNX Runtime graph that hits 9–12× realtime on an ordinary 8-core cloud CPU. No CUDA, no driver lock-in, no $3/hour accelerator surcharge that you ultimately pay for.
03
8 steps · not 50
Flow matching, not diffusion.
Classic diffusion TTS takes 30 to 50 denoising passes for studio quality. Our flow-matching decoder needs eight, and four is still usable. Every step you don't run is latency and cost you don't bill. The whole inference loop is six times tighter than the published baseline.
04
Streams sentence-by-sentence
Streamed, not batched.
The first audio chunk leaves the server before the full sentence has been synthesised. Audio streams back sentence by sentence over HTTP, so the client can start playback while the rest of the line is still being written. No waiting for the whole paragraph, no buffering pause.
05
23 langs · 1 model
One model, every language.
Most TTS providers ship a different model per language and charge you separately for "multilingual". We trained the phonemiser, the acoustic model, and the vocoder once across twenty-three languages, so there's nothing to swap, nothing to warm up, and nothing extra to bill when a Hindi voice answers a French question.
06
$0 cold-start
Scales to zero, wakes in two seconds.
The whole serving stack fits in roughly 500 MB and a single Python process. Container cold-starts complete in about two seconds, which means traffic that idles can run on serverless workers that bill by the second. No standing GPU bill, no warm-pool overhead, no "minimum committed throughput".