Docs/Expression tags
Expression tags.
Drop tags like <laugh>, <breath>, or <sigh> directly into your text and the model renders them as audible expressions, not literal words. Use them to nudge prosody without leaving the API surface.
Tag catalogue
| Tag | Renders as | Notes |
|---|---|---|
| <laugh> | A short, natural laugh | Length and intensity scale with surrounding context |
| <chuckle> | A softer, single-beat laugh | Good for warm reassurance |
| <sigh> | An audible sigh | Used naturally before bad news or apologies |
| <breath> | A breath in | Pacing aid for long-form narration |
| <pause:300> | A silent gap, milliseconds | Accepts 100–2000 ms; useful for dramatic timing |
| <whisper>…</whisper> | Whispered delivery | Drops energy ~12 dB and shifts vocal-tract shape |
| <shout>…</shout> | Raised, projected delivery | Use sparingly — sounds best on Kabir, Orion |
| <em>…</em> | Word emphasis | Boosts pitch and stress on the wrapped span |
Examples
Mix the tags freely. The phonemiser strips them before alignment, so they don't count toward billed characters.
text
Hello, and <em>welcome</em> to the show. <pause:400> <breath> Today's episode is a strange one, <chuckle> so buckle in. <sigh> I know that wasn't the result we were hoping for. <whisper>Don't tell anyone I told you this.</whisper>
Prosody hints
If you need lower-level control, set request-time prosody overrides instead of inline tags:
| Field | Type | Effect |
|---|---|---|
| speed | float (0.5 – 2.0) | Global speaking rate. Default 1.0. |
| pitch_shift_semitones | float (-4 – 4) | Lifts or drops the carrier pitch. |
| energy | float (0.6 – 1.4) | Loudness/effort. 1.0 = nominal. |
| style_weight | float (0 – 1) | How strongly the voice's reference style is applied; lower values sound more neutral. |
Tip · let the tags do the work
Most teams get better results from a single <sigh> than from manually crafted prosody overrides. The model was trained on real expressive performances, so the tags inherit the same natural timing.