Docs/Expression tags

Expression tags.

Drop tags like <laugh>, <breath>, or <sigh> directly into your text and the model renders them as audible expressions, not literal words. Use them to nudge prosody without leaving the API surface.

Tag catalogue

TagRenders asNotes
<laugh>A short, natural laughLength and intensity scale with surrounding context
<chuckle>A softer, single-beat laughGood for warm reassurance
<sigh>An audible sighUsed naturally before bad news or apologies
<breath>A breath inPacing aid for long-form narration
<pause:300>A silent gap, millisecondsAccepts 100–2000 ms; useful for dramatic timing
<whisper>…</whisper>Whispered deliveryDrops energy ~12 dB and shifts vocal-tract shape
<shout>…</shout>Raised, projected deliveryUse sparingly — sounds best on Kabir, Orion
<em>…</em>Word emphasisBoosts pitch and stress on the wrapped span

Examples

Mix the tags freely. The phonemiser strips them before alignment, so they don't count toward billed characters.

text
Hello, and <em>welcome</em> to the show. <pause:400>
<breath> Today's episode is a strange one, <chuckle> so buckle in.

<sigh> I know that wasn't the result we were hoping for.

<whisper>Don't tell anyone I told you this.</whisper>

Prosody hints

If you need lower-level control, set request-time prosody overrides instead of inline tags:

FieldTypeEffect
speedfloat (0.5 – 2.0)Global speaking rate. Default 1.0.
pitch_shift_semitonesfloat (-4 – 4)Lifts or drops the carrier pitch.
energyfloat (0.6 – 1.4)Loudness/effort. 1.0 = nominal.
style_weightfloat (0 – 1)How strongly the voice's reference style is applied; lower values sound more neutral.
Tip · let the tags do the work

Most teams get better results from a single <sigh> than from manually crafted prosody overrides. The model was trained on real expressive performances, so the tags inherit the same natural timing.