Docs/Expression tags

Expression tags.

Drop tags like <laugh>, <breath>, or <sigh> directly into your text and the model renders them as audible expressions, not literal words. Use them to nudge prosody without leaving the API surface.

Tag catalogue

Tag	Renders as	Notes
<laugh>	A short, natural laugh	Length and intensity scale with surrounding context
<chuckle>	A softer, single-beat laugh	Good for warm reassurance
<sigh>	An audible sigh	Used naturally before bad news or apologies
<breath>	A breath in	Pacing aid for long-form narration
<pause:300>	A silent gap, milliseconds	Accepts 100–2000 ms; useful for dramatic timing
<whisper>…</whisper>	Whispered delivery	Drops energy ~12 dB and shifts vocal-tract shape
<shout>…</shout>	Raised, projected delivery	Use sparingly — sounds best on Kabir, Orion
<em>…</em>	Word emphasis	Boosts pitch and stress on the wrapped span

Examples

Mix the tags freely. The phonemiser strips them before alignment, so they don't count toward billed characters.

text

Hello, and <em>welcome</em> to the show. <pause:400>
<breath> Today's episode is a strange one, <chuckle> so buckle in.

<sigh> I know that wasn't the result we were hoping for.

<whisper>Don't tell anyone I told you this.</whisper>

Prosody hints

If you need lower-level control, set request-time prosody overrides instead of inline tags:

Field	Type	Effect
speed	float (0.5 – 2.0)	Global speaking rate. Default 1.0.
pitch_shift_semitones	float (-4 – 4)	Lifts or drops the carrier pitch.
energy	float (0.6 – 1.4)	Loudness/effort. 1.0 = nominal.
style_weight	float (0 – 1)	How strongly the voice's reference style is applied; lower values sound more neutral.

Tip · let the tags do the work

Most teams get better results from a single <sigh> than from manually crafted prosody overrides. The model was trained on real expressive performances, so the tags inherit the same natural timing.

← Previous

Voices & languages

REST API reference