Skip to content

Voice

Turn on enableVoice and your NPC speaks its replies out loud, positioned in 3D space, and players can talk to it by voice instead of typing.

Turn it on

On the NPC's SSAINpc, set enableVoice = true and give it a Voice provider handle in voiceProviderOverride (register one in the Provider Manager). Then tune:

Field What it does
voiceName The voice to use (overrides the provider's Default Voice). Depends on the vendor: Grok eve/ara/rex/sal/leo; Orpheus tara…; OpenAI alloy/nova…; ElevenLabs/Cartesia a voice id. Blank = the provider's Default Voice.
voiceSpeed Speaking speed, 1.0 = normal. Grok supports 0.71.5; other providers may ignore it.
hearingRange Metres the NPC's voice carries (3D falloff) and how close a player must be for the NPC to hear them.
voiceDelivery How audio is delivered (below).
voiceAudioSource Optional — your own AudioSource. If empty, a 3D one is created on the NPC with maxDistance = hearingRange.

Delivery modes

voiceDelivery trades reliability for latency:

Mode Behavior Use when
Buffered Synthesize the whole reply, then play it. Most reliable. Default; any provider.
Sentence Synthesize + play per sentence with look‑ahead. Lower latency. Snappier conversation, any endpoint.
Streamed Play PCM chunks as they arrive. Lowest latency. A provider that supports streaming TTS.

Start with Buffered

Buffered always works. Move to Sentence or Streamed only if you want lower latency and your provider/voice supports it.

Speaking, queueing & interruption

When one message makes the NPC say several lines — e.g. a multi‑step action where it talks between steps ("Here goes!" → does the flip → "How was that?") — the lines queue and play in order. You don't have to do anything; the NPC speaks them back‑to‑back.

A new player message interrupts the NPC: the moment a player sends another line, the NPC stops talking immediately (barge‑in) and responds to the new input. This keeps conversation responsive — players never have to wait out a long reply before they can speak again. (Lines from the same message still queue; only a new message interrupts.)

Players talking to the NPC (voice input)

With voice enabled, players can speak to the NPC instead of typing: holding the voice key captures speech near an NPC, transcribes it (speech‑to‑text), and sends it to the NPC as a normal message. From the NPC's side it's identical to a typed line — your personality, actions, and world events all behave the same.

Spatial audio

NPC speech plays from the NPC's position with 3D falloff out to hearingRange, so players hear it get louder as they approach and it fades with distance — no setup beyond hearingRange (unless you supply your own voiceAudioSource).

Notes

  • Voice uses a Voice‑type provider (Providers & keys) — separate from the text/LLM provider that drives the conversation. An NPC can use one of each.
  • Voice key handling is the same as any provider: stored server‑side, never in your world.