Day 0 support for Google Gemini 3.1 Flash TTS Try it now →
Providers

xAI (Grok)

xAI Grok text-to-speech with native audio tags and BCP-47 language control.

Prefixxai
Default modelgrok-tts
Env varXAI_API_KEY
Official docsdocs.x.ai

Models

ModelStreamingAudio TagsVoice CloningNotes
grok-ttsYesYes (passthrough)NoNative bracket and <whisper> tags

Supported languages (via language): en, ar-EG, ar-SA, ar-AE, bn, zh, fr, de, hi, id, it, ja, ko, pt-BR, pt-PT, ru, es-MX, es-ES, tr, vi. Pass auto (the default) for automatic detection.

Usage

import { generateSpeech } from "@speech-sdk/core"

const result = await generateSpeech({
  model: "xai/grok-tts",
  text: "Hello from SpeechSDK!",
  voice: "ava",
})

The voice string is sent to xAI as voice_id.

Audio Tags

grok-tts natively supports both styles of audio tags, so SpeechSDK passes your text through unchanged:

  • Inline bracket tags[pause], [laugh], [sigh], etc.
  • Wrapping angle-bracket tags<whisper>quiet part</whisper>
await generateSpeech({
  model: "xai/grok-tts",
  text: "[laugh] Oh that's great. <whisper>Don't tell anyone.</whisper>",
  voice: "ava",
})

Provider Options

await generateSpeech({
  model: "xai/grok-tts",
  text: "Hello!",
  voice: "ava",
  providerOptions: {
    language: "en", // BCP-47, or "auto" (default)
    output_format: {
      codec: "wav", // mp3 (default) | wav | pcm | mulaw | alaw
    },
  },
})

language is required by the xAI API — SpeechSDK defaults it to "auto" if you don't pass one.

Custom Configuration

import { generateSpeech } from "@speech-sdk/core"
import { createXai } from "@speech-sdk/core/providers"

const xai = createXai({
  apiKey: process.env.XAI_API_KEY,
  baseURL: "https://api.x.ai/v1",
})

const result = await generateSpeech({
  model: xai("grok-tts"),
  text: "Hello!",
  voice: "ava",
})

On this page