xAI (Grok)


Prefix	`xai`
Default model	`grok-tts`
Env var	`XAI_API_KEY`
Official docs	docs.x.ai

Models

Model	Streaming	Audio Tags	Voice Cloning	Notes
`grok-tts`	Yes	Yes (passthrough)	No	Native bracket and `<whisper>` tags

Supported languages (via language): en, ar-EG, ar-SA, ar-AE, bn, zh, fr, de, hi, id, it, ja, ko, pt-BR, pt-PT, ru, es-MX, es-ES, tr, vi. Pass auto (the default) for automatic detection.

Usage

import { generateSpeech } from "@speech-sdk/core"

const result = await generateSpeech({
  model: "xai/grok-tts",
  text: "Hello from SpeechSDK!",
  voice: "ava",
})

The voice string is sent to xAI as voice_id.

Audio Tags

grok-tts natively supports both styles of audio tags, so SpeechSDK passes your text through unchanged:

Inline bracket tags — [pause], [laugh], [sigh], etc.
Wrapping angle-bracket tags — <whisper>quiet part</whisper>

await generateSpeech({
  model: "xai/grok-tts",
  text: "[laugh] Oh that's great. <whisper>Don't tell anyone.</whisper>",
  voice: "ava",
})

Provider Options

await generateSpeech({
  model: "xai/grok-tts",
  text: "Hello!",
  voice: "ava",
  providerOptions: {
    language: "en", // BCP-47, or "auto" (default)
    output_format: {
      codec: "wav", // mp3 (default) | wav | pcm | mulaw | alaw
    },
  },
})

language is required by the xAI API — SpeechSDK defaults it to "auto" if you don't pass one.

Custom Configuration

import { generateSpeech } from "@speech-sdk/core"
import { createXai } from "@speech-sdk/core/providers"

const xai = createXai({
  apiKey: process.env.XAI_API_KEY,
  baseURL: "https://api.x.ai/v1",
})

const result = await generateSpeech({
  model: xai("grok-tts"),
  text: "Hello!",
  voice: "ava",
})