Fumadocs on TanStack Start

The audio namespace bundles two unrelated capabilities that happen to share a media type: synthesis (audio.speech) and transcription (audio.transcriptions.create).

`audio.speech`

client.audio.speech(
  params: SpeechParams,
  options?: { signal?: AbortSignal },
): Promise<SpeechResponse>

Synthesizes speech via Bastion's NVIDIA Riva proxy. Returns raw audio bytes plus the upstream Content-Type, so you can pick the right file extension without re-parsing the header.

Parameters

interface SpeechParams {
  text: string;
  language_code: string;     // e.g. "en-US", "fr-CA"
  voice_name: string;        // e.g. "English-US.Female-1", "French-Canadian.Male-1"
  encoding: number;          // use SpeechEncoding constants below
  sample_rate_hz: number;    // e.g. 44100 for LINEAR_PCM
}

`SpeechEncoding`

NVIDIA Riva encoding numbers. The SDK forwards encoding verbatim — use the constants to avoid magic numbers:

import { SpeechEncoding } from "@qubittron/bastion-sdk";

SpeechEncoding.LINEAR_PCM // 1
SpeechEncoding.FLAC       // 2
SpeechEncoding.MULAW      // 3
SpeechEncoding.ALAW       // 4
SpeechEncoding.OGGOPUS    // 5

Response

interface SpeechResponse {
  audio: Uint8Array;     // raw bytes from the upstream
  contentType: string;   // e.g. "audio/wav", "audio/ogg"
}

Example

import { writeFile } from "node:fs/promises";
import { SpeechEncoding } from "@qubittron/bastion-sdk";

const { audio, contentType } = await client.audio.speech({
  text: "Bonjour le monde",
  language_code: "fr-CA",
  voice_name: "French-Canadian.Female-1",
  encoding: SpeechEncoding.LINEAR_PCM,
  sample_rate_hz: 44100,
});

console.log(contentType); // "audio/wav"
await writeFile("hello.wav", audio);

`audio.transcriptions.create`

client.audio.transcriptions.create(
  params: TranscriptionParams,
  options?: { signal?: AbortSignal },
): Promise<TranscriptionResponse>

Speech-to-text. Accepts a Blob or File; the SDK posts as multipart/form-data.

Parameters

interface TranscriptionParams {
  file: Blob | File;
  model: string;                 // e.g. "whisper-large-v3"
  language?: string;             // ISO 639-1, optional hint
  response_format?: "json" | "verbose_json" | "text";
  prompt?: string;               // biasing prompt
  temperature?: number;          // 0–1
}

verbose_json is the only format that returns timestamped segments.
text returns a plain string body — the SDK still wraps it as { text: "..." }.

Response

interface TranscriptionSegment {
  id?: number;
  start?: number;     // seconds
  end?: number;       // seconds
  text?: string;
  [key: string]: unknown;
}

interface TranscriptionResponse {
  text: string;
  language?: string;
  duration?: number;             // seconds
  segments?: TranscriptionSegment[];
  [key: string]: unknown;
}

Example

import { readFile } from "node:fs/promises";

const buf = await readFile("speech.wav");
const file = new File([buf], "speech.wav", { type: "audio/wav" });

const res = await client.audio.transcriptions.create({
  file,
  model: "whisper-large-v3",
  response_format: "verbose_json",
});

console.log(res.text);
for (const seg of res.segments ?? []) {
  console.log(`[${seg.start?.toFixed(2)}–${seg.end?.toFixed(2)}] ${seg.text}`);
}

Browser uploads

In a browser, pass a File straight from an <input type="file">:

const file = inputEl.files?.[0];
if (!file) return;

const res = await client.audio.transcriptions.create({
  file,
  model: "whisper-large-v3",
});

But see Environments before exposing an API key in browser code — proxy through your server instead.

Errors

When	Class
Missing/invalid `file`, unsupported `model`	`BadRequestError`
Invalid key	`AuthenticationError`
Rate-limited	`RateLimitError`
Riva / Whisper down	`UpstreamError`
Network failure	`APIConnectionError`

audio

On this page