audio
Text-to-speech (audio.speech) and speech-to-text (audio.transcriptions.create).
The audio namespace bundles two unrelated capabilities that happen to share a media type: synthesis (audio.speech) and transcription (audio.transcriptions.create).
audio.speech
client.audio.speech(
params: SpeechParams,
options?: { signal?: AbortSignal },
): Promise<SpeechResponse>Synthesizes speech via Bastion's NVIDIA Riva proxy. Returns raw audio bytes plus the upstream Content-Type, so you can pick the right file extension without re-parsing the header.
Parameters
interface SpeechParams {
text: string;
language_code: string; // e.g. "en-US", "fr-CA"
voice_name: string; // e.g. "English-US.Female-1", "French-Canadian.Male-1"
encoding: number; // use SpeechEncoding constants below
sample_rate_hz: number; // e.g. 44100 for LINEAR_PCM
}SpeechEncoding
NVIDIA Riva encoding numbers. The SDK forwards encoding verbatim — use the constants to avoid magic numbers:
import { SpeechEncoding } from "@qubittron/bastion-sdk";
SpeechEncoding.LINEAR_PCM // 1
SpeechEncoding.FLAC // 2
SpeechEncoding.MULAW // 3
SpeechEncoding.ALAW // 4
SpeechEncoding.OGGOPUS // 5Response
interface SpeechResponse {
audio: Uint8Array; // raw bytes from the upstream
contentType: string; // e.g. "audio/wav", "audio/ogg"
}Example
import { writeFile } from "node:fs/promises";
import { SpeechEncoding } from "@qubittron/bastion-sdk";
const { audio, contentType } = await client.audio.speech({
text: "Bonjour le monde",
language_code: "fr-CA",
voice_name: "French-Canadian.Female-1",
encoding: SpeechEncoding.LINEAR_PCM,
sample_rate_hz: 44100,
});
console.log(contentType); // "audio/wav"
await writeFile("hello.wav", audio);audio.transcriptions.create
client.audio.transcriptions.create(
params: TranscriptionParams,
options?: { signal?: AbortSignal },
): Promise<TranscriptionResponse>Speech-to-text. Accepts a Blob or File; the SDK posts as multipart/form-data.
Parameters
interface TranscriptionParams {
file: Blob | File;
model: string; // e.g. "whisper-large-v3"
language?: string; // ISO 639-1, optional hint
response_format?: "json" | "verbose_json" | "text";
prompt?: string; // biasing prompt
temperature?: number; // 0–1
}verbose_jsonis the only format that returns timestampedsegments.textreturns a plain string body — the SDK still wraps it as{ text: "..." }.
Response
interface TranscriptionSegment {
id?: number;
start?: number; // seconds
end?: number; // seconds
text?: string;
[key: string]: unknown;
}
interface TranscriptionResponse {
text: string;
language?: string;
duration?: number; // seconds
segments?: TranscriptionSegment[];
[key: string]: unknown;
}Example
import { readFile } from "node:fs/promises";
const buf = await readFile("speech.wav");
const file = new File([buf], "speech.wav", { type: "audio/wav" });
const res = await client.audio.transcriptions.create({
file,
model: "whisper-large-v3",
response_format: "verbose_json",
});
console.log(res.text);
for (const seg of res.segments ?? []) {
console.log(`[${seg.start?.toFixed(2)}–${seg.end?.toFixed(2)}] ${seg.text}`);
}Browser uploads
In a browser, pass a File straight from an <input type="file">:
const file = inputEl.files?.[0];
if (!file) return;
const res = await client.audio.transcriptions.create({
file,
model: "whisper-large-v3",
});But see Environments before exposing an API key in browser code — proxy through your server instead.
Errors
| When | Class |
|---|---|
Missing/invalid file, unsupported model | BadRequestError |
| Invalid key | AuthenticationError |
| Rate-limited | RateLimitError |
| Riva / Whisper down | UpstreamError |
| Network failure | APIConnectionError |