API reference
Audio transcriptions
POST /v1/audio/transcriptions — speech-to-text via multipart upload.
POST https://api.qubittron.ai/v1/audio/transcriptionsOpenAI-compatible speech-to-text. Multipart upload, 25 MB cap.
Authentication
Authorization: Bearer qbt_<key>
Request
Content-Type: multipart/form-data with the following fields:
| Field | Type | Required | Notes |
|---|---|---|---|
file | file (Blob) | yes | Audio. Max 25 MB |
model | string | yes | Speech model id |
language | string | no | ISO-639-1 (e.g. "en") |
response_format | string | no | "json" (default), "text", "verbose_json" |
temperature, prompt | various | no | Passthrough |
Supported models
| Model | Notes |
|---|---|
whisper-large-v3 | Highest quality, slower |
whisper-large-v3-turbo | Faster, near-equivalent quality |
Examples
curl https://api.qubittron.ai/v1/audio/transcriptions \
-H "Authorization: Bearer $QUBITTRON_API_KEY" \
-F "model=whisper-large-v3-turbo" \
-F "language=en" \
-F "file=@/path/to/audio.wav"import OpenAI from "openai";
import { createReadStream } from "node:fs";
const client = new OpenAI({
baseURL: "https://api.qubittron.ai/v1",
apiKey: process.env.QUBITTRON_API_KEY,
});
const res = await client.audio.transcriptions.create({
model: "whisper-large-v3-turbo",
file: createReadStream("/path/to/audio.wav"),
language: "en",
});
console.log(res.text);import { readFileSync } from "node:fs";
const fileBytes = readFileSync("/path/to/audio.wav");
const fd = new FormData();
fd.append("model", "whisper-large-v3-turbo");
fd.append("language", "en");
fd.append("file", new Blob([fileBytes], { type: "audio/wav" }), "audio.wav");
const res = await fetch("https://api.qubittron.ai/v1/audio/transcriptions", {
method: "POST",
headers: { Authorization: `Bearer ${process.env.QUBITTRON_API_KEY}` },
body: fd,
});
const json = (await res.json()) as { text: string; language?: string };
console.log(json.text);Response
Default (response_format omitted or "json"):
{
"text": "the quick brown fox jumps over the lazy dog",
"language": "en"
}response_format: "verbose_json" adds segments and timing.
Errors
| Status | Code | When |
|---|---|---|
| 400 | invalid_request | Missing fields, non-multipart Content-Type |
| 400 | model_not_found | Model unknown or doesn't support transcriptions |
| 401 | invalid_api_key | Missing/invalid Bearer token |
| 402 | insufficient_funds | Account credit exhausted |
| 413 | request_entity_too_large | File or body exceeds 25 MB |
| 429 | rate_limit_exceeded | Rate limit hit |
| 502 | upstream_error | Upstream returned 5xx or unparseable JSON |
Pricing
Metered per second of audio. Duration is read from the upstream usage.seconds (when present, e.g. verbose_json), otherwise parsed from the file's audio header.