Chat completions
POST /v1/chat/completions — OpenAI-compatible chat with optional streaming.
POST https://api.qubittron.ai/v1/chat/completionsThe primary entry point for conversational LLMs. OpenAI-compatible — drop in your existing SDK code with baseURL swapped.
Authentication
Authorization: Bearer qbt_<key>
Request body
| Field | Type | Required | Notes |
|---|---|---|---|
model | string | yes | Any LLM model id (see below) |
messages | Message[] | yes | OpenAI message format |
stream | boolean | no | When true, returns SSE stream |
max_tokens, temperature, top_p, tools, tool_choice, response_format, etc. | various | no | Passed through to upstream |
Unknown fields pass through unchanged — Bastion does not strip OpenAI fields it doesn't recognize.
Supported models
| Model | Category |
|---|---|
gpt-oss-120b | LLM |
gpt-oss-20b | LLM |
Llama-3.1-8B-Instruct | LLM |
Meta-Llama-3_3-70B-Instruct | LLM |
Qwen3-32B | LLM |
Mistral-7B-Instruct-v0.3 | LLM |
Mistral-Small-3.2-24B-Instruct-2506 | LLM |
Mistral-Nemo-Instruct-2407 | LLM |
Qwen3-Coder-30B-A3B-Instruct | Code |
Qwen2.5-VL-72B-Instruct | Vision |
Qwen3Guard-Gen-8B | Safety |
Qwen3Guard-Gen-0.6B | Safety |
Tool/structured-output support is not enforced by Bastion — the request passes through unchanged. Models that don't support tools or response_format upstream may return a 502.
Use GET /v1/models for the live list on your account.
Examples
curl https://api.qubittron.ai/v1/chat/completions \
-H "Authorization: Bearer $QUBITTRON_API_KEY" \
-H "Content-Type: application/json" \
-d '{
"model": "gpt-oss-120b",
"messages": [{ "role": "user", "content": "Reply with exactly: ok" }],
"max_tokens": 16
}'Non-streaming:
import OpenAI from "openai";
const client = new OpenAI({
baseURL: "https://api.qubittron.ai/v1",
apiKey: process.env.QUBITTRON_API_KEY,
});
const res = await client.chat.completions.create({
model: "gpt-oss-120b",
messages: [{ role: "user", content: "Reply with exactly: ok" }],
max_tokens: 16,
});
console.log(res.choices[0]?.message.content);Streaming:
const stream = await client.chat.completions.create({
model: "gpt-oss-120b",
messages: [{ role: "user", content: "Count from 1 to 5." }],
stream: true,
});
for await (const chunk of stream) {
process.stdout.write(chunk.choices[0]?.delta?.content ?? "");
}const res = await fetch("https://api.qubittron.ai/v1/chat/completions", {
method: "POST",
headers: {
Authorization: `Bearer ${process.env.QUBITTRON_API_KEY}`,
"Content-Type": "application/json",
},
body: JSON.stringify({
model: "gpt-oss-120b",
messages: [{ role: "user", content: "Reply with exactly: ok" }],
max_tokens: 16,
}),
});
const json = (await res.json()) as {
choices: { message: { role: string; content: string } }[];
usage: { prompt_tokens: number; completion_tokens: number; total_tokens: number };
};
console.log(json.choices[0]?.message.content);Response
Non-streaming:
{
"id": "chatcmpl-...",
"object": "chat.completion",
"created": 1735689600,
"model": "gpt-oss-120b",
"choices": [
{
"index": 0,
"message": { "role": "assistant", "content": "ok" },
"finish_reason": "stop"
}
],
"usage": {
"prompt_tokens": 12,
"completion_tokens": 1,
"total_tokens": 13
}
}Streaming (stream: true) returns text/event-stream with OpenAI-format data: {…} chunks ending with data: [DONE].
Errors
| Status | Code | When |
|---|---|---|
| 400 | invalid_request | Body failed validation (messages missing, etc.) |
| 400 | model_not_found | Model unknown or doesn't support chat |
| 401 | invalid_api_key | Missing/invalid Bearer token |
| 402 | insufficient_funds | Account credit exhausted |
| 429 | rate_limit_exceeded | Rate limit hit |
| 502 | upstream_error | Upstream model unreachable or 5xx |
Pricing
Metered per token (input + output). Per-model rates are listed on your dashboard.