API reference
Responses
POST /v1/responses — OpenAI Responses API. Stateless mode only (store=false).
POST https://api.qubittron.ai/v1/responsesOpenAI's Responses API in stateless mode. Pass store: false — Bastion's upstream does not retain server-side state.
Authentication
Authorization: Bearer qbt_<key>
Request body
| Field | Type | Required | Notes |
|---|---|---|---|
model | string | yes | One of the 9 supported models below |
input | string | InputItem[] | yes | Plain text or structured input array |
stream | boolean | no | When true, returns SSE event stream |
store | boolean | no | The Bastion server does not enforce this, but the upstream rejects stateful mode — send store: false or expect a 502 |
max_output_tokens, temperature, tools, etc. | various | no | Passthrough |
Supported models
Nine models support /v1/responses:
gpt-oss-120bgpt-oss-20bLlama-3.1-8B-InstructMeta-Llama-3_3-70B-InstructQwen3-32BMistral-Small-3.2-24B-Instruct-2506Mistral-Nemo-Instruct-2407Qwen3-Coder-30B-A3B-InstructQwen2.5-VL-72B-Instruct
Other LLMs (e.g. Mistral-7B-Instruct-v0.3) return model_not_found on this endpoint — use /v1/chat/completions instead.
Examples
curl https://api.qubittron.ai/v1/responses \
-H "Authorization: Bearer $QUBITTRON_API_KEY" \
-H "Content-Type: application/json" \
-d '{
"model": "gpt-oss-120b",
"input": "Reply with: ok",
"max_output_tokens": 32,
"store": false
}'import OpenAI from "openai";
const client = new OpenAI({
baseURL: "https://api.qubittron.ai/v1",
apiKey: process.env.QUBITTRON_API_KEY,
});
const res = await client.responses.create({
model: "gpt-oss-120b",
input: "Reply with: ok",
max_output_tokens: 32,
store: false,
});
console.log(res.output_text);const res = await fetch("https://api.qubittron.ai/v1/responses", {
method: "POST",
headers: {
Authorization: `Bearer ${process.env.QUBITTRON_API_KEY}`,
"Content-Type": "application/json",
},
body: JSON.stringify({
model: "gpt-oss-120b",
input: "Reply with: ok",
max_output_tokens: 32,
store: false,
}),
});
const json = await res.json();
console.log(json);Response
OpenAI Responses API format — see OpenAI's Responses reference for the full shape. The response contains output[] items, usage (input_tokens, output_tokens), and model.
Streaming (stream: true) returns the standard Responses event stream (response.created, response.output_item.added, response.completed, etc.).
Errors
| Status | Code | When |
|---|---|---|
| 400 | invalid_request | Body failed validation |
| 400 | model_not_found | Model unknown or doesn't support responses |
| 401 | invalid_api_key | Missing/invalid Bearer token |
| 402 | insufficient_funds | Account credit exhausted |
| 429 | rate_limit_exceeded | Rate limit hit |
| 502 | upstream_error | Upstream returned 5xx, often when store: true is sent |
Pricing
Metered per token (input + output).