Streaming
How to consume Bastion SSE streams correctly — parsing, proxy buffering, reconnection.
The chat/completions and completions endpoints both support server-sent events (SSE) via "stream": true. Streaming improves perceived latency dramatically, but it adds failure modes that buffered requests do not have.
Wire format
Each event is a data: line followed by a blank line:
data: {"id":"...","choices":[{"delta":{"content":"Hel"}}]}
data: {"id":"...","choices":[{"delta":{"content":"lo"}}]}
data: [DONE]The terminating data: [DONE] is not JSON — it's a sentinel. Parse the chunks before it as JSON; treat the [DONE] line as end-of-stream.
Use the SDK if you can
The OpenAI SDK handles parsing, async iteration, and cleanup correctly:
const stream = await client.chat.completions.create({
model: "gpt-oss-120b",
messages: [{ role: "user", content: "Hello" }],
stream: true,
});
for await (const chunk of stream) {
process.stdout.write(chunk.choices[0]?.delta?.content ?? "");
}Raw fetch parsing is only worth it when you need to embed in a constrained runtime (workers, edge) that can't bring the SDK along. See Chat completions for the canonical SDK and fetch examples.
Raw fetch parser sketch
const res = await fetch("https://api.qubittron.ai/v1/chat/completions", {
method: "POST",
headers: {
Authorization: `Bearer ${process.env.QUBITTRON_API_KEY}`,
"Content-Type": "application/json",
},
body: JSON.stringify({
model: "gpt-oss-120b",
messages: [{ role: "user", content: "Hello" }],
stream: true,
}),
});
const reader = res.body!.getReader();
const decoder = new TextDecoder();
let buf = "";
while (true) {
const { value, done } = await reader.read();
if (done) break;
buf += decoder.decode(value, { stream: true });
let nl: number;
while ((nl = buf.indexOf("\n\n")) !== -1) {
const event = buf.slice(0, nl);
buf = buf.slice(nl + 2);
if (!event.startsWith("data: ")) continue;
const payload = event.slice(6);
if (payload === "[DONE]") return;
const chunk = JSON.parse(payload);
process.stdout.write(chunk.choices[0]?.delta?.content ?? "");
}
}Two things to get right:
- Buffer until you see
\n\n. TCP can deliver any byte range; a chunk may split a JSON payload mid-string. - Use a streaming
TextDecoder({ stream: true }) so multi-byte UTF-8 characters that straddle chunks decode correctly.
Don't buffer streams behind a reverse proxy
If you're proxying Bastion responses through your own server (Express, Hono, Next.js route handler, etc.), buffering kills the latency win. Make sure your proxy:
- Passes through the upstream
Content-Type: text/event-streamunchanged. - Sets
Cache-Control: no-cacheon the response. - Flushes after each chunk (most frameworks do this automatically when you pipe a
ReadableStream; some need an explicitres.flush()). - Disables gzip / brotli compression on the streaming response (compression buffers).
Behind nginx, set proxy_buffering off; for the route. Behind Cloudflare, streaming works but caching must be off.
Reconnection
There is no Last-Event-ID semantics on Bastion streams — a dropped stream means you discard the partial output and retry the entire request. Treat your UI accordingly:
- For chat: commit the assistant message only after the stream ends cleanly. On reconnect, replace the partial message rather than appending to it.
- For agent loops: persist the request inputs, not the partial output. Idempotency belongs to your own state machine.
Cancellation
The OpenAI SDK exposes a controller you can abort(). With raw fetch, pass an AbortSignal:
const ctrl = new AbortController();
fetch(url, { signal: ctrl.signal, /* ... */ });
// later:
ctrl.abort();Aborting closes the connection. The model stops generating on the upstream side once the connection drops.