Qubittron Bastion
Best practices

Streaming

How to consume Bastion SSE streams correctly — parsing, proxy buffering, reconnection.

The chat/completions and completions endpoints both support server-sent events (SSE) via "stream": true. Streaming improves perceived latency dramatically, but it adds failure modes that buffered requests do not have.

Wire format

Each event is a data: line followed by a blank line:

data: {"id":"...","choices":[{"delta":{"content":"Hel"}}]}

data: {"id":"...","choices":[{"delta":{"content":"lo"}}]}

data: [DONE]

The terminating data: [DONE] is not JSON — it's a sentinel. Parse the chunks before it as JSON; treat the [DONE] line as end-of-stream.

Use the SDK if you can

The OpenAI SDK handles parsing, async iteration, and cleanup correctly:

const stream = await client.chat.completions.create({
  model: "gpt-oss-120b",
  messages: [{ role: "user", content: "Hello" }],
  stream: true,
});

for await (const chunk of stream) {
  process.stdout.write(chunk.choices[0]?.delta?.content ?? "");
}

Raw fetch parsing is only worth it when you need to embed in a constrained runtime (workers, edge) that can't bring the SDK along. See Chat completions for the canonical SDK and fetch examples.

Raw fetch parser sketch

const res = await fetch("https://api.qubittron.ai/v1/chat/completions", {
  method: "POST",
  headers: {
    Authorization: `Bearer ${process.env.QUBITTRON_API_KEY}`,
    "Content-Type": "application/json",
  },
  body: JSON.stringify({
    model: "gpt-oss-120b",
    messages: [{ role: "user", content: "Hello" }],
    stream: true,
  }),
});

const reader = res.body!.getReader();
const decoder = new TextDecoder();
let buf = "";

while (true) {
  const { value, done } = await reader.read();
  if (done) break;
  buf += decoder.decode(value, { stream: true });

  let nl: number;
  while ((nl = buf.indexOf("\n\n")) !== -1) {
    const event = buf.slice(0, nl);
    buf = buf.slice(nl + 2);
    if (!event.startsWith("data: ")) continue;
    const payload = event.slice(6);
    if (payload === "[DONE]") return;
    const chunk = JSON.parse(payload);
    process.stdout.write(chunk.choices[0]?.delta?.content ?? "");
  }
}

Two things to get right:

  1. Buffer until you see \n\n. TCP can deliver any byte range; a chunk may split a JSON payload mid-string.
  2. Use a streaming TextDecoder ({ stream: true }) so multi-byte UTF-8 characters that straddle chunks decode correctly.

Don't buffer streams behind a reverse proxy

If you're proxying Bastion responses through your own server (Express, Hono, Next.js route handler, etc.), buffering kills the latency win. Make sure your proxy:

  • Passes through the upstream Content-Type: text/event-stream unchanged.
  • Sets Cache-Control: no-cache on the response.
  • Flushes after each chunk (most frameworks do this automatically when you pipe a ReadableStream; some need an explicit res.flush()).
  • Disables gzip / brotli compression on the streaming response (compression buffers).

Behind nginx, set proxy_buffering off; for the route. Behind Cloudflare, streaming works but caching must be off.

Reconnection

There is no Last-Event-ID semantics on Bastion streams — a dropped stream means you discard the partial output and retry the entire request. Treat your UI accordingly:

  • For chat: commit the assistant message only after the stream ends cleanly. On reconnect, replace the partial message rather than appending to it.
  • For agent loops: persist the request inputs, not the partial output. Idempotency belongs to your own state machine.

Cancellation

The OpenAI SDK exposes a controller you can abort(). With raw fetch, pass an AbortSignal:

const ctrl = new AbortController();
fetch(url, { signal: ctrl.signal, /* ... */ });
// later:
ctrl.abort();

Aborting closes the connection. The model stops generating on the upstream side once the connection drops.

On this page