Streaming
How to consume token streams, accumulate deltas, handle reasoning tokens, and cancel cleanly.
Set stream: true and the SDK returns a Stream<ChatCompletionChunk> — an async iterable over typed chunks. The discriminated union on params.stream narrows the return type at compile time, so you don't need a runtime if (stream instanceof Stream) check.
Print-as-it-arrives (Node)
const stream = await client.chat.completions.create({
model: "gpt-oss-120b",
messages: [{ role: "user", content: "Write a haiku about Canada." }],
stream: true,
});
for await (const chunk of stream) {
process.stdout.write(chunk.choices[0]?.delta.content ?? "");
}
process.stdout.write("\n");Accumulate into a final message
For UIs that show partial output but also need the full string:
let buffer = "";
for await (const chunk of stream) {
const piece = chunk.choices[0]?.delta.content ?? "";
buffer += piece;
render(buffer);
}finish_reason on the last chunk's choices[0] tells you why the stream ended: "stop", "length", or "content_filter". Earlier chunks have finish_reason: null.
Reasoning tokens
Some upstreams emit chain-of-thought alongside content. The field name differs:
for await (const chunk of stream) {
const delta = chunk.choices[0]?.delta;
if (!delta) continue;
// reasoning channel (OVH GPT-OSS uses `reasoning`; vLLM uses `reasoning_content`)
const reasoning = delta.reasoning ?? delta.reasoning_content;
if (reasoning) onReasoning(reasoning);
// visible content
if (delta.content) onContent(delta.content);
}Qwen3 on stock vLLM embeds reasoning inside <think>…</think> in delta.content. If you care, strip those tags before showing content to users:
const visible = (delta.content ?? "").replace(/<think>[\s\S]*?<\/think>/g, "");Cancellation
Break out of the loop
The cleanest way — break releases the reader and closes the upstream connection:
for await (const chunk of stream) {
if (userClickedStop()) break;
render(chunk);
}Abort via AbortSignal
Wire one through a custom fetch:
import { Bastion } from "@qubittron/bastion-sdk";
const ac = new AbortController();
const client = new Bastion({
apiKey: process.env.BASTION_API_KEY,
fetch: (url, init) => fetch(url, { ...init, signal: ac.signal }),
});
// later, from anywhere:
ac.abort();When the signal fires mid-stream, the iterator throws — wrap the loop in try/catch:
try {
for await (const chunk of stream) render(chunk);
} catch (err) {
if (isAbortError(err)) return;
throw err;
}
function isAbortError(err: unknown): boolean {
return err instanceof Error && (err.name === "AbortError" || /aborted/i.test(err.message));
}Error semantics
API-level errors (4xx, 5xx) throw at the initial create() call, before iteration begins. Mid-stream failures are network failures and throw APIConnectionError (or a runtime AbortError) from inside the loop. Bastion never injects error events into the SSE body — once you're iterating, the only outcomes are: chunks, then a clean end, or a thrown exception.
Server-to-browser passthrough
If you stream from your server to a browser client, the cleanest pattern is to re-emit chunks as you receive them. With the Fetch API:
// server (Hono / Express / etc.)
app.get("/stream", async (c) => {
const stream = await client.chat.completions.create({
model: "gpt-oss-120b",
messages,
stream: true,
});
return new Response(
new ReadableStream({
async start(controller) {
try {
for await (const chunk of stream) {
controller.enqueue(
new TextEncoder().encode(`data: ${JSON.stringify(chunk)}\n\n`),
);
}
controller.enqueue(new TextEncoder().encode("data: [DONE]\n\n"));
} finally {
controller.close();
}
},
}),
{ headers: { "Content-Type": "text/event-stream" } },
);
});This keeps your API key on the server and gives the browser a familiar SSE shape.