Fumadocs on TanStack Start

How to consume token streams, accumulate deltas, handle reasoning tokens, and cancel cleanly.

Set stream: true and the SDK returns a Stream<ChatCompletionChunk> — an async iterable over typed chunks. The discriminated union on params.stream narrows the return type at compile time, so you don't need a runtime if (stream instanceof Stream) check.

Print-as-it-arrives (Node)

const stream = await client.chat.completions.create({
  model: "gpt-oss-120b",
  messages: [{ role: "user", content: "Write a haiku about Canada." }],
  stream: true,
});

for await (const chunk of stream) {
  process.stdout.write(chunk.choices[0]?.delta.content ?? "");
}
process.stdout.write("\n");

Accumulate into a final message

For UIs that show partial output but also need the full string:

let buffer = "";
for await (const chunk of stream) {
  const piece = chunk.choices[0]?.delta.content ?? "";
  buffer += piece;
  render(buffer);
}

finish_reason on the last chunk's choices[0] tells you why the stream ended: "stop", "length", or "content_filter". Earlier chunks have finish_reason: null.

Reasoning tokens

Some upstreams emit chain-of-thought alongside content. The field name differs:

for await (const chunk of stream) {
  const delta = chunk.choices[0]?.delta;
  if (!delta) continue;

  // reasoning channel (OVH GPT-OSS uses `reasoning`; vLLM uses `reasoning_content`)
  const reasoning = delta.reasoning ?? delta.reasoning_content;
  if (reasoning) onReasoning(reasoning);

  // visible content
  if (delta.content) onContent(delta.content);
}

Qwen3 on stock vLLM embeds reasoning inside <think>…</think> in delta.content. If you care, strip those tags before showing content to users:

const visible = (delta.content ?? "").replace(/<think>[\s\S]*?<\/think>/g, "");

Cancellation

Break out of the loop

The cleanest way — break releases the reader and closes the upstream connection:

for await (const chunk of stream) {
  if (userClickedStop()) break;
  render(chunk);
}

Abort via `AbortSignal`

Wire one through a custom fetch:

import { Bastion } from "@qubittron/bastion-sdk";

const ac = new AbortController();

const client = new Bastion({
  apiKey: process.env.BASTION_API_KEY,
  fetch: (url, init) => fetch(url, { ...init, signal: ac.signal }),
});

// later, from anywhere:
ac.abort();

When the signal fires mid-stream, the iterator throws — wrap the loop in try/catch:

try {
  for await (const chunk of stream) render(chunk);
} catch (err) {
  if (isAbortError(err)) return;
  throw err;
}

function isAbortError(err: unknown): boolean {
  return err instanceof Error && (err.name === "AbortError" || /aborted/i.test(err.message));
}

Error semantics

API-level errors (4xx, 5xx) throw at the initial create() call, before iteration begins. Mid-stream failures are network failures and throw APIConnectionError (or a runtime AbortError) from inside the loop. Bastion never injects error events into the SSE body — once you're iterating, the only outcomes are: chunks, then a clean end, or a thrown exception.

Server-to-browser passthrough

If you stream from your server to a browser client, the cleanest pattern is to re-emit chunks as you receive them. With the Fetch API:

// server (Hono / Express / etc.)
app.get("/stream", async (c) => {
  const stream = await client.chat.completions.create({
    model: "gpt-oss-120b",
    messages,
    stream: true,
  });

  return new Response(
    new ReadableStream({
      async start(controller) {
        try {
          for await (const chunk of stream) {
            controller.enqueue(
              new TextEncoder().encode(`data: ${JSON.stringify(chunk)}\n\n`),
            );
          }
          controller.enqueue(new TextEncoder().encode("data: [DONE]\n\n"));
        } finally {
          controller.close();
        }
      },
    }),
    { headers: { "Content-Type": "text/event-stream" } },
  );
});

This keeps your API key on the server and gives the browser a familiar SSE shape.

Streaming