Fumadocs on TanStack Start

// Discriminated overloads:
client.chat.completions.create(
  params: ChatCompletionCreateParamsNonStreaming,
  options?: { signal?: AbortSignal },
): Promise<ChatCompletion>;

client.chat.completions.create(
  params: ChatCompletionCreateParamsStreaming,
  options?: { signal?: AbortSignal },
): Promise<Stream<ChatCompletionChunk>>;

The primary LLM entry point. Mirrors OpenAI's chat.completions.create shape — most code written against the OpenAI SDK ports by changing the import and the baseURL.

The return type is discriminated by params.stream: when stream: true, you get a Stream<ChatCompletionChunk>; otherwise a ChatCompletion. Pass an AbortSignal via the second argument to cancel — see Streaming → cancellation.

Parameters

ChatCompletionCreateParams is the union of the streaming and non-streaming variants:

Field	Type	Required	Notes
`model`	`string`	yes	Any LLM model id — see `models.list`.
`messages`	`ChatMessage[]`	yes	OpenAI message format. Re-exported from `@qubi_bastion/shared`.
`temperature`	`number`	no
`max_tokens`	`number`	no
`top_p`	`number`	no
`stream`	`boolean`	no	When `true`, returns a `Stream<ChatCompletionChunk>`.

Other OpenAI-compatible fields (tools, tool_choice, response_format, seed, …) pass through unchanged to the upstream — the gateway does not strip unknown fields. The SDK's static types only cover the listed fields; the rest are accepted at the HTTP layer.

Non-streaming

const res = await client.chat.completions.create({
  model: "gpt-oss-120b",
  messages: [{ role: "user", content: "Reply with exactly: ok" }],
  max_tokens: 16,
});

console.log(res.choices[0]?.message.content);
console.log(res.usage.total_tokens);

`ChatCompletion`

interface ChatCompletion {
  id: string;
  object: "chat.completion";
  created: number;          // unix seconds
  model: string;
  choices: ChatChoice[];
  usage: TokenUsage;        // { prompt_tokens, completion_tokens, total_tokens }
}

ChatChoice, ChatMessage, and TokenUsage are re-exported from @qubittron/bastion-sdk (originally defined in @qubi_bastion/shared); import them from the SDK.

Streaming

const stream = await client.chat.completions.create({
  model: "gpt-oss-120b",
  messages: [{ role: "user", content: "Write a haiku about Canada." }],
  stream: true,
});

for await (const chunk of stream) {
  process.stdout.write(chunk.choices[0]?.delta.content ?? "");
}

The async iterator yields one chunk per SSE data: event. The terminating data: [DONE] is consumed internally — your loop ends naturally on the final next(). See Stream for cancellation.

`ChatCompletionChunk`

interface ChatCompletionChunk {
  id: string;
  object: "chat.completion.chunk";
  created: number;
  model: string;
  choices: ChatChoiceDelta[];
  usage?: TokenUsage;       // provider-dependent; some emit it only on the final chunk
}

interface ChatChoiceDelta {
  index: number;
  delta: {
    role?: ChatMessage["role"];
    content?: string;
    reasoning?: string;            // OVH GPT-OSS, OpenAI Responses convention
    reasoning_content?: string;    // vLLM --enable-reasoning, DeepSeek-R1
  };
  finish_reason: "stop" | "length" | "content_filter" | null;
}

Reasoning tokens

Bastion forwards reasoning / chain-of-thought tokens from upstreams that emit them. The field name varies by provider:

reasoning — OVH GPT-OSS (matches OpenAI Responses API).
reasoning_content — vLLM with --enable-reasoning, DeepSeek-R1.
Inline <think>…</think> inside content — Qwen3 on stock vLLM.

Consumers should treat delta.reasoning ?? delta.reasoning_content as the reasoning channel, and strip <think>…</think> from content when present.

Cancellation

Pass an AbortSignal as the second argument:

const ac = new AbortController();
setTimeout(() => ac.abort(), 30_000);

const res = await client.chat.completions.create(
  { model: "gpt-oss-120b", messages },
  { signal: ac.signal },
);

On a streamed call, breaking out of the for await loop releases the reader and the upstream connection closes shortly after. See Streaming → cancellation for the abort-vs-break tradeoff.

Errors

This call throws subclasses of BastionError. Common ones:

When	Class
Missing `model` or malformed `messages`	`BadRequestError`
Invalid key	`AuthenticationError`
Quota exhausted	`PermissionDeniedError`
Rate-limited	`RateLimitError`
Upstream provider down	`UpstreamError`
Network failure	`APIConnectionError`

See Errors for the full mapping.

chat.completions

On this page