chat.completions
Create chat completions, with optional server-sent-events streaming.
// Discriminated overloads:
client.chat.completions.create(
params: ChatCompletionCreateParamsNonStreaming,
options?: { signal?: AbortSignal },
): Promise<ChatCompletion>;
client.chat.completions.create(
params: ChatCompletionCreateParamsStreaming,
options?: { signal?: AbortSignal },
): Promise<Stream<ChatCompletionChunk>>;The primary LLM entry point. Mirrors OpenAI's chat.completions.create shape — most code written against the OpenAI SDK ports by changing the import and the baseURL.
The return type is discriminated by params.stream: when stream: true, you get a Stream<ChatCompletionChunk>; otherwise a ChatCompletion. Pass an AbortSignal via the second argument to cancel — see Streaming → cancellation.
Parameters
ChatCompletionCreateParams is the union of the streaming and non-streaming variants:
| Field | Type | Required | Notes |
|---|---|---|---|
model | string | yes | Any LLM model id — see models.list. |
messages | ChatMessage[] | yes | OpenAI message format. Re-exported from @qubi_bastion/shared. |
temperature | number | no | |
max_tokens | number | no | |
top_p | number | no | |
stream | boolean | no | When true, returns a Stream<ChatCompletionChunk>. |
Other OpenAI-compatible fields (tools, tool_choice, response_format, seed, …) pass through unchanged to the upstream — the gateway does not strip unknown fields. The SDK's static types only cover the listed fields; the rest are accepted at the HTTP layer.
Non-streaming
const res = await client.chat.completions.create({
model: "gpt-oss-120b",
messages: [{ role: "user", content: "Reply with exactly: ok" }],
max_tokens: 16,
});
console.log(res.choices[0]?.message.content);
console.log(res.usage.total_tokens);ChatCompletion
interface ChatCompletion {
id: string;
object: "chat.completion";
created: number; // unix seconds
model: string;
choices: ChatChoice[];
usage: TokenUsage; // { prompt_tokens, completion_tokens, total_tokens }
}ChatChoice, ChatMessage, and TokenUsage are re-exported from @qubittron/bastion-sdk (originally defined in @qubi_bastion/shared); import them from the SDK.
Streaming
const stream = await client.chat.completions.create({
model: "gpt-oss-120b",
messages: [{ role: "user", content: "Write a haiku about Canada." }],
stream: true,
});
for await (const chunk of stream) {
process.stdout.write(chunk.choices[0]?.delta.content ?? "");
}The async iterator yields one chunk per SSE data: event. The terminating data: [DONE] is consumed internally — your loop ends naturally on the final next(). See Stream for cancellation.
ChatCompletionChunk
interface ChatCompletionChunk {
id: string;
object: "chat.completion.chunk";
created: number;
model: string;
choices: ChatChoiceDelta[];
usage?: TokenUsage; // provider-dependent; some emit it only on the final chunk
}
interface ChatChoiceDelta {
index: number;
delta: {
role?: ChatMessage["role"];
content?: string;
reasoning?: string; // OVH GPT-OSS, OpenAI Responses convention
reasoning_content?: string; // vLLM --enable-reasoning, DeepSeek-R1
};
finish_reason: "stop" | "length" | "content_filter" | null;
}Reasoning tokens
Bastion forwards reasoning / chain-of-thought tokens from upstreams that emit them. The field name varies by provider:
reasoning— OVH GPT-OSS (matches OpenAI Responses API).reasoning_content— vLLM with--enable-reasoning, DeepSeek-R1.- Inline
<think>…</think>insidecontent— Qwen3 on stock vLLM.
Consumers should treat delta.reasoning ?? delta.reasoning_content as the reasoning channel, and strip <think>…</think> from content when present.
Cancellation
Pass an AbortSignal as the second argument:
const ac = new AbortController();
setTimeout(() => ac.abort(), 30_000);
const res = await client.chat.completions.create(
{ model: "gpt-oss-120b", messages },
{ signal: ac.signal },
);On a streamed call, breaking out of the for await loop releases the reader and the upstream connection closes shortly after. See Streaming → cancellation for the abort-vs-break tradeoff.
Errors
This call throws subclasses of BastionError. Common ones:
| When | Class |
|---|---|
Missing model or malformed messages | BadRequestError |
| Invalid key | AuthenticationError |
| Quota exhausted | PermissionDeniedError |
| Rate-limited | RateLimitError |
| Upstream provider down | UpstreamError |
| Network failure | APIConnectionError |
See Errors for the full mapping.