Qubittron Bastion
TypeScript SDKAPI reference

chat.completions

Create chat completions, with optional server-sent-events streaming.

// Discriminated overloads:
client.chat.completions.create(
  params: ChatCompletionCreateParamsNonStreaming,
  options?: { signal?: AbortSignal },
): Promise<ChatCompletion>;

client.chat.completions.create(
  params: ChatCompletionCreateParamsStreaming,
  options?: { signal?: AbortSignal },
): Promise<Stream<ChatCompletionChunk>>;

The primary LLM entry point. Mirrors OpenAI's chat.completions.create shape — most code written against the OpenAI SDK ports by changing the import and the baseURL.

The return type is discriminated by params.stream: when stream: true, you get a Stream<ChatCompletionChunk>; otherwise a ChatCompletion. Pass an AbortSignal via the second argument to cancel — see Streaming → cancellation.

Parameters

ChatCompletionCreateParams is the union of the streaming and non-streaming variants:

FieldTypeRequiredNotes
modelstringyesAny LLM model id — see models.list.
messagesChatMessage[]yesOpenAI message format. Re-exported from @qubi_bastion/shared.
temperaturenumberno
max_tokensnumberno
top_pnumberno
streambooleannoWhen true, returns a Stream<ChatCompletionChunk>.

Other OpenAI-compatible fields (tools, tool_choice, response_format, seed, …) pass through unchanged to the upstream — the gateway does not strip unknown fields. The SDK's static types only cover the listed fields; the rest are accepted at the HTTP layer.

Non-streaming

const res = await client.chat.completions.create({
  model: "gpt-oss-120b",
  messages: [{ role: "user", content: "Reply with exactly: ok" }],
  max_tokens: 16,
});

console.log(res.choices[0]?.message.content);
console.log(res.usage.total_tokens);

ChatCompletion

interface ChatCompletion {
  id: string;
  object: "chat.completion";
  created: number;          // unix seconds
  model: string;
  choices: ChatChoice[];
  usage: TokenUsage;        // { prompt_tokens, completion_tokens, total_tokens }
}

ChatChoice, ChatMessage, and TokenUsage are re-exported from @qubittron/bastion-sdk (originally defined in @qubi_bastion/shared); import them from the SDK.

Streaming

const stream = await client.chat.completions.create({
  model: "gpt-oss-120b",
  messages: [{ role: "user", content: "Write a haiku about Canada." }],
  stream: true,
});

for await (const chunk of stream) {
  process.stdout.write(chunk.choices[0]?.delta.content ?? "");
}

The async iterator yields one chunk per SSE data: event. The terminating data: [DONE] is consumed internally — your loop ends naturally on the final next(). See Stream for cancellation.

ChatCompletionChunk

interface ChatCompletionChunk {
  id: string;
  object: "chat.completion.chunk";
  created: number;
  model: string;
  choices: ChatChoiceDelta[];
  usage?: TokenUsage;       // provider-dependent; some emit it only on the final chunk
}

interface ChatChoiceDelta {
  index: number;
  delta: {
    role?: ChatMessage["role"];
    content?: string;
    reasoning?: string;            // OVH GPT-OSS, OpenAI Responses convention
    reasoning_content?: string;    // vLLM --enable-reasoning, DeepSeek-R1
  };
  finish_reason: "stop" | "length" | "content_filter" | null;
}

Reasoning tokens

Bastion forwards reasoning / chain-of-thought tokens from upstreams that emit them. The field name varies by provider:

  • reasoning — OVH GPT-OSS (matches OpenAI Responses API).
  • reasoning_content — vLLM with --enable-reasoning, DeepSeek-R1.
  • Inline <think>…</think> inside content — Qwen3 on stock vLLM.

Consumers should treat delta.reasoning ?? delta.reasoning_content as the reasoning channel, and strip <think>…</think> from content when present.

Cancellation

Pass an AbortSignal as the second argument:

const ac = new AbortController();
setTimeout(() => ac.abort(), 30_000);

const res = await client.chat.completions.create(
  { model: "gpt-oss-120b", messages },
  { signal: ac.signal },
);

On a streamed call, breaking out of the for await loop releases the reader and the upstream connection closes shortly after. See Streaming → cancellation for the abort-vs-break tradeoff.

Errors

This call throws subclasses of BastionError. Common ones:

WhenClass
Missing model or malformed messagesBadRequestError
Invalid keyAuthenticationError
Quota exhaustedPermissionDeniedError
Rate-limitedRateLimitError
Upstream provider downUpstreamError
Network failureAPIConnectionError

See Errors for the full mapping.

On this page