Qubittron Bastion
TypeScript SDK

Best practices

Production-readiness checklist — secrets, retries, timeouts, observability, cost control.

A short, opinionated list. Treat it as a checklist, not a tutorial.

Secrets

  • Never embed BASTION_API_KEY in client-side bundles. Bundlers will inline it. Proxy through your server. See Environments → Browsers.
  • Rotate quarterly, and on any suspected leak. The SDK reads the env var on construction — restart the process to pick up a new key.
  • Use a key per environment (dev, staging, prod). Mixing them masks misuse and complicates revocation.
  • Don't log err.body raw when handling errors — upstreams sometimes echo request fragments that include user content. Log status, code, type, and a redacted view of body.

Timeouts

fetch has no default timeout. A hung connection on a hot path will pin a worker until the runtime kills it (minutes, sometimes never). Always wire one:

const client = new Bastion({
  apiKey: process.env.BASTION_API_KEY,
  fetch: (url, init) => {
    const ac = new AbortController();
    const timer = setTimeout(() => ac.abort(), 60_000);
    return globalThis.fetch(url, { ...init, signal: ac.signal })
      .finally(() => clearTimeout(timer));
  },
});

For streaming, the "timeout" you want is usually time-to-first-byte plus an inactivity timer between chunks — not a wall-clock budget — so consumers can complete long generations. Implement that in your iteration loop, not in fetch.

Retries

  • Retry: RateLimitError, UpstreamError, APIConnectionError.
  • Do not retry: BadRequestError, AuthenticationError, PermissionDeniedError, NotFoundError. These are deterministic.
  • Cap retries (3–5 typical). Exponential backoff with jitter.
  • For batch / offline jobs, lean on the rate limiter — don't hammer.

A reference implementation lives in Error handling → Retries.

Concurrency

  • Bastion enforces a per-key concurrent-request ceiling. Going over it surfaces as RateLimitError.
  • Use a semaphore on the client side if your traffic is bursty. A simple p-limit of 20–50 concurrent calls per worker is a reasonable starting point.

Observability

  • Tag each request with a stable id and propagate it via defaultHeaders (e.g. x-trace-id) so server-side logs join your traces.
  • Measure latency in your fetch wrapper (see Custom fetch → Tracing).
  • On error, capture { status, code, type, model, requestId } at minimum. Add an exemplar err.body sample at debug level.

Cost / token control

  • Set max_tokens for every chat completion. Without it, you're paying for whatever the model decides to say. A budget of 512–2048 covers most chat UIs.
  • Choose the cheapest model that meets quality: Llama-3.1-8B-Instruct or Mistral-7B-Instruct-v0.3 for simple tasks; reserve gpt-oss-120b and Meta-Llama-3_3-70B-Instruct for heavyweight reasoning.
  • Cache aggressively for deterministic prompts (temperature: 0). A simple keyed cache (prompt-hash → response) cuts steady-state costs by 30–60% on most apps.
  • Stream UIs improve perceived latency but cost the same — don't stream for batch jobs.

Model selection

  • Call models.list() at startup to verify the models you depend on exist for this key.
  • For multi-tenant apps, let your tenants choose from a curated allowlist — surfacing the raw list invites support load.
  • Pin specific model versions where available (Mistral-Small-3.2-24B-Instruct-2506); behavior of unversioned model ids may shift.

Testing

  • Inject a fake fetch (see Custom fetch → Mocking) rather than hitting the real API in unit tests.
  • For one or two end-to-end smoke tests, use a separate sandbox key with a low spend cap.
  • Pin streaming-test fixtures (raw SSE bytes) — Bastion's parser handles \r\n\r\n and \n\n event separators, multi-line data:, and [DONE]. Cover all three in fixtures.

Versioning

  • Pre-1.0 (v0.x), pin to a minor in package.json (e.g. "~0.1.0"). The patch range is safe; minors may rename types.
  • Read the changelog before bumping a minor. Migrations are usually one rename.

Multi-tenancy

  • One client per process is fine. Don't construct a new Bastion per request — there's no connection pool to warm, but you do allocate closures and headers needlessly.
  • For per-tenant keys, hold a Map<tenantId, Bastion> and evict on logout / key rotation.

On this page