Best practices
Production checklist
Walk through this list before routing real traffic to the Bastion API.
A short, opinionated pre-launch checklist. Most teams already do most of this — the value is in catching the one item you forgot.
Credentials
- API key lives in a secrets manager, not in source control or in client code.
- Separate keys per environment (
dev,staging,prod) and per service. - Key rotation runbook documented and tested at least once.
- Logs, error reports, and metrics tags are redacted to never include the raw key.
Reliability
- Retry wrapper in place for
429and5xxwith exponential backoff and jitter. -
429responses honor theRetry-Afterheader. - No retry-on-
4xx(other than429). - Streaming requests retry the entire request on drop — no mid-stream resume.
- Downstream side effects (billing, emails, DB writes) are guarded against duplicate execution if the same Bastion call is retried.
Observability
- Latency (p50, p95, p99), error rate, and 4xx-vs-5xx breakdown are dashboards you can open in under 30 seconds.
- Alerts for: sustained 5xx rate above your SLO, sustained 429 rate,
402 insufficient_funds, key revocation. - Sampled request/response logging (sampled, redacted, retained per your policy — never log full PII-bearing prompts).
Cost and capacity
- Per-environment monthly spend cap configured in the dashboard (so a runaway test doesn't drain prod budget).
- Model selection documented per use-case (don't default everything to the largest model).
- Auto-recharge or low-balance alerts wired up to billing —
402should never be the way you learn you ran out of credit. - Token budgets per call (
max_tokens) set to the smallest reasonable value for each surface.
Streaming-specific
- Reverse proxies and CDNs do not buffer SSE responses (
proxy_buffering off, no caching, no compression on stream routes). - Client UI commits partial assistant output only after the stream ends cleanly.
- AbortController / cancellation tested end-to-end (user clicks "stop" → upstream connection closes promptly).
Data and privacy
-
responsesendpoint usage always setsstore: false(required by the API). - Prompts containing personally identifiable information are redacted or scoped per your data agreement before send.
- Audit logging for which user prompted which call (your own user ID → Bastion call), separate from request bodies.
- Data-residency requirements (PIPEDA, provincial, sector-specific) are documented and the Canadian-inference guarantee is referenced in your DPA where relevant.
Failure drills
- You've tested the path where Bastion returns 5xx for sustained periods (chaos test or a fault-injection switch).
- You've tested the path where the API key is revoked mid-traffic (401 cascade).
- You've tested the path where the account hits a hard credit limit (402 cascade — your UI should degrade gracefully, not 500).
When every box is checked, you're ready to ship. If any box is unchecked, write down why before launching anyway.