Public Docs
OpenAPI Source of Truth
MCP Streamable HTTP
CLI for Consumers

TSFM.ai developer documentation.

Multiple pages, one contract. API, MCP, and CLI are aligned on the same schema so teams can move from manual calls to production automation with zero drift.

Production

Launch readiness checklist

Production means control plane, data plane, and billing are all equally hardened. Use this page as a final gate before opening self-serve traffic.

Minimum launch gates

  1. 1All core endpoints pass `pnpm api:smoke` in CI and staging.
  2. 2OpenAPI schema validates and is published with release artifacts.
  3. 3Postgres backup and restore drills have been executed successfully.
  4. 4Stripe usage meter events reconcile against internal usage ledger.
  5. 5Runbook exists for inference gateway outage and key revocation incidents.

Data Plane

  • Configure `INFERENCE_GATEWAY_BASE_URL` to your OpenAI-compatible GPU gateway.
  • Set `INFERENCE_GATEWAY_API_KEY` from secret manager; never hardcode credentials.
  • Tune `INFERENCE_TIMEOUT_MS` per provider latency profile.
  • Define model fallback policy (`timesfm -> chronos -> ttm`) for partial outages.

Control Plane

  • Use PostgreSQL (`STORE_BACKEND=postgres`) with managed backups and PITR.
  • Enable SSL and strict role separation for app user vs migration/admin users.
  • Enforce API key rotation cadence and immediate revoke workflow.
  • Back account, usage, and billing APIs with authenticated session middleware only.

Billing + Stripe

  • Map usage events (`input_tokens`, `output_tokens`, `cost_usd`) to Stripe meters.
  • Verify product + price ids by environment (`test` vs `live`).
  • Handle subscription state drift with nightly reconciliation jobs.
  • Expose billing profile status in `/api/account/billing` and account UI.

Observability

  • Track request counts, p50/p95 latency, and non-2xx rates by endpoint + model.
  • Log request ids and propagate `x-request-id` from upstream providers.
  • Alert on error spikes, timeout growth, and budget burn anomalies.
  • Publish status heartbeat from `/api/system/status` per region.

MCP + CLI Distribution

  • Run MCP separately from API HTTP server to decouple scaling and isolation.
  • Keep MCP transport stateless (`sessionIdGenerator: undefined`) for horizontal scaling.
  • Version CLI releases alongside schema changes and publish migration notes.
  • Gate releases with `pnpm api:schema:check`, `pnpm api:smoke`, and CLI integration checks.

Day-2 operations

SLO monitoring

Track p95 latency and error rates per model/provider, not just aggregate totals.

Capacity planning

Forecast token throughput and horizon distribution so GPU routing policies stay ahead of demand.

Contract management

Treat schema changes as versioned releases and announce operation-level change notes.

Cost governance

Review cost per 1M tokens by model weekly and adjust default routing for margin protection.

Incident response priorities

  • Credential compromise: revoke affected keys, rotate gateway secrets, and force re-issue.
  • Provider outage: switch routing policy to fallback model family and reduce max horizon if needed.
  • Billing drift: pause metered invoicing and reconcile usage ledger before finalizing invoices.
  • Schema breakage: roll back schema publish and pin CLI/MCP consumers to previous contract snapshot.