Models

Models and discovery

Discover models, inspect capabilities, and decide when to route by latency, cost, context length, or task support.

Catalog endpoints

The fastest discovery loop is usually: list models, shortlist by capability or family, inspect one or two details, then test the top candidates on real data. The hosted catalog UI at `/models` is a friendly wrapper around these same catalog surfaces.

Discovery endpoints

FieldTypeRequiredDescription
GET /api/modelslistModelsNoBrowse the hosted catalog with filters for provider, capability, status, or free-text search.
GET /api/models/{modelId}getModelByIdNoInspect one model in detail, including capabilities, latency signals, hosting metadata, and fit profile guidance.
GET /v1/modelslistOpenAiModelsNoOpenAI-compatible model list for clients that expect the /v1/models surface.
GET /v1/models/{modelId}getOpenAiModelByIdNoOpenAI-compatible detail endpoint for one hosted model id.

Query patterns

List first, then inspect detail

The list endpoint is for narrowing the field. The detail endpoint is where you confirm operational metadata such as capabilities, latency, pricing, and fit profile before you promote a model into routing policy.

List endpoint filters

FieldTypeRequiredDescription
qstringNoFree-text search over model ids and descriptive metadata. Best starting point when you know the family name, such as chronos or moirai.
capabilitystringNoFilter for hosted capabilities such as covariates, quantile-forecasting, or long-context.
statusstringNoOperational filter for states such as online so your selection flow ignores inactive entries.
providerstringNoNarrow to one provider namespace when you are comparing variants within a family or region.
list-models.sh
curl -s "https://api.tsfm.ai/api/models?q=chronos" | jq '.models | map({
  id,
  supported_tasks,
  avg_latency_ms
})'
list-models-filtered.sh
curl -s "https://api.tsfm.ai/api/models?capability=covariates&status=online&q=chronos" \
  | jq '.models[] | {
      id,
      supported_tasks,
      capabilities,
      avg_latency_ms,
      context_length
    }'
models.response.json
{
  "models": [
    {
      "id": "amazon/chronos-bolt-base",
      "provider": "tsfm (us)",
      "supported_tasks": ["forecast"],
      "capabilities": ["forecasting", "quantile-forecasting", "multivariate", "covariates"],
      "avg_latency_ms": 240,
      "context_length": 8192,
      "availability_pct": 99.95
    },
    {
      "id": "amazon/chronos-bolt-small",
      "provider": "tsfm (us)",
      "supported_tasks": ["forecast"],
      "capabilities": ["forecasting", "quantile-forecasting", "high-throughput"],
      "avg_latency_ms": 120,
      "context_length": 4096,
      "availability_pct": 99.97
    }
  ]
}
inspect-model.sh
curl -s https://api.tsfm.ai/api/models/amazon%2Fchronos-bolt-base \
  | jq '.model | {
      id,
      provider,
      supported_tasks,
      capabilities,
      avg_latency_ms,
      availability_pct,
      context_length,
      input_cost_per_1m,
      output_cost_per_1m
    }'
model.response.json
{
  "model": {
    "id": "amazon/chronos-bolt-base",
    "provider": "tsfm (us)",
    "status": "online",
    "supported_tasks": ["forecast"],
    "capabilities": ["forecasting", "quantile-forecasting", "multivariate", "covariates"],
    "context_length": 8192,
    "avg_latency_ms": 240,
    "availability_pct": 99.95,
    "input_cost_per_1m": 0.5,
    "output_cost_per_1m": 1.5,
    "fit_profile": {
      "recommended_use_cases": ["long-context demand planning", "covariate-aware retail forecasting"],
      "limitations": ["higher latency than bolt-class models"]
    }
  }
}

Selection signals

What to read on a model record

You do not need every metadata field for every decision. Focus on task support, context length, latency, cost, and whether the model family has the capabilities your data needs, such as covariates, quantiles, or long-context forecasting.

High-signal fields

FieldTypeRequiredDescription
supported_tasksstring[]NoThe first filter to apply. It tells you whether the model supports forecast, classify, impute, or anomaly flows.
context_lengthintegerNoHow much history the model can consider. This matters for long seasonal cycles, dense telemetry streams, and covariate-heavy payloads.
avg_latency_msnumberNoA routing signal, not a hard SLA. Compare it with your p95 target and expected concurrency before you pick a default.
input_cost_per_1m / output_cost_per_1mnumberNoPublic catalog pricing is unified across models, so this is mainly useful for displaying the current published rate and verifying that every surface agrees on it.
capabilitiesstring[]NoHighlights strengths such as covariates, long-context forecasting, quantiles, or multi-task support.
status / availability_pctstring / numberNoOperational hints for production routing and incident response planning.

Routing guidance

Turn discovery into a routing policy

The catalog should not just answer “what exists?” It should help you define a default model, a fallback model, and the thresholds that trigger an upgrade for specific workloads.

Filter by task first

Do not compare a forecast-only checkpoint against a multi-task model until you know your task needs. Supported task coverage narrows the field quickly and avoids false comparisons.

Pick one default, keep one fallback

Use catalog latency and cost signals to choose a default model, then keep a cheaper or more robust fallback ready for outages, quota pressure, or burst traffic.

Benchmark on your own distribution

Leaderboard results are directional. Real routing decisions should be validated on your historical series, horizon lengths, and business metrics before rollout.

routing-policy.example.json
{
  "default_model": "amazon/chronos-bolt-small",
  "fallback_model": "ibm/ttm-r2",
  "promotion_rules": [
    "upgrade to chronos-bolt-base when covariates are required",
    "upgrade to google/timesfm-2.0-500m when context_length exceeds 4096"
  ],
  "review_inputs": [
    "p95 latency",
    "forecast error on holdout set",
    "cost per 10k requests"
  ]
}