Lowest latency
You need sub-150ms responses for real-time dashboards, alerting, or streaming applications.
Recommended
These models use direct prediction or lightweight architectures that minimize inference time.
Model selection
Choose the right model family based on latency, cost, context length, and task requirements.
Start with amazon/chronos-bolt-base for most workloads. It offers a strong balance of accuracy, speed (130ms), and cost ($0.07/1M tokens) with probabilistic output support. From there, move to google/timesfm-2.0-500m-pytorch if you need longer served context, Salesforce moirai-1.1-R models for multivariate, or ibm-research/granite-timeseries-ttm-r2 if you need to minimize cost.
Key dimensions to consider when selecting a model.
| Factor | Low | Mid | High |
|---|---|---|---|
| Latency requirement | < 150ms | 150-400ms | > 400ms |
| Budget per 1M tokens | < $0.10 | $0.10-0.25 | > $0.25 |
| History length | < 512 pts | 512-2,048 pts | > 2,048 pts |
| Task complexity | Point forecast | Probabilistic | Multi-task |
You need sub-150ms responses for real-time dashboards, alerting, or streaming applications.
Recommended
These models use direct prediction or lightweight architectures that minimize inference time.
You are processing millions of series in batch and need to minimize per-request cost.
Recommended
Smaller parameter counts mean lower GPU utilization per request. Combined with high rate limits, these are ideal for batch workloads.
Accuracy is the primary concern and you can tolerate higher latency and cost.
Recommended
Larger models with stronger zero-shot behavior across real workloads are the safest default place to spend latency budget. Moirai 1.1-R Large excels at multivariate forecasting, TimesFM 2.0 remains strong on longer served histories, and Chronos-Bolt Base is a pragmatic all-around baseline.
You have multiple correlated variables that should be modeled jointly.
Recommended
Moirai's Any-Variate Attention captures cross-variate dependencies natively. Choose the size that fits your latency and cost budget.
You have fewer than 50 historical observations (new products, new sensors).
Recommended
These models have strong zero-shot transfer from pre-training and produce reasonable forecasts even with minimal context.
All hosted models sorted by latency. Exact served context limits come from the live catalog and can change as deployment configs change.
| Model | Params | Latency | Input cost | Best for |
|---|---|---|---|---|
| amazon/chronos-bolt-mini | 9M | 60ms | $0.02 | Ultra-low latency and edge deployment |
| amazon/chronos-bolt-small | 48M | 88ms | $0.04 | Real-time and batch applications at lowest cost |
| ibm-research/granite-timeseries-ttm-r2 | ~1M | 95ms | $0.03 | Ultra-low-cost batch forecasting and edge deployment |
| amazon/chronos-bolt-base | 205M | 130ms | $0.07 | Fast inference with strong accuracy |
| Salesforce/moirai-1.1-R-small | 14M | 210ms | $0.09 | Low-cost multivariate forecasting |
| thuml/timer-base-84m | 84M | 260ms | $0.15 | Patch-based zero-shot point forecasting once you have enough history |
| Salesforce/moirai-1.1-R-base | 91M | 330ms | $0.17 | Balanced multivariate quality and cost |
| google/timesfm-2.0-500m-pytorch | 500M | 480ms | $0.38 | Maximum forecast quality for longer served histories |
| Salesforce/moirai-1.1-R-large | 311M | 520ms | $0.29 | Maximum multivariate forecast quality |
Browse models — See all models with live status, pricing, and detailed specifications.
Quickstart — Make your first API call and see forecasts in under 5 minutes.
Playground — Test models interactively with your own data before committing to an integration.