Lowest latency
You need sub-150ms responses for real-time dashboards, alerting, or streaming applications.
Recommended
These models use direct prediction or lightweight architectures that minimize inference time.
Model selection
Choose the right model family based on latency, cost, context length, and task requirements.
Start with chronos-bolt-base for most workloads. It offers a strong balance of accuracy, speed (130ms), and cost ($0.07/1M tokens) with probabilistic output support. From there, move to timesfm-2.0-500m if you need longer context, moirai-1.1-R models for multivariate, or ttm-r2 if you need to minimize cost.
Key dimensions to consider when selecting a model.
| Factor | Low | Mid | High |
|---|---|---|---|
| Latency requirement | < 150ms | 150-400ms | > 400ms |
| Budget per 1M tokens | < $0.10 | $0.10-0.25 | > $0.25 |
| Context needed | < 5K tokens | 5K-12K | > 12K |
| Task complexity | Point forecast | Probabilistic | Multi-task |
You need sub-150ms responses for real-time dashboards, alerting, or streaming applications.
Recommended
These models use direct prediction or lightweight architectures that minimize inference time.
You are processing millions of series in batch and need to minimize per-request cost.
Recommended
Smaller parameter counts mean lower GPU utilization per request. Combined with high rate limits, these are ideal for batch workloads.
Accuracy is the primary concern and you can tolerate higher latency and cost.
Recommended
Larger models with longer context windows capture more complex patterns. Moirai 1.1-R Large excels at multivariate, TimesFM 2.0 at long context, and Timer provides a strong zero-shot univariate baseline.
You have multiple correlated variables that should be modeled jointly.
Recommended
Moirai's Any-Variate Attention captures cross-variate dependencies natively. Choose the size that fits your latency and cost budget.
You have fewer than 50 historical observations (new products, new sensors).
Recommended
These models have strong zero-shot transfer from pre-training and produce reasonable forecasts even with minimal context.
All hosted models sorted by latency. Click any model name to see full details.
| Model | Params | Latency | Input cost | Context | Best for |
|---|---|---|---|---|---|
| chronos-bolt-mini | 9M | 60ms | $0.02 | 4K | Ultra-low latency and edge deployment |
| chronos-bolt-small | 48M | 88ms | $0.04 | 6K | Real-time and batch applications at lowest cost |
| ttm-r2 | ~1M | 95ms | $0.03 | 4K | Ultra-low-cost batch forecasting and edge deployment |
| chronos-bolt-base | 205M | 130ms | $0.07 | 8K | Fast inference with strong accuracy |
| moirai-1.1-R-small | 14M | 210ms | $0.09 | 8K | Low-cost multivariate forecasting |
| timer-base-84m | 84M | 260ms | $0.15 | 2.9K | Strong zero-shot point forecasting from THUML |
| moirai-1.1-R-base | 91M | 330ms | $0.17 | 12K | Balanced multivariate quality and cost |
| timesfm-2.0-500m | 500M | 480ms | $0.38 | 16K | Maximum forecast quality for long-horizon tasks |
| moirai-1.1-R-large | 311M | 520ms | $0.29 | 16K | Maximum multivariate forecast quality |
Browse models — See all models with live status, pricing, and detailed specifications.
Quickstart — Make your first API call and see forecasts in under 5 minutes.
Playground — Test models interactively with your own data before committing to an integration.