Choosing a Model

Model selection

Choose the right model family based on latency, cost, context length, and task requirements.

If you are not sure where to start

Start with amazon/chronos-bolt-base for most workloads. It offers a strong balance of accuracy, speed (130ms), and cost ($0.07/1M tokens) with probabilistic output support. From there, move to google/timesfm-2.0-500m-pytorch if you need longer served context, Salesforce moirai-1.1-R models for multivariate, or ibm-research/granite-timeseries-ttm-r2 if you need to minimize cost.

Decision factors

Key dimensions to consider when selecting a model.

Factor	Low	Mid	High
Latency requirement	< 150ms	150-400ms	> 400ms
Budget per 1M tokens	< $0.10	$0.10-0.25	> $0.25
History length	< 512 pts	512-2,048 pts	> 2,048 pts
Task complexity	Point forecast	Probabilistic	Multi-task

Recommendations by scenario

Lowest latency

You need sub-150ms responses for real-time dashboards, alerting, or streaming applications.

Recommended

amazon/chronos-bolt-mini (60ms)amazon/chronos-bolt-small (88ms)ibm-research/granite-timeseries-ttm-r2 (95ms)

These models use direct prediction or lightweight architectures that minimize inference time.

Lowest cost

You are processing millions of series in batch and need to minimize per-request cost.

Recommended

amazon/chronos-bolt-mini ($0.02)ibm-research/granite-timeseries-ttm-r2 ($0.03)amazon/chronos-bolt-small ($0.04)

Smaller parameter counts mean lower GPU utilization per request. Combined with high rate limits, these are ideal for batch workloads.

Best forecast quality

Accuracy is the primary concern and you can tolerate higher latency and cost.

Recommended

Salesforce/moirai-1.1-R-largegoogle/timesfm-2.0-500m-pytorchamazon/chronos-bolt-base

Larger models with stronger zero-shot behavior across real workloads are the safest default place to spend latency budget. Moirai 1.1-R Large excels at multivariate forecasting, TimesFM 2.0 remains strong on longer served histories, and Chronos-Bolt Base is a pragmatic all-around baseline.

Multivariate series

You have multiple correlated variables that should be modeled jointly.

Recommended

Salesforce/moirai-1.1-R-smallSalesforce/moirai-1.1-R-baseSalesforce/moirai-1.1-R-large

Moirai's Any-Variate Attention captures cross-variate dependencies natively. Choose the size that fits your latency and cost budget.

Limited history

You have fewer than 50 historical observations (new products, new sensors).

Recommended

amazon/chronos-bolt-baseSalesforce/moirai-1.1-R-smallgoogle/timesfm-2.0-500m-pytorch

These models have strong zero-shot transfer from pre-training and produce reasonable forecasts even with minimal context.

Full model comparison

All hosted models sorted by latency. Exact served context limits come from the live catalog and can change as deployment configs change.

Model	Params	Latency	Input cost	Best for
amazon/chronos-bolt-mini	9M	60ms	$0.02	Ultra-low latency and edge deployment
amazon/chronos-bolt-small	48M	88ms	$0.04	Real-time and batch applications at lowest cost
ibm-research/granite-timeseries-ttm-r2	~1M	95ms	$0.03	Ultra-low-cost batch forecasting and edge deployment
amazon/chronos-bolt-base	205M	130ms	$0.07	Fast inference with strong accuracy
Salesforce/moirai-1.1-R-small	14M	210ms	$0.09	Low-cost multivariate forecasting
thuml/timer-base-84m	84M	260ms	$0.15	Patch-based zero-shot point forecasting once you have enough history
Salesforce/moirai-1.1-R-base	91M	330ms	$0.17	Balanced multivariate quality and cost
google/timesfm-2.0-500m-pytorch	500M	480ms	$0.38	Maximum forecast quality for longer served histories
Salesforce/moirai-1.1-R-large	311M	520ms	$0.29	Maximum multivariate forecast quality

Next steps

Browse models — See all models with live status, pricing, and detailed specifications.

Quickstart — Make your first API call and see forecasts in under 5 minutes.

Playground — Test models interactively with your own data before committing to an integration.