Chronos v2: What's New and Why It Matters
Amazon's Chronos-Bolt brings major improvements: direct multi-step forecasting, much lower latency than the original Chronos-T5 stack, and stronger benchmark results.
When Amazon released the original Chronos in early 2024, it demonstrated that language model architectures could be adapted for time series forecasting through a clever tokenization scheme. The model worked well, but its T5-based encoder-decoder architecture carried a cost: autoregressive decoding made inference slow, particularly for long prediction horizons. Chronos v2, released under the name Chronos-Bolt, fundamentally rethinks the architecture to solve this bottleneck while simultaneously improving accuracy.
#Architecture: From Autoregressive T5 to Direct Multi-Step Forecasting
The original Chronos mapped time series values into a discrete vocabulary using mean scaling and quantization, then used a T5 encoder-decoder to generate future tokens autoregressively. Each predicted time step required a separate forward pass through the decoder, and producing probabilistic forecasts meant running multiple sample paths. For a 64-step horizon with 20 sample paths, that meant 1,280 decoder forward passes per series.
Chronos-Bolt keeps a T5-style encoder-decoder structure, but it no longer uses the decoder autoregressively. The encoder processes the patched input context, and the decoder emits the full forecast horizon in a small number of direct steps rather than token-by-token rollout. Instead of sampling discrete tokens, the model directly predicts quantile outputs for the target horizon, giving you probabilistic forecasts without multiple sample paths.
This architectural change yields roughly a 250x speedup in inference compared to the original Chronos-Large family. On a single A10G GPU, the public Chronos-Bolt checkpoints can produce forecasts for large batches in a small fraction of the original Chronos-Large runtime. The original Chronos-Large takes over 40 seconds for the same workload.
#New Model Sizes
The Chronos-Bolt family ships in four sizes:
- Chronos-Bolt-Tiny (9M parameters): Suitable for edge deployment and latency-critical applications. Runs efficiently on CPU.
- Chronos-Bolt-Mini (21M parameters): Good balance of accuracy and speed for moderate workloads.
- Chronos-Bolt-Small (48M parameters): Strong general-purpose option for most production use cases.
- Chronos-Bolt-Base (205M parameters): Highest-capacity public Bolt checkpoint, still dramatically faster than the original Chronos-Large due to the direct multi-step design.
The public Bolt checkpoints are substantially smaller than the original Chronos-Large while still benefiting from the same direct multi-step design.
#Benchmark Results
Amazon evaluated Chronos-Bolt on 27 datasets from the Monash Time Series Forecasting Repository and additional held-out datasets not seen during training. The results show consistent improvement over the original Chronos across the board.
On the zero-shot Monash benchmark, the published Bolt family improves on the original Chronos checkpoints despite being much faster at inference. More notably, Chronos-Bolt-Small matches the accuracy of the original Chronos-Large at a fraction of the computational cost. This means you can get roughly the same forecast quality with a 48M parameter model that previously required 710M parameters, as long as you are on the Bolt architecture.
Against other foundation models, Chronos-Bolt Base is competitive with larger TimesFM and Moirai checkpoints on most benchmark categories, with relative rankings varying by dataset characteristics. On short-context retail and demand forecasting series, Chronos-Bolt tends to perform particularly well. On very long-context datasets (thousands of time steps), larger long-context families can still hold an edge.
#Training Data Improvements
The Chronos-Bolt training data pipeline was expanded and curated compared to the original. The team increased the volume of synthetic data generated through Gaussian Process kernels, and applied more aggressive filtering to remove low-quality or near-duplicate series from the public data corpus. They also introduced a curriculum strategy where the model sees shorter, simpler series early in training before progressing to longer, more complex patterns. This curriculum approach appears to improve learning efficiency and final accuracy, particularly on short-horizon forecasting tasks.
#What This Means for Production
The original Chronos was accurate but often impractical for high-throughput production workloads. If you needed to forecast tens of thousands of series at hourly cadence, the inference cost was prohibitive. Chronos-Bolt removes that constraint entirely.
At much lower inference cost than the original Chronos-T5 models, Chronos-Bolt Base can handle workloads that previously required either downgrading to smaller models or using non-foundation-model approaches. For more on GPU optimization for TSFM serving, see our dedicated post. A batch of 10,000 series with a 48-step horizon is practical on a single GPU, making Bolt viable for near-real-time applications like dynamic pricing, inventory replenishment triggers, and operational dashboards.
The small model sizes also open up new deployment targets. Chronos-Bolt-Tiny runs comfortably on CPU with sub-second latency for individual series, which means you can deploy it at the edge or in serverless functions without GPU infrastructure.
#Availability on TSFM.ai
All four Chronos-Bolt variants are available in the TSFM.ai model catalog today. Our automatic routing has been updated to prefer Chronos-Bolt over the original Chronos-T5 variants in most scenarios, since Bolt matches or exceeds accuracy at dramatically lower latency. If you are currently using Chronos-Large through our API, we recommend testing Chronos-Bolt Base as a drop-in replacement. The request format is identical and you should see the same or better accuracy with significantly faster response times.
#The Broader Trend
Chronos-Bolt reflects a pattern we are seeing across the TSFM space: the shift from adapting general-purpose architectures to designing inference-efficient architectures specifically for time series. The T5 encoder-decoder was a convenient starting point for research, but production deployment demands architectures that produce forecasts in a single forward pass. We expect future TSFM releases from other labs to follow a similar trajectory, prioritizing throughput and latency alongside accuracy.