chronosamazonmodel-updatebenchmark

Chronos v2: What's New and Why It Matters

Amazon's Chronos v2 (Chronos-Bolt) brings major improvements: 250x faster inference, encoder-only architecture, and stronger benchmark results.

T
TSFM.ai Team
February 20, 20254 min read

Chronos v2: What's New and Why It Matters

When Amazon released the original Chronos in early 2024, it demonstrated that language model architectures could be adapted for time series forecasting through a clever tokenization scheme. The model worked well, but its T5-based encoder-decoder architecture carried a cost: autoregressive decoding made inference slow, particularly for long prediction horizons. Chronos v2, released under the name Chronos-Bolt, fundamentally rethinks the architecture to solve this bottleneck while simultaneously improving accuracy.

Architecture: From Encoder-Decoder to Encoder-Only

The original Chronos mapped time series values into a discrete vocabulary using mean scaling and quantization, then used a T5 encoder-decoder to generate future tokens autoregressively. Each predicted time step required a separate forward pass through the decoder, and producing probabilistic forecasts meant running multiple sample paths. For a 64-step horizon with 20 sample paths, that meant 1,280 decoder forward passes per series.

Chronos-Bolt drops the decoder entirely. The encoder processes the input context and produces a fixed-length representation, which is then passed through a lightweight MLP decoding head that outputs all forecast steps in a single forward pass. Instead of sampling discrete tokens, the MLP head directly predicts parameters of a mixture distribution at each horizon step, giving you probabilistic forecasts without multiple sample paths.

This architectural change yields roughly a 250x speedup in inference compared to the original Chronos-Large. On a single A10G GPU, Chronos-Bolt-Large produces forecasts for a batch of 256 series with a 64-step horizon in under 200 milliseconds. The original Chronos-Large takes over 40 seconds for the same workload.

New Model Sizes

The Chronos-Bolt family ships in four sizes, with parameter counts reflecting the encoder-only design:

  • Chronos-Bolt-Tiny (9M parameters): Suitable for edge deployment and latency-critical applications. Runs efficiently on CPU.
  • Chronos-Bolt-Mini (21M parameters): Good balance of accuracy and speed for moderate workloads.
  • Chronos-Bolt-Small (48M parameters): Strong general-purpose option for most production use cases.
  • Chronos-Bolt-Large (710M parameters): Highest accuracy, still dramatically faster than the original Chronos-Large (710M) due to the architectural change.

The parameter counts for Tiny, Mini, and Small are lower than the original Chronos counterparts because the decoder parameters are replaced by the much smaller MLP head.

Benchmark Results

Amazon evaluated Chronos-Bolt on 27 datasets from the Monash Time Series Forecasting Repository and additional held-out datasets not seen during training. The results show consistent improvement over the original Chronos across the board.

On the zero-shot Monash benchmark, Chronos-Bolt-Large achieves a mean WQL (Weighted Quantile Loss) that improves on the original Chronos-Large by approximately 7%. More notably, Chronos-Bolt-Small matches the accuracy of the original Chronos-Large at a fraction of the computational cost. This means you can get the same forecast quality with a 48M parameter model that previously required 710M parameters, as long as you are on the Bolt architecture.

Against other foundation models, Chronos-Bolt-Large is competitive with TimesFM 2.0 and Moirai-Large on most benchmark categories, with relative rankings varying by dataset characteristics. On short-context retail and demand forecasting series, Chronos-Bolt tends to perform particularly well. On very long-context datasets (thousands of time steps), TimesFM's 2048-token context length gives it an edge.

Training Data Improvements

The Chronos-Bolt training data pipeline was expanded and curated compared to the original. The team increased the volume of synthetic data generated through Gaussian Process kernels, and applied more aggressive filtering to remove low-quality or near-duplicate series from the public data corpus. They also introduced a curriculum strategy where the model sees shorter, simpler series early in training before progressing to longer, more complex patterns. This curriculum approach appears to improve learning efficiency and final accuracy, particularly on short-horizon forecasting tasks.

What This Means for Production

The original Chronos was accurate but often impractical for high-throughput production workloads. If you needed to forecast tens of thousands of series at hourly cadence, the inference cost was prohibitive. Chronos-Bolt removes that constraint entirely.

At 250x faster inference, Chronos-Bolt-Large can handle workloads that previously required either downgrading to smaller models or using non-foundation-model approaches. For more on GPU optimization for TSFM serving, see our dedicated post. A batch of 10,000 series with a 48-step horizon runs in roughly 8 seconds on a single A10G, making it viable for near-real-time applications like dynamic pricing, inventory replenishment triggers, and operational dashboards.

The small model sizes also open up new deployment targets. Chronos-Bolt-Tiny runs comfortably on CPU with sub-second latency for individual series, which means you can deploy it at the edge or in serverless functions without GPU infrastructure.

Availability on TSFM.ai

All four Chronos-Bolt variants are available in the TSFM.ai model catalog today. Our automatic routing has been updated to prefer Chronos-Bolt over the original Chronos variants in most scenarios, since Bolt matches or exceeds accuracy at dramatically lower latency. If you are currently using Chronos-Large through our API, we recommend testing Chronos-Bolt-Large as a drop-in replacement. The request format is identical and you should see the same or better accuracy with significantly faster response times.

The Broader Trend

Chronos-Bolt reflects a pattern we are seeing across the TSFM space: the shift from adapting general-purpose architectures to designing inference-efficient architectures specifically for time series. The T5 encoder-decoder was a convenient starting point for research, but production deployment demands architectures that produce forecasts in a single forward pass. We expect future TSFM releases from other labs to follow a similar trajectory, prioritizing throughput and latency alongside accuracy.

Related articles