TimesFM: Google's Approach to Time Series Foundation Models
Google's TimesFM is a decoder-only foundation model for time series, trained on 100B real-world time points from Google Trends and Wikipedia.
When Google Research published the TimesFM paper (Das et al., 2024), the approach stood out for two reasons: the sheer scale of the pretraining corpus and an architectural design that diverges meaningfully from other TSFMs. Where Amazon's Chronos adapts an encoder-decoder language model, TimesFM follows the decoder-only lineage — closer to GPT than to T5 — and introduces a patching mechanism that gives it unusual flexibility at inference time.
Architecture: Decoder-Only with Patching
TimesFM uses a decoder-only transformer, meaning it processes the input sequence causally (left-to-right) and generates outputs autoregressively. This is the same high-level architecture as GPT-2, GPT-3, and LLaMA, adapted for continuous-valued temporal data rather than discrete text tokens.
The key architectural innovation is input and output patching. Rather than consuming one time step per transformer position, TimesFM groups consecutive time steps into patches. Each input patch is a contiguous subsequence of the time series (e.g., 32 time steps), which is projected into the model's hidden dimension through a linear layer. The transformer then operates over a sequence of these patch embeddings.
On the output side, TimesFM uses output patches as well. At each decoding step, the model produces a patch of multiple future values simultaneously, rather than a single next-step prediction. This has a practical consequence: the model can cover a long forecast horizon in relatively few autoregressive steps, reducing inference latency and error accumulation.
Critically, TimesFM supports variable input and output patch lengths at inference time. The model was trained with multiple patch sizes, so it can adapt to different forecasting granularities without retraining. This makes it straightforward to handle different frequencies (hourly, daily, weekly) and different horizon lengths from a single model checkpoint.
Pretraining Data: Scale Through Google's Data Assets
The pretraining corpus is where TimesFM distinguishes itself most clearly. The model was trained on approximately 100 billion real-world time points, sourced from:
-
Google Trends: Search interest time series across millions of queries and geographies. This provides dense coverage of varied seasonal patterns, trend behaviors, and event-driven spikes.
-
Wikipedia pageviews: Daily and hourly page view counts for millions of articles. This corpus contributes a rich set of bursty, event-driven, and seasonally varying series.
-
Synthetic data: Generated time series augmenting the real-world data with controlled properties — specific trend/seasonality combinations, noise levels, and structural breaks.
This training corpus is an order of magnitude larger than what most competing TSFMs use. Chronos, by comparison, was trained on approximately 30 public datasets plus synthetic GP data. Google also published a detailed overview on the Google Research blog. The scale of TimesFM's pretraining data is a direct consequence of Google's access to proprietary internal data assets — an advantage that is difficult for academic or smaller-scale efforts to replicate.
Model Configuration
The released version of TimesFM uses approximately 200M parameters. While this is comparable to Chronos-T5-Base in size, the architectural differences (decoder-only vs. encoder-decoder, patching vs. tokenization) mean direct parameter-count comparisons are not especially meaningful.
The model uses a context length of up to 512 patches. With a patch size of 32, this corresponds to a maximum lookback of 16,384 time steps — substantially longer effective context than most competing models.
Point Forecasts, Not Probabilistic
One notable design choice: TimesFM produces point forecasts by default. The model outputs a single predicted value (or patch of values) per decoding step, optimized with mean squared error during training.
This contrasts with Chronos, which generates full probabilistic forecasts through categorical token sampling. For applications that need prediction intervals or quantile estimates, TimesFM's point forecast output requires additional machinery — either conformal prediction wrappers, quantile regression heads added during fine-tuning, or ensemble-based uncertainty estimation.
The Google team's rationale is pragmatic: point forecasts are sufficient for many production use cases, and the simpler output head contributes to faster inference. But for risk-sensitive applications (supply chain planning, financial risk management) where calibrated uncertainty matters, this is a meaningful limitation.
Benchmark Results
TimesFM was evaluated across multiple benchmark suites:
-
Monash: On the Monash Time Series Forecasting Repository, TimesFM achieves strong zero-shot performance, matching or exceeding supervised baselines on the majority of datasets. Its aggregate weighted quantile loss is competitive with Chronos-Large despite using fewer parameters.
-
Darts: On the Darts benchmark suite, TimesFM performs particularly well on datasets with clear seasonal patterns — consistent with the seasonal richness of its Google Trends pretraining data.
-
Long-horizon benchmarks: On ETT, Weather, and Electricity datasets commonly used in long-range forecasting research, TimesFM's patching architecture helps it maintain accuracy over extended horizons where step-by-step autoregressive models accumulate error.
The model's strongest results tend to appear on datasets with characteristics well represented in its pretraining corpus: clear seasonality, moderate trend, and event-driven variation. On datasets with more exotic dynamics (e.g., high-frequency financial data), performance is less differentiated. For more on evaluation methodology, see TSFM Benchmarking Challenges.
Comparison with Chronos
The TimesFM-Chronos comparison illustrates a genuine architectural divide in the TSFM space:
| Dimension | TimesFM | Chronos |
|---|---|---|
| Architecture | Decoder-only | Encoder-decoder (T5) |
| Input handling | Continuous patching | Discrete tokenization (binning) |
| Output type | Point forecast | Probabilistic (sampled trajectories) |
| Training data | ~100B time points (Google-internal + synthetic) | ~30 public datasets + GP synthetic |
| Multivariate | No (univariate) | No (univariate) |
Neither approach dominates the other across all benchmarks. Chronos tends to excel when calibrated uncertainty is important. TimesFM tends to win on raw point accuracy and inference speed, especially for longer horizons.
Using TimesFM
Google released TimesFM as open weights, available through Hugging Face and GitHub. The model can be loaded and run with standard PyTorch tooling, and inference requires only a GPU with sufficient memory for the 200M parameter checkpoint (a single consumer GPU is sufficient).
On TSFM.ai, TimesFM is available through our unified API alongside Chronos, Moirai, and other foundation models — letting you compare zero-shot results across architectures on your own data without managing separate model deployments. Browse all available models in the model catalog.