timesfmgooglemodel-architecturetransformer

TimesFM: Google's Approach to Time Series Foundation Models

Google's TimesFM family is a decoder-only foundation model line for time series, spanning the original 200M release and newer 2.0/2.5 checkpoints with longer context and optional quantile support.

TSFM.ai Team

May 20, 20245 min read

When Google Research published the TimesFM paper (Das et al., 2024), the approach stood out for two reasons: the sheer scale of the pretraining corpus and an architectural design that diverges meaningfully from other TSFMs. Where Amazon's Chronos adapts an encoder-decoder language model, TimesFM follows the decoder-only lineage — closer to GPT than to T5 — and introduces a patching mechanism that gives it unusual flexibility at inference time.

#Architecture: Decoder-Only with Patching

TimesFM uses a decoder-only transformer, meaning it processes the input sequence causally (left-to-right) and generates outputs autoregressively. This is the same high-level architecture as GPT-2, GPT-3, and LLaMA, adapted for continuous-valued temporal data rather than discrete text tokens.

The key architectural innovation is input and output patching. Rather than consuming one time step per transformer position, TimesFM groups consecutive time steps into patches. Each input patch is a contiguous subsequence of the time series (e.g., 32 time steps), which is projected into the model's hidden dimension through a linear layer. The transformer then operates over a sequence of these patch embeddings.

On the output side, TimesFM uses output patches as well. At each decoding step, the model produces a patch of multiple future values simultaneously, rather than a single next-step prediction. This has a practical consequence: the model can cover a long forecast horizon in relatively few autoregressive steps, reducing inference latency and error accumulation.

Critically, TimesFM supports variable input and output patch lengths at inference time. The model was trained with multiple patch sizes, so it can adapt to different forecasting granularities without retraining. This makes it straightforward to handle different frequencies (hourly, daily, weekly) and different horizon lengths from a single model checkpoint.

#Pretraining Data: Scale Through Google's Data Assets

The pretraining corpus is where TimesFM distinguishes itself most clearly. The model was trained on approximately 100 billion real-world time points, sourced from:

Google Trends: Search interest time series across millions of queries and geographies. This provides dense coverage of varied seasonal patterns, trend behaviors, and event-driven spikes.
Wikipedia pageviews: Daily and hourly page view counts for millions of articles. This corpus contributes a rich set of bursty, event-driven, and seasonally varying series.
Synthetic data: Generated time series augmenting the real-world data with controlled properties — specific trend/seasonality combinations, noise levels, and structural breaks.

This training corpus is an order of magnitude larger than what most competing TSFMs use. Chronos, by comparison, was trained on approximately 30 public datasets plus synthetic GP data. Google also published a detailed overview on the Google Research blog. The scale of TimesFM's pretraining data is a direct consequence of Google's access to proprietary internal data assets — an advantage that is difficult for academic or smaller-scale efforts to replicate.

#Model Configuration

The original TimesFM release used approximately 200M parameters. The newer public checkpoints we host split into TimesFM 2.0 (500M) and TimesFM 2.5 (200M), with 2.5 adding structural speedups and an optional continuous quantile head while 2.0 remains the larger univariate checkpoint.

The model uses a context length of up to 512 patches. With a patch size of 32, this corresponds to a maximum lookback of 16,384 time steps — substantially longer effective context than most competing models.

#Mostly Point Forecasts, with Newer Optional Quantile Heads

One notable design choice in the original TimesFM line is that it produces point forecasts by default. The model outputs a single predicted value (or patch of values) per decoding step, optimized with mean squared error during training.

This contrasts with Chronos, which generates full probabilistic forecasts through categorical token sampling. The newer TimesFM 2.0 and 2.5 checkpoints narrow that gap somewhat: 2.0 exposes experimental quantile heads, and 2.5 supports an optional continuous quantile head plus covariates through XReg in the official repo. But the family is still best understood as point-first rather than uncertainty-first.

The Google team's rationale is pragmatic: point forecasts are sufficient for many production use cases, and the simpler output head contributes to faster inference. But for risk-sensitive applications (supply chain planning, financial risk management) where calibrated uncertainty matters, this is a meaningful limitation.

#Benchmark Results

TimesFM was evaluated across multiple benchmark suites:

Monash: On the Monash Time Series Forecasting Repository, TimesFM achieves strong zero-shot performance, matching or exceeding supervised baselines on the majority of datasets. Its aggregate weighted quantile loss is competitive with Chronos-Large despite using fewer parameters.
Darts: On the Darts benchmark suite, TimesFM performs particularly well on datasets with clear seasonal patterns — consistent with the seasonal richness of its Google Trends pretraining data.
Long-horizon benchmarks: On ETT, Weather, and Electricity datasets commonly used in long-range forecasting research, TimesFM's patching architecture helps it maintain accuracy over extended horizons where step-by-step autoregressive models accumulate error.

The model's strongest results tend to appear on datasets with characteristics well represented in its pretraining corpus: clear seasonality, moderate trend, and event-driven variation. On datasets with more exotic dynamics (e.g., high-frequency financial data), performance is less differentiated. For more on evaluation methodology, see TSFM Benchmarking Challenges.

#Comparison with Chronos

The TimesFM-Chronos comparison illustrates a genuine architectural divide in the TSFM space:

Dimension	TimesFM	Chronos
Architecture	Decoder-only	Encoder-decoder (T5)
Input handling	Continuous patching	Discrete tokenization (binning)
Output type	Point forecast	Probabilistic (sampled trajectories)
Training data	~100B time points (Google-internal + synthetic)	~30 public datasets + GP synthetic
Multivariate / covariates	No native multivariate; newer releases add XReg covariates	Chronos v1: univariate only; Chronos-2 adds multivariate + covariates

Neither approach dominates the other across all benchmarks. Chronos tends to excel when calibrated uncertainty is important. TimesFM tends to win on raw point accuracy and inference speed, especially for longer horizons.

#Using TimesFM

Google released TimesFM as open weights, with the family now available through Hugging Face and GitHub. The model can be loaded and run with standard PyTorch tooling, and the smaller 2.5 checkpoint is practical on a single consumer GPU.

On TSFM.ai, TimesFM is available through our unified API alongside Chronos, Moirai, and other foundation models — letting you compare zero-shot results across architectures on your own data without managing separate model deployments. Browse all available models in the model catalog.

TimesFM: Google's Approach to Time Series Foundation Models

#Architecture: Decoder-Only with Patching

#Pretraining Data: Scale Through Google's Data Assets

#Model Configuration

#Mostly Point Forecasts, with Newer Optional Quantile Heads

#Benchmark Results

#Comparison with Chronos

#Using TimesFM

Run these models on your own data

Related articles

TimesFM In-Context Fine-Tuning: Domain Adaptation Without Weight Updates

Migas 1.5 and the Text-Conditioned Forecasting Stack

Local Hosting for Time Series Foundation Models