sundialdiffusionprobabilisticmodel-architecturetsinghua

Diffusion Models for Time Series: Inside Tsinghua's Sundial

Sundial applies flow-matching diffusion to time series forecasting, producing full predictive distributions from noise in fewer steps than standard diffusion models.

TSFM.ai Team

April 25, 20254 min read

Most time series foundation models generate forecasts by predicting the next value in a sequence. Chronos bins continuous values into tokens and decodes them autoregressively. TimesFM and Moirai use patching and attention to map context windows to future horizons. These approaches share a common thread: they construct the forecast left-to-right or through masked reconstruction. Sundial, introduced by researchers at Tsinghua University in February 2025, breaks from this paradigm entirely. Instead of predicting tokens or patches sequentially, Sundial generates entire forecast trajectories at once by reversing a noise process, applying diffusion-based generative modeling to the time series domain.

#What Diffusion Means for Forecasting

Diffusion models gained prominence in image generation through denoising diffusion probabilistic models (DDPM), where an image is gradually corrupted with Gaussian noise and a neural network learns to reverse the corruption step by step. The same principle applies to time series. Given a historical context window, the model starts with pure noise in the shape of the forecast horizon and iteratively denoises it into a plausible future trajectory, conditioned on the observed history.

This is a fundamentally different generative process than autoregressive decoding. An autoregressive model like Chronos commits to each forecast step sequentially, with errors compounding as the horizon extends. A diffusion model refines the entire horizon simultaneously across multiple denoising steps, allowing later steps to correct earlier approximations. The forecast emerges holistically rather than incrementally.

The practical consequence is that diffusion models produce full joint distributions over the forecast horizon by construction. Each run of the denoising process yields one complete sample trajectory. Drawing multiple samples gives you an empirical predictive distribution that captures correlations across time steps, not just marginal uncertainty at each point. This is a meaningful advantage over models that output independent quantiles or parametric distributions at each step. For a broader discussion of why distributional forecasts matter, see our post on prediction intervals vs. point forecasts.

#Sundial's Architecture: Flow Matching for Speed

Standard DDPM-style diffusion requires hundreds or thousands of denoising steps at inference time, which is prohibitively slow for time series applications that may need to generate millions of forecasts. Sundial addresses this by adopting a flow-matching framework instead of the traditional DDPM noise schedule.

Flow matching learns a continuous-time velocity field that transports samples from a simple noise distribution to the data distribution along straight-line paths. Compared to the curved trajectories of standard diffusion, these straighter paths can be traversed accurately with far fewer integration steps. In practice, Sundial produces high-quality forecasts with roughly 10-20 denoising steps rather than the hundreds typically required by DDPM, bringing inference latency into a practical range.

The model itself is a 128-million-parameter transformer that takes the context window as conditioning input and predicts the velocity field for the forecast horizon at each denoising step. The context is encoded through a standard transformer encoder, and cross-attention injects the historical information into the denoising process. This architecture is conceptually clean: the encoder understands the past, and the diffusion process generates the future.

#Calibrated Uncertainty in Fewer Steps

One of Sundial's most notable properties is that its probabilistic forecasts remain well-calibrated even with an aggressively reduced number of denoising steps. The flow-matching formulation produces distributions whose coverage (the fraction of true values falling within predicted intervals) closely matches nominal levels across a range of step counts.

This is a practical advantage over both autoregressive sampling and parametric distribution heads. Autoregressive models like Lag-Llama produce probabilistic forecasts through repeated sampling, but each sample requires a full sequential pass through the forecast horizon. Parametric heads, as used in Moirai's mixture distributions, are limited to the distributional families they are designed to represent. Sundial's diffusion process is nonparametric: it can represent arbitrary distribution shapes, including fat tails, skewness, and multimodality, without committing to a fixed functional form.

#Benchmark Results: Probabilistic Metrics

The Sundial authors evaluate on standard forecasting benchmarks using CRPS (Continuous Ranked Probability Score), the primary metric for assessing probabilistic forecast quality. CRPS measures how well the predicted distribution matches the observed outcome, rewarding both accuracy and calibration simultaneously.

On the benchmark suite reported in the paper, Sundial achieves competitive or superior CRPS compared to Chronos, TimesFM, and Moirai across multiple domains and horizons. The gains are most pronounced on datasets where the underlying distribution is non-Gaussian: series with regime changes, heavy tails, or multimodal behavior. On well-behaved, approximately Gaussian series, the advantage over simpler probabilistic approaches is smaller, as expected.

The 128M parameter count places Sundial in the mid-range of TSFM model sizes, smaller than Chronos-Large (710M) or Moirai-Large (311M) but larger than Lag-Llama (2.45M). The flow-matching inference procedure adds per-sample overhead compared to single-pass models, but the reduced step count keeps total inference time competitive.

#When to Choose a Diffusion-Based Model

Diffusion-based forecasting is not universally superior. If you need fast point forecasts and do not require distributional output, a single-pass model like TimesFM will be more efficient. If your uncertainty needs are served by standard prediction intervals on roughly Gaussian data, parametric approaches work well and are cheaper to run.

Sundial and diffusion-based models earn their place when the shape of the predictive distribution matters. Applications in energy markets, financial risk, and supply chain planning often exhibit fat-tailed or multimodal demand patterns where the tails of the distribution drive decisions. In these settings, the ability to generate nonparametric, well-calibrated distributional forecasts is worth the additional inference cost.

#Getting Started

Sundial's pretrained weights are available on Hugging Face, and the model is listed in our model catalog. You can experiment with diffusion-based forecasting directly in the playground to compare Sundial's distributional output against autoregressive and patch-based alternatives on your own data.

Diffusion Models for Time Series: Inside Tsinghua's Sundial

#What Diffusion Means for Forecasting

#Sundial's Architecture: Flow Matching for Speed

#Calibrated Uncertainty in Fewer Steps

#Benchmark Results: Probabilistic Metrics

#When to Choose a Diffusion-Based Model

#Getting Started

Run these models on your own data

Related articles

TEDM: What Happens When You Apply Elucidated Diffusion Models to Time Series

Aurora: What the First Multimodal Time Series Foundation Model Actually Does

Timer-S1: Billion-Scale Time Series Forecasting with Serial Scaling