The State of Multivariate Forecasting in 2025
Multivariate time series forecasting remains one of the hardest problems in ML. Here's where foundation models stand in 2025.
The State of Multivariate Forecasting in 2025
Univariate time series forecasting has made remarkable progress with foundation models. Chronos, TimesFM, and their peers forecast individual series with zero-shot accuracy that matches or exceeds task-specific models trained on each series individually. But most real-world forecasting problems are not univariate. You are not forecasting a single metric in isolation; you are forecasting dozens or hundreds of interrelated variables where the relationships between series carry information.
Multivariate time series forecasting, predicting multiple correlated variables jointly, remains one of the hardest open problems in the field. Here is where foundation models stand in mid-2025 and what practitioners should know.
Why Multivariate Is Harder
The difficulty of multivariate forecasting stems from several compounding factors.
Cross-series dependencies. In a supply chain, demand for complementary products is correlated: when laptop sales spike, laptop bag sales follow. In energy grids, consumption in neighboring zones is spatially correlated. These dependencies contain predictive information that a univariate model cannot exploit. But modeling them requires the model to learn which dependencies are informative and which are spurious, a problem that scales quadratically with the number of variables.
Curse of dimensionality. A univariate model learns a mapping from past values of one series to its future. A multivariate model with N variables must learn mappings across an N-dimensional input space. With limited data, the additional parameters needed to model cross-variable interactions often hurt more than they help through overfitting.
Non-stationary correlations. The relationships between variables are not fixed. Product correlations shift with seasons and trends. Financial asset correlations spike during market stress (the well-documented "correlation breakdown" problem). A model trained on historical cross-variable patterns may learn relationships that no longer hold at inference time.
The TSFM Landscape: Multivariate Support
Current foundation models fall along a spectrum of multivariate support.
Chronos and Chronos-Bolt are strictly univariate. Each series is tokenized and processed independently. There is no mechanism for cross-series information flow. To forecast multiple related variables, you run separate inference calls for each one.
TimesFM is architecturally univariate but uses a channel-independent strategy for multivariate data. You can pass multiple series in a single request, and the model processes them independently through shared parameters. The shared weights provide some implicit transfer (patterns learned from one series inform predictions on others), but there is no explicit attention or interaction between variables within a single forward pass.
Moirai was designed with multivariate data in mind. Its Any-Variate Attention mechanism allows the model to attend across variables within the same time window. This means Moirai can, in principle, learn that a spike in variable A at time t predicts a spike in variable B at time t+1. The attention is learned during pretraining on multi-series datasets, and the architecture supports variable numbers of input series at inference time without retraining.
MOMENT supports multivariate input through its patched embedding scheme. Multiple variables are embedded into the same patch representation, allowing the transformer layers to model cross-variable interactions. However, MOMENT's pretraining was primarily oriented toward representation learning and classification rather than forecasting, so its multivariate forecasting performance is less validated than Moirai's. For more on MOMENT, see our dedicated overview.
The Channel-Independent Paradox
One of the most provocative results in recent time series research is that channel-independent approaches, which ignore cross-variable correlations entirely, often outperform channel-mixing approaches on standard benchmarks.
This finding, prominently demonstrated in the DLinear paper by Zeng et al. (2023) and corroborated in subsequent work, challenged the intuition that modeling cross-variable relationships should always help. On benchmark datasets like ETTh, ETTm, Weather, and Electricity, simple models that treat each variable independently matched or beat sophisticated architectures with cross-variate attention.
Why does this happen? Several factors contribute. First, many benchmark datasets have weak cross-variable correlations that do not provide useful predictive signal. The Weather dataset, for example, contains meteorological variables measured at a single station; the cross-variable patterns (temperature correlates with humidity) are largely redundant with the temporal patterns in each individual series. Second, channel-mixing models have more parameters and are more prone to overfitting on the limited training data in standard benchmarks. Third, many cross-variable correlations in benchmark data are contemporaneous (happening at the same time) rather than lagged (one variable predicting another's future), and contemporaneous correlations do not help with forecasting.
This does not mean cross-variable modeling is useless. It means the benefit is data-dependent, and the benchmarks that dominate academic evaluation may not reflect the use cases where multivariate modeling matters most.
When Cross-Variable Correlations Actually Help
Multivariate modeling provides genuine value when three conditions are met simultaneously: the variables have strong lagged correlations (not just contemporaneous), the correlations are reasonably stable over the forecast horizon, and you have enough data to learn the interaction patterns without overfitting.
Supply chain and retail. Correlated product demand is a textbook case. Promotional events on one product affect substitutes and complements with predictable lags. New product launches cannibalize existing products with patterns that transfer across categories. Moirai's multivariate capability has shown measurable improvements over channel-independent approaches on retail datasets with these characteristics.
IoT sensor arrays. Industrial sensor networks often have strong physical relationships: upstream temperature affects downstream pressure with a transport delay. These lagged correlations are stable (governed by physics) and provide genuine predictive signal. Multivariate models can exploit the spatial structure that a univariate model must ignore.
Financial markets. Cross-asset correlations carry information, particularly in factor models where common factors drive multiple asset returns. However, the non-stationarity of financial correlations means this is also the domain where multivariate models are most prone to learning patterns that break down out of sample.
Macroeconomic indicators. Leading indicators (building permits predicting construction employment, PMI predicting GDP) represent lagged cross-variable relationships that have been exploited by economists for decades. TSFMs that can model these relationships have an advantage over univariate approaches for macro forecasting.
The Practical Recommendation
For practitioners in mid-2025, our recommendation is a staged approach.
Start channel-independent. Use any strong univariate TSFM (Chronos-Bolt or TimesFM) and forecast each variable separately. See the model catalog for available options. This gives you a strong baseline with minimal complexity. In many cases, this baseline will be hard to beat.
Measure cross-variable signal. Before investing in multivariate modeling, test whether your data actually contains exploitable cross-variable information. Compute lagged cross-correlations between your variables. If the strongest lagged correlations are below 0.3, multivariate modeling is unlikely to help.
Add multivariate selectively. If you identify variable pairs or groups with strong lagged correlations, try Moirai with those variables as joint inputs. Compare against the channel-independent baseline on a held-out test set. Only adopt the multivariate model if it shows a statistically significant improvement.
Monitor correlation stability. If you deploy a multivariate model, track the cross-variable correlations over time. When the correlation structure shifts (which it will, eventually), the multivariate model may degrade faster than a channel-independent alternative. Build monitoring to detect this.
Looking Ahead
The next generation of TSFMs will likely improve multivariate handling through several avenues. Sparse attention mechanisms that learn which cross-variable connections matter, rather than attending over all pairs, will reduce overfitting on irrelevant correlations. Recent work like iTransformer already demonstrates that inverting the attention dimension to treat each variable as a token can capture cross-variate dependencies more efficiently. Pretraining on larger and more diverse multivariate datasets will give models stronger priors about cross-variable dynamics. And hybrid approaches that combine channel-independent temporal modeling with lightweight cross-variable modules may offer the best of both worlds.
At TSFM.ai, we are tracking these developments closely and will make new multivariate-capable models available in our catalog as they mature. For now, the pragmatic path is to start simple, measure carefully, and add complexity only when the data justifies it.