moiraisalesforcemultivariatemodel-architecture

Moirai: Salesforce's Universal Forecasting Transformer

Moirai from Salesforce introduces a universal forecasting transformer that handles variable frequencies, prediction lengths, and multivariate inputs.

TSFM.ai Team

July 22, 20244 min read

When Salesforce AI Research released Moirai in early 2024, it marked a significant step toward truly universal time series forecasting. Unlike models that specialize in univariate point forecasting, Moirai was designed from the ground up to handle the full diversity of real-world forecasting problems: arbitrary numbers of variates, mixed frequencies, variable context and prediction lengths, and probabilistic outputs. The result is a model family that performs competitively across a remarkably wide range of tasks without any task-specific fine-tuning.

#Architecture: Masked Encoder with Any-Variate Attention

Moirai is built on a masked encoder architecture, diverging from the decoder-only trend popularized by large language models. The encoder-based design allows the model to attend bidirectionally over the input context, which is particularly beneficial for forecasting where the entire historical window is available at inference time.

The core architectural innovation is the Any-Variate Attention mechanism. Traditional multivariate transformers either flatten all variates into a single sequence (which scales poorly) or treat each variate independently (which misses cross-variate correlations). Moirai's approach applies attention across both time and variate dimensions in a unified framework. Each variate is embedded independently with shared patch embeddings, and then attention is applied across all variate-time patch tokens simultaneously. This means the same model can ingest a univariate retail sales series and a 300-variate sensor array without architectural changes.

Input patches are created using a multi-scale patching strategy. Rather than fixing a single patch size, Moirai selects patch sizes based on the input frequency, which helps the model handle data ranging from minutely to yearly observations. Time index normalization maps different sampling rates into a common representational space, so the model does not need separate frequency-specific heads or embeddings.

#LOTSA: Pretraining at Scale

A foundation model is only as good as its pretraining data. For Moirai, Salesforce curated the Large-scale Open Time Series Archive (LOTSA), a dataset comprising approximately 27 billion observations drawn from 9 distinct domains: energy, transport, nature, economic indicators, healthcare, sales, weather, web traffic, and banking. LOTSA aggregates data from publicly available sources including the Monash Forecasting Archive, GluonTS datasets, and various domain-specific repositories.

The scale and diversity of LOTSA are intentional. Prior TSFMs often trained on narrower corpora, which limited their generalization. By spanning frequencies from minutely to yearly and covering both univariate and multivariate series, LOTSA ensures that Moirai encounters the statistical patterns it will face at inference time. The curation process also addressed quality: series with insufficient length, degenerate distributions, or known data issues were filtered.

#Model Sizes and the Mixture Distribution Head

Moirai ships in three sizes: Small (14M parameters), Base (91M), and Large (311M). This range makes the model practical across deployment scenarios, from edge inference with the Small variant to maximum accuracy with Large.

The output layer uses a mixture distribution head that combines multiple parametric distributions to produce flexible probabilistic forecasts. Specifically, Moirai outputs a mixture of location-scale distributions, which can approximate a wide range of forecast densities including skewed and heavy-tailed shapes. This is more expressive than a single Gaussian assumption and avoids the discretization artifacts of bin-based approaches like those used in Chronos.

At inference time, you can extract point forecasts (mean or median of the mixture), quantiles at arbitrary levels, or full predictive samples. This flexibility is critical for downstream applications where decision-making depends on tail risk rather than central tendency.

#Benchmark Performance

The Moirai paper evaluates on a broad suite of forecasting benchmarks spanning short-horizon and long-horizon tasks across multiple domains. On the Monash Forecasting Archive, Moirai-Large achieves competitive or superior performance compared to both classical statistical baselines (ETS, ARIMA) and recent deep learning models (PatchTST, iTransformer). Notably, it performs well on multivariate benchmarks like ETTh and Weather, where cross-variate modeling provides an advantage.

Compared to Chronos, which uses a token-classification approach over binned values, Moirai's continuous distributional output tends to produce better-calibrated prediction intervals. Against TimesFM, which focuses on point forecast accuracy, Moirai trades some point accuracy for richer probabilistic output. The choice between them depends on whether your application needs well-calibrated uncertainty or just the best possible median forecast.

#Open-Source Release and Practical Usage

Moirai is released under the Salesforce open-source ecosystem as part of the uni2ts library. The library provides pretrained checkpoints for all three model sizes, along with utilities for data loading, preprocessing, and inference. Integration follows a straightforward pattern: load a pretrained model, prepare your time series as a pandas DataFrame or GluonTS-compatible dataset, and call the prediction method with your desired forecast horizon and number of samples.

The uni2ts library also supports fine-tuning on custom datasets, which can meaningfully improve accuracy on domain-specific tasks. Salesforce's experiments show that even light fine-tuning on a few hundred series from the target domain can reduce forecast error by 10-20% compared to zero-shot inference.

#Where Moirai Fits in the TSFM Landscape

Moirai's strength lies in its generality. If your forecasting problem involves multivariate inputs, if you need calibrated prediction intervals, or if your data spans unusual frequencies, Moirai is a strong default choice. Its main trade-off is computational cost: the any-variate attention mechanism scales with the product of variates and time patches, so very high-dimensional multivariate problems may require careful batching.

On TSFM.ai, Moirai is available across all three model sizes through our unified forecasting API. You can specify quantile levels, number of samples, and prediction horizon, and the platform handles patching, normalization, and frequency alignment behind the scenes.

Moirai: Salesforce's Universal Forecasting Transformer

#Architecture: Masked Encoder with Any-Variate Attention

#LOTSA: Pretraining at Scale

#Model Sizes and the Mixture Distribution Head

#Benchmark Performance

#Open-Source Release and Practical Usage

#Where Moirai Fits in the TSFM Landscape

Run these models on your own data

Related articles

Moirai-MoE: Token-Level Specialization for Time Series Foundation Models

TiRex-2: Multivariate xLSTM Forecasting With Covariates and Streaming State

Migas 1.5 and the Text-Conditioned Forecasting Stack