TiRex: Nixtla's Covariate-Native Large-Scale Forecasting Transformer
Nixtla's TiRex brings native covariate support to large-scale time series forecasting with a dedicated encoder for exogenous regressors, a 16K context window, and strong zero-shot performance on covariate-rich datasets.
Most time series foundation models treat forecasting as a univariate problem: ingest a sequence of past values and predict the future. That works well when the target series contains enough information to explain its own dynamics. But in practice, many forecasting problems are driven by external forces — holidays, promotions, weather, pricing changes — that the target series alone cannot anticipate. Nixtla built TiRex to close that gap. Released as the successor to TimeGPT, TiRex is a transformer designed from the ground up to treat covariates as first-class inputs rather than afterthoughts bolted onto a univariate backbone.
From TimeGPT to TiRex
TimeGPT, Nixtla's first foundation model, demonstrated that a single pretrained transformer could generalize across domains and frequencies with strong zero-shot accuracy. However, it shared a limitation with most early TSFMs: covariate support was either absent or handled through simple concatenation, where exogenous variables were appended to the input sequence without any architectural mechanism to model their relationship with the target. This approach works for weakly correlated covariates but struggles when external drivers carry strong, structured signal — think temperature driving energy load or promotional calendars driving retail volume.
TiRex addresses this with a purpose-built architecture. The model retains TimeGPT's strengths in cross-domain generalization while adding a dedicated covariate processing pathway, a substantially longer context window, and a pretraining corpus that pairs target series with covariate signals at scale.
Architecture: Dual-Encoder Transformer
TiRex is a transformer with approximately 300 million parameters, organized around a dual-encoder design. The target series and the covariate inputs are processed through separate encoder branches before being fused through cross-attention layers.
The target encoder operates on the historical values of the forecast variable. Input patches are extracted from the raw series, projected into the model's hidden dimension, and processed through self-attention layers that capture temporal dependencies within the target.
The covariate encoder processes exogenous regressors through a parallel pathway. Each covariate is embedded independently and then combined through attention layers that learn inter-covariate relationships. Critically, covariates are not simply concatenated with the target in the input embedding — they maintain their own representational space until the cross-attention fusion stage.
The cross-attention layers are where covariate signal meets target dynamics. The target encoder's representations serve as queries, and the covariate encoder's representations serve as keys and values. This allows the model to selectively attend to the covariates that are most informative for each segment of the target series. A holiday indicator might strongly influence weekend forecasts but carry little signal on weekday mornings; the cross-attention mechanism can learn this differential relevance during pretraining.
This dual-encoder approach stands in contrast to how other models handle exogenous inputs. Chronos-Bolt concatenates covariates with the target in the input embedding layer, which is simpler but conflates the two signal sources. Moirai treats covariates as additional variate channels in its any-variate attention framework, which is flexible but does not enforce an explicit target-covariate relationship. TiRex's architecture occupies a middle ground: more structured than concatenation, more targeted than treating all channels symmetrically.
16K Context Window
TiRex supports a context window of 16,384 time steps, one of the longest among current TSFMs. For comparison, TimesFM operates with a 2,048-token context, and Chronos-Bolt supports up to 2,048 input positions. The extended context is particularly valuable for domains with long-range seasonality — annual cycles in weekly data require at least 52 positions to capture a single period, and multiple cycles are needed for the model to disambiguate seasonality from trend. With 16K positions, TiRex can ingest over 300 weeks of weekly data or roughly 2.5 years of hourly observations in a single pass.
The long context also benefits covariate utilization. The more historical covariate-target pairs the model can see at inference time, the better it can estimate the relationship between external drivers and the target. This is especially relevant for covariates with irregular effects, such as promotions that occur sporadically and whose impact varies by timing and magnitude.
Training Data: 50 Billion Observations with Covariate Pairs
The pretraining corpus comprises over 50 billion observations drawn from retail, energy, finance, logistics, and other domains. Importantly, a large portion of these observations include paired covariate data — promotional calendars alongside sales series, temperature records alongside energy load, economic indicators alongside financial metrics. This covariate-paired pretraining is what gives TiRex its ability to leverage exogenous inputs at zero-shot inference: the model has already learned the statistical patterns linking common covariate types to target behavior across millions of series.
Zero-Shot and Fine-Tuning Performance
TiRex delivers strong zero-shot performance on standard univariate benchmarks, placing competitively with Chronos-Bolt and TimesFM on the Monash Forecasting Archive and M-competition datasets. On these benchmarks, where covariates are not provided, TiRex's accuracy is in line with other leading TSFMs of similar parameter count.
The differentiation emerges on covariate-rich datasets. On benchmarks that include exogenous regressors — electricity load with temperature, retail demand with promotions and holidays, transportation volume with calendar features — TiRex with covariates outperforms its own covariate-free baseline by 8-15% in weighted quantile loss and outperforms competing models that lack dedicated covariate architectures by similar or larger margins.
TiRex also supports fine-tuning for domain adaptation. If you have a proprietary dataset with domain-specific covariate relationships (e.g., a pharmaceutical supply chain where regulatory approval timelines affect demand), fine-tuning the cross-attention layers on your data can yield significant additional accuracy gains beyond zero-shot inference.
When to Choose TiRex
TiRex is the strongest choice when your forecasting problem has meaningful external drivers. If you have promotional calendars, weather data, event schedules, pricing changes, or other regressors that demonstrably affect the target variable, TiRex's architecture is designed to extract that signal. The 16K context window makes it particularly well-suited for long-history series where the covariate-target relationship evolves over time.
If your problem is purely univariate with no available covariates, TiRex remains competitive but does not offer a structural advantage over Chronos-Bolt or TimesFM. For multivariate problems where you need to forecast multiple targets simultaneously and model cross-series correlations, Moirai's any-variate attention may be a better fit. For guidance on selecting the right model for your use case, see our post on TSFM.ai model routing.
Availability on TSFM.ai
TiRex is available through the TSFM.ai model catalog and can be accessed via the forecast API. The TiRex-Large checkpoint on HuggingFace serves as the default variant. When you include covariates in your API request, the platform's routing layer automatically selects TiRex as the preferred model unless you specify otherwise. Pass your covariates as named arrays aligned to the target timestamps, and the platform handles embedding, normalization, and inference.