Lag-Llama

online

time-series-foundation-models/Lag-Llama

2.45M params | 1K context | $0.00025 per forecast | Apache-2.0

Lag-Llama is an open probabilistic forecasting model released through the Time Series Foundation Models org by ServiceNow Research, using a LLaMA-style decoder architecture with lag-based tokenization. It converts lagged historical values into tokens for autoregressive generation of full predictive distributions and remains one of the smallest public TSFMs with native uncertainty output. The upstream best-practices guide recommends sweeping context length starting from the 32-point training regime; on TSFM.ai it is currently better suited to probabilistic baseline usage than to aggressive deterministic trend continuation.

Model Classification

Family

Lag-Llama

Type

time series foundation model

Pretrained time-series model exposed on TSFM.ai for zero-shot or few-shot forecasting workloads.

Resources

HuggingFace Paper

Training Data

Curated subset of public time-series repositories spanning energy, traffic, weather, and economic domains — trained on 27 datasets from the Monash Time Series Forecasting Repository.

Recommended For

• Probabilistic zero-shot forecasting with well-calibrated uncertainty estimates
• Lightweight deployment where sample-based distribution quality matters

Strengths

• One of the earliest open probabilistic TSFMs — well studied and widely benchmarked
• Lag-based tokenization captures temporal dependencies efficiently with a small model

Limitations

• Smaller capacity than newer large-scale TSFM families — may plateau on complex multivariate workloads
• Focused on forecasting rather than multi-task time-series understanding
• Can under-react on strong deterministic trend extrapolation compared with newer larger zero-shot forecasters
• Prediction intervals can be very wide with short context — provide at least 100+ data points for tighter uncertainty estimates

Not Ideal For

• Strong monotonic trend continuation where you expect the first forecast step to stay close to the recent level
• Using long-context zero-shot quality as the main selection criterion against newer larger model families

Capabilities

forecastingprobabilistic-forecastingzero-shot

Specifications

Parameters: 2.45M
Architecture: decoder-only transformer (LLaMA-style) with lag-based tokenization and Student-t mixture output
Context length: 1,024
Max context: 1,024
Minimum history: n/a
Recommended history: 512
Input step: n/a
Required target series: 1
Temperature: Ignored
Top P: Ignored
Max output: 2,048
Avg latency: n/a
Uptime: n/a
Plan limits: 1,000 rpm free · 1,000,000 rpm with billing
Accelerator: L40S
Regions: Virginia, US
License: Apache-2.0

Pricing

Per forecast: $0.00025

Performance

Average latency: n/a
Availability: n/a
Plan limits: 1,000 rpm free · 1,000,000 rpm with billing

Lag-Llama

Model Classification

Resources

Training Data

Recommended For

Strengths

Limitations

Not Ideal For

Capabilities

Tags

Specifications

Pricing

Performance