Lag-Llama
onlinetime-series-foundation-models/Lag-Llama2.45M params | 1K context | $0.00025 per forecast | Apache-2.0
Lag-Llama is an open probabilistic forecasting model released through the Time Series Foundation Models org by ServiceNow Research, using a LLaMA-style decoder architecture with lag-based tokenization. It converts lagged historical values into tokens for autoregressive generation of full predictive distributions and remains one of the smallest public TSFMs with native uncertainty output. The upstream best-practices guide recommends sweeping context length starting from the 32-point training regime; on TSFM.ai it is currently better suited to probabilistic baseline usage than to aggressive deterministic trend continuation.
Model Classification
Family
Lag-Llama
Type
time series foundation model
Pretrained time-series model exposed on TSFM.ai for zero-shot or few-shot forecasting workloads.
Resources
Training Data
Curated subset of public time-series repositories spanning energy, traffic, weather, and economic domains — trained on 27 datasets from the Monash Time Series Forecasting Repository.
Recommended For
- • Probabilistic zero-shot forecasting with well-calibrated uncertainty estimates
- • Lightweight deployment where sample-based distribution quality matters
Strengths
- • One of the earliest open probabilistic TSFMs — well studied and widely benchmarked
- • Lag-based tokenization captures temporal dependencies efficiently with a small model
Limitations
- • Smaller capacity than newer large-scale TSFM families — may plateau on complex multivariate workloads
- • Focused on forecasting rather than multi-task time-series understanding
- • Can under-react on strong deterministic trend extrapolation compared with newer larger zero-shot forecasters
- • Prediction intervals can be very wide with short context — provide at least 100+ data points for tighter uncertainty estimates
Not Ideal For
- • Strong monotonic trend continuation where you expect the first forecast step to stay close to the recent level
- • Using long-context zero-shot quality as the main selection criterion against newer larger model families
Capabilities
Tags
Specifications
- Parameters
- 2.45M
- Architecture
- decoder-only transformer (LLaMA-style) with lag-based tokenization and Student-t mixture output
- Context length
- 1,024
- Max context
- 1,024
- Minimum history
- n/a
- Recommended history
- 512
- Input step
- n/a
- Required target series
- 1
- Temperature
- Ignored
- Top P
- Ignored
- Max output
- 2,048
- Avg latency
- n/a
- Uptime
- n/a
- Plan limits
- 1,000 rpm free · 1,000,000 rpm with billing
- Accelerator
- L40S
- Regions
- Virginia, US
- License
- Apache-2.0
Pricing
- Per forecast
- $0.00025
Performance
- Average latency
- n/a
- Availability
- n/a
- Plan limits
- 1,000 rpm free · 1,000,000 rpm with billing