MOMENT-Large

online

AutonLab/MOMENT-1-large

385M params | 512 context | $0.00025 per forecast | MIT

MOMENT-Large is the large checkpoint in AutonLab's general-purpose time-series foundation-model family. Official sources frame MOMENT as a multi-task representation model that transfers across forecasting, classification, anomaly detection, imputation, reconstruction, and embedding extraction rather than optimizing purely for one forecasting benchmark. The upstream momentfm package warns that only the reconstruction head is pretrained and that forecasting heads must be fine-tuned, so it is best used as a shared backbone across downstream tasks rather than assumed to be the strongest default zero-shot forecaster.

Model Classification

Family

MOMENT

Type

time series foundation model

Pretrained time-series model exposed on TSFM.ai for zero-shot or few-shot forecasting workloads.

Resources

HuggingFace Paper

Training Data

Timeseries-PILE, built from public forecasting, classification, and anomaly-detection corpora including Informer datasets, Monash, UCR/UEA, and TSB-UAD.

Recommended For

• Shared backbones across forecasting, anomaly detection, classification, and imputation
• Teams that want one general-purpose time-series representation model

Strengths

• Broadest multi-task scope in the hosted catalog
• Useful when the same deployment needs to cover several downstream tasks

Limitations

• Not optimized purely around one forecasting leaderboard objective
• May be heavier than needed if you only need straightforward zero-shot forecasting
• The upstream momentfm package warns that only the reconstruction head is pretrained and that forecasting heads must be fine-tuned
• Hosted forecasting quality can lag specialist zero-shot forecasters because MOMENT is primarily framed as a transferable representation backbone
• Zero-shot forecasting tends to be trend-blind — predictions may flatten regardless of clear trends in the input

Not Ideal For

• Choosing a default forecast-only model when you mostly care about zero-shot continuation quality
• Short benchmark-style trend extrapolation where specialist forecasting families are stronger

Capabilities

forecastingclassificationanomaly-detectionimputationretrieval

Specifications

Parameters: 385M
Architecture: patch-based encoder-only transformer trained with masked time-series modeling
Context length: 512
Max context: 512
Minimum history: n/a
Recommended history: 512
Input step: n/a
Required target series: 1
Temperature: Ignored
Top P: Ignored
Max output: 2,048
Avg latency: n/a
Uptime: n/a
Plan limits: 1,000 rpm free · 1,000,000 rpm with billing
Accelerator: L40S
Regions: Virginia, US
License: MIT

Pricing

Per forecast: $0.00025

Performance

Average latency: n/a
Availability: n/a
Plan limits: 1,000 rpm free · 1,000,000 rpm with billing