Carnegie Mellon University logo

MOMENT-Base

online
AutonLab/MOMENT-1-base

~125M params | 512 context | $0.00025 per forecast | MIT

MOMENT-Base is the mid-size checkpoint in AutonLab's MOMENT family, providing a balance between the lightweight Small variant and the higher-capacity Large model. It shares the same masked time-series modeling architecture and multi-task transfer capabilities as the rest of the family. The upstream momentfm package warns that only the reconstruction head is pretrained and that forecasting heads must be fine-tuned, so zero-shot forecast quality should be treated as secondary to backbone reuse.

Model Classification

Family

MOMENT

Type

time series foundation model

Pretrained time-series model exposed on TSFM.ai for zero-shot or few-shot forecasting workloads.

Training Data

Timeseries-PILE, built from public forecasting, classification, and anomaly-detection corpora including Informer datasets, Monash, UCR/UEA, and TSB-UAD.

Recommended For

  • Shared backbones across forecasting, anomaly detection, classification, and imputation
  • Teams that want one general-purpose time-series representation model

Strengths

  • Broadest multi-task scope in the hosted catalog
  • Useful when the same deployment needs to cover several downstream tasks

Limitations

  • Not optimized purely around one forecasting leaderboard objective
  • May be heavier than needed if you only need straightforward zero-shot forecasting
  • The upstream momentfm package warns that only the reconstruction head is pretrained and that forecasting heads must be fine-tuned
  • Hosted forecasting quality can lag specialist zero-shot forecasters because MOMENT is primarily framed as a transferable representation backbone
  • Zero-shot forecasting tends to be trend-blind — predictions may flatten regardless of clear trends in the input

Not Ideal For

  • Choosing a default forecast-only model when you mostly care about zero-shot continuation quality
  • Short benchmark-style trend extrapolation where specialist forecasting families are stronger

Capabilities

forecastingclassificationanomaly-detectionimputationretrieval

Tags

momentmulti-taskrepresentation-learninggeneral-purpose

Specifications

Parameters
~125M
Architecture
patch-based encoder-only transformer trained with masked time-series modeling
Context length
512
Max context
512
Minimum history
n/a
Recommended history
512
Input step
n/a
Required target series
1
Temperature
Ignored
Top P
Ignored
Max output
1,024
Avg latency
n/a
Uptime
n/a
Plan limits
1,000 rpm free · 1,000,000 rpm with billing
Accelerator
T4
Regions
Virginia, US
License
MIT

Pricing

Per forecast
$0.00025

Performance

Average latency
n/a
Availability
n/a
Plan limits
1,000 rpm free · 1,000,000 rpm with billing