Alibaba Cloud logo

Time-MoE-200M

online
Maple728/TimeMoE-200M

200M params | 512 context | $0.5000 input | $1.50 output

Time-MoE-200M is a sparse Mixture-of-Experts forecasting model from the Time-MoE family. The official paper and repo position the family as a way to scale time-series model capacity more efficiently than dense decoder-only models while keeping long-context autoregressive forecasting practical. It is a good fit when you want long-context zero-shot forecasting with explicit sparse-expert scaling.

Model Classification

Family

Time-MoE

Type

time series foundation model

Pretrained time-series model exposed on TSFM.ai for zero-shot or few-shot forecasting workloads.

Training Data

Time-300B, a multi-domain corpus spanning more than nine domains and over 300B time points, per the official paper and repository.

Recommended For

  • Long-context forecasting with sparse-expert scaling
  • Teams exploring MoE behavior in time-series foundation models

Strengths

  • Sparse experts offer large-capacity behavior without a fully dense footprint
  • Well aligned to long-context autoregressive forecasting

Limitations

  • MoE operational behavior can be less familiar than dense baselines
  • Not the best first pick if you just need a simple compact deployment

Capabilities

forecastingzero-shotlong-contexthigh-throughput

Tags

timemoemoeautoregressivelong-context

Specifications

Parameters
200M
Architecture
decoder-only transformer with sparse Mixture-of-Experts routing
Context length
512
Max output
1,024
Avg latency
n/a
Uptime
n/a
Rate limit
n/a
Accelerator
NVIDIA GPU
Regions
Virginia, US
License
n/a

Pricing

Input / 1M tokens
$0.5000
Output / 1M tokens
$1.50

Performance

Average latency
n/a
Availability
n/a
Rate limit
n/a