Time-MoE-200M

online

Maple728/TimeMoE-200M

0.5B total / 200M active params | 4K context | $0.00025 per forecast | Apache-2.0

Time-MoE-200M is the larger public checkpoint in Xiaohongshu's Time-MoE family. The official paper and repo position the family as a way to scale time-series model capacity more efficiently than dense decoder-only models while keeping long-context autoregressive forecasting practical. The published checkpoint reports about 0.5B stored parameters while activating roughly 200M per forward pass.

Model Classification

Family

TimeMoE

Type

time series foundation model

Pretrained time-series model exposed on TSFM.ai for zero-shot or few-shot forecasting workloads.

Resources

HuggingFace Paper

Training Data

Time-300B, a multi-domain corpus spanning more than nine domains and over 300B time points, per the official paper and repository.

Recommended For

• Long-context forecasting with sparse-expert scaling
• Teams exploring MoE behavior in time-series foundation models

Strengths

• Sparse experts offer large-capacity behavior without a fully dense footprint
• Well aligned to long-context autoregressive forecasting

Limitations

• MoE operational behavior can be less familiar than dense baselines
• Not the best first pick if you just need a simple compact deployment

Capabilities

forecastingzero-shotlong-contexthigh-throughput

Specifications

Parameters: 0.5B total / 200M active
Architecture: decoder-only transformer with sparse Mixture-of-Experts routing
Context length: 4,096
Max context: 4,096
Minimum history: n/a
Recommended history: n/a
Input step: n/a
Required target series: 1
Temperature: Ignored
Top P: Ignored
Max output: 1,024
Avg latency: n/a
Uptime: n/a
Plan limits: 1,000 rpm free · 1,000,000 rpm with billing
Accelerator: T4
Regions: Virginia, US
License: Apache-2.0

Pricing

Per forecast: $0.00025

Performance

Average latency: n/a
Availability: n/a
Plan limits: 1,000 rpm free · 1,000,000 rpm with billing