Time-MoE-200M
onlineMaple728/TimeMoE-200M200M params | 512 context | $0.5000 input | $1.50 output
Time-MoE-200M is a sparse Mixture-of-Experts forecasting model from the Time-MoE family. The official paper and repo position the family as a way to scale time-series model capacity more efficiently than dense decoder-only models while keeping long-context autoregressive forecasting practical. It is a good fit when you want long-context zero-shot forecasting with explicit sparse-expert scaling.
Model Classification
Family
Time-MoE
Type
time series foundation model
Pretrained time-series model exposed on TSFM.ai for zero-shot or few-shot forecasting workloads.
Resources
Training Data
Time-300B, a multi-domain corpus spanning more than nine domains and over 300B time points, per the official paper and repository.
Recommended For
- • Long-context forecasting with sparse-expert scaling
- • Teams exploring MoE behavior in time-series foundation models
Strengths
- • Sparse experts offer large-capacity behavior without a fully dense footprint
- • Well aligned to long-context autoregressive forecasting
Limitations
- • MoE operational behavior can be less familiar than dense baselines
- • Not the best first pick if you just need a simple compact deployment
Capabilities
forecastingzero-shotlong-contexthigh-throughput
Tags
timemoemoeautoregressivelong-context
Specifications
- Parameters
- 200M
- Architecture
- decoder-only transformer with sparse Mixture-of-Experts routing
- Context length
- 512
- Max output
- 1,024
- Avg latency
- n/a
- Uptime
- n/a
- Rate limit
- n/a
- Accelerator
- NVIDIA GPU
- Regions
- Virginia, US
- License
- n/a
Pricing
- Input / 1M tokens
- $0.5000
- Output / 1M tokens
- $1.50
Performance
- Average latency
- n/a
- Availability
- n/a
- Rate limit
- n/a