Timer-S1
onlinebytedance-research/Timer-S18.3B total / 0.75B active params | 512 context | $0.5000 input | $1.50 output | Apache-2.0
Timer-S1 is ByteDance's billion-scale sparse MoE time-series foundation model with 8.3B total parameters and 0.75B active parameters per token. It uses Serial-Token Prediction (STP) and an 11.5K context window to achieve state-of-the-art MASE and CRPS on the GIFT-Eval leaderboard. Outputs 9-quantile probabilistic forecasts natively. Apache-2.0 licensed.
Model Classification
Family
Timer
Type
time series foundation model
Pretrained time-series model exposed on TSFM.ai for zero-shot or few-shot forecasting workloads.
Resources
Training Data
TimeBench, a curated corpus with one trillion time points spanning diverse domains, with data augmentation to mitigate predictive bias.
Recommended For
- • Zero-shot multivariate forecasting with causal autoregressive generation
- • Long-horizon prediction across heterogeneous time-series domains
Strengths
- • TimeAttention unifies variable-length and multi-resolution inputs
- • Strong zero-shot performance from 260B-point pretraining
Limitations
- • Smaller model family with fewer checkpoint size options than Chronos or Moirai
- • Causal-only architecture limits suitability for bidirectional tasks like imputation
Capabilities
forecastingquantile-forecastingzero-shotlong-context
Tags
bytedancetimermoebillion-scaleprobabilisticquality-tier
Specifications
- Parameters
- 8.3B total / 0.75B active
- Architecture
- decoder-only sparse MoE transformer with Serial-Token Prediction (STP)
- Context length
- 512
- Max output
- 1,024
- Avg latency
- n/a
- Uptime
- n/a
- Plan limits
- 1,000 rpm free · 1,000,000 rpm with billing
- Accelerator
- NVIDIA GPU
- Regions
- Virginia, US
- License
- Apache-2.0
Pricing
- Input / 1M tokens
- $0.5000
- Output / 1M tokens
- $1.50
Performance
- Average latency
- n/a
- Availability
- n/a
- Plan limits
- 1,000 rpm free · 1,000,000 rpm with billing