Timer-S1

online

bytedance-research/Timer-S1

8.3B total / 0.75B active params | 12K context | $0.00025 per forecast | Apache-2.0

Timer-S1 is ByteDance's billion-scale sparse MoE time-series foundation model with 8.3B total parameters and 0.75B active parameters per token. It uses Serial-Token Prediction (STP) and an 11.5K context window to achieve state-of-the-art MASE and CRPS on the GIFT-Eval leaderboard. Outputs 9-quantile probabilistic forecasts natively. Apache-2.0 licensed.

Model Classification

Family

Timer-S1

Type

time series foundation model

Pretrained time-series model exposed on TSFM.ai for zero-shot or few-shot forecasting workloads.

Resources

HuggingFace Paper

Training Data

TimeBench, a curated corpus with one trillion time points spanning diverse domains, with data augmentation to mitigate predictive bias.

Recommended For

• Zero-shot univariate forecasting with causal autoregressive generation
• Long-horizon prediction across heterogeneous time-series domains

Strengths

• TimeAttention unifies variable-length and multi-resolution inputs
• Strong zero-shot performance from 260B-point pretraining

Limitations

• Smaller model family with fewer checkpoint size options than Chronos or Moirai
• Causal-only architecture limits suitability for bidirectional tasks like imputation
• Hosted Timer serving works best once you have at least one full 96-point patch of history; very short series are a poor fit
• The current hosted checkpoint can flatten simple repeated seasonal toy probes more than newer specialist zero-shot models

Not Ideal For

• Histories shorter than one full 96-point patch
• Users who need strong seasonal continuation on very small repeated-pattern probes without tuning

Capabilities

forecastingquantile-forecastingzero-shotlong-context

Specifications

Parameters: 8.3B total / 0.75B active
Architecture: decoder-only sparse MoE transformer with Serial-Token Prediction (STP)
Context length: 11,520
Max context: 11,520
Minimum history: n/a
Recommended history: n/a
Input step: n/a
Required target series: 1
Temperature: Ignored
Top P: Ignored
Max output: 2,048
Avg latency: n/a
Uptime: n/a
Plan limits: 1,000 rpm free · 1,000,000 rpm with billing
Accelerator: A10G
Regions: Virginia, US
License: Apache-2.0

Pricing

Per forecast: $0.00025

Performance

Average latency: n/a
Availability: n/a
Plan limits: 1,000 rpm free · 1,000,000 rpm with billing

Timer-S1

Model Classification

Resources

Training Data

Recommended For

Strengths

Limitations

Not Ideal For

Capabilities

Tags

Specifications

Pricing

Performance