Tsinghua University logo

Timer-S1

online
bytedance-research/Timer-S1

8.3B total / 0.75B active params | 12K context | $0.00025 per forecast | Apache-2.0

Timer-S1 is ByteDance's billion-scale sparse MoE time-series foundation model with 8.3B total parameters and 0.75B active parameters per token. It uses Serial-Token Prediction (STP) and an 11.5K context window to achieve state-of-the-art MASE and CRPS on the GIFT-Eval leaderboard. Outputs 9-quantile probabilistic forecasts natively. Apache-2.0 licensed.

Model Classification

Family

Timer-S1

Type

time series foundation model

Pretrained time-series model exposed on TSFM.ai for zero-shot or few-shot forecasting workloads.

Training Data

TimeBench, a curated corpus with one trillion time points spanning diverse domains, with data augmentation to mitigate predictive bias.

Recommended For

  • Zero-shot univariate forecasting with causal autoregressive generation
  • Long-horizon prediction across heterogeneous time-series domains

Strengths

  • TimeAttention unifies variable-length and multi-resolution inputs
  • Strong zero-shot performance from 260B-point pretraining

Limitations

  • Smaller model family with fewer checkpoint size options than Chronos or Moirai
  • Causal-only architecture limits suitability for bidirectional tasks like imputation
  • Hosted Timer serving works best once you have at least one full 96-point patch of history; very short series are a poor fit
  • The current hosted checkpoint can flatten simple repeated seasonal toy probes more than newer specialist zero-shot models

Not Ideal For

  • Histories shorter than one full 96-point patch
  • Users who need strong seasonal continuation on very small repeated-pattern probes without tuning

Capabilities

forecastingquantile-forecastingzero-shotlong-context

Tags

bytedancetimermoebillion-scaleprobabilisticquality-tier

Specifications

Parameters
8.3B total / 0.75B active
Architecture
decoder-only sparse MoE transformer with Serial-Token Prediction (STP)
Context length
11,520
Max context
11,520
Minimum history
n/a
Recommended history
n/a
Input step
n/a
Required target series
1
Temperature
Ignored
Top P
Ignored
Max output
2,048
Avg latency
n/a
Uptime
n/a
Plan limits
1,000 rpm free · 1,000,000 rpm with billing
Accelerator
A10G
Regions
Virginia, US
License
Apache-2.0

Pricing

Per forecast
$0.00025

Performance

Average latency
n/a
Availability
n/a
Plan limits
1,000 rpm free · 1,000,000 rpm with billing
Timer-S1 — TSFM.ai