Tsinghua University logo

Timer-S1

online
bytedance-research/Timer-S1

8.3B total / 0.75B active params | 512 context | $0.5000 input | $1.50 output | Apache-2.0

Timer-S1 is ByteDance's billion-scale sparse MoE time-series foundation model with 8.3B total parameters and 0.75B active parameters per token. It uses Serial-Token Prediction (STP) and an 11.5K context window to achieve state-of-the-art MASE and CRPS on the GIFT-Eval leaderboard. Outputs 9-quantile probabilistic forecasts natively. Apache-2.0 licensed.

Model Classification

Family

Timer

Type

time series foundation model

Pretrained time-series model exposed on TSFM.ai for zero-shot or few-shot forecasting workloads.

Training Data

TimeBench, a curated corpus with one trillion time points spanning diverse domains, with data augmentation to mitigate predictive bias.

Recommended For

  • Zero-shot multivariate forecasting with causal autoregressive generation
  • Long-horizon prediction across heterogeneous time-series domains

Strengths

  • TimeAttention unifies variable-length and multi-resolution inputs
  • Strong zero-shot performance from 260B-point pretraining

Limitations

  • Smaller model family with fewer checkpoint size options than Chronos or Moirai
  • Causal-only architecture limits suitability for bidirectional tasks like imputation

Capabilities

forecastingquantile-forecastingzero-shotlong-context

Tags

bytedancetimermoebillion-scaleprobabilisticquality-tier

Specifications

Parameters
8.3B total / 0.75B active
Architecture
decoder-only sparse MoE transformer with Serial-Token Prediction (STP)
Context length
512
Max output
1,024
Avg latency
n/a
Uptime
n/a
Plan limits
1,000 rpm free · 1,000,000 rpm with billing
Accelerator
NVIDIA GPU
Regions
Virginia, US
License
Apache-2.0

Pricing

Input / 1M tokens
$0.5000
Output / 1M tokens
$1.50

Performance

Average latency
n/a
Availability
n/a
Plan limits
1,000 rpm free · 1,000,000 rpm with billing