Emerging·400 sources · 6,400 series·Source surface still stabilizing; live row ingestion will follow once the public feed is durable

Impermanent

Impermanent is designed to test whether benchmark wins survive once real time passes. It scores forecasts sequentially on a continuously updating GitHub activity stream, so the benchmark reflects temporal drift and the fact that future observations do not exist at training time.

What this benchmark answers

Does model performance hold up as real time passes and the data distribution shifts?

Methodology

The benchmark uses a prequential protocol: models forecast before outcomes exist, scores accumulate over time, and rankings reflect sustained performance under live temporal change rather than one-off wins on a frozen split.

Impermanent is now part of the benchmark surface, but the public leaderboard feed is still stabilizing. TSFM.ai will publish automated rankings here once upstream exposes a durable machine-readable source.

Impermanent is now part of the benchmark surface, but the public leaderboard feed is still stabilizing. TSFM.ai will publish automated rankings here once upstream exposes a durable machine-readable source.

The temporal contamination problem

Static benchmarks freeze a test split at one point in time. Once that split is public, model authors can overfit to it — intentionally or not — and benchmark scores stop reflecting real-world performance. Impermanent sidesteps this by scoring forecasts before outcomes exist: the future data literally does not exist at evaluation time, so there is no split to leak.

How the prequential protocol works

Models submit forecasts on a rolling basis against a continuously updating stream of GitHub activity data (issues opened, PRs merged, pushes, stargazers) across 400 repositories and 4 frequencies. Scores accumulate over time, so a model that performs well early but degrades under distribution shift will see its aggregate ranking fall. This makes Impermanent a measure of sustained forecasting ability, not one-off performance.

How to interpret it

  • Impermanent is the right benchmark when you care about temporal drift, not just a frozen snapshot.
  • A strong live benchmark should tell you whether early leaderboard wins persist under new data.
  • Because the public interface is still settling, treat the methodology as the durable asset and the rows as an upcoming integration.

Frequently asked questions

What is the Impermanent benchmark?
Impermanent is a live benchmark for time series foundation models that tests temporal generalization. It uses a prequential protocol on continuously updating GitHub activity data, scoring models on forecasts they made before outcomes existed.
Why is Impermanent listed as an emerging benchmark?
The benchmark methodology and dataset are established, but the public machine-readable leaderboard feed is still stabilizing. TSFM.ai will publish automated rankings once the upstream source exposes a durable feed.
What is temporal contamination?
Temporal contamination occurs when benchmark test data exists at the time of model training, allowing models to overfit — intentionally or not — to the specific test split. Impermanent avoids this because future observations do not exist at evaluation time.
What data does Impermanent use?
Impermanent uses GitHub activity data from 400 repositories across 4 event types (issues, pull requests, pushes, stargazers) at 4 frequencies, producing 6,400 total time series.

Related reading

Compare with other TSFM benchmarks

FEV Bench

How well does a model generalize zero-shot to unseen forecasting series?

GIFT-Eval

Which models stay strong across heterogeneous datasets and probabilistic settings?

BOOM

How do models behave on observability telemetry instead of academic datasets?

Sources