productionmlopspipelinesengineering

Building Production Forecast Pipelines with TSFMs

A practical guide to moving time series foundation models from notebooks to production-grade forecasting systems.

TSFM.ai Team

August 30, 20244 min read

Getting a time series foundation model to produce impressive forecasts in a Jupyter notebook is the easy part. The hard part is wrapping that model in a system that runs reliably every day, handles messy real-world data, serves results at low latency, and alerts you when something goes wrong. This post walks through the architecture of a production forecast pipeline built on TSFMs, covering each stage from data ingestion to delivery.

#Stage 1: Data Ingestion and Validation

Every pipeline starts with data. In production, your time series arrive from databases, streaming platforms, APIs, or flat file drops. The first task is ingestion with schema validation: confirm that timestamps parse correctly, values are numeric, and the expected columns are present.

Beyond schema checks, you need statistical validation. Flag series that are entirely constant (likely a dead sensor), series with sudden order-of-magnitude jumps (possible unit changes), and series that stopped updating (stale data). These checks prevent garbage from propagating downstream and producing confidently wrong forecasts.

ingest(source) -> raw_series
validate_schema(raw_series) -> checked_series
validate_statistics(checked_series) -> validated_series

#Stage 2: Preprocessing

TSFMs are more robust to messy data than traditional statistical models, but they still benefit from clean inputs. Key preprocessing steps include:

Missing value handling. Short gaps (a few timestamps) can be forward-filled or linearly interpolated. Longer gaps may require marking the series as incomplete and adjusting the context window to only use the contiguous segment after the gap.

Frequency alignment. If your model expects hourly data but your source provides irregular samples, you need to resample. Downsampling (e.g., tick data to minutely) involves aggregation. Upsampling requires interpolation or simply accepting a coarser frequency.

Normalization. Most TSFMs handle normalization internally (instance normalization or reversible normalization), but some pipelines benefit from explicit scaling, especially when combining series with wildly different magnitudes into batched inference calls. Libraries like GluonTS provide utilities for standardized preprocessing and data loading.

fill_gaps(validated_series) -> continuous_series
align_frequency(continuous_series, target_freq) -> aligned_series
normalize(aligned_series) -> model_ready_series

#Stage 3: Model Selection

Not every series deserves the same model. A production pipeline should route series to the most appropriate TSFM based on characteristics:

Univariate, short-horizon, high volume: Chronos-Small or TimesFM offer fast inference with solid accuracy.
Multivariate with cross-variate dependencies: Moirai's any-variate attention captures inter-series correlations.
Domain-specific with available fine-tuning data: Lag-Llama or a fine-tuned Moirai variant can outperform general zero-shot models.
Probabilistic requirements: If calibrated prediction intervals matter, prefer Moirai or Chronos over pure point-forecast models.

This routing can be rule-based or learned from historical forecast accuracy on each series.

#Stage 4: Inference

Batching is critical for throughput. Group series by frequency and context length so they can be stacked into tensors and processed in a single GPU forward pass. Padding and masking handle length differences within a batch.

For latency-sensitive applications, consider caching model weights in GPU memory and using persistent inference servers rather than cold-starting on each request. Tools like vLLM can help manage GPU-optimized inference. ONNX export or TensorRT optimization can reduce per-series latency from hundreds of milliseconds to single-digit milliseconds.

batch(model_ready_series, batch_size=64) -> batches
for batch in batches:
    raw_forecasts = model.predict(batch, horizon=H, quantiles=[0.1, 0.5, 0.9])

#Stage 5: Post-Processing

Raw model outputs need transformation before they are useful. Apply inverse normalization to return forecasts to original scale. If you ran multiple models, this is where ensemble combination happens, often a simple weighted average based on recent backtest performance.

Calibration is an underappreciated step. Check whether your 90% prediction intervals actually cover 90% of recent observations. If they are systematically too narrow or too wide, apply conformal calibration: adjust quantile levels using a held-out calibration set to achieve the desired coverage.

inverse_normalize(raw_forecasts) -> scaled_forecasts
ensemble(scaled_forecasts, weights) -> combined_forecasts
calibrate_intervals(combined_forecasts, calibration_set) -> final_forecasts

#Stage 6: Delivery

Forecasts are useless if they do not reach the consumer. Delivery mechanisms depend on the use case: write to a database table for BI dashboards, return via REST API for real-time applications, push to a message queue for downstream systems, or materialize into a spreadsheet for finance teams who refuse to give up Excel.

Each delivery channel has different requirements for format (JSON, Parquet, CSV), latency (sub-second vs. daily batch), and completeness (do you include all quantiles or just the median?).

#Monitoring and Observability

A deployed forecast pipeline must monitor three things continuously:

Forecast accuracy. Track rolling metrics like MASE, MAPE, and Weighted Quantile Loss against actuals as they arrive. Degrade gracefully: if a model's accuracy drops below a threshold, fall back to a simpler baseline.

Data drift. Monitor input distributions for shifts that might invalidate model assumptions. Sudden changes in mean, variance, or seasonality patterns warrant investigation and may trigger anomaly detection workflows.

System health. Track inference latency, GPU utilization, queue depth, and error rates. Set alerts for anomalies in any of these signals.

#Where TSFM.ai Fits

Building this pipeline from scratch requires significant engineering investment. TSFM.ai abstracts the complexity of model selection, batched inference, calibration, and monitoring behind a unified API. You send your time series and receive calibrated probabilistic forecasts, without managing GPU infrastructure, model versioning, or preprocessing logic yourself. Try it in the playground or browse available models. The goal is to let your team focus on the business decisions that forecasts enable, not on the plumbing that produces them.

Building Production Forecast Pipelines with TSFMs

#Stage 1: Data Ingestion and Validation

#Stage 2: Preprocessing

#Stage 3: Model Selection

#Stage 4: Inference

#Stage 5: Post-Processing

#Stage 6: Delivery

#Monitoring and Observability

#Where TSFM.ai Fits

Run these models on your own data

Related articles

Traditional Forecasting vs. TSFMs: The True Cost of Building and Maintaining Enterprise Forecast Pipelines

ForecastOps: Local-First Observability for Production Forecasts

Local Hosting for Time Series Foundation Models