Network Traffic Forecasting and Telecom Capacity Planning with TSFMs
Telecom and network operators generate some of the richest time series data on earth — CDN throughput, cell tower load, DNS query rates — and time series foundation models can forecast it without per-metric training.
Telecom networks produce time series data at a scale and granularity that few other industries match. Every router, switch, cell tower, and CDN edge node emits continuous streams of counters: bytes transferred, packets dropped, connections opened, latency percentiles, error rates. Traditionally, capacity planning teams have relied on SNMP polling, NetFlow aggregation, and hand-tuned threshold alerts to manage this data. These tools are effective for monitoring what is happening right now, but they offer limited help with the harder question: what will happen in the next hour, day, or week? Time series foundation models offer a fundamentally different approach — one that can forecast network load across thousands of metrics simultaneously, without training a separate model for each one.
The Shape of Network Traffic
Network time series have distinctive patterns that make them both predictable and challenging. CDN traffic and backbone throughput follow strong diurnal cycles driven by human activity: traffic climbs through the morning, peaks in the evening during streaming hours, and drops overnight. Layered on top of the daily rhythm are weekly patterns — weekday enterprise traffic differs sharply from weekend consumer traffic — and longer-term growth trends as subscriber bases expand and per-user bandwidth consumption increases.
Then there are the event-driven spikes. A major sporting event can double CDN throughput in a region within minutes. Election nights produce sustained surges in DNS query rates and API call volumes. Software update rollouts from major vendors create predictable but short-lived bandwidth spikes across ISP backbones. These events break simple seasonal models but follow patterns that a foundation model pretrained on diverse time series can learn to anticipate.
Cell tower load adds spatial complexity. A single tower's traffic profile depends on population density, commute patterns, nearby venues, and even weather. Rush hour shifts load from residential towers to those along transit corridors. A stadium event concentrates thousands of connections onto a handful of nearby cells. These spatial-temporal dynamics create the kind of complex, multi-factor seasonality where zero-shot forecasting excels — the model has seen enough similar patterns during pretraining to produce useful forecasts without site-specific training.
Anomaly Detection for Network Security and Reliability
Beyond forecasting, the same foundation models enable anomaly detection on network telemetry. A DDoS attack manifests as a sudden, sustained spike in inbound traffic or connection attempts that diverges sharply from the model's forecast. A link failure causes traffic to reroute, producing simultaneous anomalies across multiple metrics: throughput drops on the failed link, corresponding increases on backup paths, and latency changes across the affected network segment. BGP routing changes create abrupt shifts in traffic distribution that look like regime changes in the time series.
The forecast-residual approach — flagging observations that fall outside predicted intervals — is well-suited to these problems because it adapts to each metric's individual baseline. A traffic level that is normal for a major peering point would be anomalous for a rural cell site, and the model captures this context without manual threshold configuration. For networks handling millions of concurrent flows, this eliminates the impractical task of setting and maintaining static alert thresholds across thousands of monitoring points.
Capacity Planning and Cost Optimization
The core value of network traffic forecasting is capacity planning. Over-provisioning wastes capital expenditure on unused bandwidth, rack space, and power. Under-provisioning risks congestion, packet loss, and SLA violations. The window between these outcomes is often narrow, and the cost of getting it wrong compounds across hundreds of network elements.
TSFMs can forecast peak loads days or weeks ahead with sufficient accuracy to inform provisioning decisions. A CDN operator can predict regional traffic peaks to pre-position content at edge nodes. A mobile operator can forecast per-tower load to schedule maintenance windows during predicted traffic troughs. An ISP can project backbone utilization growth to time capacity upgrades before congestion materializes rather than after.
At shorter horizons — 15 to 60 minutes ahead — forecasts enable predictive auto-scaling. Cloud-based network functions and CDN edge compute can scale capacity in advance of predicted demand rather than reacting to load after latency has already increased. This is the same principle behind Toto's infrastructure forecasting, and Datadog's domain-specific model is particularly relevant here given its training on real-world infrastructure telemetry at scale.
The cost implications are direct. Right-sizing CDN capacity based on forecasted demand rather than peak-provisioning can reduce bandwidth costs significantly. Predictive scaling of cloud-based network functions avoids both the latency penalty of reactive scaling and the waste of static over-provisioning. Transit and peering commitments can be negotiated against forecasted traffic profiles rather than conservative estimates.
Complementing Traditional Monitoring
TSFMs do not replace SNMP, NetFlow, or sFlow — they complement them. Traditional monitoring tools remain essential for real-time visibility, flow-level analysis, and protocol-specific diagnostics. What foundation models add is the temporal reasoning layer: understanding how current observations relate to historical patterns and projecting that understanding forward.
A practical integration looks like this: NetFlow collectors aggregate traffic data and feed it into a time series store. A forecasting pipeline calls the TSFM inference endpoint on a schedule — hourly for capacity planning horizons, every few minutes for auto-scaling triggers. The resulting forecasts and anomaly scores feed back into the existing monitoring dashboards and alerting systems. Model routing can direct infrastructure metrics to Toto while sending other business time series to general-purpose models like Chronos or TimesFM. For architecture details, see building production forecast pipelines.
The Cold Start Problem: New Deployments
One of the hardest problems in network capacity planning is forecasting load for new infrastructure — a newly deployed cell tower, a new CDN point of presence, or a freshly provisioned cloud region. There is no historical data to train on, and transferring models from existing sites requires assumptions about similarity that may not hold.
This is where zero-shot transfer from foundation models provides the most leverage. A TSFM pretrained on diverse network traffic patterns can produce reasonable forecasts for a new cell tower from its first few hours of data, because it has internalized the general shape of cellular traffic: the diurnal cycle, the weekday-weekend distinction, the relationship between population density and peak load. The forecasts will not be as accurate as those for a tower with months of history, but they are substantially better than the alternative of no forecast at all or a naive copy from a supposedly similar site.
Public network traffic datasets like the CAIDA internet traces and the Abilene backbone dataset provide additional context for evaluating model performance on network-specific patterns, though real production telemetry inevitably differs from research captures.
Getting Started
Network operators can begin experimenting with TSFM-based forecasting without overhauling their monitoring stack. Export a representative set of traffic metrics from your existing monitoring platform, explore model behavior in the playground, and compare forecast accuracy against your current approach. The model catalog lists available models with their context lengths and output characteristics — GPU optimization for inference scaling becomes relevant as you move from experimentation to production deployment across thousands of monitored interfaces.