Agricultural Commodity Forecasting: When Zero-Shot TSFMs Beat Futures-Based USDA Baselines
A new agricultural price forecasting study compares 17 methods on USDA ERS commodity prices from 1997-2025. The surprising result: zero-shot TSFMs take the top five monthly forecasting ranks, and Time-MoE beats futures-based USDA benchmarks on wheat and corn despite using only historical cash prices.
Most time series foundation model benchmarks live in familiar territory: electricity load, traffic sensors, weather stations, retail demand, and synthetic pattern collections. A new agricultural forecasting study asks a more operational question: can zero-shot TSFMs forecast commodity prices well enough to challenge the expert systems used by agricultural agencies?
The Promise of Time-Series Foundation Models for Agricultural Forecasting tests 17 forecasting approaches on monthly USDA commodity prices from 1997 through 2025. The dataset covers corn, soybeans, wheat, and cotton. The methods include traditional statistical models, machine learning models, deep learning models trained from scratch, and five TSFMs: Chronos, Chronos-2, TimesFM 2.5, Time-MoE, and Moirai-2.

The result is unusually clean: the five TSFMs take the top five positions on monthly price forecasting. Time-MoE leads overall, followed by Chronos, Chronos-2, TimesFM 2.5, and Moirai-2. More strikingly, when the authors aggregate monthly forecasts into Marketing Year Average price forecasts and compare them with USDA ERS season-average price forecasts, TSFMs beat the USDA benchmark on three of four commodities in the recent evaluation window.
This is not a replacement for USDA forecasting. It is a strong signal that zero-shot temporal pretraining can extract useful commodity-price structure from short historical sequences that traditional and from-scratch models miss.
#Why This Benchmark Is Hard
Agricultural commodity prices are a deliberately awkward test case for TSFMs.
The series are short. Monthly prices from 1997 to 2025 provide only a few hundred observations per commodity, not the thousands or millions that deep learning models usually want. The data also includes structural breaks: the 2008 financial crisis, the 2012 drought, COVID-era disruptions, inflation shocks, and changing biofuel, trade, and supply-chain dynamics.
The target is operationally meaningful. USDA ERS publishes season-average price forecasts, also called Marketing Year Average (MYA) prices. These are not just next-month forecasts. They aggregate monthly prices across the crop marketing year using marketing percentages that reflect when farmers typically sell the crop. For corn and soybeans, the marketing year runs September through August; for wheat, June through May; for cotton, August through July.
The comparison is also asymmetric. USDA's benchmark forecast uses a futures-plus-basis methodology, incorporating forward-looking futures market information. The TSFMs in the study use historical cash prices only. If a zero-shot TSFM beats that benchmark, it is not because it had more market information. It is because it extracted useful temporal structure from the historical price path.
#The Monthly Forecasting Result
The first evaluation is monthly price forecasting across 768 forecast instances. Here, the TSFMs dominate the model ranking:

| Rank | Model | Monthly MAE | Model type |
|---|---|---|---|
| 1 | Time-MoE | 0.693 | Foundation |
| 2 | Chronos | 0.734 | Foundation |
| 3 | Chronos-2 | 0.736 | Foundation |
| 4 | TimesFM 2.5 | 0.736 | Foundation |
| 5 | Moirai-2 | 0.751 | Foundation |
That ordering matters less than the grouping. All five foundation models outrank the traditional, machine learning, and deep learning baselines. Deep learning models trained from scratch perform poorly because the commodity histories are small. Foundation models arrive with temporal priors learned elsewhere, then apply them zero-shot.
The commodity-specific picture is more nuanced:
- Time-MoE leads on corn and wheat.
- Chronos leads on soybeans.
- Seasonal naive is strongest on cotton in parts of the evaluation.
- No single model dominates every commodity.
That last point is worth keeping. Agricultural forecasting is not one distribution. Corn and wheat behave differently from soybeans, and cotton behaves differently again. A model routing strategy is more natural than declaring a universal winner.
#The USDA Comparison
The second evaluation is the more interesting one for practitioners. The authors convert monthly forecasts into MYA forecasts and compare them with USDA ERS season-average price forecasts for marketing years 2017-2024, excluding the COVID shock year.

The headline results:
| Commodity | USDA MAE | Best model | Model MAE | Improvement |
|---|---|---|---|---|
| Corn | 0.328 | Time-MoE | 0.267 | +18.5% |
| Soybeans | 0.396 | TimesFM | 0.369 | +6.9% |
| Wheat | 1.201 | Time-MoE | 0.542 | +54.9% |
| Cotton | 0.045 | Seasonal naive | 0.094 | -110.5% |
The wheat result is the standout. Time-MoE cuts MAE by more than half relative to USDA in the study's recent evaluation window. Corn is also meaningfully better. Soybeans show a smaller edge for TimesFM. Cotton is the counterexample: USDA remains much better, and all evaluated models struggle.
That pattern is more useful than a single accuracy headline. It suggests TSFMs can be valuable complements to expert forecasts, especially where historical price dynamics contain structure not fully captured by futures-plus-basis methods. It also shows where agency forecasts still dominate.
#Why Time-MoE Fits This Domain
Time-MoE is a sparse mixture-of-experts model. Instead of running every parameter for every input, a router activates selected experts for a given token. That architecture is a good match for heterogeneous forecasting corpora because different experts can specialize in different temporal regimes.
Commodity prices are regime-heavy. A quiet marketing year, a drought year, and a trade-shock year do not look like variations of the same smooth seasonal curve. An MoE model can, in principle, route different contexts toward different expert sub-networks. The paper does not prove that the router is choosing economically meaningful regimes, but Time-MoE's performance on wheat and corn is consistent with the idea that sparse specialization helps when the data-generating process changes.
Chronos, Chronos-2, TimesFM, and Moirai-2 also do well, which means the result is not just a Time-MoE story. The stronger claim is that pretrained TSFMs, as a class, beat models trained from scratch when the target domain has short histories and recurring but unstable structure.
#What This Means for Forecasting Systems
The operational takeaway is not "replace USDA forecasts with TSFMs." The study's authors are careful about scope, and practitioners should be too. The USDA benchmark includes institutional knowledge, futures prices, and a transparent methodology. It is optimized for public reporting and policy use, not just backtest accuracy.
The stronger architecture is an ensemble:

Use agency forecasts as a high-information baseline. USDA ERS forecasts incorporate market and domain information that historical-price-only TSFMs do not see.
Run multiple TSFMs as independent zero-shot signals. Time-MoE, Chronos, Chronos-2, TimesFM, and Moirai-2 produce different errors across commodities. Their disagreement is useful.
Route by commodity and horizon. Wheat and corn may justify Time-MoE as a primary signal. Soybeans may prefer TimesFM or Chronos depending on the evaluation split. Cotton should likely weight USDA much more heavily until a TSFM demonstrates robust gains.
Use probabilistic intervals where available. The paper evaluates point forecasts for comparability with USDA. In production, prediction intervals matter because price-risk decisions need downside and upside scenarios, not only means.
#Limits of the Study
The study is compelling, but the limitations matter:
The USDA comparison window is small. Marketing years 2017-2024 excluding COVID leaves seven annual observations per commodity. That is enough to motivate follow-up work, not enough to declare a permanent winner.
The TSFMs use price history only. That makes the benchmark impressive but also unrealistic for a best possible production system. Commodity forecasting should include futures curves, basis, stocks-to-use ratios, weather, acreage, policy, exports, and macro conditions when available.
Cotton is a real failure case. USDA wins decisively. Any system that uses TSFMs blindly across all commodities would underperform exactly where the expert benchmark is strongest.
Point forecasts hide risk quality. Moirai-2 and Chronos-2 can emit quantiles, but the benchmark reduces outputs to point forecasts. Future evaluations should compare calibration, CRPS, and interval coverage, especially for hedging and procurement.
#Why This Matters Beyond Agriculture
Agriculture is a useful stress test for TSFMs because it combines short histories, high stakes, external shocks, and an expert benchmark with real institutional authority. If zero-shot models can add signal here, they are likely useful in other sparse industrial domains: specialty chemicals, industrial inputs, wholesale energy contracts, freight rates, and regional construction materials.
The broader lesson echoes what we have seen in retail demand planning and supply chain forecasting: pretrained temporal representations are most valuable when there are too many related series to hand-model, too little data per individual series to train deep models, and enough recurring structure for transfer learning to help.
Commodity-price forecasting will not be solved by zero-shot TSFMs alone. But this benchmark shows they deserve a seat at the table alongside econometric models, futures-based expert forecasts, and domain-specific feature pipelines.
Primary sources: Agricultural TSFM forecasting paper, USDA ERS Season-Average Price Forecasts, Time-MoE paper, Time-MoE model weights.