Improving deterministic forecasts of maximum and minimum temperature using machine learning
Abstract
Accurately forecasting near-surface temperature is essential for heatwave and cold-wave warnings and impact-based decision support over India. Deterministic numerical weather prediction (NWP) models show systematic, regionally varying biases that increase with lead times. To improve the reliability of these forecasts, bias correction is essential. This study applies a multivariate machine-learning (ML) bias-correction framework to location specific 2 m maximum (Tmax) and minimum (Tmin) temperature forecasts from the operational NWP model at the National Centre for Medium Range Weather Forecasting (NCMRWF). Data from 179 India Meteorological Department (IMD) stations covering the period 2019–2024 were used. Four ML methods, Random Forest (RF), eXtreme Gradient Boosting (XGB), Long Short-Term Memory (LSTM), and Convolutional Neural Networks (CNNs) were used for bias correction of the forecasts at the 179 stations. The ML models were assessed using continuous metrics like mean error (ME), root mean square error (RMSE), and correlation/Taylor diagnostics. Along with these categorical skills for extremes, metrics like equitable threat score (ETS) and Heidke Skill Score (HSS) (for Tmax ≥ 30/35 °C in MAMJ (March–June) and Tmin ≤ 10/15 °C in DJF (December–February)), and Relative Economic Value (REV) were used. It is found that ML post-processing substantially reduces bias and error across stations and lead times. For Tmax, RMSE improvement increases with lead time, typically ∼10–15% at Day-1, ∼20–30% by Day-5, and frequently >30–40% (locally reaching ∼50–60%) by Day-9, especially for XGB/LSTM. For Tmin, improvements are strongest: XGB improves RMSE by ∼25–40% at Day-1, increasing to ∼40–60% by Day-7 to Day-9 across many stations. Categorical verification shows consistent improvements in terms of higher ETS/HSS values after bias correction across most stations. Winter Tmin shows large gains for both thresholds, particularly for Tmin ≤ 15 °C. REV analysis indicates that ML-corrected forecasts remain economically useful over a wider range of cost–loss ratios and retain value at longer lead times compared to the raw model. Overall, XGB provides the most consistent improvement across regions and metrics, RF is generally second-best, LSTM shows competitive performance, particularly for Tmax and at longer lead times, while CNN performs worst. SHAP-based analysis links the corrections to physically meaningful drivers, with Tmax corrections dominated by boundary-layer/land-surface predictors and Tmin corrections dominated by radiative and synoptic controls.

Please wait while we load your content...