Keeping machine learning models sharp: Data drift in time series forecasting

Model performance often takes center stage in machine learning (ML). Yet behind every successful model is the data it relies on – a dynamic and evolving foundation that can make or break the accuracy of predictions.

One possible source of errors is data drift, which occurs when the data distribution at inference time severely deviates from the data the model was trained on. A striking, if simplified, example would be Netflix’s rating system. Imagine a shift from the like/not like/love structure to a more granular one-to-five scale; models trained on the original data would struggle to interpret user preferences and make flawed recommendations.

Data drift can have a costly impact if unnoticed. For example, one source reported that a predictive maintenance model’s accuracy dropped from 95% to below 60% in just 18 months due to changes in operating machine data that went undetected.

This article expands on the issue of data drift, focusing on time series forecasting and showing how understanding and carefully managing data ensures more reliable forecasts.

Data drift in the ML life cycle

To map out the data drift challenge, we start by looking at the broader ML life cycle. The well-defined process, which you’ll likely recognize, starts with training the model on historical data with a specific distribution – a way variables are spread within a dataset. For instance, in energy forecasting, wind speeds in a local dataset might range between 4 and 10 m/s at 10 meters above ground. The model learns from these patterns.

At inference time, the trained model generates predictions based on new, incoming data, ideally with a similar distribution to the one used during training. However, in practice, the new data distribution can differ significantly from the one at training time. This is the point at which data drift can manifest itself. The model may struggle to make accurate predictions if it cannot generalize beyond what it has learned. The result: degraded performance.

Below is a visualization of two different probability distributions, with the significant difference between the training and new data distribution signaling potential drift. A common statistical measure to track drift is the Jensen-Shannon distance, which quantifies the similarity between two distributions.

Data drift: Different distributions in training vs new data

Real-world causes and consequences

Data drift can occur in various real-world scenarios. For example, in finance, fraud detection models need regular updates because fraudsters come up with new methods. Similarly, a global economic event like the COVID-19 pandemic caused significant changes in consumer behavior, affecting credit risk models.

At Dexter Energy, where we process large amounts of weather and market data to provide short-term energy trading signals, we have observed data drift in both power price and generation forecasting.

Power price forecasting

The 2021-2022 geopolitical crisis triggered sudden and extreme volatility in natural gas prices, leading to spikes in electricity costs across Europe and beyond. Forecasting models trained on historical, pre-crisis data encountered a new, steep increase in prices and struggled to make accurate predictions.
This change in gas prices exemplifies data drift, where new data distributions no longer align with the training data, affecting forecast accuracy.

Power generation forecasting

Power generation forecasts rely heavily on weather model data (numerical weather predictions, or NWP). Drift can arise from changes in the underlying NWP models.

For example, a weather model provider could change a key variable, like the definition of a wind speed feature. Consequently, data fed into a wind power forecasting model would begin to differ from the training data, which is based on the previous definitions. Such changes can lead to data drift and have serious implications for model performance – unless it is detected and actioned upon.

Innovation in data drift detection: ML monitoring tools

Common methods to detect data drift include comparing distributions using summary statistics like mean or variance, or more advanced statistical tests. Noticeably, as the ML field matures, several tools have emerged to automate these techniques and offer more sophisticated solutions for detecting data drift.

The challenge lies in selecting the right tool for your specific configuration – the Machine Learning, AI, and Data (MAD) Landscape is a good place to start.

ML monitoring tools compare a target against a reference dataset and mark significant splits in the data. These indicate critical points where the data distribution begins to diverge significantly from the reference. In the image below, inspired by an open-source tool, the split is marked by a red line:

Data drift detection with ML monitoring tool

ML monitoring tools often come with overlapping functionality but also offer unique features. Some of these features include:

  • Different types of statistical tests to identify drift, suited to specifc cases;
  • Univariate or multivariate drift detection, on a feature level, respectively across multiple features;
  • Live drift visualization capabilities;
  • Root cause analysis functionality
  • Specialized features for time series data.

Through our implementations, we’ve found that while these tools are generally useful in detecting data drift, additional care is needed to identify it at an early stage. Moreover, we do not recommend relying on automation before fully understanding a tool through careful monitoring and experimentation.

In fact, detecting issues in time-sensitive forecasts typically requires a blend of automation and human expertise; to ensure this balance, we’ve adopted a human-in-the-loop approach. My colleague Leon wrote an article detailing the reasoning behind and the application of human-in-the-loop for accurate wind power forecasts.

Know your data, trust your forecast

Is merely detecting data drift enough to ensure a well-performing model? Certainly not. The next step is to investigate the drift through a domain knowledge lens. Depending on the outcome, possible follow-up actions include retraining the model, calibration, or even temporarily removing the drifted feature. We’ll cover some of these options in a future article.

Monitoring model performance remains a must in an ML setup, especially for weather-dependent power generation forecasting. It’s worth remembering that performance degradation may not always be due to a faulty model, but could be attributed to several external factors like seasonality, larger-than-usual errors in weather forecasts, or delayed or inaccurate historical production data. In this context, data drift can serve as a valuable early warning system, flagging potential issues before they impact accuracy.

By integrating drift detection into our workflows, and constantly improving and automating our processes, we can deliver products that empower our users to make informed decisions in the volatile energy market. Get in touch if you’d like to learn more!