Case Study

Learn how Iguazio provided Quadient with an out-of-the-box data science platform and next-level MLOps automation capabilities, saving their data scientists and developers precious time and resources

Concept Drift and the Impact of COVID-19 on Data Science

Yaron Haviv | April 30, 2020

Modern business applications leverage Machine Learning (ML) and Deep Learning (DL) models to analyze real-world and large-scale data, to predict or to react intelligently to events. Unlike data analysis for research purposes, models deployed in production are required to handle data at scale and often in real-time, and must provide accurate results and predictions for end-users.

In production, these models must often be agile enough to continuously handle massive streams of real-time data. However, at times such data streams change due to factors that have changed in the environment, such as changes in consumer preferences, technological innovations, catastrophic events, etc. These shifts result in constantly changing patterns in data — which ultimately degrade the predictive ability of models built, trained and tested on patterns of data that are suddenly no longer relevant.

Referred to as “concept drift”, this change in the meaning of an incoming data streams and what they predict is nothing new. While concept drift has always been an issue for data science, its impact has accelerated aggressively and has reached unprecedented levels due to the COVID-19 pandemic. And this is likely to occur again as the world starts to prepare for COVID recovery, and human behavior is altered once again.

Concept drift occurs due to the massive changes in human behavior and economic activities resulting from social distancing, self-isolation, lockdown and other responses to the pandemic. With these responses upsetting most assumptions regarding market trends, fraud prediction, demand forecasts etc., many models built to predict patterns, outcomes and behaviors are no longer viable.

Today’s concept drift in machine learning and how it impacts businesses

Nothing lasts forever — not even carefully constructed models trained using mountains of well-labeled data. Concept drift leads to the divergence of decision boundaries for new data from those of models built from earlier data. Currently, its impact on predictive models built for various applications across industries is becoming pervasive with far-reaching consequences.

For instance, there has been a sharp decline in in-store shopping and an unprecedented increase in the number and volume of items purchased online. The type of items consumers buy online has also changed — from accessories to groceries, food items and other essentials. Soon these trends will shift again as the world prepares for a return to normalcy.

ML models built for retail businesses now deliver predictions that are no longer valid. Since businesses no longer have accurate predictions to guide operational decisions, they are unable to adequately optimize supply chain activities.

Concept drift is also impacting models designed to predict fraud across various industries. For example, previously models were trained to view the purchase of one-way flight tickets as a strong indicator of airline fraud. This is no longer the case. With the onset and spread of Corona virus, many fliers purchased one-way tickets. It will likely take some time before this returns to be a valid indicator of fraud.

Insurance isn’t left out. Prior to this pandemic era, predictive models were used to evaluate various factors to ascertain customers’ risk profiles and thus arrive at pricing for various insurance policies. As a result of self-isolation and restriction of movement, along with a change in risk associated with demographics, many of these factors are no longer the predictors they used to be. In addition, there is an introduction of a previously unseen variety of data that requires new categories and labels.

Essentially, data scientists can no longer rely on historical data alone to train models and then deploy them in real-world scenarios. The ripple effect of the pandemic shows us that we need to be more agile, adaptive and leverage better strategies for keeping deployed models responsive, and making sure they provide the value they were built to provide.

How ML models are affected

Before deploying or operationalizing data science into real-world scenarios, AI and ML models need to train on mountains of raw data. However, there’s a catch — once these models are deployed, although they continue to learn and adapt, they are still based on the same logic they were originally built on. Models in production don’t account for variables and don’t factor in evolving trends in the real world.

As a result, model predictions appear to deteriorate over time, and no longer serve their purpose. In particular, models trained to predict human behavior are especially prone to such degradation especially in extreme situations such as the current pandemic, which has completely shifted the way people spend their time, what they purchase and how they spend their time.

Under such evolving conditions, drift detection and adaptation mechanisms are indispensable. Monitoring models to detect drift and adjust accordingly is a continuous process.

Mechanisms must be in place to track errors in an ongoing manner and enable adaptation of predictive models to rapidly changing environments while maintaining accuracy — otherwise, these models will become obsolete and may generate results that are no longer accurate or productive for the organization.

MLOps makes adaptation to new situations feasible and fast

There’s more to data science projects than creating ML models and deploying them. Monitoring and maintaining model performance is a continuous process — one that’s made easier with the adoption of MLOps. While you can continually re-label data and retrain models, this is an extremely expensive, cumbersome and time-consuming approach. 

Right now, data scientists need to leverage MLOps automation to detect, understand and reduce the impact of concept drift on production models, and to automate as much of the process as possible. Given the track record of DevOps in facilitating the rapid design and deployment of software with high degrees of visibility and quality, it makes sense for data science teams to leverage MLOps to manage the development, deployment, and management of ML models.

MLOps enables data science teams to leverage change management strategies to either

  • Continuously update models once new data instances are received
  • Or update models once a concept or data drift is detected

With this, you can obtain new data to retrain and adjust models, even if the new data set is considerably smaller. Where possible, teams can create and engineer new data in a way that accounts for missing data.

Most importantly, MLOps automation enables teams to apply these change management strategies in fast iterations, since there’s no longer time for long-term development. The data science lifecycle must be executed in much faster cycles and this can only be achieved through automation.

Understandably, rapid iteration cycles based on small data sets may result in imperfect models. This is fine as long as such models deliver business value and can be integrated into a workable solution. Once deployed, these models can be improved over time.

Those that adapt will survive

Data science needs to quickly adapt to the fast-paced changes happening all over the world. Currently, many businesses are in a tough spot, and having the right kinds of data, insights and intelligence to react quickly to the unprecedented transitions brought about by the pandemic can be the make or break of some companies in the current situation.

This is where MLOps automation can provide immense value — by enabling data scientists to track and monitor the impact of their AI applications and to be able to quickly adjust to new situations in production. Rather than creating solutions based on stored models that were trained using static data, ML teams need to design and store recipes for generating models on demand. This enables them to quickly and efficiently create new models based on fresh data, and deploy them quickly.

Data science teams need to continuously monitor and detect concept drift to ensure the integrity of their AI models and the value they provide to the organization. MLOps speeds up development, deployment and management of models, thus enabling the creation of AI applications that can adapt rapidly to changes in the environment. Using MLOps automation, businesses can monitor and detect changes that impact their AI models, make swift changes to their AI applications and get new solutions to market faster and in a much more agile way. Those that can adapt quickly, make necessary amendments to their models to ensure accuracy and harness AI to their advantage will come out on top.

For a deep dive on how to detect and remediate concept drift in production, view our on-demand webinar: https://go.iguazio.com/mlopslive/webinar3