Machine Learning Lifecycle

The machine learning lifecycle is the cyclical process that data science projects follow. It defines each step that an organization should follow to take advantage of machine learning and artificial intelligence (AI) to derive practical business value.

There are five major stages of a machine learning project, which must follow a particular sequence.

1. Data Collection and Preparation

Machine learning depends on data, so every ML project begins there. ML teams must get access to data sets, usually from a variety of locations and in various formats, and then the raw data is transformed somehow. Data needs to be organized and catalogued so that it can be analyzed.

Usually, the raw data cannot be used as-is for any number of reasons. For example:

The data requires some clean-up because of quality issues like missing fields or null values.
The data requires joins with reference information or is encoded.
The data must be converted to categorical or numerical values so it can be processed automatically.
The data needs to be transformed with groupings or aggregations.
The data is unstructured (text, json, image, or audio formats) and must be converted to tabular or vector formats.

The ML lifecycle begins with manual exploratory data analysis and feature engineering on small data sets. Then, to increase the accuracy of the model, ML teams must work on larger datasets. As the data sets and complexities increase, it’s smart to automate the process of collecting and preparing the data.

ML teams often build separate data pipelines which use stream processing, NoSQL, and containerized micro-services to create operational or real-time pipelines.

2. Model Development Pipeline

When developing models, data scientists generally go through the following process:

Extract data manually from external sources
Data labelling, exploration and enrichment to identify potential patterns and features
Model training and validation
Model evaluation and testing
Repeat this process until the model meets the business goals

This ML model development lifecycle is time consuming, and it’s just the beginning of the larger ML lifecycle. To work more efficiently, ML teams can build machine learning pipelines that automatically collect and prepare data, choose the best features, run the training with different parameters or algorithms, and evaluate models for accuracy.

ML pipelines can be triggered manually, but it’s usually better to trigger them automatically, when significant changes in code, parameters, or logic occur, or when drift is detected.

3. Building Online ML Services

Once an ML model is working in the lab, it needs to be integrated with live data and the business application or front-end services. The process of deploying a model to the live environment is called production, and is managed with a pipeline.

Components of production pipelines typically include:

A mechanism for collecting data, validating it, and feature engineering logic
Model serving service(s)
API services and/or application integration logic
A service to monitor the model and its data
A service to monitor resources and send alerts
A service to log features, events, telemetry, and data

Overall, this is a typical production pipeline development and deployment flow:

Develop production components:

API services and application integration logic
Feature collection, validation and transformation
Model serving graphs

Test real-time pipelines using simulated data
Deploy real-time pipelines to production
Monitor models and data for drift
Trigger retraining of models and re-engineer data if needed
Upgrade components of the pipeline (non-disruptively) if needed

4. Continuous Monitoring, Governance, and Retraining

To maintain the quality and mitigate the liabilities of AI services, ML teams need to add data, code and experiment tracking, monitor data to detect quality problems, monitor models to detect concept drift and improve model accuracy.

Because ML models are typically deployed in very dynamic environments, ML teams need a mechanism to react quickly to constantly changing patterns in real-world data, and react accordingly. Machine learning model monitoring in production is a core component of MLOps to keep deployed models current and predicting with the utmost accuracy, and to ensure they deliver value long-term.