The machine learning lifecycle is an iterative, multidirectional process composed of three main phases:

- Use case assessment and data collection
- Model development and training
- Model deployment and monitoring

In this lifecycle, the second phase is the most experimental. Here, data scientists perform feature engineering to ensure that the collected raw data is transformed in its best representation for model learning. Model training can then begin.

Feature engineering and model training are intertwined and iterative, but model training can be seen as the pivotal step of the machine learning model development process. This is because feature engineering ultimately aims to enable the most effective performance for model training.

This article presents an introduction to model training, a discussion of its importance, a walk-through of how to train machine learning (ML) models during experimentation, and a conclusion on productionizing model retraining.

Model training is the process of feeding engineered data to a parametrized machine learning algorithm in order to output a model with optimal learned trainable parameters that minimize an objective function.

Let’s dissect the different parts of this definition:

**Feeding engineered data:**The input to any ML model is data. Even the most advanced machine learning model can only be as good as the data from which it learns. The simple concept of ”garbage in, garbage out” explains why feature engineering is so relevant for—and intertwined with—model training. Feature engineering should be performed with awareness of common hidden errors in the data set that can bias the training, including data leakage where the target is indirectly represented in one or more features.**Parametrized machine learning algorithm:**ML algorithms are coded procedures with a set of input parameters, known as “hyperparameters”. The developer can customize the hyperparameters to tune the algorithm’s learning to the specific data set and use case. The documentation of each algorithm should highlight implementation details, including the complete set of tunable hyperparameters.**Model with optimal learned trained parameters:**Machine learning algorithms have another set of parameters, known as “trainable parameters”, which correspond to the coefficients automatically learned during model training. Trainable parameters make the algorithm derive an output from an unseen input at prediction time within an expected range of accuracy. Each algorithm learns in its own specific way, so each has a unique set of trainable parameters. For example, a decision tree learns the choice of decision variables at each node, while a neural network learns the weights associated with each layer’s activation function.**Minimize an objective function:**An objective function defines how a machine learning model learns its trainable parameters. The model adjusts its learnable parameters so as to optimize—i.e., minimize or maximize—the value outputted by the objective function. Specifically, loss functions are the type of objective function most commonly used in ML training, often accompanied by a regularization term. A loss function defines how well the algorithm models the training data by providing an error between the estimated and the true output value. The higher the error, the more the trainable parameters are updated for that training iteration.

Model training happens in multiple consecutive iterations whereby the training data, divided into batches of typical size between 32 and 1024, are fed multiple times to the algorithm. This allows the algorithm to learn the data’s underlying patterns.

Machine learning is a discipline at the intersection of artificial intelligence and computer science. We use terminology and concepts from the latter to understand ML.

- An algorithm is a coded procedure where the rules to solve the associated task are known; a machine learning algorithm specifically aims to perform pattern recognition on data to automatically learn how to solve a task where rules are unknown.
- A machine learning model is the program created by running the algorithm on data, and is used for prediction. It is often referred to as a “trained model.”

Model training aims to build the best mathematical representation of the relationship between data and a target (supervised) or among the data itself (unsupervised).

Metrics such as accuracy define how well the model has learned this representation, i.e. they report the model’s performance. The better the model performance, the more benefits using the model in real life will bring. These benefits could include increased revenue, reduced costs, or improved user experience.

Investing time and resources for optimal model training means having access to the right expertise and an appropriate engineering backbone setup within a production-first approach to ML. Such an investment can prove a real differentiator for business success. In fact, leading ML-driven businesses achieve 44% higher productivity and 40% better customer experience—among other gains—than their counterparts.

The process of training ML models can be divided into four steps.

The training data set is used for model training, and the evaluation set for performance evaluation of the trained model. It is essential that these sets do not intersect and that data in the evaluation sets has not been seen during training in order to ensure an unbiased performance estimate.

First, we should select a simpler algorithm than our model’s, or a heuristic, to use as a baseline to compare the final trained model’s performance against.

Then, it is common to select multiple algorithms for training, unless one specific algorithm is clearly the best fit for the use case and data. The most appropriate algorithm(s) to deploy is dependent upon training and inference speed, costs, data size and type, available infrastructure, and desired offline performance.

Some of the most common machine modeling techniques are:

- Linear regression, SVM, random forest, boosted trees, and neural networks*, for supervised learning
- K-means for unsupervised learning

* For deep learning, there is a follow-up phase of “model architecture development” to define the exact layers—optionally on top of pretrained networks—to be used for the final neural network model.

Each algorithm has a set of default hyperparameters, which is unlikely to be the most performant for any use case and data. We perform hyperparameter tuning on a data subset before training the final model on the complete data set to maximize the performance from each algorithm.

We should also provide a validation set when performing model tuning for evaluation with different hyperparameter selections so as to keep the evaluation set unseen for the final model evaluation.

This is the process of fitting the training data to the tuned algorithm.

The end-to-end model training is a highly experimental process that requires many iterations. For each selected algorithm, we can expect to repeat steps 3 and 4 multiple times, and to update frequently the feature set provided as input. This is why having a robust and user-friendly experiment tracking process that ensures a systematic, repeatable, and reproducible process—like MLRun—is so important for the success of data science initiatives.

In production, we can expect to want to retrain the model periodically as new data comes in to minimize the chances of concept drift. The model retraining is best automated to run on a schedule, and possibly trigger, within the monitored end-to-end production system.

A production-first approach aims to develop the infrastructure for the complete model lifecycle first, and push models into production fast within an agile process. This kind of approach can accelerate the end-to-end data science process up to x12! No surprise, then, that a production-first approach is the new paradigm for model training and prototyping.