MLOps Live Webinar

How to Deploy Your Hugging Face Model to Production at Scale - MLOps Live #20, Oct, 25 at 12pm ET

What is Model Tuning?

What Is Model Tuning in Machine Learning?

While machine learning and artificial intelligence are often used interchangeably, machine learning is actually a specialized subfield of the latter: AI algorithms learn from encoded domain knowledge, and ML algorithms specifically learn to make predictions by extracting this knowledge directly from data.

There are various learning techniques that ML can be applied with, the most common being supervised learning. In supervised learning, ML algorithms learn in a training phase where the model adjusts its trainable parameters to fit the patterns that map features to label; this adjustment is performed progressively by splitting the training data into multiple batches and iterating through the split training data in many consecutive epochs.

Crucially, all ML techniques, from supervised to reinforcement learning, rely on adjusting trainable parameters to enable learning. Each ML algorithm has a set of hyper parameters that define how this adjustment is performed; and how these hyper parameters are set dictates how well the algorithm will learn, i.e., how accurate the model will be. Setting hyper parameters is the remit of model fine-tuning, or model tuning in short.

Below, we’ll explore in detail what hyper parameters and model tuning are, explain why model tuning is important, and walk through all the steps necessary to successfully tune your machine learning models.

What Is Model Tuning?

Tuning a machine learning model is the process of configuring the implementation-specific parameters that act as control knobs to guide its learning—for the model structure itself as well as its training regime.

Specifically, hyper parameters guide how the model learns its trainable parameters. 

To understand model tuning, we need to clarify the difference between two types of parameters:

  • Trainable parameters are the trained internal values of a model learned from the data; they are typically saved out of the box as part of the trained model.
  • Hyper parameters are the tuned external values of an algorithm that are configured by the user; they typically need to be saved manually for traceability, often in JSON format.

While model training focuses on learning optimal trainable parameters, model tuning focuses on learning optimal hyper parameters. 

It’s particularly important to understand the difference between these two since it’s common for practitioners to simply refer to either as “‘parameters,” leaving it to context to identify the exact type, which can lead to confusion and misunderstandings.

Each algorithm—sometimes each implementation of an algorithm—has its own set of hyper parameters, but it’s common for the same class of algorithms to at least share a small subset of them. When developing a pipeline for model training, it’s fundamental to always refer to the algorithm’s implementation for details about hyper parameters. We recommend reviewing the official documentation for XGBoost and LightGBM—two of the most widely used and successful implementations of tree-based algorithms—for in-depth examples.

While all hyper parameters affect the model’s learning capability, some are more influential than others, and it’s typical to only tune these for time and computational efficiency. For a neural network in TensorFlow Keras, we may want to tune:

  • Parameters such as number of hidden units, number of layers, and activation functions, for the model’s structure
  • Parameters such as learning rate, batch size, and number of epochs, for the model’s training regime, which for neural networks is related to the selected optimizer 

Moving beyond the algorithmic perspective, most practitioners nowadays refer to any parameter that has an impact on model performance and can have multiple values assigned to it as a hyper parameter. This also includes data processing, e.g., which transformations are performed or which features are used as input.

Why Is Model Tuning Important?

As feature engineering is the process that transforms data into its best form for learning, model tuning is the process that assigns the best settings to an algorithm for learning.
All implementations of machine learning algorithms come with a default set of hyper parameters that have been proven to typically perform well. Relying on the defaults for a real-world application is too high a risk to take, as it is unlikely—if not impossible—that the default hyper parameter configuration will provide optimal performance to any use case.

In fact, it is well-known that ML algorithms are highly variable depending on the hyper parameter selection. Each model and data set combination requires its own tuning, which is particularly relevant to keep in mind for automated re-training. 

New call-to-action

What Are the Steps to Tuning Machine Learning Models?

After a data scientist selects the most appropriate algorithm for a given use case and performs the relevant feature engineering, they must determine the optimal hyper parameters for training. Even with lots of prior experience, empirically determining them is unthinkable.

While it’s a good idea to try a couple of hyper parameter selections that are thought to be relevant to ensure the use case is feasible and can achieve the expected offline performance, performing extensive hyper parameter tuning by hand is inefficient, error-prone, and difficult to reproduce.

Instead, hyper parameter tuning should be automated—this is what is typically referred to as “optimization.” 

At experimentation, automated tuning refers to defining the optimal hyper parameter configuration via a reproducible tuning approach. There are three steps to model fine-tuning and optimization, covered below. 

1. Select Relevant Hyper Parameters and Define Their Value Range

The more hyper parameters are selected and the wider their ranges are defined, the more combinations exist for the hyper parameter tuning configuration.

For example, if we define batch size as an integer with possible values in [32, 64, 128, 256, 512, 1024] and another 5 hyper parameters also with 6 possible values, 46,656 combinations exist.

Selecting all hyper parameters with exhaustive ranges is often unfeasible, and an educated compromise between efficiency and completeness of the search space is always made.

2. Select the Tuning Approach and Define Its Parameters

The most common tuning approaches are:

  • Grid search: Exhaustively tries all hyper parameter combinations; has exponential complexity, thus not often used in practice
  • Random search: Randomly samples the value range of each hyper parameter until a maximum threshold is achieved with respect to the number of trials, running time, or utilized resources 
  • Bayesian optimization: Sequentially defines the next hyper parameter configuration to trial based on the results of the previous iteration

Each tuning approach comes with its own set of parameters to specify, including:

  • Optimization metric: A metric such as validation accuracy on which to evaluate the trained model with the trialed hyper parameter configuration 
  • Early stopping rounds: The number of training steps to perform without an improvement in the optimization metric before ending the trial 
  • Maximum parallel trials: The number of trials to run in parallel

This last parameter can be set to a large value for tuning via independent trials such as grid and random search; on the other hand, it should be set to a small value for sequential tuning approaches such as Bayesian optimization.

3. Start the Tuning Job

This will be a series of parallel or sequential trainings, each with a specific hyper parameter selection in the allowable range, as defined by the configured tuning approach.

It is fundamental to keep track of all of the runs, metadata, and artifacts collaboratively via a robust experimentation framework.

How to Productionize Model Tuning

Ideally, data scientists and machine learning engineers should collaborate to define what a productionizable tuning approach is before experimentation. Sometimes, this is not the case, and the selection of the tuning approach and hyper parameters may be updated for efficiency during productionization as considerations around re-training the same model or tuning multiple models become prioritized.

During productionization, automated tuning refers to the process of setting up tuning as part of the automated re-training pipeline—often as a conditional flow to standard training with the last optimal hyper parameter configurations. The default flow should be tuning at each re-training run, as data would have changed over time.

Many tuning solutions are available, from self-managed ones like Hyperopt and skopt to managed tools like AWS SageMaker and Google Cloud’s Vizier. These solutions focus on the experimentation phase with varying degrees of traceability and ease of collaboration.

Iguazio provides a state-of-the-art tuning solution via MLRun, which is seamlessly incorporated within a unique platform that handles both experimentation and productionization following MLOps best practices with simplicity, flexibility, and scalability.