MLOps Live Webinar

How to Deploy Your Hugging Face Model to Production at Scale - MLOps Live #20, Oct, 25 at 12pm ET

What is Model Deployment?

What is Model Deployment?

Model deployment is the process of putting machine learning models into production. This makes the model’s predictions available to users, developers or systems, so they can make business decisions based on data, interact with their application (like recognize a face in an image) and so on.

Model deployment is considered to be a challenging stage for data scientists. This is because it is often not considered their core responsibility, and due to the technological and mindset differences between model development and training and the organizational tech stack, like versioning, testing and scaling which make deployment difficult. These organizational and technological silos can be overcome with the right model deployment frameworks, tools and processes.

The Importance of Model Deployment

Only models that are deployed to production provide business value to customers and users. Anywhere between 60%-90% of models don’t make it to production, according to various analyses. Deploying machine learning models makes them available for decision-making, predictions and insights, depending on the specific end-product.

For example, let’s say a data scientist has built a model that runs a sentiment analysis on YouTube comments. After building, debugging and training the model, the model achieves excellent accuracy scores and the data scientist is happy with the results. Although a high accuracy score is great, while the model is in the research environment, its value is only theoretical, and can’t be tested on real life data (where it might perform differently). So, even if it’s the highest performing SOTA NLP analysis model in the world, the model only provides value after it has been tested and deployed into production, where it can analyze real data.

Challenges of Machine Learning Model Deployment

There are a number of reasons model deployment is a resource-intensive and challenging process: 

  • Silos: Data scientists are focussed on training and optimizing models. In many organizations, data science is siloed from the rest of the machine learning lifecycle. The infamous ‘throw it over the wall’ approach is notorious for creating bottlenecks, duplicate work, and general chaos. The production system, which is managed and maintained by DevOps and IT, is usually unfamiliar with the ML frameworks and files the models are based on, written and produced by the data scientists. In some cases, the models need to be re-coded, which can take weeks and is a very tedious process.
  • Preparing for the live environment: In the lab, models are developed and trained based on existing data. However, in production, models will have to work with (sometimes real-time) real data from external sources. Usually the data needs to be processed before inferring through the model. Then, the output and predictions need to be properly consumed by the applications. These processes need to be prepared and orchestrated to ensure a smooth and successful deployment.
  • Monitoring models in production: Deployment doesn’t end once the model is in production. Today, models are constantly changing due to changing business needs, data that changes or even use cases of the model that changes. To ensure their relevance and business value in performance, models in production need to be evaluated and flagged when a model isn’t performing, so they can be retrained and deployed again.
  • Infrastructure management: A small amount of data doesn’t require 8 GPUs for inferring, but traffic at rush hour does require many computational resources to ensure high performance. Managing this requires the right processes and tools.

Automating Machine Learning Model Deployment

Automating the deployment of models helps reduce friction and improve scalability and repeatability. By using CI/CD tools and integrating them into the MLOps pipeline, data scientists can continuously train their models and retrain if drift is detected.

When automating the model deployment pipeline, it is important to monitor that retraining is conducted correctly and that the outputs make sense. If the metrics show anomalies, the retrained model should probably not be deployed. So, automate with care: add alerts and triggers to your automation to ensure an accurate model is deployed.

Model Deployment and Iguazio

Model deployment can be a complex and time consuming process. That’s why many ML teams turn to MLOps tools to ease the burden. MLRun is Iguazio’s open source ML orchestration tool that, among other things, automates the deployment of real time production pipelines.  

With MLRun Serving, the ML team can work together to compose a series of steps (which can include data processing, model ensembles, model servers, post-processing steps, and so on). To see an example of how this works, check out this Advanced Model Serving Graph Notebook Example. Complex and distributed graphs can be composed with MLRun Serving, and they can include elements like streaming datac, data/document/image processing, NLP, model monitoring and more.

With MLRun you can:

  • Easily build and deploy distributed real-time computation graphs
  • Auto-scale and optimize resource utilization (which is especially important for deep learning use cases)
  • Leverage built-in model monitoring 
  • Debug in the IDE/notebook, and then deploy to production with a single command