Webinar

How to Deploy Your Hugging Face Model to Production at Scale - MLOps Live #20, Oct, 25 at 12pm ET

Automating MLOps for Deep Learning: How to Operationalize DL With Minimal Effort

Guy Lecker | February 22, 2022

Operationalizing AI pipelines is notoriously complex. For deep learning applications, the challenge is even greater, due to the complexities of the types of data involved. Without a holistic view of the pipeline, operationalization can take months, and will require many data science and engineering resources. In this blog post, I'll show you how to move deep learning pipelines from the research environment to production, with minimal effort and without a single line of code. To automate the entire process, we will use MLRun, an open source MLOps orchestration framework.

This blog post is based on a talk I gave with Iguazio co-founder and CTO Yaron Haviv at the Open Data Science Conference, called “Automating MLOps for Deep Learning”, which you can watch here.

The Challenges of Operationalizing Deep Learning Models 

When developing deep learning models, the prevalent mindset most data scientists have today is to begin by  developing in the research environment. Data scientists will take the data (images, text, etc.) from an object, data lake or warehouse, prepare it, train the model and evaluate it. However, this interactive and iterative process quickly breaks down  when moved into production. There, it becomes a convoluted mess.

This is because the production environment requires the data scientist to make his model a reliable asset in the product, and like any other aspect of the product, the model will require performance monitoring, tests and evaluation, logging of every training/retraining process and versioning in order to pass on the best version of the model to the next team in the production pipeline. Moreover, the data is no longer conveniently processed and located in the research environment, so it must be processed to be usable for the model. The predictions should run fast and at scale, so resource management is also required for the model to perform with large amounts of data 

Today, operationalizing this process requires a large amount of resources, including data engineers, ML engineers, software developers, DevOps, data scientists, and more. The prevailing paradigm where these teams are siloed from each other only makes this process more challenging.

Instead of applying brute force and continuing in this manner, we need to change the way we address the MLOps pipeline. We can close these gaps by approaching it in a holistic manner and using one tool to orchestrate the entire pipeline. That’s why we advocate for a mindset shift: instead of developing in a research environment, make the research environment part of the production environment.

Moving From Development to Production: Considerations for Deep Learning Use Cases

Moving from development to production is of course not simply about taking the training code, the model files and serving it. Rather, it requires understanding how to do many complex operations:

  • Packaging the code with dependencies, parameters and containers, and then script it and build it
  • Training a large/complex model requires data partitioning and load balancing  for distributed training, e.g training the model on multiple containers and resources (like GPUs or TPUs).
  • Tuning workloads, scaling up and down so they are efficient in response time and in cost. This includes adding and releasing GPUs, running in parallel, query tuning and caching tricks
  • Making it production-proof, by monitoring, logging, troubleshooting, versioning and security hardening
  • Automating the process, through CI/CD, workloads, rolling upgrades, scheduled evaluations and retraining triggers, so every change is automatically pushed

Why Automate the DL Process?

The process described above is complicated and long. In the traditional AI development approach, it is completely manual. A data scientist develops on Jupyter, PyCharm or Visual Studio. Then, different stakeholders use the output to build a container on Kubernetes, scale to multiple containers, operationalize, perform integrated data engineering and deploy. In addition, adding instrumentation is required to gain visibility for monitoring. These steps are performed in silos, which creates complexity.

By automating the entire process from a piece of code to a fully working service on Kubernetes, the process becomes more efficient, fast and continuous. Automation is end-to-end: starting with code development and pushing it to a microservices, through auto-scaling, logging, monitoring, integrations, and more. Instrumentation is built-in, so tracking takes place automatically and at the application level.

Orchestrating the DL Pipeline with Open Source Framework, MLRun 

MLRun is an open source project built and maintained by Iguazio for automating the MLOps pipeline with a “production-first” mindset, and is particularly suited to enable MLOps for deep learning use cases. MLRun is made up of five key components:

  1. Model development - MLRun is seamlessly integrated to the most used ML/DL frameworks, such as SciKit-Learn, Tensorflow, PyTorch, XGBoost and more. With a single line of code MLRun reads and wraps your training code, providing automatic logging, profiling, versioning and distributed training.
  2. Real-time serving pipeline - for deploying data and AI pipelines with serverless technology. This is especially complicated in deep learning, because it requires deploying complicated processes like NLP and image processing.
  3. Model Monitoring - for monitoring, detecting data/concept drift and remediation.
  4. CI/CD for ML/DL - across code, data and models.
  5. Feature Store - for building and sharing features, including their metadata

MLRun’s automation is enabled through serverless computing. The engines take the code, turn it into an horizontal scaling piece of service, and then everything is built into that piece of code.  

Now let’s deep dive into each component and see how it enables MLOps automation, and specifically for deep learning.

MLRun Component #1: The Feature Store

The feature store enables building sharing and reusing features. More than that, it is a data transformation service for both structured and unstructured data. In addition, once a feature is deployed, it can be used in training, real-time inference and monitoring. This makes it a fast, simple and scalable solution for feature development.

The MLRun feature store provides a seamless integration with production data and model monitoring and can interact with multiple tools: APIs for data labeling and data exploration, notebooks for interactive development, monitoring tools and more.

The Feature Store for Deep Learning

Tabular data is starting to be used for DL models more and more, so the feature store enables its construction, a very easy task. Moreover, MLRun’s graph computation capabilities, on top of the serverless computations, support different transformations that are required for deep learning. Data can come from various sources, like an HTTP endpoint, a Kafka stream, a video stream, an S3 bucket, etc. Since MLRun is Pythonic, it supports various data structures: structured, textual, visual and even composite.

Learn how IHS Markit is using MLRun to scale NLP pipelines by processing documents with composite data.

MLRun Component #2 - Model Development

MLRun is  integrated with the training code, enabling resource management, distributed training, model versioning, logging artifacts, comparing experiments, hyper-parameter tuning and more.

When running a job, MLRun lets you specify the required amount of resources, or even a range. This allows you to track the consumption of computational resources and map them to applications, projects and services. These metrics can all be seen on an aggregated dashboard in MLRun.

MLRun Component #3: Real-time Pipelines

Enriching the data when pushing events can be done with pre-processing, building an ensemble and post-processing. However, if the processes along the pipeline are siloed, there could be a mis-match between the model and the rest of the system. 

MLRun enables testing and debugging the pipeline with a simulator for the Notebook or IDE environment. Once a model is confirmed to be working, it can be deployed in a single command. The entire pipeline will then be deployed on serverless functions in production or in a test cluster, which can then be integrated with a CI system.

Deploying For Deep Learning

Distributing the workload on top of multiple containers is important for deep learning workloads, because they often require pre-processing data, resizing images, fetching images from an object store, breaking frames from a video frame or feed, etc. Doing that on a node with GPU is wasteful since not all GPUs will be utilized.

The costs associated with deep learning can be significant. MLRun can help save costs by distributing the workload so some of the steps in the computation graph are running on one container with GPUs and others on another structure, like a container widgets view.

MLRun Component #4: Built-in Model Monitoring

Model monitoring in production  identifies and mitigates data/concept drift and supplies relevant information regarding the usage of the model and the data being sent to it. MLRun can also automatically trigger retraining. To learn more about monitoring drift for deep learning, contact us.

MLRun Component #5 - CI/CD for ML/DL

Building an automated flow ensures that a flow will run every time we have a new piece of code, change a parameter or refresh a data set, etc. This flow will include training, testing and optimization. However, automation is not just about running the different steps in the pipeline.

Automation also requires working with versioned code, datasets, artifacts and models; building complex workflows from local or library functions; supporting pipeline engines like KubeFlow, GitActions on Jenkins; integrating with reporting tools like TensorBoard; and more.

These are all supported by MLRun, without the need to write a single line of code.

Efficient Resource Management for Deep Learning

Controlling and managing resources during automation is very important, especially when using GPUs. MLRun can orchestrate different types of instances and spot instances to lower costs, throughout the automation flow as discussed in the model development section.  This includes GPU sharing to make the most of your GPU investment and ensure that resources are not waiting in idle and holding up other tasks.

Deep Learning MLRun Demo

To see an example of how to use MLRun with a deep learning application, check out the demo here.This demo showcases a mask detection application, to slow down the spread of COVID-19 in public spaces. This example demonstrates how MLRun can be added to an existing training and serving code (in Tensorflow and in PyTorch) to train a model that classifies whether a person in an image is wearing a mask or not, with MLRun's deep learning auto-logging and distributed training, and serves it to an HTTP endpoint.

Conclusion

A comprehensive MLOps technology can help data scientists generate a significant impact, and help move deep learning models to production in a more efficient and scalable way. MLRun is an open source ML orchestration framework that enables MLOps for deep learning, through its feature store, automation capabilities, real-time pipeline,model development API and model monitoring. To see the entire presentation about MLRun and deep learning, including a video of running the demo, you can view the entire webinar here.

New call-to-action