How to Deploy Your Hugging Face Model to Production at Scale - MLOps Live #20, Oct, 25 at 12pm ET
In any AI project, most of the complexity arises from the data: ingesting, exploring, processing, storing, monitoring it, and more. That’s why the concept of a machine learning pipeline is so important, regardless of the use case of particular technologies involved. Bringing deep learning use cases to production is a particularly complex AI project, given the scale of data required and therefore the critical data-related tasks involved. As complexity increases, the success or failure of an AI project can easily hinge on how effective the pipeline is designed.
Typically, deep learning operations comprise the following stages:
When approaching deep learning pipelines, the prevalent mindset most enterprise ML teams have today is to begin by developing in the research environment. Data scientists will take the data (which can be images, video, text, etc.) from an object, data lake or warehouse, prepare it, train the model and validate it. However, when data scientists throw the model “over the wall” to engineers, this process quickly becomes a convoluted mess.
Like any other component of the product, the model needs to be a reliable asset, with many dependencies and requirements like performance monitoring, logging, scalability, and so on. These requirements mean that “beginning with the end in mind” is a critical factor in accelerating the path to production.
There are various tools that ML teams can use to develop deep learning models. Popular deep learning frameworks include:
MLRun is an open source framework, built and maintained by Iguazio, that automates and orchestrates the machine learning pipeline end to end.
MLRun is seamlessly integrated with the most popular ML/DL frameworks, like SciKit-Learn, Tensorflow, PyTorch, XGBoost and more. With a few lines of code, MLRun takes training code and runs it in Kubernetes, providing automatic logging, profiling, versioning and distributed training.
Deep learning pipeline management is important, especially when it comes to resource management, because deep learning models involve heavy computational workloads. ML teams working with deep learning use cases need a way to control and manage resources, especially when using GPUs. MLRun can help lower costs by orchestrating different types of instances and spot instances throughout the pipeline flow. This includes GPU sharing and scale to zero options, to make the most efficient use of GPU investments.
Check out the demo here to see an example of how to use MLRun with a deep learning application. The demo shows how MLRun can be added to existing training and serving code (in this case, Tensorflow or PyTorch) with minimal effort – only one line is added to be more specific.