The concept of an ML pipeline is an automated pipeline that can be created from automated steps that take a model to production. In each step, different pieces of code can run that handle different types of processing within that pipeline.
In a typical pipeline, there is a need for different functions that handle actions like collecting data sets, cleaning and preparing the data, calculating the data, creating feature sets, running training processes, selecting the best model from the training process, deploying the model on an inference layer, and finally, monitoring the model.
An operational pipeline needs to support running those processes in a scalable way, so it needs to run on a framework like Kubernetes, which enables you to scale up or down based on your load. This pipeline also needs to support running different frameworks, for example, you may want to run your data preparation using Spark or Dask, or when it comes to processing data in real time, you may need to incorporate frameworks that can read streaming data, such as Nuclio or Sparks streaming.
Another thing that’s important when creating a pipeline is the ability to track and capture relevant matrix and logs so the user can easily compare between different runs and identify issues within the pipeline and the root cause of those issues.
Businesses seeking to generate profit with machine learning will benefit from MLOps. Businesses that are already further along on their data science journey will be more likely to see the advantages of MLOps in the near term.
Lean businesses without existing AI services can leverage MLOps to get to market faster. Open source tools like Kubeflow and MLRun can help teams get their experiments up and running, while larger enterprises and businesses looking to accelerate the rollout of new AI services will benefit the most from a managed MLOps platform like Iguazio.