On a high level, the concept of pipelines in ML refers to a way of linking sequential components of the workflow of an ML project, and their relationships. In ML projects, it is challenging to keep track of when and where steps like data prep, training, and monitoring take place, so pipelines serve as a step-by-step map that describes what needs to happen and when. Working with pipelines creates easily composable, shareable, reproducible, and stitch-able ML projects.
Kubeflow pipelines in particular is a set of services and UI enabling the user to create and manage ML pipelines. Users can write their own code or build from a large set of pre-defined components and algorithms contributed by teams at companies like Google, IBM, Amazon, Microsoft, NVIDIA, Iguazio, and others.
Much like a function (ingesting inputs, parameters, and producing outputs), each Kubeflow component is Python code that is packaged in Docker images that executes one step within the ML pipeline.
Kubeflow components will launch one or more Kubernetes pods for each step on your pipeline.
Kubeflow can also run on Nuclio, a high-performance serverless platform that runs over Docker or Kubernetes and automates the development, operation, and scaling of code.
The benefits of using Kubeflow are:
- Scalability: Easily spin up more resources when needed, and release them when you don’t.
- Composability: Each step is independent, which simplifies the orchestration of the whole pipeline.
- IE: you can use many different ML-specific frameworks
- Portability: Easily compose each step of the pipeline in one place without worrying about different services.
Kubeflow Pipeline includes:
- A UI to manage and track experiments, jobs, and runs
- An engine to schedule multi-step ML workflows
- An SDK to define and manage pipelines and their components
- Notebooks to interact with the system via the SDK