Introducing the Platform
Welcome to the Iguazio MLOps Platform
An initial introduction to the Iguazio MLOps platform and the platform tutorials
- Platform Overview
- Data Science Workflow
- The Tutorial Notebooks
- Getting-Started Tutorial
- End-to-End Use-Case Application and How-To Demos
- Installing and Updating the MLRun Python Package
- Additional platform Resources
The Iguazio MLOps platform ("the platform") is a fully integrated and secure data science platform as a service (PaaS), which simplifies development, accelerates performance, facilitates collaboration, and addresses operational challenges. The platform incorporates the following components:
- A data science workbench that includes Jupyter Notebook, integrated analytics engines, and Python packages
- The MLRun open-source MLOps orchestration framework for ML model management with experiments tracking and pipeline automation
- Managed data and machine-learning (ML) services over a scalable Kubernetes cluster
- A real-time serverless functions framework for model serving (Nuclio)
- Integration with third-party data sources such as Amazon S3, HDFS, SQL databases, and streaming or messaging protocols
- Real-time dashboards based on Grafana
Data Science Workflow
The platform provides a complete data science workflow in a single ready-to-use platform that includes all the required building blocks for creating data science applications from research to production:
- Collect, explore, and label data from various real-time or offline sources
- Run ML training and validation at scale over multiple CPUs and GPUs
- Deploy models and applications into production with serverless functions
- Log, monitor, and visualize all your data and services
The Tutorial Notebooks
The home directory of the platform's running-user directory (/User/<running user>) contains pre-deployed tutorial Jupyter notebooks with code samples and documentation to assist you in your development — including a MLRun demos repository with end-to-end use-case applications (see the next sections).
- To view and run the tutorials from the platform, you first need to create a Jupyter Notebook service.
- The welcome.ipynb notebook and main README.md file provide the same introduction in different formats.
Start by running the getting-started tutorial to familiarize yourself with the platform and experience firsthand some of its main capabilities.
End-to-End Use-Case Application and How-To Demos
Iguazio provides full end-to-end use-case application and how-to demos that demonstrate how to use the platform, its MLRun service, and related tools to address data science requirements for different industries and implementations.
These demos are available in the MLRun demos repository.
Use the provided update demos script to get updated demos from this repository.
By default, the script retrieves the files from the latest release that matches the version of the installed
mlrun package (see Installing and Updating the MLRun Python Package).
The files are copied to the /v3io/users/<username>/demos directory, where
<username> is the name of the running user (
$V3IO_USERNAME) unless you set the
-u|--user flag to another username.
Note: Before running the script, close any open files in the demos directory.
# Get additional demos !/User/update-demos.sh
For full usage instructions, run the script with the
End-to-End Use-Case Application Demos
|Mask detection||This demo contains three notebooks that:
1. Train and evaluate a model for detecting if an image includes a person who is wearing a mask, by using Tensorflow.Keras or PyTorch.
2. Serve the model as a serverless function in an http endpoint.
3. Write an automatic pipeline where you download a dataset of images, train and evaluate the model, then optimize the model (using ONNX) and serve it.
|Fraud prevention||This demo shows the usage of MLRun and the feature store. Fraud prevention specifically is a challenge as it requires processing raw transaction and events in real-time and being able to quickly respond and block transactions before they occur. Consider, for example, a case where you would like to evaluate the average transaction amount. When training the model, it is common to take a DataFrame and just calculate the average. However, when dealing with real-time/online scenarios, this average has to be calculated incrementally.|
|News Article||This demo creates an NLP pipeline that summarizes and extract keywords from a news article URL. It uses state-of-the-art transformer models, such as BERT, to perform these NLP tasks. Additionally, it uses MLRun's real-time inference graphs to create the pipeline. This allows for easy containerization and deployment of the pipeline on top of a production-ready Kubernetes cluster.|
|NetOps Demo: Predictive Network Operations/Telemetry||This demo demonstrates how to build an automated machine-learning (ML) pipeline for predicting network outages based on network-device telemetry, also known as Network Operations (NetOps). The demo implements feature engineering, model training, testing, inference, and model monitoring (with concept-drift detection). The demo uses an offline/real-time metrics simulator to generate semi-random network telemetry data that is used across the pipeline.|
|Stock Prediction||This demo utilizes Iguazio's latest technologies and methods: model serving, feature store, MLRun frameworks.|
|Building Production Pipelines WIth AzureML and MLRun||This demo contains 3 notebooks where you:
1. Use MLRun Feature Store to ingest and prepare data
2. Create offline feature vector (snapshot) for training
3. Run AzureML AutoML Service as an automated step (function) in MLRun
3. View and compare the AzureML Models using MLRun tools
3. Build Real-time Serving pipeline with multiple stages
|Converting existing ML code to an MLRun project||Demonstrates how to convert existing ML code to an MLRun project. The demo implements an MLRun project for taxi ride-fare prediction based on a Kaggle notebook with an ML Python script that uses data from the New York City Taxi Fare Prediction competition.|
|Running a Spark job for reading a CSV file||Demonstrates how to run a Spark job that reads a CSV file and logs the data set to an MLRun database.|
|Running a Spark job for analyzing data||Demonstrates how to create and run a Spark job that generates a profile report from an Apache Spark DataFrame based on pandas profiling.|
|Running a Spark Job with Spark Operator||Demonstrates how to use Spark Operator to run a Spark job over Kubernetes with MLRun.|
Installing and Updating the MLRun Python Package
The demo applications and many of the platform tutorials use MLRun — Iguazio's end-to-end open-source MLOps solution for managing and automating your entire analytics and machine-learning life cycle, from data ingestion through model development to full pipeline deployment in production.
MLRun is available in the platform via a default (pre-deployed) shared platform service (
However, to use MLRun from Python code (such as in the demo and tutorial notebooks), you also need to install the MLRun Python package (
The version of the installed package must match the version of the platform's MLRun service and must be updated whenever the service's version is updated.
The platform provides an
/User data mount.
Use the following command to run this script for the initial package installation (after creating a new Jupyter Notebook service) and whenever the MLRun service is updated; (run the command for each Jupyter Notebook service):
Additional platform Resources
You can find more information and resources in the MLRun documentation:
▶ View the MLRun documentation
You might also find the following resources useful:
- Introduction video
- In-depth platform overview with a description of the steps for developing a full data science workflow from development to production
- Platform Services
- nuclio-jupyter SDK for creating and deploying Nuclio functions with Python and Jupyter Notebook
- Python SDK for management APIs: a python SDK for controlling and performing operations on the the Iguazio system via REST-API
Creating Virtual Environments in Jupyter Notebook
A virtual environment is a named, isolated, working copy of Python that maintains its own files, directories, and paths so that you can work with specific versions of libraries or Python itself without affecting other Python projects. Virtual environments make it easy to cleanly separate projects and avoid problems with different dependencies and version requirements across components. See Creating Python Virtual Environments with Conda for step-by-step instructions for using conda to create your own Python virtual environments, which appear as custom kernels in Jupyter Notebook.
Updating the Tutorial Notebooks
You can use the provided igz-tutorials-get.sh script to get updated platform tutorials from the tutorials GitHub repository. By default, the script retrieves the files from the latest release that matches the current platform version. For details, see the update-tutorials.ipynb notebook.
The v3io Directory
The v3io directory that you see in the file browser of the Jupyter UI displays the contents of the
v3io data mount for browsing the platform data containers.
For information about the platform's data containers and how to reference data in these containers, see Data Containers.
The Iguazio support team will be happy to assist with any questions.