MLOps Live

2021 Kickoff Webinar: Deploying ML in federated cloud and edge environments ft. AWS - Jan 26 @ 10amPST


Session #11

Handling Large Datasets in Data Preparation & ML Training Using MLOps


In this technical training session, we explore how to use Dask, Kubernetes and MLRun to scale data preparation and training with maximum performance. Dask is an open-source library for parallel computing written in Python, which can be used in conjunction with open-source MLOps orchestration tool MLRun over Kubernetes to handle large-scale datasets.

Watch this session to explore:

• An overview of the tools available for large-scale data processing in Python (PySpark, Dask, Vaex and more), and how they are used with existing ML frameworks
• Dask and how to use the same native Python code at scale, without the need to learn other technologies like Spark
• How to run Dask in a distributed and elastic way over Kubernetes to improve resource utilization
• How to deploy Dask-based data engineering and ML pipelines with MLRun and Kubeflow, in one click
• Further optimizations for handling large-scale data effectively and efficiently