Tapping into more compute power is the next frontier of data science. Data scientists need it to complete increasingly complex machine learning (ML) and deep learning (DL) tasks without it taking forever. Otherwise, faced with a long wait for compute jobs to finish, data scientists give in to the temptation to test smaller datasets or run fewer iterations in order to produce results more quickly.
NVIDIA GPUs are an excellent way to deliver the compute power the DS team demand, but they bring their own challenges. Unlike CPUs, you can't run multiple parallel workloads or containers on GPUs. The result is that GPUs stand idle when they complete their tasks, wasting your money and your work time.
The solution lies in using orchestration, clustering, and a shared data layer to combine containers so that you can harness multiple GPUs to speed up tasks, and allocate tasks to them as desired. We use MLRun, an open-source ML Orchestration framework, to define serverless ML functions that can run either locally, or in dynamically provisioned containers. The whole system can run as one logical unit, sharing the same code and data through a low-latency shared data plane.
MLRun builds on Kubernetes and KubeFlow, using Kubernetes API and the KubeFlow custom resources (CRDs). Every task executed through MLRun is tracked with the MLRun service controller, while a versioned database stores all the inputs and outputs, logs, artifacts, etc. You can browse the database using simple UI, SDK, or REST APIs, and link MLRun functions into an automated KubeFlow pipeline to generate an end to end workflow.
Using GPUaaS in this way simplifies and automates data science, boosting productivity and significantly reducing time to market. To read the full article for more information about GPUaaS, follow this link to Towards Data Science.