Running Applications over GPUs
The platform supports accelerated code execution over NVIDIA graphics processing units (GPUs):
You can run Nuclio serverless functions on GPUs.
You can run GPU applications that use one of the following supported GPU libraries from a platform Jupyter Notebook service with the GPU flavor:
The platform has a default (pre-deployed) shared single-instance tenant-wide Kubeflow MPI Operator service (
mpi-operator), which facilitates Uber's Horovod distributed deep-learning framework.
Horovod, which is already preinstalled as part of the platform's Jupyter Notebook service, is widely used for creating machine-learning models that are trained simultaneously over multiple GPUs or CPUs.
You can use Horovod to convert a single-GPU TensorFlow, Keras, or PyTorch model-training program to a distributed multi-GPU program. The objective is to speed up your model training with minimal changes to your existing single-GPU code and without complicating the execution. Note that you can also run Horovod code over CPUs with just minor modification. For an example of using Horovod on the platform, see the image-classification-with-distributed-training demo.
- To run Horovod code, ensure that the
mpi-operatorplatform service is enabled. (This service is enabled by default.)
- Horovod applications allocate GPUs dynamically from among the available GPUs in the system; they don't use the GPU resources of the parent Jupyter Notebook service. See also the Jupyter GPU resources note.
You can use NVIDIA's RAPIDS open-source libraries suite to execute end-to-end data science and analytics pipelines entirely on GPUs.
To use the cuDF and cuML RAPIDS libraries, you need to create a RAPIDS Conda environment.
For example, you can run the following command from a Jupyter notebook or terminal to create a RAPIDS Conda environment named
conda create -n rapids -c rapidsai -c nvidia -c anaconda -c conda-forge -c defaults ipykernel rapids=0.17 python=3.7) cudatoolkit=11.0
For more information about using Conda to create Python virtual environments, see the platform'
For a comparison of performance benchmarks using the cuDF RAPIDS GPU DataFrame library and pandas DataFrames, see the
RAPIDS applications use the GPU resource of the parent Jupyter Notebook service. Therefore, you must configure at least one GPU resource for this service: from the dashboard
Servicespage, select to edit your Jupyter Notebook service, select the Common Parameterstab, and set the Resources | GPU | Limitfield to a value greater than zero. See also the Jupyter GPU resources note.
For more information about using RAPIDS to run applications over GPUs, see Ingesting and Preparing Data.
Jupyter GPU Resources Note
In environments with GPUs, you can use the common