Running Applications over GPUs
The platform supports accelerated code execution over NVIDIA graphics processing units (GPUs):
You can run Nuclio serverless functions on GPUs.
You can run GPU applications that use one of the following supported GPU libraries from a platform Jupyter Notebook service with the GPU flavor:
The platform has a default (pre-deployed) shared single-instance tenant-wide Kubeflow MPI Operator service (
mpi-operator), which facilitates Uber's Horovod distributed deep-learning framework.
Horovod, which is already preinstalled as part of the platform's Jupyter Notebook service, is widely used for creating machine-learning models that are trained simultaneously over multiple GPUs or CPUs.
You can use Horovod to convert a single-GPU TensorFlow, Keras, or PyTorch model-training program to a distributed multi-GPU program. The objective is to speed up your model training with minimal changes to your existing single-GPU code and without complicating the execution. Note that you can also run Horovod code over CPUs with just minor modification. For an example of using Horovod on the platform, see the image-classification-with-distributed-training demo.
- To run Horovod code, ensure that the
mpi-operatorplatform service is enabled. (This service is enabled by default.)
- Horovod applications allocate GPUs dynamically from among the available GPUs in the system; they don't use the GPU resources of the parent Jupyter Notebook service. See also the Jupyter GPU resources note.
You can use NVIDIA's RAPIDS open-source libraries suite to execute end-to-end data science and analytics pipelines entirely on GPUs.
To use the cuDF and cuML RAPIDS libraries, you need to create a RAPIDS Conda environment.
For example, you can run the following command from a Jupyter notebook or terminal to create a RAPIDS Conda environment named
conda create -n rapids -c rapidsai -c nvidia -c anaconda -c conda-forge -c defaults ipykernel rapids=0.17 python=3.7) cudatoolkit=11.0
For more information about using Conda to create Python virtual environments, see the platform'
For a comparison of performance benchmarks using the cuDF RAPIDS GPU DataFrame library and pandas DataFrames, see the
RAPIDS applications use the GPU resource of the parent Jupyter Notebook service. Therefore, you must configure at least one GPU resource for this service: from the dashboard
Servicespage, select to edit your Jupyter Notebook service, select the Common Parameterstab, and set the Resources | GPU | Limitfield to a value greater than zero. See also the Jupyter and GPU resources.
Jupyter and GPU Resources
In environments with GPUs, you can use the common
A Jupyter service that is using GPU should be configured with the scale to zero option to automatically free up resources, including
GPUs, when the service becomes idle. Check the
When configuring your Jupyter Notebook service, take the following into account: While the Jupyter Notebook service is enabled and not scaled to zero, it monopolizes the configured amount of GPUs even when the GPUs aren't in use.
- RAPIDS applications use the GPUs that were allocated for the Jupyter Notebook service from which the code is executed.
- Horovod applications allocate GPUs dynamically and don't use the GPUs of the parent Jupyter Notebook service. Therefore, on systems with limited GPU resources you might need to reduce the amount of GPU resources allocated to the Jupyter Notebook service or set it to zero to successfully run the Horovod code over GPUs.