The Jupyter Notebook Service
Overview
Jupyter is a project for development of open-source software, standards, and services for interactive computation across multiple programming languages. The Platform comes preinstalled with the JupyterLab web-based user interface, including Jupyter Notebook and JupyterLab Terminals, which are available via a Jupyter Notebook user application service.
Jupyter Notebook is an open-source web application that allows users to create and share documents that contain live code, equations, visualizations, and narrative text; it's currently the leading industry tool for data exploration and training. Jupyter Notebook supports integration with all key analytics services, enabling users to perform all stages of the data science flow, from data collection to production, from a single interface using various APIs and tools to concurrently access the same data without having to move the data. Your Jupyter Notebook code can execute Spark jobs (for example, using Spark DataFrames); run SQL queries using Trino; define, deploy, and trigger Nuclio serverless functions; send web-API requests; use pandas and V3IO Frames DataFrames; use the Dask library to scale the use of pandas DataFrames; and more.
Virtual Environments
You can use Conda and pip, which are available as part of the Jupyter Notebook service, to easily install Python packages such as Dask and machine-learning and computation packages. Your choice of pip or conda depends on your needs; the platform provides you with a few options.
Jupyter comes with a few prebaked conda environments:
- base:
*/conda
- jupyter:
/conda/envs/jupyter
- mlrun-base:
/conda/envs/mlrun-base
- mlrun-extended:
/conda/envs/mlrun-extended
The prebaked environments are consistent for pip, but are not persistent for Conda. If you are only using pip, you can use the prebaked Conda environments. If you need to use Conda, create or clone an environment. When you create or clone an environment, it is saved to the V3IO fuse mount by default (/User/.conda/envs/<env name>
) and is persistent for both pip and Conda. Since MLRun is pip-based, it's recommended to use pip whenever possible to avoid dependency-conflicts.
See full details and examples in Creating Python Virtual Environments with Conda.
The platform allows new system packages installation within the Jupyter image by running the apt-get
command. However, those packages are
not persistently stored on the V3IO and exist only within the container, meaning that they are deleted upon restart of the Jupyter
service. If persistence for those packages is needed, the installation commands for those packages can be added to startup hook script,
which runs before Jupyter is launched. You can also add jupyter extensions or other modifications. The jupyter startup script is in /User/.igz/startup-hook.sh
.
If it exists, it is executed just before Jupyter is launched (after all other launch steps and configurations). Any failure of the script
is ignored in order to avoid unnecessary Jupyter downtime.
Jupyter Flavors
In addition, you can use Jupyter terminals to execute shell commands, such as file-system and installation commands. As part of the configuration of the platform's Jupyter Notebook service you select a specific Jupyter flavor and you can optionally define environment variables for the service.
Resources
The platform provides tutorial Jupyter notebooks with code examples ranging from getting-started examples to full end-to-end demo applications, including detailed documentation.
Start out by reading the introductory
Configuring the Service
Pod Priority
Pods (services, or jobs created by those services) can have priorities, which indicate the relative importance of one pod to the other pods on the node. The priority is used for
scheduling: a lower priority pod can be evicted to allow scheduling of a higher priority pod. Pod priority is relevant for all pods created
by the service.
Eviction uses these values to determine what to evict with conjunction to the pods priority. See more details in Interactions between Pod priority and quality of service.
Pod priority is specified through Priority classes, which map to a priority value. The priority values are: High, Medium, Low. The default is Medium.
Configure the default User functions default priority for a service, which is applied to the service itself or to all subsequently created user-jobs in the service's Common Parameters tab, User jobs defaults section, Priority class drop-down list.
Jupyter Flavors
You can set the custom
- Jupyter Full Stack
- A full version of Jupyter for execution over central processing units (CPUs).
- Jupyter Full Stack with GPU
- A full version of Jupyter for execution over graphics processing units (GPUs). This flavor is available only in environments with GPUs and is sometimes referred to in the documentation as the Jupyter "GPU flavor". For more information about the platform's GPU support, see Running Applications over GPUs.
This parameter is in the Custom Parameters tab of the service.
Associate the Jupyter Service with a Trino Service
If you have multiple Trino services in the same cluster, you can associate the Jupyter service with a specific Trino service. See The Trino Service (formerly Presto).
In the
Environment Variables
You can add Environment variables to a Jupyter Notebook service in the
Persistent Volume Claims (PVCs)
You can connect an existing cluster Persistent Volume Claims (PVCs) to a Jupyter Notebook service in the
SSH
You can configure secure connectivity to the Jupyter service using SSH, which enables debugging from remote IDEs such as PyCharm and VSCode.
Enable SSH and configure the port in the
The SSH port must be in the range of 30000–32767, and the SSH connection must be done with user "iguazio" regardless of the identity of the running user of the Jupyter service.
Node Selection
You can assign jobs and functions to a specific node or a node group, to manage your resources, and to differentiate between processes and their respective nodes. A typical example is a workflow that you want to only run on dedicated servers.
When specified, the service or the pods of a function can only run on nodes whose labels match the node selector entries configured for the service. You can also specify labels that were assigned to app nodes by an iguazio IT Admin user. See Setting Labels on App Nodes.
Configure the key-value node selector pairs in the
If node selection for the service is not specified, the selection criteria defaults to the Kubernetes default behavior, and jobs run on a random node. Node selection is relevant for all cloud services.
See more about Kubernetes nodeSelector.
Custom Jupyter Image
You can specify a custom Jupyter image, to optimize the Jupyter notebook runtime for your application needs.
- Store the image in an available docker registry. You can also store a script in this location and it will be run as part of the initialization steps
- Select
Custom image from the Flavor drop-down list, then specify the:- Docker registry
- Image name