The Jupyter Notebook Service

On This Page

Overview

Jupyter is a project for development of open-source software, standards, and services for interactive computation across multiple programming languages. The Platform comes preinstalled with the JupyterLab web-based user interface, including Jupyter Notebook and JupyterLab Terminals, which are available via a Jupyter Notebook user application service.

Jupyter Notebook is an open-source web application that allows users to create and share documents that contain live code, equations, visualizations, and narrative text; it's currently the leading industry tool for data exploration and training. Jupyter Notebook supports integration with all key analytics services, enabling users to perform all stages of the data science flow, from data collection to production, from a single interface using various APIs and tools to concurrently access the same data without having to move the data. Your Jupyter Notebook code can execute Spark jobs (for example, using Spark DataFrames); run SQL queries using Trino; define, deploy, and trigger Nuclio serverless functions; send web-API requests; use pandas and V3IO Frames DataFrames; use the Dask library to scale the use of pandas DataFrames; and more.

You can use Conda and pip, which are available as part of the Jupyter Notebook service, to easily install Python packages such as Dask and machine-learning and computation packages. Python packages installed on Jupyter are, by default, installed in a non-persistent location. Such packages are removed once the service is restarted and would therefore require re-installation. To persist the Python packages installation, use the Conda environment that installs packages to a persistent location:
conda create -p /User/my_env -y python=3.8.x ipykernel

Iguazio allows new system packages installation within the Jupyter image by running the apt-get command. However, those packages are not persistently stored on the V3IO and exist only within the container, meaning that they are deleted upon restart of the Jupyter service. If persistence for those packages is needed, the installation commands for those packages can be added to startup hook script, which runs before Jupyter is launched. You can also add jupyter extensions or other modifications. The jupyter startup script is in /User/.igz/startup-hook.sh. If it exists, it is executed just before Jupyter is launched (after all other launch steps and configurations). Any failure of the script is ignored in order to avoid unnecessary Jupyter downtime.

In addition, you can use Jupyter terminals to execute shell commands, such as file-system and installation commands. As part of the configuration of the platform's Jupyter Notebook service you select a specific Jupyter flavor and you can optionally define environment variables for the service.

Iguazio provides tutorial Jupyter notebooks with code examples ranging from getting-started examples to full end-to-end demo applications, including detailed documentation. Start out by reading the introductory welcome.ipynb notebook (available also as a Markdown README.md file), which is similar to the introduction on the documentation site. Then, proceed to the getting-started tutorial.

Configuring the Service

Pod Priority

Pods (services, or jobs created by those services) can have priorities, which indicate the relative importance of one pod to the other pods on the node. The priority is used for scheduling: a lower priority pod can be evicted to allow scheduling of a higher priority pod. Pod priority is relevant for all pods created by the service.
Eviction uses these values to determine what to evict with conjunction to the pods priority. See more details in Interactions between Pod priority and quality of service.

Pod priority is specified through Priority classes, which map to a priority value. The priority values are: High, Medium, Low. The default is Medium.

Configure the default User functions default priority for a service, which is applied to the service itself or to all subsequently created user-jobs in the service's Common Parameters tab, User jobs defaults section, Priority class drop-down list.

Jupyter Flavors

You can set the custom Flavor parameter of the Jupyter Notebook service to one of the following flavors to install a matching Jupyter Docker image:

Jupyter Full Stack
A full version of Jupyter for execution over central processing units (CPUs).
Jupyter Full Stack with GPU
A full version of Jupyter for execution over graphics processing units (GPUs). This flavor is available only in environments with GPUs and is sometimes referred to in the documentation as the Jupyter "GPU flavor". For more information about the platform's GPU support, see Running Applications over GPUs.

This parameter is in the Custom Parameters tab of the service.

Associate the Jupyter Service with a Trino Service

If you have multiple Trino services in the same cluster, you can associate the Jupyter service with a specific Trino service. See The Trino Service (formerly Presto).
In the Custom Parameters tab of the Jupyter service, select the service from the Trino drop-down list, or press Create new.. to open the Create a new service page.

Environment Variables

You can add Environment variables to a Jupyter Notebook service in the Custom Parameters tab of the service.

Persistent Volume Claims (PVCs)

You can connect an existing cluster Persistent Volume Claims (PVCs) to a Jupyter Notebook service in the Custom Parameters tab of the service.

SSH

You can configure secure connectivity to the Jupyter service using SSH, which enables debugging from remote IDEs such as PyCharm and VSCode. Enable SSH and configure the port in the Custom Parameters tab. When SSH is configured, you can get the authentication key from the service menu User SSH option.
The SSH port must be in the range of 30000–32767, and the SSH connection must be done with user "iguazio" regardless of the identity of the running user of the Jupyter service.

Node Selection

You can assign jobs and functions to a specific node or a node group, to manage your resources, and to differentiate between processes and their respective nodes. A typical example is a workflow that you want to only run on dedicated servers.

When specified, the service or the pods of a function can only run on nodes whose labels match the node selector entries configured for the service. You can also specify labels that were assigned to app nodes by an iguazio IT Admin user. See Setting Labels on App Nodes.

Configure the key-value node selector pairs in the Custom Parameters tab of the service.

If node selection for the service is not specified, the selection criteria defaults to the Kubernetes default behavior, and jobs run on a random node.
Node selection is relevant for all cloud services.

See more about Kubernetes nodeSelector.

Custom Jupyter Image

You can specify a custom Jupyter image, to optimize the Jupyter notebook runtime for your application needs.

  1. Store the image in an available docker registry. You can also store a script in this location and it will be run as part of the initialization steps
  2. Select Custom image from the Flavor drop-down list, then specify the:
    • Docker registry
    • Image name

See Also