6. Interactive Computing on Cyclone with Jupyter Notebooks

6.1. Overview

This tutorial introduces participants to running Jupyter Notebooks directly on Cyclone’s compute nodes, enabling interactive workflows for data analysis, AI model development, and other computational tasks. Participants will gain an understanding of the benefits of using Jupyter Notebooks in an HPC environment and learn the step-by-step process to launch and access them. By the end of the tutorial, users will be equipped with the knowledge to set up and interact with Jupyter Notebooks efficiently on Cyclone.

6.1.1. Why Jupyter Notebooks on HPC?

Jupyter Notebooks offer a highly interactive environment that seamlessly combines code execution, visualizations, and narrative explanations, making them ideal for tasks like data exploration, visualization, and AI model development. Their intuitive, web-based interface simplifies complex workflows, lowering the learning curve for users across various expertise levels.
Leveraging Jupyter Notebooks on HPC systems amplifies these benefits by providing access to powerful compute resources, such as CPUs and GPUs, that can handle large-scale datasets and perform demanding AI training or numerical simulations. This integration enables users to work interactively and efficiently, tackling computational challenges beyond the capabilities of local machines.

6.2. Learning Objectives

By the end of this tutorial, participants will be able to:
  1. Understand the advantages of using Jupyter Notebooks on HPC systems for interactive computing.
  2. Follow the steps to configure and launch Jupyter Notebooks on Cyclone’s compute nodes.
  3. Establish secure SSH tunnels to access notebooks from a local browser.
  4. Optimize resource allocation for Jupyter Notebook sessions using SLURM scripts.

6.3. Prerequisites

  1. T01 - Introduction to HPC Systems: This tutorial will give you some basic knowledge on HPC systems and basic terminologies.

  2. T02 - Accessing and Navigating Cyclone:This tutorial will give you some basic knowledge on how to connect, copy files and navigate the HPC system.


6.4. Workflow Steps

Running Jupyter Notebooks on an HPC system involves allocating resources using a SLURM script and establishing a secure connection to access the notebook interface in your local browser, or VS Studio.
To launch and access a notebook from Cyclone's compute nodes, the following workflow must be followed:
  1. Create a Clean Environment: Create a clean conda environment with Jupyter Notebook and relevant dependencies.
  2. Write a SLURM Script: Create a SLURM job script specifying the resources required for your Jupyter session, such as CPUs, memory, or GPUs.
  3. Submit the Script: Use the sbatch command to submit the script to the HPC scheduler, which will allocate the requested resources and launch the Jupyter Notebook server.
  4. Create an SSH Tunnel: Establish a secure SSH tunnel to forward the notebook's port from the remote HPC system to your local machine, enabling browser access.
  5. Open the Notebook: Use the forwarded port to access the Jupyter Notebook interface in your web browser, enabling an interactive and powerful environment for your tasks.

6.5. Initial Setup

First, establish a connection to Cyclone using SSH:
ssh username@cyclone.hpcf.ac.cy
⚠️ Replace username with your actual Cyclone username. If you encounter connection issues, refer back to Tutorial 02 - Accessing and Navigating Cyclone.
Next, create an environment with the necessary dependencies.
⚠️ During these steps you might see this in your terminal:
Proceed ([y]/n)?
Just type the letter y and then press Enter to continue.
First, create a simple conda environment
module load Anaconda3
conda create --name notebookEnv
Your terminal should look something like this
(base) [gkosta@front02 ~]$ module load Anaconda3
(base) [gkosta@front02 ~]$ conda create --name notebookEnv
Retrieving notices: ...working... done
Channels:
 - defaults
Platform: linux-64
Collecting package metadata (repodata.json): done
Solving environment: done

## Package Plan ##

    environment location: /nvme/h/gkosta/.conda/envs/notebookEnv


Proceed ([y]/n)? y

Preparing transaction: done
Verifying transaction: done
Executing transaction: done
#
# To activate this environment, use
#
#     $ conda activate notebookEnv
#
# To deactivate an active environment, use
#
#     $ conda deactivate
To activate the environment, type
conda activate notebookEnv
You should now see the name of the environment before your username:
(base) [gkosta@front02 ~]$ conda activate notebookEnv
(notebookEnv) [gkosta@front02 ~]$
Once you activate the environment, to install the dependencies required by Jupyter notebook run the following:
(notebookEnv) [gkosta@front02 ~]$ conda install -c conda-forge notebook
Proceed ([y]/n)? y

Downloading and Extracting Packages:

Preparing transaction: done
Verifying transaction: done
Executing transaction: done
⚠️ This installation might take a few minutes. Be patient and don't interrupt the process.

6.5. Launching Jupyter on a Compute Node

We'll use a pre-configured Slurm script to launch our Jupyter server. Let's break down the key components:

Step 1: Slurm Environment Setup

We first setup the basic slurm environment variables so our job can be submitted using sbatch:
#!/bin/bash -l

#SBATCH --job-name=jupyter_test
#SBATCH --partition=gpu             # Partition
#SBATCH --nodes=1                   # Number of nodes
#SBATCH --gres=gpu:1                # Number of GPUs
#SBATCH --ntasks-per-node=1         # Number of tasks
#SBATCH --cpus-per-task=10          # Number of cpu cores
#SBATCH --mem=20G                   # Total memory per node
#SBATCH --output=job.%j.out         # Stdout (%j=jobId)
#SBATCH --error=job.%j.err          # Stderr (%j=jobId)
#SBATCH --time=1:00:00              # Walltime
#SBATCH -A <your_project_id>        # Accounting project

In this instance, we're requesting resources from 1 Node (--nodes=1) in the GPU partition (--partition=gpu) with:
  • 1 GPU (--gres=gpu:1)
  • 1 hour (--time=1:00:00)
  • 20GB of RAM (--mem=20G)
  • 10 CPU cores. (--cpus-per-task=10)
The job name is jupyter_test and the usage will be deducted from account your_project_id.
⚠️ Remember to replace your_project_id with your allocated project budget.

Step 2: Activate conda Environment

We load the Anaconda3 module and activate the environment created previously:
# Load any necessary modules and activate environment
module load Anaconda3

conda activate notebookEnv

Step 3: Configure the Jupyter Server

This piece of the Slurm script initialises some basic variables so we can securely connect to our Jupyter server:
# Add our environment as a notebook kernel
python -m ipykernel install --user --name=notebookEnv

# Compute node hostname
HOSTNAME=$(hostname)

# Generate random ports for Jupyter
JUPYTER_PORT=$(shuf -i 10000-60000 -n 1)

# Generate a random password for Jupyter Notebook
PASSWORD=$(openssl rand -base64 12)

# Hash the password using Jupyter's built-in function
HASHED_PASSWORD=$(python -c "from jupyter_server.auth import passwd; print(passwd('$PASSWORD'))")
Let's look at the above code snapshot step by step:
First, we start by adding our environment as a notebook kernel. This is done so we can effieciently manage our python packages. You can add more environments for different use cases. For example you can have a conda environment for PyTorch and one for Tensorflow.
# Add our environment as a notebook kernel
python -m ipykernel install --user --name=notebookEnv
Then, we retrieve the hostname or IP of the compute node:
# Compute node hostname
HOSTNAME=$(hostname)
Next, we generate random port numbers so we're less likely to use an already used port. Additionally, we generate a random hashed password to avoid unauthorised usage of your Jupyter server and HPC resources.
# Generate random ports for Jupyter
JUPYTER_PORT=$(shuf -i 10000-60000 -n 1)

# Generate a random password for Jupyter Notebook
PASSWORD=$(openssl rand -base64 12)

# Hash the password using Jupyter's built-in function
HASHED_PASSWORD=$(python -c "from jupyter_server.auth import passwd; print(passwd('$PASSWORD'))")

Step 4: Launching the Jupyter Server

We launch the jupyter server with the variables we just generated. Feel free to change the --notebook-dir option to point at whatever directory you want.
# Run Jupyter notebook
jupyter notebook --port=$JUPYTER_PORT --NotebookApp.password="$HASHED_PASSWORD" --notebook-dir="$HOME" --no-browser > jupyter.log 2>&1 &
The jupyter command generates a blocking process, meaning it keeps control of our bash session until we end that process. So we redirect it's output to the jupyter.log file and leave it running as a background process.

Step 5: Generating Connection Commands

Since we want to connect from our personal machine, laptop for example, to the Jupyter server running on the compute node, we'll need an SSH tunnel. This tunnel will first create a jump connection from the front node to our assigned compute node, and then bind the port our server is running to our local machine's port. We've prepared the code which automatically generates this command for you:
LOGIN_HOST="cyclone.hpcf.cyi.ac.cy"


# Prepare the message to be displayed and saved to a file
CONNECTION_MESSAGE=$(cat <<EOF
==================================================================
Run this command to connect on your jupyter notebooks remotely
ssh -N -J ${USER}@${LOGIN_HOST} ${USER}@${HOSTNAME} -L ${JUPYTER_PORT}:localhost:${JUPYTER_PORT}


Jupyter Notebook is running at: http://localhost:$JUPYTER_PORT
Password to access the notebook: $PASSWORD
==================================================================
EOF
)

# Print the connection details to both the terminal and a txt file
echo "$CONNECTION_MESSAGE" | tee ./connection_info.txt

wait

The Complete Script

The complete script for Steps 1-5 is listed below for your convenience:
#!/bin/bash -l

#SBATCH --job-name=jupyter_test
#SBATCH --partition=gpu             # Partition
#SBATCH --nodes=1                   # Number of nodes
#SBATCH --gres=gpu:1                # Number of GPUs
#SBATCH --ntasks-per-node=1         # Number of tasks
#SBATCH --cpus-per-task=10          # Number of cpu cores
#SBATCH --mem=20G                   # Total memory per node
#SBATCH --output=job.%j.out         # Stdout (%j=jobId)
#SBATCH --error=job.%j.err          # Stderr (%j=jobId)
#SBATCH --time=1:00:00              # Walltime
#SBATCH -A <your_project_id>        # Accounting project


# Load any necessary modules and activate environment
module load Anaconda3

conda activate notebookEnv

# Add our environment as a notebook kernel
python -m ipykernel install --user --name=notebookEnv

# Compute node hostname
HOSTNAME=$(hostname)

# Generate random ports for Jupyter
JUPYTER_PORT=$(shuf -i 10000-60000 -n 1)

# Generate a random password for Jupyter Notebook
PASSWORD=$(openssl rand -base64 12)

# Hash the password using Jupyter's built-in function
HASHED_PASSWORD=$(python -c "from jupyter_server.auth import passwd; print(passwd('$PASSWORD'))")


# Run Jupyter notebook
jupyter notebook --port=$JUPYTER_PORT --NotebookApp.password="$HASHED_PASSWORD" --notebook-dir="$HOME" --no-browser > jupyter.log 2>&1 &

sleep 5


LOGIN_HOST="cyclone.hpcf.cyi.ac.cy"


# Prepare the message to be displayed and saved to a file
CONNECTION_MESSAGE=$(cat <<EOF
==================================================================
Run this command to connect on your jupyter notebooks remotely
ssh -N -J ${USER}@${LOGIN_HOST} ${USER}@${HOSTNAME} -L ${JUPYTER_PORT}:localhost:${JUPYTER_PORT}


Jupyter Notebook is running at: http://localhost:$JUPYTER_PORT
Password to access the notebook: $PASSWORD
==================================================================
EOF
)

# Print the connection details to both the terminal and a txt file
echo "$CONNECTION_MESSAGE" | tee ./connection_info.txt

wait
To create the script:
[gcosta@front02 ~]$ cd $HOME
[gcosta@front02 ~]$ mkdir tutorial_06
[gcosta@front02 ~]$ cd tutorial_06
[gcosta@front02 tutorial_06]$ touch launch_notebook.sh
[gcosta@front02 tutorial_06]$ nano launch_notebook.sh     # copy the Bash code above
[gcosta@front02 tutorial_06]$ chmod +x launch_notebook.sh # make the script executable

Step 6: Job Submission

Now that everything is configured, let's submit this slurm script and see what it does.
Submit the launch_notebook.sh from inside tutorial_06 directory using the following command:
[gcosta@front02 tutorial_06]$ sbatch launch_notebook.sh
Submitted batch job 1034638
In this instance 1034638 is your job id. To view the status of your job you can use the squeue command:
squeue -u $USER
The output will look like this:
[gcosta@front02 tutorial_06]$ sbatch launch_notebook.sh
Submitted batch job 1034638
[gcosta@front02 tutorial_06]$ squeue --u $USER
             JOBID PARTITION     NAME     USER ST       TIME  NODES NODELIST(REASON)
           1034638       gpu jupyter_   gkosta  R      38:09      1 gpu01
ℹ️ Under the ST column you can see the status of your job. In this case R means it's running. If you see CF then it means your node is in its configuration state - waiting 5 minutes should be enough for it to get ready and for your job to start. If you see PD then it means your job is Pending resource allocation, meaning there aren't enough resources and your job has been placed on a queue.
When you're sure your job is running, you should see some new files generated in your directory:
[gcosta@front02 tutorial_06]$ ls -l
total 5
-rw-r--r-- 1 gkosta p166   382 Dec 17 13:09 connection_info.txt
-rw-r--r-- 1 gkosta p166    81 Dec 17 13:48 job.1034638.err
-rw-r--r-- 1 gkosta p166   474 Dec 17 13:09 job.1034638.out
-rw-r--r-- 1 gkosta p166  8308 Dec 17 13:50 jupyter.log
-rwxr-xr-x 1 gkosta p166  1977 Dec 17 12:47 launch_notebook.sh
  • job.1034638.out is your jobs output stream redirection
  • job.1034638.err is your jobs error stream redirection
  • jupyter.log is your jupyter server log output
  • connection_info.txt contains the information on how to access the Jupyter Server on the compute node.
ℹ️ The only file we are interested in here is connection_info.txt, which will be described in detail in the next section. Unless you are debugging something the remaining files shouldn't concern you.

6.6. Connect to the Jupyter Server

We'll look at two different options on how we can run notebooks on the Jupyter server we just launched on the compute node of Cyclone:
  1. Browser
  2. VSCode

6.6.1. Locating the Connection Information

Before we connect to the Jupyter Server, we need to create the SSH tunnel to securely forward ports from the Cyclone to our local machine. The connection info is stored in a text file with the name connection_info.txt. The file is located in the directory the Slurm script launch_notebook.sh was executed from (i.e. in $HOME/tutorial_06). To view its content you can use your VSCode editor if you're following from Tutorial 03: Setting up and Using Development Tools, or simply use the cat command from your terminal:
[gcosta@front02 tutorial_06]$ cat ./connection_info.txt
==================================================================
Run this command to connect on your jupyter notebooks remotely
ssh -N -J gkosta@cyclone.hpcf.cyi.ac.cy gkosta@gpu01 -L 11083:localhost:11083


Jupyter Notebook is running at: http://localhost:11083
Password to access the notebook: s23un9qxYjpenFnE
==================================================================

6.6.2. Establishing the SSH Tunnel

Locate the tunneling command in connection_instructions.txt. It looks like:
ssh -N -J <username>@cyclone.hpcf.cyi.ac.cy <username>@<batch-node> -L <port>:localhost:<port>
In my case, the command for SSH Tunneling would be:
ssh -N -J gkosta@cyclone.hpcf.cyi.ac.cy gkosta@gpu01 -L 11083:localhost:11083
In other words, running this command will create a secure connection for user gkosta from compute node gpu01 to our local machine through Cyclone's login nodes and via ports 11083. Note that for your case, the command will be adjusted with your own username, allocated compute node and ports.
Now, open a new terminal and run your own SSH Tunneling command on your local machine:
gkosta@gkosta-dell:~$ ssh -N -J gkosta@cyclone.hpcf.cyi.ac.cy gkosta@gpu01 -L 11083:localhost:11083

ℹ️ The SSH command is blocking, meaning nothing will be printed when you run the above command. You may be prompted though for your key's passphrase. Otherwise, your cursor will stay there blinking with the connection established. Minimise the window and you are ready for the next step.
‼️ The SSH command should be run on a fresh local terminal, NOT the one already connected to Cyclone.

6.6.3. Connecting via Browser

With the SSH tunnel running, our local machine now is now connected to the compute node via the port 11083 in the above example. To launch the Jupyter notebook in a browser, just pick your favourite browser and copy the link printed from connection_info.txt, in our case it's http://localhost:11083.
You should reach a page asking for the password looking like this:

browser_asking_password

The password can be found again in connection_info.txt. Once we input the password from, which in this example is s23un9qxYjpenFnE, and press the Log In button, we're in!

jupyter_home_page

Let's create a new notebook! Click the New button

home_new_button

Now we can see serveral options:

alt text

  • Notebook kernels: Various Python Kernels available to be used
    • Python 3 (ipykernel): Default Python kernel
    • notebookEnv: Our custom kernel we added
    ℹ️ Note that both kernels have the same Python Interpreter, i.e., the one in our conda environment.
  • Terminal: Launches a terminal session on the compute node. You can use this for running htop or nvidia-smi to view hardware utilisation.
  • Console: Launches a python interactive shell.
  • New File: Create a new file, this might be a text file, a python script or whatever you want.
  • New Folder: Create a new folder
If you click on any of the Python kernel options a new tab in your browser will open with a notebook:

notebook_helloworld

As you can see, in this case we have selected the notebookEnv. In other words, this notebook will now run on 1 GPU on the GPU nodes (gpu01) of Cyclone, using the environment as configured in notebookEnv kernel.

6.6.4. Connecting via VSCode

Alternatively, you can use VS code to view and run notebooks in similar manner. To view and run notebooks in VSCode we need to have some extentions installed. Searching jupyter in the extensions tab of VSCode should show you something like this:

jupyter_extension_selection

Click install on the one circled and wait for it to be installed. Once that's done, Open a Folder on your local machine:

newfile_vscode

For this example we have created a folder called vs_tutorial_06, but it can be any folder you'd like:

repo_folder

Right click inside the folder, and create a New File:

example_file_explorer

Name the file example_notebook.ipynb. Make sure to add the .ipynb extention at the end!

create_notebook_file

Now the notebook should open in your VSCode window. You are now ready to connect this notebook to the Jupyter Server running on the compute node. We do this by selecting a remote server by pressing the Select Kernel button at the top right of your screen:

alt text

Then you will see this in the top middle of your screen:

alt text

Select Existing Jupyter Server...
Add the link that's inside your connection_info.txt

alt text

Add the password, again found inside the connection_info.txt

alt text

And finally a display name for your connection, this can be anything you want:

alt text

Select the appropriate kernel:

alt text

That's it. Now your notebook is running remotely on the compute node! Adding a couple of cells and calling nvidia-smi shows us the 1 GPU running on gpu01:

alt text


6.7. Notes and Troubleshooting

6.7.1. Port Conflicts in SSH Tunnel

Error message "Address already in use" or unable to connect to the specified port.
  • Check if the port is already in use:
lsof -i :PORT_NUMBER     # On your local machine
  • Kill any existing SSH tunnels:
pkill -f "ssh -N -J"

6.7.2. SSH Authentication issues

SSH key authentication failures
  • Verify your SSH key is properly added to Cyclone:
ssh-add -l               # List loaded keys
ssh-add ~/.ssh/id_rsa    # Add your key if needed
  • Check key permissions:
chmod 600 ~/.ssh/id_rsa
chmod 700 ~/.ssh
💡 If you are still facing SSH/connection issues, the Notes and Troubleshooting Section in Tutorial 2 might be benefecial.

6.7.3. Activating Conda in Slurm Script

When initialising conda inside a Slurm Script, such as when running conda activate notebookEnv in launch_notebook.sh, you might come across the error "Failure to initialise Conda". If conda fails to initialise inside the SLURM script, you will need to add the following after loading the Anaconda module (i.e. after module load Anaconda3 with):
__conda_setup="$('/nvme/h/buildsets/eb_cyclone_rl/software/Anaconda3/2023.03-1/bin/conda' 'shell.bash' 'hook' 2> /dev/null)"
if [ $? -eq 0 ]; then
    eval "$__conda_setup"
else
    if [ -f "/nvme/h/buildsets/eb_cyclone_rl/software/Anaconda3/2023.03-1/etc/profile.d/conda.sh" ]; then
        . "/nvme/h/buildsets/eb_cyclone_rl/software/Anaconda3/2023.03-1/etc/profile.d/conda.sh"
    else
        export PATH="/nvme/h/buildsets/eb_cyclone_rl/software/Anaconda3/2023.03-1/bin:$PATH"
    fi
fi
unset __conda_setup

6.7.4. General Debugging Tips

Check the job output files for errors:
cat job.[jobid].out
cat job.[jobid].err
⚠️ Replace [jobid] with the your Job ID
These commands will print out the contents of the outputs of the job. They might contain some more information that will guide you to find the problem. Some examples:
  • The conda environment name might be wrong.
  • Package dependency issues inside your conda environment.
  • The project you're requesting resources from might not have access to the partition you requested.