Jupyter: Difference between revisions

From Cheaha
Jump to navigation Jump to search
(DeepNLP option)
 
(37 intermediate revisions by 5 users not shown)
Line 1: Line 1:
[http://jupyter.org/ Jupyter Notebook]  is an open-source web application that allows you to create and share documents that contain live code, equations, visualizations and narrative text. For more information on jupyter notebook, click [http://jupyter.org/documentation here].
[http://jupyter.org/ Jupyter Notebook]  is an open-source web application that allows you to create and share documents that contain live code, equations, visualizations and narrative text. For more information on jupyter notebook, click [http://jupyter.org/documentation here].


= Jupyter on Cheaha =
= Jupyter On Demand =


The cheaha cluster supports Jupyter notebooks for data analysis, but such jobs should be running using the SLURM job submission system to avoid overloading the head node. To run a Jupyter Notebook on cheaha, login to cheaha from your client machine and start an [https://docs.uabgrid.uab.edu/wiki/Slurm#Interactive_Job interactive job].  
As of 2019, UAB Research Computing allows access to cheaha via [https://rc.uab.edu On Demand]. To access.
 
== 1. Click [https://rc.uab.edu On Demand] ==
 
 
== 2. Select Interactive App and pick Jupyter Notebook ==
 
[[File:JupyterNotebookStart.png|500px]]
 
== 3. Load in Anaconda ==
'''
module load Anaconda3/5.3.1
'''
The following should also work for an updated version of Anaconda.
'''
module load Anaconda3
'''
 
== 4. If you require running on a '''''GPU''''', please add the following to your environment. ==
 
'''
module load cuda92/toolkit/9.2.88
module load CUDA/9.2.88-GCC-7.3.0-2.30
'''
 
Additionally, you will need to request a GPU as shown below by
including the pascalenodes argument:
 
[[File:PascalNodes.png|500px]]
 
== 5. Click Launch ==
 
[[File:LaunchJupyter.png|500px]]
 
Wait until you receive an email or get a blue Launch button. This can happen in about 10-20 seconds or may take much longer depending on the resources (CPU count and memory requested).
 
== 6. Connect to Jupyter Notebook ==
 
== 7. Test Pytorch with a new notebook ==
 
[[File:TestPytorch.png|500px]]
 
= Adding Custom Conda Environments to Jupyter =
 
See [https://docs.conda.io/projects/conda/en/4.6.0/_downloads/52a95608c49671267e40c689e0bc00ca/conda-cheatsheet.pdf Conda Cheet Sheet] for some helpful Conda documentation. Additional info (some specific to cheaha) is below.
 
== Preparing to build a conda env on cheaha==
Do NOT build your new conda env on the head node, use an interactive job to create the new environment by using the command below.
<pre>
srun --ntasks=1 --cpus-per-task=2 --mem-per-cpu=4096 --time=08:00:00 --partition=medium --job-name=JOB_NAME --pty /bin/bash
</pre>
 
== Set up your cheaha .condarc file (Cheaha) ==
This tells conda where to look for the installed environments. This file should be in your home directory on Cheaha (/data/user/username/.condarc). Make sure to include that '.' in the beginning of the filename.
<pre>
channels:
  - defaults
envs_dirs:
  - /data/user/username/python/env
</pre>
Use the data directory rather than the home directory for your conda environment as it can get quite large.
The file will be hidden and not visible in Jupyter environment. To view or edit it, start terminal and run 'nano ~/.condarc' or 'vi ~/.condarc'.
 
== Build your conda environment ==
Once you are off the root node, you can start building your conda environment.
<pre>
conda create --name mynlp python=3.7
</pre>


One important note is that cheaha only supports openssh, you should be able to use native ssh from Mac or Linux machines. Windows 10 supports openssh as well, but it is not enabled by default. On updated Windows 10 machines, '''a Developers Command Prompt''' (available via searching from the Start Menu) is able to run openssh via the ssh command similar to Mac and Linux users. Another option for Windows machines is the installation of Cygwin. Putty has been [[Setting_Up_VNC_Session#Port-forwarding_from_Windows_Systems|tested]], but does not work reliably on cheaha for proxying connections.
To update an existing environment:
<pre>
conda env update --prefix ./scibert --file scibert.yml --prune
</pre>


== Starting the Jupyter Notebook ==
== Add ipykernel to your conda environment ==
Add this  conda module is what gets your environment to show up in Jupyter notebooks. Insert these lines in your .yml file. You can edit the file in Jupyter environment.
<pre>
<pre>
srun --ntasks=1 --cpus-per-task=4 --mem-per-cpu=4096 --time=08:00:00 --partition=medium --job-name=JOB_NAME --pty /bin/bash
name: scibert
module load Anaconda3/5.2.0
channels:
unset XDG_RUNTIME_DIR
  - defaults
jupyter notebook --no-browser --ip=$host
dependencies:
  - ipykernel=5.1.2
  - _libgcc_mutex=0.1=main
  - alabaster=0.7.12=py37_0
</pre>
</pre>
[[Anaconda]] is a free and open source distribution of python and R for scientific computing. If you need additional packages, you can create your own [[Python_Virtual_Environment]] just for that purpose.


== DeepNLP option ==
== Add other packages of interest ==
For the use of additional libraries (pytorch, spacy) related to Deep Learning and/or NLP after loading Anaconda3/5.2.0 run:
Add additional package of interest to your conda environment, for example "torch", "transformers", etc.. Pip can be used as well.
<pre>
<pre>
conda activate /share/apps/rc/software/Anaconda3/5.2.0/envs/DeepNLP
conda install torch
</pre>
</pre>


== Heavy Data IO option ==
You can see your conda environment packages using
Additionally, if anticipating large IO data transfer adjust the run command to set a higher data rate limit as shown below:
<pre>
<pre>
jupyter notebook --no-browser --ip=$host --NotebookApp.iopub_data_rate_limit=1.0e10
conda list
</pre>
</pre>


== Memory Heavy option ==
 
== (Optional) Export YAML file containing your environment to cheaha ==
If 8you already have a functional conda environment elsewhere (like your laptop) you can  export it to cheaha. The command below exports the "scibert" environment built on another machine. You can create any filename, but keep the extension .yml
<pre>
<pre>
srun --ntasks=1 --cpus-per-task=4 --mem-per-cpu=16384 --time=08:00:00 --partition=medium --job-name=POSTag --pty /bin/bash
conda env export --no-builds > scibert.yml
</pre>
</pre>
Copy the file to Cheaha. You can use "Upload" button in Jupyter


== GPU Option ==
To replicate the new conda environment on cheaha execute the following commands:
Finally, if your job requires a GPU then add the [https://docs.uabgrid.uab.edu/wiki/Slurm#Requesting_for_GPUs gres and partition arguments] as shown below:
<pre>
<pre>
srun --ntasks=1 --cpus-per-task=1 --mem-per-cpu=4096 --time=08:00:00 --partition=pascalnodes --job-name=JOB_NAME --gres=gpu:1 --pty /bin/bash
conda env create --file scibert.yml  --prefix /data/user/username/python/env/scibert
</pre>
</pre>
If you use --name argument instead to specify the environment name, conda will create a subdirectory with same name as the yml filename. That is confusing, so using --prefix makes the environment path cleaner
== Load environment on Jupyter Notebook ==
[[File:JupyterCustomEnv.jpeg|pics500x]]
= Jupyter by Proxy =
'''''(not longer required as of August 2019, use OnDemand option instead and this only as a fallback option)'''''
The cheaha cluster supports Jupyter notebooks for data analysis, but such jobs should be running using the SLURM job submission system to avoid overloading the head node. To run a Jupyter Notebook on cheaha, login to cheaha from your client machine and start an [https://docs.uabgrid.uab.edu/wiki/Slurm#Interactive_Job interactive job].
One important note is that cheaha only supports openssh, you should be able to use native ssh from Mac or Linux machines. Windows 10 supports openssh as well, but it is not enabled by default. On updated Windows 10 machines, '''a Developers Command Prompt''' (available via searching from the Start Menu) is able to run openssh via the ssh command similar to Mac and Linux users. Another option for Windows machines is the installation of Cygwin. Putty has been  [[Setting_Up_VNC_Session#Port-forwarding_from_Windows_Systems|tested]], but does not work reliably on cheaha for proxying connections.


== Copy notebook settings ==
The Jupyter notebooks is built with [[Anaconda]],a free and open source distribution of python and R for scientific computing. If you need additional packages, you can create your own [[Python_Virtual_Environment]] just for that purpose.
After running the jupyter notebook command the server should start running and provide you with a URL including a port # (typically but not always 8888) and a compute node on cheaha (for example C0047) that looks something like this:
 
== 1. Start the Jupyter Notebook ==
<pre>
<pre>
 
srun --ntasks=1 --cpus-per-task=4 --mem-per-cpu=4096 --time=08:00:00 --partition=medium --job-name=JOB_NAME --pty /bin/bash
    Copy/paste this URL into your browser when you connect for the first time,
module load Anaconda3/5.2.0
    to login with a token:
unset XDG_RUNTIME_DIR
        http://c0047:8888/?token=73da89e0eabdeb9d6dc1241a55754634d4e169357f60626c&token=73da89e0eabdeb7d6dc1241a55754634d4e169357f60626c
jupyter notebook --no-browser --ip=$host
</pre>
</pre>
A headless Jupyter notebook should now be running on a compute node. The next step is to proxy this connection to your local machine.


== Proxy Connection Locally ==
== 2. Proxy Connection Locally ==
Now, start up a '''new''' tab/terminal/window on your client machine and relogin to cheaha, using
Now, start up a '''new''' tab/terminal/window on your client machine and relogin to cheaha, using
<pre>
<pre>
Line 56: Line 143:
* '''c00XX''' is the compute node where you started the jupyter notebook, for example c0047
* '''c00XX''' is the compute node where you started the jupyter notebook, for example c0047
* '''88XX''' is the port that the notebook is running, for example 8888
* '''88XX''' is the port that the notebook is running, for example 8888
* For windows users, you can find instructions for port forwarding, [https://docs.uabgrid.uab.edu/wiki/Setting_Up_VNC_Session#Port-forwarding_from_Windows_Systems here]
== 3. Copy notebook URL ==
After running the jupyter notebook command the server should start running in headless mode and provide you with a URL including a port # (typically but not always 8888) and a compute node on cheaha (for example C0047) that looks something like this:
<pre>
    Copy/paste this URL into your browser when you connect for the first time,
    to login with a token:
        http://c0047:8888/?token=73da89e0eabdeb9d6dc1241a55754634d4e169357f60626c&token=73da89e0eabdeb7d6dc1241a55754634d4e169357f60626c
</pre>
Copy the URL shown below into you clipboard/buffer for pasting into the browser as shown in step 4).


== Access Notebook through Local Browser via Proxy Connection ==
== 4. Access Notebook through Local Browser via Proxy Connection ==
Now access the link on your client machine browser locally using the link generated by jupyter notebook by '''substituting in localhost instead of c00XX'''. Make sure you have the correct port as well.
Now access the link on your client machine browser locally using the link generated by jupyter notebook by '''substituting in localhost instead of c00XX'''. Make sure you have the correct port as well.
<pre>
<pre>
http://localhost:88XX/?token=73da89e0eabdeb9d6dc1241a55754634d4e169357f60626c&token=73da89e0eabdeb7d6dc1241a55754634d4e169357f60626c
http://localhost:88XX/?token=73da89e0eabdeb9d6dc1241a55754634d4e169357f60626c&token=73da89e0eabdeb7d6dc1241a55754634d4e169357f60626c
</pre>
A Jupyter notebook should then open in your browser connected to the compute node.
== Jupyter Options ==
=== DeepNLP option (development in progress) ===
For the use of additional libraries (pytorch, spacy) related to Deep Learning and/or NLP after loading Anaconda3/5.2.0 run:
<pre>
conda activate /share/apps/rc/software/Anaconda3/5.2.0/envs/DeepNLP
</pre>
=== Heavy Data IO option ===
Additionally, if anticipating large IO data transfer adjust the run command to set a higher data rate limit as shown below:
<pre>
jupyter notebook --no-browser --ip=$host --NotebookApp.iopub_data_rate_limit=1.0e10
</pre>
=== Memory Heavy option ===
<pre>
srun --ntasks=1 --cpus-per-task=4 --mem-per-cpu=16384 --time=08:00:00 --partition=medium --job-name=POSTag --pty /bin/bash
</pre>
</pre>


A Jupyter notebook should then open in your browser.
=== GPU Option ===
Finally, if your job requires a GPU then add the [https://docs.uabgrid.uab.edu/wiki/Slurm#Requesting_for_GPUs gres and partition arguments] as shown below:
<pre>
srun --ntasks=1 --cpus-per-task=1 --mem-per-cpu=4096 --time=08:00:00 --partition=pascalnodes --job-name=JOB_NAME --gres=gpu:1 --pty /bin/bash
</pre>

Latest revision as of 13:13, 24 September 2020

Jupyter Notebook is an open-source web application that allows you to create and share documents that contain live code, equations, visualizations and narrative text. For more information on jupyter notebook, click here.

Jupyter On Demand

As of 2019, UAB Research Computing allows access to cheaha via On Demand. To access.

1. Click On Demand

2. Select Interactive App and pick Jupyter Notebook

Error creating thumbnail: File missing

3. Load in Anaconda

module load Anaconda3/5.3.1

The following should also work for an updated version of Anaconda.

module load Anaconda3

4. If you require running on a GPU, please add the following to your environment.

module load cuda92/toolkit/9.2.88
module load CUDA/9.2.88-GCC-7.3.0-2.30

Additionally, you will need to request a GPU as shown below by including the pascalenodes argument:

Error creating thumbnail: File missing

5. Click Launch

Error creating thumbnail: File missing

Wait until you receive an email or get a blue Launch button. This can happen in about 10-20 seconds or may take much longer depending on the resources (CPU count and memory requested).

6. Connect to Jupyter Notebook

7. Test Pytorch with a new notebook

Error creating thumbnail: File missing

Adding Custom Conda Environments to Jupyter

See Conda Cheet Sheet for some helpful Conda documentation. Additional info (some specific to cheaha) is below.

Preparing to build a conda env on cheaha

Do NOT build your new conda env on the head node, use an interactive job to create the new environment by using the command below.

srun --ntasks=1 --cpus-per-task=2 --mem-per-cpu=4096 --time=08:00:00 --partition=medium --job-name=JOB_NAME --pty /bin/bash

Set up your cheaha .condarc file (Cheaha)

This tells conda where to look for the installed environments. This file should be in your home directory on Cheaha (/data/user/username/.condarc). Make sure to include that '.' in the beginning of the filename.

 
channels:
  - defaults
envs_dirs:
  - /data/user/username/python/env

Use the data directory rather than the home directory for your conda environment as it can get quite large. The file will be hidden and not visible in Jupyter environment. To view or edit it, start terminal and run 'nano ~/.condarc' or 'vi ~/.condarc'.

Build your conda environment

Once you are off the root node, you can start building your conda environment.

conda create --name mynlp python=3.7

To update an existing environment:

conda env update --prefix ./scibert --file scibert.yml  --prune

Add ipykernel to your conda environment

Add this conda module is what gets your environment to show up in Jupyter notebooks. Insert these lines in your .yml file. You can edit the file in Jupyter environment.

name: scibert
channels:
  - defaults
dependencies:
  - ipykernel=5.1.2
  - _libgcc_mutex=0.1=main
  - alabaster=0.7.12=py37_0

Add other packages of interest

Add additional package of interest to your conda environment, for example "torch", "transformers", etc.. Pip can be used as well.

conda install torch

You can see your conda environment packages using

conda list


(Optional) Export YAML file containing your environment to cheaha

If 8you already have a functional conda environment elsewhere (like your laptop) you can export it to cheaha. The command below exports the "scibert" environment built on another machine. You can create any filename, but keep the extension .yml

conda env export --no-builds > scibert.yml

Copy the file to Cheaha. You can use "Upload" button in Jupyter

To replicate the new conda environment on cheaha execute the following commands:

conda env create --file scibert.yml  --prefix /data/user/username/python/env/scibert

If you use --name argument instead to specify the environment name, conda will create a subdirectory with same name as the yml filename. That is confusing, so using --prefix makes the environment path cleaner

Load environment on Jupyter Notebook

pics500x

Jupyter by Proxy

(not longer required as of August 2019, use OnDemand option instead and this only as a fallback option)

The cheaha cluster supports Jupyter notebooks for data analysis, but such jobs should be running using the SLURM job submission system to avoid overloading the head node. To run a Jupyter Notebook on cheaha, login to cheaha from your client machine and start an interactive job.

One important note is that cheaha only supports openssh, you should be able to use native ssh from Mac or Linux machines. Windows 10 supports openssh as well, but it is not enabled by default. On updated Windows 10 machines, a Developers Command Prompt (available via searching from the Start Menu) is able to run openssh via the ssh command similar to Mac and Linux users. Another option for Windows machines is the installation of Cygwin. Putty has been tested, but does not work reliably on cheaha for proxying connections.

The Jupyter notebooks is built with Anaconda,a free and open source distribution of python and R for scientific computing. If you need additional packages, you can create your own Python_Virtual_Environment just for that purpose.

1. Start the Jupyter Notebook

srun --ntasks=1 --cpus-per-task=4 --mem-per-cpu=4096 --time=08:00:00 --partition=medium --job-name=JOB_NAME --pty /bin/bash
module load Anaconda3/5.2.0
unset XDG_RUNTIME_DIR
jupyter notebook --no-browser --ip=$host

A headless Jupyter notebook should now be running on a compute node. The next step is to proxy this connection to your local machine.

2. Proxy Connection Locally

Now, start up a new tab/terminal/window on your client machine and relogin to cheaha, using

ssh -L 88XX:c00XX:88XX BLAZERID@cheaha.rc.uab.edu

Note:

  • c00XX is the compute node where you started the jupyter notebook, for example c0047
  • 88XX is the port that the notebook is running, for example 8888
  • For windows users, you can find instructions for port forwarding, here

3. Copy notebook URL

After running the jupyter notebook command the server should start running in headless mode and provide you with a URL including a port # (typically but not always 8888) and a compute node on cheaha (for example C0047) that looks something like this:

    Copy/paste this URL into your browser when you connect for the first time,
    to login with a token:
        http://c0047:8888/?token=73da89e0eabdeb9d6dc1241a55754634d4e169357f60626c&token=73da89e0eabdeb7d6dc1241a55754634d4e169357f60626c

Copy the URL shown below into you clipboard/buffer for pasting into the browser as shown in step 4).

4. Access Notebook through Local Browser via Proxy Connection

Now access the link on your client machine browser locally using the link generated by jupyter notebook by substituting in localhost instead of c00XX. Make sure you have the correct port as well.

http://localhost:88XX/?token=73da89e0eabdeb9d6dc1241a55754634d4e169357f60626c&token=73da89e0eabdeb7d6dc1241a55754634d4e169357f60626c

A Jupyter notebook should then open in your browser connected to the compute node.

Jupyter Options

DeepNLP option (development in progress)

For the use of additional libraries (pytorch, spacy) related to Deep Learning and/or NLP after loading Anaconda3/5.2.0 run:

conda activate /share/apps/rc/software/Anaconda3/5.2.0/envs/DeepNLP

Heavy Data IO option

Additionally, if anticipating large IO data transfer adjust the run command to set a higher data rate limit as shown below:

jupyter notebook --no-browser --ip=$host --NotebookApp.iopub_data_rate_limit=1.0e10 

Memory Heavy option

srun --ntasks=1 --cpus-per-task=4 --mem-per-cpu=16384 --time=08:00:00 --partition=medium --job-name=POSTag --pty /bin/bash

GPU Option

Finally, if your job requires a GPU then add the gres and partition arguments as shown below:

srun --ntasks=1 --cpus-per-task=1 --mem-per-cpu=4096 --time=08:00:00 --partition=pascalnodes --job-name=JOB_NAME --gres=gpu:1 --pty /bin/bash