Jupyter: Difference between revisions

From Cheaha
Jump to navigation Jump to search
(Detailing steps to starting Jupyter)
Line 7: Line 7:
One important note is that cheaha only supports openssh, you should be able to use native ssh from Mac or Linux machines. Windows 10 supports openssh as well, but it is not enabled by default. On updated Windows 10 machines, '''a Developers Command Prompt''' (available via searching from the Start Menu) is able to run openssh via the ssh command similar to Mac and Linux users. Another option for Windows machines is the installation of Cygwin. Putty has been  [[Setting_Up_VNC_Session#Port-forwarding_from_Windows_Systems|tested]], but does not work reliably on cheaha for proxying connections.
One important note is that cheaha only supports openssh, you should be able to use native ssh from Mac or Linux machines. Windows 10 supports openssh as well, but it is not enabled by default. On updated Windows 10 machines, '''a Developers Command Prompt''' (available via searching from the Start Menu) is able to run openssh via the ssh command similar to Mac and Linux users. Another option for Windows machines is the installation of Cygwin. Putty has been  [[Setting_Up_VNC_Session#Port-forwarding_from_Windows_Systems|tested]], but does not work reliably on cheaha for proxying connections.


== Starting the Jupyter Notebook ==
The Jupyter notebooks is built with [[Anaconda]],a free and open source distribution of python and R for scientific computing. If you need additional packages, you can create your own [[Python_Virtual_Environment]] just for that purpose.
 
== 1. Start the Jupyter Notebook ==
<pre>
<pre>
srun --ntasks=1 --cpus-per-task=4 --mem-per-cpu=4096 --time=08:00:00 --partition=medium --job-name=JOB_NAME --pty /bin/bash
srun --ntasks=1 --cpus-per-task=4 --mem-per-cpu=4096 --time=08:00:00 --partition=medium --job-name=JOB_NAME --pty /bin/bash
Line 14: Line 16:
jupyter notebook --no-browser --ip=$host
jupyter notebook --no-browser --ip=$host
</pre>
</pre>
[[Anaconda]] is a free and open source distribution of python and R for scientific computing. If you need additional packages, you can create your own [[Python_Virtual_Environment]] just for that purpose.
A headless Jupyter notebook should now be running on a compute node. The next step is to proxy this connection to your local machine.


== DeepNLP option (development in progress) ==
== 2. Proxy Connection Locally ==
For the use of additional libraries (pytorch, spacy) related to Deep Learning and/or NLP after loading Anaconda3/5.2.0 run:
Now, start up a '''new''' tab/terminal/window on your client machine and relogin to cheaha, using
<pre>
<pre>
conda activate /share/apps/rc/software/Anaconda3/5.2.0/envs/DeepNLP
ssh -L 88XX:c00XX:88XX BLAZERID@cheaha.rc.uab.edu
</pre>
</pre>
'''Note:'''
* '''c00XX''' is the compute node where you started the jupyter notebook, for example c0047
* '''88XX''' is the port that the notebook is running, for example 8888


== Heavy Data IO option ==
== 3. Copy notebook settings ==
Additionally, if anticipating large IO data transfer adjust the run command to set a higher data rate limit as shown below:
After running the jupyter notebook command the server should start running in headless mode and provide you with a URL including a port # (typically but not always 8888) and a compute node on cheaha (for example C0047) that looks something like this:
<pre>
<pre>
jupyter notebook --no-browser --ip=$host --NotebookApp.iopub_data_rate_limit=1.0e10
    Copy/paste this URL into your browser when you connect for the first time,
    to login with a token:
        http://c0047:8888/?token=73da89e0eabdeb9d6dc1241a55754634d4e169357f60626c&token=73da89e0eabdeb7d6dc1241a55754634d4e169357f60626c
</pre>
</pre>
Copy the URL shown below into you clipboard/buffer for pasting into the browser as shown in step 4).


== Memory Heavy option ==
== 4. Access Notebook through Local Browser via Proxy Connection ==
Now access the link on your client machine browser locally using the link generated by jupyter notebook by '''substituting in localhost instead of c00XX'''. Make sure you have the correct port as well.
<pre>
<pre>
srun --ntasks=1 --cpus-per-task=4 --mem-per-cpu=16384 --time=08:00:00 --partition=medium --job-name=POSTag --pty /bin/bash
http://localhost:88XX/?token=73da89e0eabdeb9d6dc1241a55754634d4e169357f60626c&token=73da89e0eabdeb7d6dc1241a55754634d4e169357f60626c
</pre>
</pre>
A Jupyter notebook should then open in your browser connected to the compute node.
== Jupyter Options ==


== GPU Option ==
=== DeepNLP option (development in progress) ===
Finally, if your job requires a GPU then add the [https://docs.uabgrid.uab.edu/wiki/Slurm#Requesting_for_GPUs gres and partition arguments] as shown below:
For the use of additional libraries (pytorch, spacy) related to Deep Learning and/or NLP after loading Anaconda3/5.2.0 run:
<pre>
<pre>
srun --ntasks=1 --cpus-per-task=1 --mem-per-cpu=4096 --time=08:00:00 --partition=pascalnodes --job-name=JOB_NAME --gres=gpu:1 --pty /bin/bash
conda activate /share/apps/rc/software/Anaconda3/5.2.0/envs/DeepNLP
</pre>
</pre>


== Copy notebook settings ==
=== Heavy Data IO option ===
After running the jupyter notebook command the server should start running and provide you with a URL including a port # (typically but not always 8888) and a compute node on cheaha (for example C0047) that looks something like this:
Additionally, if anticipating large IO data transfer adjust the run command to set a higher data rate limit as shown below:
<pre>
<pre>
 
jupyter notebook --no-browser --ip=$host --NotebookApp.iopub_data_rate_limit=1.0e10
    Copy/paste this URL into your browser when you connect for the first time,
    to login with a token:
        http://c0047:8888/?token=73da89e0eabdeb9d6dc1241a55754634d4e169357f60626c&token=73da89e0eabdeb7d6dc1241a55754634d4e169357f60626c
</pre>
</pre>


== Proxy Connection Locally ==
=== Memory Heavy option ===
Now, start up a '''new''' tab/terminal/window on your client machine and relogin to cheaha, using
<pre>
<pre>
ssh -L 88XX:c00XX:88XX BLAZERID@cheaha.rc.uab.edu
srun --ntasks=1 --cpus-per-task=4 --mem-per-cpu=16384 --time=08:00:00 --partition=medium --job-name=POSTag --pty /bin/bash
</pre>
</pre>
'''Note:'''
* '''c00XX''' is the compute node where you started the jupyter notebook, for example c0047
* '''88XX''' is the port that the notebook is running, for example 8888


== Access Notebook through Local Browser via Proxy Connection ==
=== GPU Option ===
Now access the link on your client machine browser locally using the link generated by jupyter notebook by '''substituting in localhost instead of c00XX'''. Make sure you have the correct port as well.
Finally, if your job requires a GPU then add the [https://docs.uabgrid.uab.edu/wiki/Slurm#Requesting_for_GPUs gres and partition arguments] as shown below:
<pre>
<pre>
http://localhost:88XX/?token=73da89e0eabdeb9d6dc1241a55754634d4e169357f60626c&token=73da89e0eabdeb7d6dc1241a55754634d4e169357f60626c
srun --ntasks=1 --cpus-per-task=1 --mem-per-cpu=4096 --time=08:00:00 --partition=pascalnodes --job-name=JOB_NAME --gres=gpu:1 --pty /bin/bash
</pre>
</pre>
A Jupyter notebook should then open in your browser.

Revision as of 14:54, 1 November 2018

Jupyter Notebook is an open-source web application that allows you to create and share documents that contain live code, equations, visualizations and narrative text. For more information on jupyter notebook, click here.

Jupyter on Cheaha

The cheaha cluster supports Jupyter notebooks for data analysis, but such jobs should be running using the SLURM job submission system to avoid overloading the head node. To run a Jupyter Notebook on cheaha, login to cheaha from your client machine and start an interactive job.

One important note is that cheaha only supports openssh, you should be able to use native ssh from Mac or Linux machines. Windows 10 supports openssh as well, but it is not enabled by default. On updated Windows 10 machines, a Developers Command Prompt (available via searching from the Start Menu) is able to run openssh via the ssh command similar to Mac and Linux users. Another option for Windows machines is the installation of Cygwin. Putty has been tested, but does not work reliably on cheaha for proxying connections.

The Jupyter notebooks is built with Anaconda,a free and open source distribution of python and R for scientific computing. If you need additional packages, you can create your own Python_Virtual_Environment just for that purpose.

1. Start the Jupyter Notebook

srun --ntasks=1 --cpus-per-task=4 --mem-per-cpu=4096 --time=08:00:00 --partition=medium --job-name=JOB_NAME --pty /bin/bash
module load Anaconda3/5.2.0
unset XDG_RUNTIME_DIR
jupyter notebook --no-browser --ip=$host

A headless Jupyter notebook should now be running on a compute node. The next step is to proxy this connection to your local machine.

2. Proxy Connection Locally

Now, start up a new tab/terminal/window on your client machine and relogin to cheaha, using

ssh -L 88XX:c00XX:88XX BLAZERID@cheaha.rc.uab.edu

Note:

  • c00XX is the compute node where you started the jupyter notebook, for example c0047
  • 88XX is the port that the notebook is running, for example 8888

3. Copy notebook settings

After running the jupyter notebook command the server should start running in headless mode and provide you with a URL including a port # (typically but not always 8888) and a compute node on cheaha (for example C0047) that looks something like this:

    Copy/paste this URL into your browser when you connect for the first time,
    to login with a token:
        http://c0047:8888/?token=73da89e0eabdeb9d6dc1241a55754634d4e169357f60626c&token=73da89e0eabdeb7d6dc1241a55754634d4e169357f60626c

Copy the URL shown below into you clipboard/buffer for pasting into the browser as shown in step 4).

4. Access Notebook through Local Browser via Proxy Connection

Now access the link on your client machine browser locally using the link generated by jupyter notebook by substituting in localhost instead of c00XX. Make sure you have the correct port as well.

http://localhost:88XX/?token=73da89e0eabdeb9d6dc1241a55754634d4e169357f60626c&token=73da89e0eabdeb7d6dc1241a55754634d4e169357f60626c

A Jupyter notebook should then open in your browser connected to the compute node.

Jupyter Options

DeepNLP option (development in progress)

For the use of additional libraries (pytorch, spacy) related to Deep Learning and/or NLP after loading Anaconda3/5.2.0 run:

conda activate /share/apps/rc/software/Anaconda3/5.2.0/envs/DeepNLP

Heavy Data IO option

Additionally, if anticipating large IO data transfer adjust the run command to set a higher data rate limit as shown below:

jupyter notebook --no-browser --ip=$host --NotebookApp.iopub_data_rate_limit=1.0e10 

Memory Heavy option

srun --ntasks=1 --cpus-per-task=4 --mem-per-cpu=16384 --time=08:00:00 --partition=medium --job-name=POSTag --pty /bin/bash

GPU Option

Finally, if your job requires a GPU then add the gres and partition arguments as shown below:

srun --ntasks=1 --cpus-per-task=1 --mem-per-cpu=4096 --time=08:00:00 --partition=pascalnodes --job-name=JOB_NAME --gres=gpu:1 --pty /bin/bash