Jupyter: Difference between revisions
Line 60: | Line 60: | ||
</pre> | </pre> | ||
Use the data directory rather than the home directory for your conda environment as it can get quite large | Use the data directory rather than the home directory for your conda environment as it can get quite large | ||
== Add | == Add ipykernel to your conda environment == | ||
Add this conda module is what gets your environment to show up in Jupyter notebooks | Add this conda module is what gets your environment to show up in Jupyter notebooks | ||
<pre> | <pre> | ||
Line 71: | Line 71: | ||
- alabaster=0.7.12=py37_0 | - alabaster=0.7.12=py37_0 | ||
</pre> | </pre> | ||
== Build your conda environment == | == Build your conda environment == | ||
Do NOT run it on the head node, use an interactive job to create the new environment. | Do NOT run it on the head node, use an interactive job to create the new environment. |
Revision as of 03:21, 15 January 2020
Jupyter Notebook is an open-source web application that allows you to create and share documents that contain live code, equations, visualizations and narrative text. For more information on jupyter notebook, click here.
Jupyter On Demand
As of 2019, UAB Research Computing allows access to cheaha via On Demand. To access.
1. Click On Demand
2. Select Interactive App and pick Jupyter Notebook
3. Load in Anaconda
module load Anaconda3/5.3.1
The following should also work for an updated version of Anaconda.
module load Anaconda3
4. If you require running on a GPU, please add the following to your environment.
module load cuda92/toolkit/9.2.88 module load CUDA/9.2.88-GCC-7.3.0-2.30
Additionally, you will need to request a GPU as shown below by including the pascalenodes argument:
5. Click Launch
Wait until you receive an email or get a blue Launch button. This can happen in about 10-20 seconds or may take much longer depending on the resources (CPU count and memory requested).
6. Connect to Jupyter Notebook
7. Test Pytorch with a new notebook
Adding Custom Conda Environments to Jupyter
Export YAML file containing your environment to cheaha
Wherever your working environment is, export it to cheaha. Below exports the scibert environment which I set up on a different server.
conda env export > scibert.yml
Set up your cheaha .condarc file
channels: - defaults envs_dirs: - /data/user/ozborn/Conda_Env
Use the data directory rather than the home directory for your conda environment as it can get quite large
Add ipykernel to your conda environment
Add this conda module is what gets your environment to show up in Jupyter notebooks
name: scibert channels: - defaults dependencies: - ipykernel=5.1.2 - _libgcc_mutex=0.1=main - alabaster=0.7.12=py37_0
Build your conda environment
Do NOT run it on the head node, use an interactive job to create the new environment.
srun --ntasks=1 --cpus-per-task=4 --mem-per-cpu=4096 --time=08:00:00 --partition=medium --job-name=JOB_NAME --pty /bin/bash
Jupyter by Proxy
(not longer required as of August 2019, use OnDemand option instead and this only as a fallback option)
The cheaha cluster supports Jupyter notebooks for data analysis, but such jobs should be running using the SLURM job submission system to avoid overloading the head node. To run a Jupyter Notebook on cheaha, login to cheaha from your client machine and start an interactive job.
One important note is that cheaha only supports openssh, you should be able to use native ssh from Mac or Linux machines. Windows 10 supports openssh as well, but it is not enabled by default. On updated Windows 10 machines, a Developers Command Prompt (available via searching from the Start Menu) is able to run openssh via the ssh command similar to Mac and Linux users. Another option for Windows machines is the installation of Cygwin. Putty has been tested, but does not work reliably on cheaha for proxying connections.
The Jupyter notebooks is built with Anaconda,a free and open source distribution of python and R for scientific computing. If you need additional packages, you can create your own Python_Virtual_Environment just for that purpose.
1. Start the Jupyter Notebook
srun --ntasks=1 --cpus-per-task=4 --mem-per-cpu=4096 --time=08:00:00 --partition=medium --job-name=JOB_NAME --pty /bin/bash module load Anaconda3/5.2.0 unset XDG_RUNTIME_DIR jupyter notebook --no-browser --ip=$host
A headless Jupyter notebook should now be running on a compute node. The next step is to proxy this connection to your local machine.
2. Proxy Connection Locally
Now, start up a new tab/terminal/window on your client machine and relogin to cheaha, using
ssh -L 88XX:c00XX:88XX BLAZERID@cheaha.rc.uab.edu
Note:
- c00XX is the compute node where you started the jupyter notebook, for example c0047
- 88XX is the port that the notebook is running, for example 8888
- For windows users, you can find instructions for port forwarding, here
3. Copy notebook URL
After running the jupyter notebook command the server should start running in headless mode and provide you with a URL including a port # (typically but not always 8888) and a compute node on cheaha (for example C0047) that looks something like this:
Copy/paste this URL into your browser when you connect for the first time, to login with a token: http://c0047:8888/?token=73da89e0eabdeb9d6dc1241a55754634d4e169357f60626c&token=73da89e0eabdeb7d6dc1241a55754634d4e169357f60626c
Copy the URL shown below into you clipboard/buffer for pasting into the browser as shown in step 4).
4. Access Notebook through Local Browser via Proxy Connection
Now access the link on your client machine browser locally using the link generated by jupyter notebook by substituting in localhost instead of c00XX. Make sure you have the correct port as well.
http://localhost:88XX/?token=73da89e0eabdeb9d6dc1241a55754634d4e169357f60626c&token=73da89e0eabdeb7d6dc1241a55754634d4e169357f60626c
A Jupyter notebook should then open in your browser connected to the compute node.
Jupyter Options
DeepNLP option (development in progress)
For the use of additional libraries (pytorch, spacy) related to Deep Learning and/or NLP after loading Anaconda3/5.2.0 run:
conda activate /share/apps/rc/software/Anaconda3/5.2.0/envs/DeepNLP
Heavy Data IO option
Additionally, if anticipating large IO data transfer adjust the run command to set a higher data rate limit as shown below:
jupyter notebook --no-browser --ip=$host --NotebookApp.iopub_data_rate_limit=1.0e10
Memory Heavy option
srun --ntasks=1 --cpus-per-task=4 --mem-per-cpu=16384 --time=08:00:00 --partition=medium --job-name=POSTag --pty /bin/bash
GPU Option
Finally, if your job requires a GPU then add the gres and partition arguments as shown below:
srun --ntasks=1 --cpus-per-task=1 --mem-per-cpu=4096 --time=08:00:00 --partition=pascalnodes --job-name=JOB_NAME --gres=gpu:1 --pty /bin/bash