Namespaces

Variants
Actions
Personal tools

Research Storage

From UABgrid Documentation

(Difference between revisions)
Jump to: navigation, search
(`$LOCAL_SCRATCH`)
 
(8 intermediate revisions by 2 users not shown)
Line 1: Line 1:
[[File:Example.jpg]]
+
Research Storage is a scalable storage fabric designed to grow with your research operations.  Data set sizes are growing dramatically. Keeping enough storage on hand to manage [[wikipedia:Big_Data|Big Data]] is a challenge for everyone.  The Research Storage service provides nearly unlimited capacity to hold the data important to your research.  The service is built using flexible technologies that can support any research data requirement.
 +
 
 +
== Introduction ==
 +
 
 +
The following description of research storage locations and hardware assumes that all files and data will be placed on Cheaha with the intent of being used for computation related to a legitimate research need. Compute storage is not intended for archival or backup purposes, and is not intended to store personal, non-research and non-educational data. Educational data should be limited to coursework requiring computational research. Research Computing does not own any data placed by users on Cheaha. Backup services are not provided by the Research Computing department and must be maintained by the data owner.
 +
 
 +
The following terms may be useful:
 +
* GB - gigabyte
 +
* TB - terabyte (1024 GB)
 +
* PB - petabyte (1024 PB)
 +
* $ - anything following a $ symbol is a shell variable. All of the variables used here should be predefined in your shell environment on Cheaha
 +
* $USER - a shell variable containing your username/blazerid or XIAS account username
 +
 
 +
All users are provided with a 5 TB personal allocation under `/data/user/$USER`, also known as `$USER_DATA`. Users with collaboration needs such as PI labs, large-scale research projects or external collaborators on projects may request space under `/data/project`, also known as `$SHARE_PROJECT`. The default shared space is 50 TB.
 +
 
 +
Users may also make use of global and node-local scratch space. Global scratch space is `/scratch/$USER` also known as `$USER_SCRATCH` and has 1 PB total. Node-local scratch space is `/scratch/local` or `$LOCAL_SCRATCH` and has 1 TB per node. Files in these locations should be cleaned up by users regularly when used, and as soon as jobs are completed for `$LOCAL_SCRATCH`.
 +
 
 
== Description ==
 
== Description ==
 +
 +
The following description of research storage locations and hardware assumes that all files and data will be placed on Cheaha with the intent of being used for computation related to a legitimate research need. Compute storage is not intended for archival or backup purposes, and is not intended to store personal, non-research related data. Research Computing does not own any data placed by users on Cheaha. Backup services are not provided by the Research Computing department and must be maintained by the data owner.
 +
 +
The following terms may be useful for the description of research storage:
 +
* GB - gigabyte
 +
* TB - terabyte (1024 GB)
 +
* PB - petabyte (1024 PB)
 +
* $ - anything following a $ symbol is a shell variable. All of the variables used here should be predefined in your shell environment on Cheaha
 +
* $USER - a shell variable containing your username/blazerid or XIAS account username
 +
 +
<br />
 +
 +
===== Home Directory aka `$HOME` =====
 +
Points to `/home/$USER`. Your home directory, where the operating system and most software stores personal configuration files. Historically on a separate filesystem from other storage locations, with a 20 GB quota. The previous hardware was retired and its data was placed on the primary `$USER_DATA` hardware and merged into that quota.
 +
 +
===== User Data aka `$USER_DATA` =====
 +
Points to `/data/user/$USER`. The preferred location for storing personal, research computing related files, scripts and code. Has a 5 TB quota, replicated for robustness.
 +
 +
===== Projects aka `$SHARE_PROJECT` =====
 +
Points to `/data/project/`. The preferred location for storing shared, research computing related files, scripts and code. To use this space you must make a formal request to [[Support]] using the information from the [[Support#Project_Requests|Project Request]] page. Quotas are made on request for legitimate research projects or labs and the data in each quota must be owned by a particular PI on campus. The default quota size is 50 TB, but may be requested to be smaller. Larger quotas, or increasing an existing quota, requires justification.
 +
 +
===== Scratch aka `$USER_SCRATCH` =====
 +
Points to `/scratch/$USER`. The preferred location for storing any temporary files related to research computing. Total space is 1 PB (petabyte) or 1024 TB shared among all users of the cluster. Users should delete files placed here on a regular basis or on job completion.
 +
 +
===== `$LOCAL_SCRATCH` =====
 +
Points to `/scratch/local`. The preferred location for storing small quantities of temporary files for currently running jobs. Local to each compute node and not shared between nodes, but shared between all users. Typically this space is about 1 TB. Highest possible IO performance for a single node job. Users must delete files placed here on job completion.
 +
 +
To review project quota please use the following command at the terminal, replacing <PROJECT-SLUG> with the name of your project directory. Not the full directory path, just the part immediately after `/data/project/`.
 +
 +
<pre>
 +
/usr/lpp/mmfs/bin/mmlsquota --block-size=auto -v -j <PROJECT-SLUG> data
 +
</pre>
 +
 +
You can also add a function to your `.bashrc` file as follows. Use it as `project_quota <PROJECT-SLUG>`.
 +
 +
<pre>
 +
function project_quota() {
 +
  /usr/lpp/mmfs/bin/mmlsquota --block-size=auto -v -j $1 data
 +
}
 +
</pre>
 +
 +
Another method is using the built-in function for calculating disk usage, du.
 +
 +
<pre>
 +
du -sh <path>
 +
</pre>
 +
 +
The -sh will give the size in a human-readable units (i.e. GB) and will give a total for just the folder as opposed to a size for every item in the path. This is slower, especially for very large folders.
 +
 +
===== sloss =====
 +
Not a shell variable, refers to `/data/project/sloss`. A special project location under `$SHARE_PROJECT` for projects that are at most a few TB. Essentially a foundry for project spaces that start small but may grow and graduate into a full-fledged project space.
 +
 
== Request Storage ==
 
== Request Storage ==
 +
To request storage space, please contact [[Support]] using the information from the [[Support#Project_Requests|Project Request]] page.
 +
 
== How to Access ==
 
== How to Access ==
* From iMAC desktop running OS X, 10.9.5, use the following steps to mount Cheaha home directory for a BlazerID which has an establish account on Cheaha:
+
 
** Click on "Go" from menu bar at top of screen;
+
Research storage may be accessed in a number of ways which are described at [[Data Movement]].
** New window "Connect to Server" opens;
+
** There are two screens within "Connect to Server" window, which are: "Server Address:” and “Favorite Servers:”
+
** For example, some possible server addresses within Favorite Servers:
+
*** smb://uabfile.ad.uab.edu/ShareNAME, where ShareNAME is the name of a share on UABFILE;
+
*** smb://138.26.XYZ.ABC, where XYZ.ABC is the complete IP address of a server within the UAB domain 138.68...
+
*** smb://files.uabgrid.uab.edu
+
** one can enter a new command in the "connect to Server" window.
+

Latest revision as of 13:56, 30 September 2021

Research Storage is a scalable storage fabric designed to grow with your research operations. Data set sizes are growing dramatically. Keeping enough storage on hand to manage Big Data is a challenge for everyone. The Research Storage service provides nearly unlimited capacity to hold the data important to your research. The service is built using flexible technologies that can support any research data requirement.

Contents

[edit] Introduction

The following description of research storage locations and hardware assumes that all files and data will be placed on Cheaha with the intent of being used for computation related to a legitimate research need. Compute storage is not intended for archival or backup purposes, and is not intended to store personal, non-research and non-educational data. Educational data should be limited to coursework requiring computational research. Research Computing does not own any data placed by users on Cheaha. Backup services are not provided by the Research Computing department and must be maintained by the data owner.

The following terms may be useful:

  • GB - gigabyte
  • TB - terabyte (1024 GB)
  • PB - petabyte (1024 PB)
  • $ - anything following a $ symbol is a shell variable. All of the variables used here should be predefined in your shell environment on Cheaha
  • $USER - a shell variable containing your username/blazerid or XIAS account username

All users are provided with a 5 TB personal allocation under `/data/user/$USER`, also known as `$USER_DATA`. Users with collaboration needs such as PI labs, large-scale research projects or external collaborators on projects may request space under `/data/project`, also known as `$SHARE_PROJECT`. The default shared space is 50 TB.

Users may also make use of global and node-local scratch space. Global scratch space is `/scratch/$USER` also known as `$USER_SCRATCH` and has 1 PB total. Node-local scratch space is `/scratch/local` or `$LOCAL_SCRATCH` and has 1 TB per node. Files in these locations should be cleaned up by users regularly when used, and as soon as jobs are completed for `$LOCAL_SCRATCH`.

[edit] Description

The following description of research storage locations and hardware assumes that all files and data will be placed on Cheaha with the intent of being used for computation related to a legitimate research need. Compute storage is not intended for archival or backup purposes, and is not intended to store personal, non-research related data. Research Computing does not own any data placed by users on Cheaha. Backup services are not provided by the Research Computing department and must be maintained by the data owner.

The following terms may be useful for the description of research storage:

  • GB - gigabyte
  • TB - terabyte (1024 GB)
  • PB - petabyte (1024 PB)
  • $ - anything following a $ symbol is a shell variable. All of the variables used here should be predefined in your shell environment on Cheaha
  • $USER - a shell variable containing your username/blazerid or XIAS account username


[edit] Home Directory aka `$HOME`

Points to `/home/$USER`. Your home directory, where the operating system and most software stores personal configuration files. Historically on a separate filesystem from other storage locations, with a 20 GB quota. The previous hardware was retired and its data was placed on the primary `$USER_DATA` hardware and merged into that quota.

[edit] User Data aka `$USER_DATA`

Points to `/data/user/$USER`. The preferred location for storing personal, research computing related files, scripts and code. Has a 5 TB quota, replicated for robustness.

[edit] Projects aka `$SHARE_PROJECT`

Points to `/data/project/`. The preferred location for storing shared, research computing related files, scripts and code. To use this space you must make a formal request to Support using the information from the Project Request page. Quotas are made on request for legitimate research projects or labs and the data in each quota must be owned by a particular PI on campus. The default quota size is 50 TB, but may be requested to be smaller. Larger quotas, or increasing an existing quota, requires justification.

[edit] Scratch aka `$USER_SCRATCH`

Points to `/scratch/$USER`. The preferred location for storing any temporary files related to research computing. Total space is 1 PB (petabyte) or 1024 TB shared among all users of the cluster. Users should delete files placed here on a regular basis or on job completion.

[edit] `$LOCAL_SCRATCH`

Points to `/scratch/local`. The preferred location for storing small quantities of temporary files for currently running jobs. Local to each compute node and not shared between nodes, but shared between all users. Typically this space is about 1 TB. Highest possible IO performance for a single node job. Users must delete files placed here on job completion.

To review project quota please use the following command at the terminal, replacing <PROJECT-SLUG> with the name of your project directory. Not the full directory path, just the part immediately after `/data/project/`.

/usr/lpp/mmfs/bin/mmlsquota --block-size=auto -v -j <PROJECT-SLUG> data

You can also add a function to your `.bashrc` file as follows. Use it as `project_quota <PROJECT-SLUG>`.

function project_quota() {
  /usr/lpp/mmfs/bin/mmlsquota --block-size=auto -v -j $1 data
}

Another method is using the built-in function for calculating disk usage, du.

du -sh <path>

The -sh will give the size in a human-readable units (i.e. GB) and will give a total for just the folder as opposed to a size for every item in the path. This is slower, especially for very large folders.

[edit] sloss

Not a shell variable, refers to `/data/project/sloss`. A special project location under `$SHARE_PROJECT` for projects that are at most a few TB. Essentially a foundry for project spaces that start small but may grow and graduate into a full-fledged project space.

[edit] Request Storage

To request storage space, please contact Support using the information from the Project Request page.

[edit] How to Access

Research storage may be accessed in a number of ways which are described at Data Movement.