Data Movement: Difference between revisions

From Cheaha
Jump to navigation Jump to search
(→‎Job Scripts: Adding instructions for interactive session.)
No edit summary
Line 1: Line 1:
There are various tools which you can utilize to help you move data within the HPC cluster, such as [https://linux.die.net/man/1/mv mv], [https://linux.die.net/man/1/cp cp], [https://linux.die.net/man/1/scp scp] etc. One of the most powerful tools for data movement on Linux is [https://linux.die.net/man/1/rsync rsync], which we'll be using in our example scripts below.  
There are various tools which you can use to move your data within the HPC cluster, such as [https://linux.die.net/man/1/mv mv], [https://linux.die.net/man/1/cp cp], [https://linux.die.net/man/1/scp scp] etc. One of the most powerful tools for data movement on Linux is [https://linux.die.net/man/1/rsync rsync], which we'll be using in our examples below.  


==Procedure==
==Procedure==
Line 6: Line 6:
==Job Scripts==
==Job Scripts==


If the data that you are moving is large, then you should always use an interactive session or a job script for your data movement. This ensures that the process for your data movement isn't using and slowing login nodes for a long time, and instead is performing these operations on a compute node.
If the data that you are moving is large, then you should always use either an interactive session or a job script for your data movement. This ensures that the process for your data movement isn't using and slowing login nodes for a long time, and instead is performing these operations on a compute node. General rule of thumb is that if your transfer takes more then a minute, then perform that task as a job.


===Interactive session===
===Interactive session===
Line 44: Line 44:
* Please change the time required and the corresponding [https://docs.uabgrid.uab.edu/wiki/SLURM#Slurm_Partitions partition] according to your need.
* Please change the time required and the corresponding [https://docs.uabgrid.uab.edu/wiki/SLURM#Slurm_Partitions partition] according to your need.
* After modifications to the given job script, submit it using : '''sbatch JOB_SCRIPT'''
* After modifications to the given job script, submit it using : '''sbatch JOB_SCRIPT'''
==Moving data from Lustre to GPFS Storage==
'''SGE and Lustre will be taken offline December 18 2016 and decommissioned.  All data remaining on Lustre after this date will be deleted.'''
Instructions for migrating data to /data/scratch/$USER location:
* Login to the new hardware (hostname:cheaha.rc.uab.edu)
* You will notice that your /scratch/user/$USER is also mounted on the new hardware. It’s a read-only mount, and there to help you in moving your data .
* Start a rsync process using : rsync -aP /scratch/user/$USER/ /data/scratch/$USER. If the data that you would be transferring is large, then either start an interactive session for this job or create a job script.
Data in /home or /rstore isn’t affected and remains the same on both new and old hardware, hence you don’t need to move that data.

Revision as of 18:28, 14 December 2016

There are various tools which you can use to move your data within the HPC cluster, such as mv, cp, scp etc. One of the most powerful tools for data movement on Linux is rsync, which we'll be using in our examples below.

Procedure

rr

Job Scripts

If the data that you are moving is large, then you should always use either an interactive session or a job script for your data movement. This ensures that the process for your data movement isn't using and slowing login nodes for a long time, and instead is performing these operations on a compute node. General rule of thumb is that if your transfer takes more then a minute, then perform that task as a job.

Interactive session

  • Start an interactive session using srun
srun --ntasks=4 --mem-per-cpu=4096 --time=08:00:00 --partition=medium --job-name=JOB_NAME --pty /bin/bash
  • Start an rsync process to start the transfer, once you have moved from login001 to c00XX node:
[build@c0051 Salmon]$ rsync -aP SOURCE_PATH DESTINATION_PATH

Job Script

#!/bin/bash
#
#SBATCH --job-name=test
#SBATCH --output=res.txt
#SBATCH --ntasks=1
#SBATCH --partition=express
#
# Time format = HH:MM:SS, DD-HH:MM:SS
#
#SBATCH --time=10:00
#
# Mimimum memory required per allocated  CPU  in  MegaBytes. 
#
#SBATCH --mem-per-cpu=2048
#SBATCH --mail-type=FAIL
#SBATCH --mail-user=YOUR_EMAIL_ADDRESS

rsync -aP SOURCE_PATH DESTINATION_PATH

NOTE:

  • Please change the time required and the corresponding partition according to your need.
  • After modifications to the given job script, submit it using : sbatch JOB_SCRIPT

Moving data from Lustre to GPFS Storage

SGE and Lustre will be taken offline December 18 2016 and decommissioned. All data remaining on Lustre after this date will be deleted.

Instructions for migrating data to /data/scratch/$USER location:

  • Login to the new hardware (hostname:cheaha.rc.uab.edu)
  • You will notice that your /scratch/user/$USER is also mounted on the new hardware. It’s a read-only mount, and there to help you in moving your data .
  • Start a rsync process using : rsync -aP /scratch/user/$USER/ /data/scratch/$USER. If the data that you would be transferring is large, then either start an interactive session for this job or create a job script.

Data in /home or /rstore isn’t affected and remains the same on both new and old hardware, hence you don’t need to move that data.