Data Movement: Difference between revisions

From Cheaha
Jump to navigation Jump to search
m (added section Privacy)
No edit summary
Line 5: Line 5:
'''rsync''' and '''scp''' can also be used for moving data from a local storage to Cheaha.
'''rsync''' and '''scp''' can also be used for moving data from a local storage to Cheaha.


==General Usage==
==RSync==
To find out more information such as flags, usage etc. about any of the above mentioned tools, you can use '''man TOOL_NAME'''.
To find out more information such as flags, usage etc. about any of the above mentioned tools, you can use '''man TOOL_NAME'''.
<pre>
<pre>
Line 82: Line 82:
* After modifications to the given job script, submit it using : '''sbatch JOB_SCRIPT'''
* After modifications to the given job script, submit it using : '''sbatch JOB_SCRIPT'''


==Moving data from Lustre to GPFS Storage==
==FileZilla==


'''SGE and Lustre will be taken offline December 18 2016 and decommissioned.  All data remaining on Lustre after this date will be deleted.'''
Instructions for migrating data to /data/scratch/$USER location:
* Login to the new hardware (hostname:cheaha.rc.uab.edu). Instructions to login can be found [https://docs.uabgrid.uab.edu/wiki/Cheaha_GettingStarted#Overview here].
* You will notice that your /scratch/user/$USER is also mounted on the new hardware. It’s a read-only mount, and there to help you in moving your data .
* Start a rsync process using : '''rsync -aP /scratch/user/$USER/ /data/scratch/$USER'''. If the data that you would be transferring is large, then either start an [https://docs.uabgrid.uab.edu/wiki/Data_Movement#Interactive_session interactive session] for this task or create a [[https://docs.uabgrid.uab.edu/wiki/Data_Movement#Job_Script job script].
Data in /home or /rstore isn’t affected and remains the same on both new and old hardware, hence you don’t need to move that data.


==Examples==
==Examples==
Line 97: Line 89:


===Moving data from local storage to HPC===
===Moving data from local storage to HPC===
\\TODO
===Moving data from rstore to /data/scratch===
\\TODO
\\TODO

Revision as of 19:15, 21 May 2018

NOTE: This page is under construction.

There are various Linux native commands that you can use to move your data within the HPC cluster, such as mv, cp, scp etc. One of the most powerful tools for data movement on Linux is rsync, which we'll be using in our examples below.

rsync and scp can also be used for moving data from a local storage to Cheaha.

RSync

To find out more information such as flags, usage etc. about any of the above mentioned tools, you can use man TOOL_NAME.

[build@c0051 ~]$ man rsync

NAME
       rsync - a fast, versatile, remote (and local) file-copying tool

SYNOPSIS
       Local:  rsync [OPTION...] SRC... [DEST]

       Access via remote shell:
         Pull: rsync [OPTION...] [USER@]HOST:SRC... [DEST]
         Push: rsync [OPTION...] SRC... [USER@]HOST:DEST

       Access via rsync daemon:
         Pull: rsync [OPTION...] [USER@]HOST::SRC... [DEST]
               rsync [OPTION...] rsync://[USER@]HOST[:PORT]/SRC... [DEST]
         Push: rsync [OPTION...] SRC... [USER@]HOST::DEST
               rsync [OPTION...] SRC... rsync://[USER@]HOST[:PORT]/DEST

       Usages with just one SRC arg and no DEST arg will list the source files
       instead of copying.

DESCRIPTION
 .
 .
 .

If you are interested in finding out about various methods of moving data and various tools which can be used to achieve that aim, this page provides a very good description/guide : How to transfer large amounts of data via network..

Privacy

Do not store sensitive information on this filesystem. It is not encrypted. Note that your data will be stored on the cluster filesystem, and while not accessible to ordinary users, it could be accessible to the cluster administrator(s).

Jobs

If the data that you are moving is large, then you should always use either an interactive session or a job script for your data movement. This ensures that the process for your data movement isn't using and slowing login nodes for a long time, and instead is performing these operations on a compute node.

Interactive session

  • Start an interactive session using srun
srun --ntasks=1 --mem-per-cpu=1024 --time=08:00:00 --partition=medium --job-name=DATA_TRANSFER --pty /bin/bash

NOTE: Please change the time required and the corresponding partition according to your need.

  • Start an rsync process to start the transfer, once you have moved from login001 to c00XX node:
[build@c0051 Salmon]$ rsync -aP SOURCE_PATH DESTINATION_PATH

Job Script

#!/bin/bash
#
#SBATCH --job-name=test
#SBATCH --output=res.txt
#SBATCH --ntasks=1
#SBATCH --partition=express
#
# Time format = HH:MM:SS, DD-HH:MM:SS
#
#SBATCH --time=10:00
#
# Mimimum memory required per allocated  CPU  in  MegaBytes. 
#
#SBATCH --mem-per-cpu=2048
#SBATCH --mail-type=FAIL
#SBATCH --mail-user=YOUR_EMAIL_ADDRESS

rsync -aP SOURCE_PATH DESTINATION_PATH

NOTE:

  • Please change the time required and the corresponding partition according to your need.
  • After modifications to the given job script, submit it using : sbatch JOB_SCRIPT

FileZilla

Examples

This sections provides various use cases where you would need to move your data.

Moving data from local storage to HPC

\\TODO