UploadLargeData
Revision as of 20:01, 8 July 2011 by Curtish@uab.edu (talk | contribs) (→transfer and uncompress (slow))
Attention: Research Computing Documentation has Moved
https://docs.rc.uab.edu/
https://docs.rc.uab.edu/
Please use the new documentation url https://docs.rc.uab.edu/ for all Research Computing documentation needs.
As a result of this move, we have deprecated use of this wiki for documentation. We are providing read-only access to the content to facilitate migration of bookmarks and to serve as an historical record. All content updates should be made at the new documentation site. The original wiki will not receive further updates.
Thank you,
The Research Computing Team
Load and Link approach
transfer and uncompress (slow)
- login to cheaha.uabgrid.uab.edu (linux),
- create directory for this data set in your scratch dir
- mkdir /lustre/scratch/user/proj1
- make sure that directory is readable by galaxy user
- Chmod og+x /lustre/scratch/user
- Chmod og+x /lustre/scratch/user/proj1
- transfer the files with SCP
- UNCOMPRESS fastq.gz files!!
- cd /lustre/scratch/user/proj1
- find `pwd` -name "*.gz" -exec ksh -c 'qrsh "gzip -d \{}" &' \;
- ls -1 *.gz | xargs -L 1 -i_f_ ksh -c 'qrsh -cwd gzip -d _f_ &' \;
- gzip -d filename
- cd /lustre/scratch/user/proj1
- make sure the files are readable by galaxy
- if you're in galaxy-admin UNIX group you can do
- chgrp galaxy-admin *.fastq
- chmod g+r *.fastq
- if you're not, then you have to make it readable to the world (o=other)
- chmod o+r /lustre/scratch/user/proj1/*
- if you're in galaxy-admin UNIX group you can do
link into galaxy dataset (fast)
- get Admin privileges on galaxy
- either get Shantanu to make you admin in Galaxy
- or grab someone who is (John, Curtis)
- In Galaxy GUI:
- admin > Manage Data Libraries > create new library
- add Datasets
- Upload option: Upload files from system path
- Get a list of absolute path names using one of the following
- cd /lustre/scratch/user/proj1 THEN RUN find `pwd` -name "*.fastq"
- find /lustre/scratch/user/proj1 -name "*.fastq"
- paste list of absolute path names into URL/Text box in Web Admin GUI
- Get a list of absolute path names using one of the following
- Change "Copy data into Galaxy?" to "Link to files without copying into Galaxy"
- Put something mnemonic in Message box.
- Upload option: Upload files from system path
- add Datasets
- admin > Manage Data Libraries > create new library
link data into a history (fast)
I could then select the datasets and, at bottom of page "For selected datasets: <Import to histories>" and get them into a history so I can compute on them.