Galaxy File Uploads: Difference between revisions

From Cheaha
Jump to navigation Jump to search
(started notes on Galaxy upload method)
 
m (added section privacy with Sensitive information template)
 
(12 intermediate revisions by 3 users not shown)
Line 1: Line 1:
[https://galaxy.uabgrid.uab.edu UAB Galaxy] supports data import in two ways:
[https://galaxy.uabgrid.uab.edu UAB Galaxy] supports data import in three ways:
# Direct file uploads to using a web browser
# Fetching data from external URLs
# Importing existing files on Cheaha file system


Web browser based file upload is convenient approach, but not recommended for files larger than 1 GB in size because of browser limitations. Also, web browser based upload in Galaxy doesn't provide any feedback on upload progress and can be an unreliable operation. You can fetch data from external URL locations, but that's not an efficient operation. Hence, it's recommended to use last option where files can be imported from [[Cheaha]] file system. You can transfer files to Cheaha using tools like [http://en.wikipedia.org/wiki/Secure_copy SCP]  and later import these files in Galaxy application. You will need an account on Cheaha cluster to transfer files to it. Please refer to
{| border="1"
[[Cheaha_GettingStarted#Access]] page for getting an account on it. Following sections provides an overview of methods to import existing Cheaha files into Galaxy.
|+
! Method !! Limitation
|-
|  Direct file uploads to using a web browser
|  only files < 2G
|-
|  Fetching data from external URLs through Galaxy (ftp/http)
|  can't access some password protected sites, such as the HudsonAlpha GSL
|-
|  Importing files via the Cheaha file system
| requires an [[Cheaha_GettingStarted#Access|account]] on cheaha, but command-line can be avoided
|-
|}


Galaxy provides two options to import data on the filesystem:
==Privacy==
# FTP or file drop-off mode: Galaxy provides FTP upload option in the UI to import files from a user's FTP directory. Although Galaxy call this FTP upload method, it doesn't really require any FTP setup to upload files. The 'FTP upload' option should be considered as 'FTP or file drop-off' type mode, where files dropped in a directory can be 'moved' into Galaxy application. When Galaxy imports files from 'FTP or drop-off' directory, the original file in is deleted. On UAB Galaxy platform we have configured this directory as '/lustre/importfs/galaxy/$USER' and you can get your files in this directory using scp, wget or any other transfer mechanisms. See [[Galaxy_FTP_Upload]] page for more details on this data import method.  
{{SensitiveInformation}}
# Data Library: Galaxy has a concept of 'Data Libraries' which provide a data container to organize your data in a hierarchical manner, similar to directories on a desktop. Data libraries provide other features for data organization and sharing as well. Data libraries support direct files uploads from a web browser, data fetch from external URLs and also file system imports. The file system import options is similar to FTP option described above, however, it doesn't delete original file on the file system after it has been imported in Galaxy. On UAB Galaxy platform we have configured Galaxy to import files in '/lustre/scratch/$USER' directory if the user configures appropriate permissions for Galaxy application. You can get your files in '/lustre/scratch/$USER' directory using scp, wget or any other transfer mechanisms. See [[Galaxy_Data_Library_Import]] page for more details on this data import method.
 
==Direct file uploads to using a web browser==
Web browser based file upload is a convenient approach, but not recommended for files larger than 2 GB in size because of browser limitations. Also, web browser based upload in Galaxy doesn't provide any feedback on upload progress and it can be an unreliable operation. Hence, it's recommended to stage data on Galaxy accessible file-system and then import it in Galaxy.
 
==Importing files via the Cheaha file system==
UAB Galaxy instance is configured to look for files in '/scratch/importfs/galaxy/$USER' and '/scratch/user/$USER' directories on Cheaha. Data files can be copied to Cheaha using [[Wikipedia:Secure_copy|scp]] or they can be downloaded using tools like wget, curl or ftp. A nice windows-friendly drag-and-drop tool is [http://winscp.net/eng/download.php#download2 WinSCP]. Please refer to [[Cheaha_GettingStarted#Access]] page for getting access to Cheaha.
 
Following sections provide an overview of UAB Galaxy import methods.  
 
# importfs or file drop-off mode: UAB Galaxy platform is configured to import files in $GALAXY_IMPORTFS directory on Cheaha (/scratch/importfs/galaxy/$USER). Galaxy application 'moves' files from imports directory to it's internal datasets directory. See [[Galaxy_Importfs]] page for more details on this upload method.
# Data Library: Galaxy has a concept of 'Data Libraries' which is a data container to organize files in an hierarchical manner, similar to directories on a desktop. Data libraries provide other features for data organization and sharing as well. Data libraries support files uploads using a web browser, fetching from external URLs and also by copying existing directories in a file-system. The file-system copy is similar to importfs option described above, however, it copies file to internal datasets directory rather than moving it. UAB Galaxy platform is configured to copy files in $USER_SCRATCH (/scratch/user/$USER) directory. See [[Galaxy_Data_Libraries]] page for more details on data libraries.

Latest revision as of 15:49, 3 May 2018

UAB Galaxy supports data import in three ways:

Method Limitation
Direct file uploads to using a web browser only files < 2G
Fetching data from external URLs through Galaxy (ftp/http) can't access some password protected sites, such as the HudsonAlpha GSL
Importing files via the Cheaha file system requires an account on cheaha, but command-line can be avoided

Privacy

Do not store sensitive information on this filesystem. It is not encrypted. Note that your data will be stored on the cluster filesystem, and while not accessible to ordinary users, it could be accessible to the cluster administrator(s).

Direct file uploads to using a web browser

Web browser based file upload is a convenient approach, but not recommended for files larger than 2 GB in size because of browser limitations. Also, web browser based upload in Galaxy doesn't provide any feedback on upload progress and it can be an unreliable operation. Hence, it's recommended to stage data on Galaxy accessible file-system and then import it in Galaxy.

Importing files via the Cheaha file system

UAB Galaxy instance is configured to look for files in '/scratch/importfs/galaxy/$USER' and '/scratch/user/$USER' directories on Cheaha. Data files can be copied to Cheaha using scp or they can be downloaded using tools like wget, curl or ftp. A nice windows-friendly drag-and-drop tool is WinSCP. Please refer to Cheaha_GettingStarted#Access page for getting access to Cheaha.

Following sections provide an overview of UAB Galaxy import methods.

  1. importfs or file drop-off mode: UAB Galaxy platform is configured to import files in $GALAXY_IMPORTFS directory on Cheaha (/scratch/importfs/galaxy/$USER). Galaxy application 'moves' files from imports directory to it's internal datasets directory. See Galaxy_Importfs page for more details on this upload method.
  2. Data Library: Galaxy has a concept of 'Data Libraries' which is a data container to organize files in an hierarchical manner, similar to directories on a desktop. Data libraries provide other features for data organization and sharing as well. Data libraries support files uploads using a web browser, fetching from external URLs and also by copying existing directories in a file-system. The file-system copy is similar to importfs option described above, however, it copies file to internal datasets directory rather than moving it. UAB Galaxy platform is configured to copy files in $USER_SCRATCH (/scratch/user/$USER) directory. See Galaxy_Data_Libraries page for more details on data libraries.