Galaxy

From UABgrid Documentation
(Difference between revisions)
Jump to: navigation, search
(Overview)
(Galaxy@UAB: add list of available genomes, update and abbreviate software list)
Line 13: Line 13:
  
 
= Galaxy@UAB =
 
= Galaxy@UAB =
The UAB Galaxy instance can be accessed at http://galaxy.uabgrid.uab.edu using BlazerID credentials. The https/ssl access will be available soon. The UAB Galaxy instance is using revision 50e249442c5a from the upstream [https://bitbucket.org/galaxy/galaxy-dist galaxy repository].
+
The UAB Galaxy instance can be accessed at https://galaxy.uabgrid.uab.edu using BlazerID credentials. No account on the cluster is needed.  
 
+
However, the tools installed for galaxy (BWA, etc) can be accessed via the command line if you have an account on the cluster.
[http://docs.uabgrid.uab.edu/wiki/UploadLargeData Temporary Protocol] for moving large sequence files (>2GB) to UAB's galaxy instance (or very large numbers of files).
+
 
+
== Hardware ==
+
Behind the scenes the Galaxy server at UAB is powered by [http://docs.uabgrid.uab.edu/wiki/Cheaha Cheaha cluster].  
+
 
+
 
== Available Tools ==  
 
== Available Tools ==  
Following is a list of tools available through Galaxy platform right now. More description will be added soon.
+
Following is a short list highlighting some of the important tools available:
  
 
{| border="1"
 
{| border="1"
Line 28: Line 24:
 
|-
 
|-
 
! bwa
 
! bwa
| 0.5.9 || Further information
+
| 0.5.9-r26 || Align genomic short reads to a reference genome
 
|-
 
|-
 
! bowtie  
 
! bowtie  
| 0.12.7 || Further Information
+
| 0.12.7 || Align genomic short reads to a reference genome
|-
+
! lastz
+
| 1.02.00 || Further information
+
 
|-
 
|-
 
! samtools
 
! samtools
| 0.1.12a || Further information
+
| 0.1.12a || Alignment (SAM/BAM file) manipulations
|-
+
! Legacy blast (megablast)
+
| 2.2.25 || Further information
+
|-
+
! srma
+
| 0.1.15 || Further information
+
 
|-
 
|-
 
! velvet
 
! velvet
| 1.1.03 || Further information
+
| 1.1.03 || Denovo Assembly
 
|-
 
|-
 
! Top Hat
 
! Top Hat
| 1.2.0 || Further information
+
| 1.4.0 || Align transcriptome short reads to a reference genome
 
|-
 
|-
 
! Cuff Links
 
! Cuff Links
| 1.0.1 || Further information
+
| 1.3.0 || Reconstruct and quantify transcript levels from tophat alignments.
 
|-
 
|-
! Lift Over
+
! [http://en.wikipedia.org/wiki/EMBOSS EMBOSS]
| 26-Apr-2011 18:26  2.6M || Further information
+
| 6.3.1  || European Molecular Biology Open Software Suite - sequence manipulation and format conversion
 
|-
 
|-
! R
+
|}
| R-2.13.0 || Further information
+
 
 +
 
 +
== Installed Genome Indexes ==
 +
 
 +
You can always use your own genome by uploading the .fasta into your history, but alignments against installed (pre-indexed) genomes run much more quickly. If you need an additional genome installed, please contact [mailto:galaxy-help@vo.uabgrid.uab.edu].
 +
{| border="1"
 +
|+
 +
! dbkey !! Genome !! Accessions
 
|-
 
|-
! RPy
+
| mm9 || Mouse July 2007 (NCBI37/mm9) (mm9)
| 1.0.3 || Further information
+
 
|-
 
|-
! ps2pdf
+
| mm10 || Mouse Dec. 2011 (GRCm38/mm10) (mm10)
| ?? || Further information
+
 
|-
 
|-
! MACS
+
| hg18 || Human Mar. 2006 (NCBI36/hg18) (hg18)
| 1.4.0rc2 || Further information
+
 
|-
 
|-
! taxonomy2tree
+
| hg19 || Human Feb. 2009 (GRCh37/hg19) (hg19)
| r3 || Further information
+
|-
+
! sputnik
+
| NA || Further information
+
|-
+
! beam2
+
| Unknown || Further information
+
 
|-
 
|-
! addscores
+
|sacCer2 || S. cerevisiae June 2008 (SGD/sacCer2) (sacCer2)
| NA || Further information
+
|-
+
! clustalw
+
| 2.1 || Further information
+
|-
+
! gmaj
+
| NA || Further information
+
|-
+
! gpass
+
| NA ||  Further information
+
|-
+
! HYPHY
+
| 2.0020110330 beta ||  Further information
+
|-
+
! laj
+
| NA ||  Further information
+
 
|-
 
|-
! pass2
+
|sacCer3 || S. cerevisiae Apr. 2011 (SacCer_Apr2011/sacCer3) (sacCer3)
| NA || Further information
+
 
|-
 
|-
! twoBitToFa
+
|ce10 || C. elegans Oct. 2010 (WS220/ce10) (ce10)
| NA || Further information
+
|-
+
! Perl
+
| revision 5 version 8 subversion 8 ||  Further information
+
 
|-
 
|-
! perM
+
|rn4 || Rat Nov. 2004 (Baylor 3.4/rn4) (rn4)
| 3.3 || Further information
+
|-
+
! GNUPlot
+
| 4.4.3 ||  Further information
+
|-
+
! Numpy
+
| 1.6.0 ||  Further information
+
 
|-
 
|-
! numexpr
+
|rn5 || Rat Mar. 2012 (RGSC 5.0/rn5) (rn5)
| 1.4.2 || Further information
+
|-
+
! hdf5
+
| 1.8.7 ||  Further information
+
|-
+
! Cython
+
| 0.14.1 ||  Further information
+
|-
+
! Python Tables (tables)
+
| 2.2.1||  Further information
+
|-
+
! FastX Toolkit
+
| 0.0.13 ||  Further information
+
 
+
|}
+
 
+
 
+
== Adding Novel Datasets ==
+
 
+
=== Prerequisites ===
+
 
+
You should have checked out your own galaxy instance and run it from git as described in http://projects.uabgrid.uab.edu/galaxy/wiki/GalaxyDevelopment
+
 
+
=== Introduction ===
+
 
+
In order to add a new data set, a series of dependent files must be created and configured on cheaha.uabgrid.uab.edu. The configuration files are located in or under:
+
* /share/apps/galaxy/galaxy-latest (production galaxy instance)
+
* $HOME/projects/galaxy/galaxy (personal galaxy instance)
+
 
+
The dependent files should be located in or under:
+
* /lustre/project/public_datasets
+
Some of the older data sets are still located in or under:
+
* /lustre/project/galaxy/public_dataset
+
 
+
I describe here setting up a basic genome and include only a description of how to set up 3 critical pieces:
+
* bwa
+
* bowtie
+
* samtools
+
 
+
Obviously you will need to have an account on cheaha but you will also need to be in the galaxy-admin group.
+
 
+
=== FASTA File ===
+
Download your FASTA file (doing any conversions needed) and place the file in:
+
/lustre/project/public_datasets/primary/MY_GENOME/MY_GENOME.fa
+
 
+
You should following the naming conventions in $GALAXY/tool-data/shared/ucsc/builds.txt as shown below.
+
* sacCer2 S. cerevisiae June 2008 (SGD/sacCer2) (sacCer2)
+
 
+
For instance, if there is already an entry for your build of S. cerevisiae, then use the dbkey (leftmost column) in builds.txt to name MY_GENOME. In this case it would be sacCer2. In some cases (tree shrew, obscure chimeric mouse genomes you construct yourself) there will be no entry. You will need to edit and create one yourself and update $HOME/projects/galaxy/galaxy/tool-data/shared/ucsc/builds.txt.
+
 
+
Make sure the extension is .fa and don't worry if there are multiple files, they can be concatenated together as shown below.
+
 
+
=== Directory Creation (example for sacSer2) ===
+
 
+
<pre>
+
mkdir /lustre/project/public_datasets/primary/sacCer2
+
cd /lustre/project/public_datasets/primary/sacCer2
+
wget http://hgdownload.cse.ucsc.edu/goldenPath/sacCer2/bigZips/chromFa.tar.gz
+
tar xzvf chromFa.tar.gz
+
cat chr*.fa 2micron.fa > sacCer2.fa
+
</pre>
+
 
+
 
+
=== Index Creation ===
+
 
+
Index for Bowtie, BWA and SAM are created in /lustre/project/public_datasets/derived by the tool $GALAXY/scripts/uab/index_genome.bsh. If you need additional types of indices, please update that script. Note: $GALAXY will likely be your personal galaxy: /home/''USER''/projects/galaxy/galaxy
+
 
+
<pre>
+
$GALAXY/scripts/uab/index_genome.bsh sacCer2
+
</pre>
+
 
+
=== Configuration File Update ===
+
 
+
Follow the detail instructions that Shantanu has posted for creating a git branch for your changes, they can be found here:
+
http://projects.uabgrid.uab.edu/galaxy/wiki/GalaxyDevelopment
+
 
+
Once you branch is prepared, you register the genome as follows:
+
* add your new dbkey to tool-data/shared/ucsc/builds.txt
+
* run the script $GALAXY/scripts/uab/register_genome_with_galaxy.bsh to updated the tool-data/*.loc files and copy the .len file.
+
<pre>
+
$GALAXY/scripts/uab/register_genome_with_galaxy.bsh sacCer2
+
</pre>
+
 
+
You will then have to do the necessary "git add" and "git commit", etc, and send the patch to Shantanu.
+
 
+
This will be committed and others can add to it to update blast databases, perM, and other indices as needed.
+
 
+
=== Final Steps ===
+
Log in to your local galaxy and see if you can run your job. If it all works out, contact Shantanu and push the changes to production. It is best to send a patch or set up your directory so he can pull from it.
+
 
+
== Available datasets ==
+
 
+
{| border="1"
+
|+
+
! Genome !! Downloaded !! Blast Database !! BWA Index !! Bowtie Index !! PerM Index !! Sam Index !! SRMA Dict
+
 
|-
 
|-
! hg19 (by chromosome)
+
|danRer7 || Zebrafish Jul. 2010 (Zv9/danRer7) (danRer7)
| Yes || Yes || Yes || Yes || Yes || Yes || Yes
+
 
|-
 
|-
! Mouse (mm9)
+
|eschColi_APEC_O1 || Escherichia coli APEC O1 || chr=5082025
| Yes || Yes || Yes || Yes || Yes || Yes || Yes
+
 
|-
 
|-
! Vaccinia Western Reserve
+
|eschColi_CFT073 || Escherichia coli CFT073 || chr=5231428
| Yes || No || Yes || Yes || Yes || Yes || Yes
+
 
|-
 
|-
! Mycoplasma pneumonniae (M129)
+
|eschColi_EC4115 || Escherichia coli EC4115 || chr=5572075,plasmid_pO157=94644,plasmid_pEC4115=37452
| Yes || Yes || Yes || Yes || Yes || Yes || Yes
+
 
|-
 
|-
! Mycoplasma pneumonniae (FH)
+
|eschColi_K12 || Escherichia coli K12 || chr=4639675
| Yes || Yes || Yes || Yes || Yes || Yes || Yes
+
 
|-
 
|-
! Chromosome 11 Mouse Contigs
+
|eschColi_EDL993 || Escherichia coli O157:H7 EDL933 || NC_007414=92077,NC_002655=5528445
| Yes || No || Yes || Yes || Yes || Yes || Yes
+
 
|-
 
|-
! S. cerevisae (sacCer2)
+
|eschColi_O157H7 || Escherichia coli O157:H7 EDL933 || NC_007414=92077,NC_002655=5528445
| Yes || No || Yes || Yes || No || Yes || Yes
+
 
|-
 
|-
! C. elegans (ce6 and WS226)
+
|eschColi_TW14359 || Escherichia coli TW14359 || chr=5528136,plasmid_pO157=94601
| Yes || No || Yes || Yes || No || Yes || Yes
+
 
|-
 
|-
! Tree Shrew 62
 
| Yes || No || Yes || Yes || No || Yes || Yes
 
 
|}
 
|}
  

Revision as of 15:30, 7 March 2013

Contents


Overview

The UAB Galaxy platform for experimental biology and comparative genomics designed to help you analyze multiple alignments, compare genomic annotations, profile metagenomic samples and more from your web browser. This platform is built on Galaxy, backed by the Cheaha compute cluster, and powered by UABgrid.

The primary uses of UAB Galaxy are to provide a simple web interface for NGS (short read sequencing) analysis for genomic and transcriptomic datasets, using tools like BWA, Bowtie, Tophat and Cufflinks, as well as simple sequence manipulation via the EMBOSS toolkit.

Using Galaxy / Tutorials

There are numerous general tutorials online at the Penn State public Galaxy site that are worth looking at.

There are also several UAB tutorials on NGS Analysis with Galaxy, created for HPC Boot Camp 2011 and a nice talk by Jeremy Goecks during Research Computing Day 2011.

Galaxy@UAB

The UAB Galaxy instance can be accessed at https://galaxy.uabgrid.uab.edu using BlazerID credentials. No account on the cluster is needed. However, the tools installed for galaxy (BWA, etc) can be accessed via the command line if you have an account on the cluster.

Available Tools

Following is a short list highlighting some of the important tools available:

Software Version Information
bwa 0.5.9-r26 Align genomic short reads to a reference genome
bowtie 0.12.7 Align genomic short reads to a reference genome
samtools 0.1.12a Alignment (SAM/BAM file) manipulations
velvet 1.1.03 Denovo Assembly
Top Hat 1.4.0 Align transcriptome short reads to a reference genome
Cuff Links 1.3.0 Reconstruct and quantify transcript levels from tophat alignments.
EMBOSS 6.3.1 European Molecular Biology Open Software Suite - sequence manipulation and format conversion


Installed Genome Indexes

You can always use your own genome by uploading the .fasta into your history, but alignments against installed (pre-indexed) genomes run much more quickly. If you need an additional genome installed, please contact [1].

dbkey Genome Accessions
mm9 Mouse July 2007 (NCBI37/mm9) (mm9)
mm10 Mouse Dec. 2011 (GRCm38/mm10) (mm10)
hg18 Human Mar. 2006 (NCBI36/hg18) (hg18)
hg19 Human Feb. 2009 (GRCh37/hg19) (hg19)
sacCer2 S. cerevisiae June 2008 (SGD/sacCer2) (sacCer2)
sacCer3 S. cerevisiae Apr. 2011 (SacCer_Apr2011/sacCer3) (sacCer3)
ce10 C. elegans Oct. 2010 (WS220/ce10) (ce10)
rn4 Rat Nov. 2004 (Baylor 3.4/rn4) (rn4)
rn5 Rat Mar. 2012 (RGSC 5.0/rn5) (rn5)
danRer7 Zebrafish Jul. 2010 (Zv9/danRer7) (danRer7)
eschColi_APEC_O1 Escherichia coli APEC O1 chr=5082025
eschColi_CFT073 Escherichia coli CFT073 chr=5231428
eschColi_EC4115 Escherichia coli EC4115 chr=5572075,plasmid_pO157=94644,plasmid_pEC4115=37452
eschColi_K12 Escherichia coli K12 chr=4639675
eschColi_EDL993 Escherichia coli O157:H7 EDL933 NC_007414=92077,NC_002655=5528445
eschColi_O157H7 Escherichia coli O157:H7 EDL933 NC_007414=92077,NC_002655=5528445
eschColi_TW14359 Escherichia coli TW14359 chr=5528136,plasmid_pO157=94601

Public instance

A public instance of Galaxy maintained by Penn State University is at http://usegalaxy.org/

Support

In order to facilitate interaction among UAB Galaxy users, share experience, and provide peer-support we have established a galaxy-user group. To join this group and participate in email discussions please subscribe to the galaxy-user group. On-line archives of these discussions are available here. In addition to the galaxy-user mailing list, we have a galaxy-annce mailing list for UAB's galaxy instance related announcements. The galaxy-annce is a low volume mailing list for announcing service status and system updates etc. Please note, the email discussions are a public forum. You are advised to only post information you are authorized to share and comfortable with being public.

References

Personal tools
Namespaces

Variants
Actions
Navigation
Toolbox