Galaxy: Difference between revisions
m (SRMA mention) |
(Structure for Tutorial started) |
||
Line 122: | Line 122: | ||
|} | |} | ||
== Using Galaxy == | |||
=== Online Tutorials === | |||
There are numerous tutorial online, including the main [http://main.g2.bx.psu.edu/ Penn State galaxy site] that are worth looking at. Also there are various [http://main.g2.bx.psu.edu/workflow/list_published workflows] published that have helpful information. | |||
== UAB Galaxy DNA-Seq Step-by-Step Tutorial == | |||
== UAB Galaxy RNA-Seq Step-by-Step Tutorial == | |||
== Adding Novel Datasets == | == Adding Novel Datasets == |
Revision as of 19:52, 12 September 2011
Overview
The UAB Galaxy platform for experimental biology and comparative genomics designed to help you analyze multiple alignments, compare genomic annotations, profile metagenomic samples and more from your web browser. This platform is built on Galaxy, backed by the Cheaha compute cluster, and powered by UABgrid. Documentation on the UAB installation can be found on the UAB Galaxy wiki.
Galaxy@UAB
The UAB Galaxy instance can be accessed at http://galaxy.uabgrid.uab.edu using BlazerID credentials. The https/ssl access will be available soon. The UAB Galaxy instance is using revision 50e249442c5a from the upstream galaxy repository.
Temporary Protocol for moving large sequence files (>2GB) to UAB's galaxy instance (or very large numbers of files).
Hardware
Behind the scenes the Galaxy server at UAB is powered by Cheaha cluster.
Available Tools
Following is a list of tools available through Galaxy platform right now. More description will be added soon.
Software | Version | Information |
---|---|---|
bwa | 0.5.9 | Further information |
bowtie | 0.12.7 | Further Information |
lastz | 1.02.00 | Further information |
samtools | 0.1.12a | Further information |
Legacy blast (megablast) | 2.2.25 | Further information |
srma | 0.1.15 | Further information |
velvet | 1.1.03 | Further information |
Top Hat | 1.2.0 | Further information |
Cuff Links | 1.0.1 | Further information |
Lift Over | 26-Apr-2011 18:26 2.6M | Further information |
R | R-2.13.0 | Further information |
RPy | 1.0.3 | Further information |
ps2pdf | ?? | Further information |
MACS | 1.4.0rc2 | Further information |
taxonomy2tree | r3 | Further information |
sputnik | NA | Further information |
beam2 | Unknown | Further information |
addscores | NA | Further information |
clustalw | 2.1 | Further information |
gmaj | NA | Further information |
gpass | NA | Further information |
HYPHY | 2.0020110330 beta | Further information |
laj | NA | Further information |
pass2 | NA | Further information |
twoBitToFa | NA | Further information |
Perl | revision 5 version 8 subversion 8 | Further information |
perM | 3.3 | Further information |
GNUPlot | 4.4.3 | Further information |
Numpy | 1.6.0 | Further information |
numexpr | 1.4.2 | Further information |
hdf5 | 1.8.7 | Further information |
Cython | 0.14.1 | Further information |
Python Tables (tables) | 2.2.1 | Further information |
FastX Toolkit | 0.0.13 | Further information |
Using Galaxy
Online Tutorials
There are numerous tutorial online, including the main Penn State galaxy site that are worth looking at. Also there are various workflows published that have helpful information.
UAB Galaxy DNA-Seq Step-by-Step Tutorial
UAB Galaxy RNA-Seq Step-by-Step Tutorial
Adding Novel Datasets
Prerequisites
You should have checked out your own galaxy instance and run it from git as described in http://projects.uabgrid.uab.edu/galaxy/wiki/GalaxyDevelopment
Introduction
In order to add a new data set, a series of dependent files must be created and configured on cheaha.uabgrid.uab.edu. The configuration files are located in or under:
- /share/apps/galaxy/galaxy-latest
The dependent files should be located in or under:
- /lustre/project/public_datasets
Some of the older data sets are still located in or under:
- /lustre/project/galaxy/public_dataset
I describe here setting up a basic genome and include only a description of how to set up 3 critical pieces:
- bwa
- bowtie
- samtools
Obviously you will need to have an account on cheaha but you will also need to be in the galaxy-admin group.
FASTA File
Download your FASTA file (doing any conversions needed) and place the file in: /lustre/project/public_datasets/primary/MY_GENOME/MY_GENOME.fa
You should following the naming conventions in tool-data/shared/ucsc/builds.txt as shown below.
- sacCer2 S. cerevisiae June 2008 (SGD/sacCer2) (sacCer2)
For instance, if there is already an entry for your build of S. cerevisiae, then use the dbkey (leftmost column) in builds.txt to name MY_GENOME. In this case it would be sacCer2. In some cases (tree shrew, obscure chimeric mouse genomes you construct yourself) there will be no entry. You will need to edit and create one yourself and update builds.txt.
Make sure the extension is .fa and don't worry if there are multiple files, they can be concatenated together as shown below.
Directory Creation (example for sacSer2)
mkdir /lustre/project/public_datasets/primary/sacCer2 cd /lustre/project/public_datasets/primary/sacCer2 wget http://hgdownload.cse.ucsc.edu/goldenPath/sacCer2/bigZips/chromFa.tar.gz tar xzvf chromFa.tar.gz cat chr*.fa 2micron.fa > sacCer2.fa
Index Creation
A not fully tested script has been written to index the genomes, it can be run instead of commands below to index genomes.
#!/bin/bash # Argument #1 = GENOME_NAME - Should be of the form ce6, sacCer2, etc.. # This scripts builds the indices for a new genome. It assumes that: # /lustre/project/public_datasets/primary/$1 and /lustre/project/public_datasets/primary/$1/$1.fa exist mkdir /lustre/project/public_datasets/derived/$1 cd /lustre/project/public_datasets/derived/$1 mkdir bowtie mkdir bowtie/color mkdir bowtie/base mkdir bwa mkdir perm mkdir sam #Bowtie cd /lustre/project/public_datasets/derived/$1/bowtie/base cat /lustre/project/public_datasets/derived/sacCer2/bowtie/base/reindex_bowtie_sacCer2.bsh | perl -pe "s/sacCer2/$1/g;" > reindex_bowtie_$1.bsh cat /lustre/project/public_datasets/derived/sacCer2/bowtie/base/submit_index_job | perl -pe "s/sacCer2/$1/g;" > submit_index_job chmod +x submit_index_job chmod +x reindex_bowtie_$1.bsh /lustre/project/public_datasets/derived/$1/bwa/submit_index_job #BWA cd /lustre/project/public_datasets/derived/$1/bwa cat /lustre/project/public_datasets/derived/sacCer2/bwa/reindex_bwa_sacCer2.bsh | perl -pe "s/sacCer2/$1/g;" > reindex_bwa_$1.bsh cat /lustre/project/public_datasets/derived/sacCer2/bwa/submit_index_job | perl -pe "s/sacCer2/$1/g;" > submit_index_job ln -s /lustre/project/public_datasets/primary/$1/$1.fa chmod +x submit_index_job chmod +x reindex_bwa_$1.bsh /lustre/project/public_datasets/derived/$1/bwa/submit_index_job #SAM and SRMA cd /lustre/project/public_datasets/derived/$1/sam cat /lustre/project/public_datasets/derived/sacCer2/sam/reindex_sam_sacCer2.bsh | perl -pe "s/sacCer2/$1/g;" > reindex_sam_$1.bsh cat /lustre/project/public_datasets/derived/sacCer2/sam/submit_index_job | perl -pe "s/sacCer2/$1/g;" > submit_index_job ln -s /lustre/project/public_datasets/primary/$1/$1.fa chmod +x submit_index_job chmod +x reindex_sam_$1.bsh /lustre/project/public_datasets/derived/$1/sam/submit_index_job
Configuration File Update
Follow the detail instructions that Shantanu has posted for creating a git branch for your changes, they can be found here: http://projects.uabgrid.uab.edu/galaxy/wiki/GalaxyDevelopment
I have written a small script to update the bowtie, bwa, samtools and srma indices called update_indices.bsh as shown below:
#!/bin/bash #Argument #1 GENOME_NAME (ex. ce6) #Argument #2 through 8 GENOME DESCRIPTION (C. elegans May 2008 (WS190/ce6) (ce6)) # Try to use description from tool-data/shared/ucsc/builds.txt if available echo "$1 $1 $2 $3 $4 $5 $6 $7 $8 /lustre/project/public_datasets/derived/$1/bowtie/base/$1" >> /home/ozborn/projects/galaxy/galaxy/tool-data/bowtie_indices.loc echo "$1 $1 $2 $3 $4 $5 $6 $7 $8 /lustre/project/public_datasets/derived/$1/bowtie/base/$1" >> /home/ozborn/projects/galaxy/galaxy/tool-data/bowtie_indices_color.loc echo "$1 $1 $2 $3 $4 $5 $6 $7 $8 /lustre/project/public_datasets/derived/$1/bwa/$1.fa" >> /home/ozborn/projects/galaxy/galaxy/tool-data/bwa_index.loc echo "$1 $1 $2 $3 $4 $5 $6 $7 $8 /lustre/project/public_datasets/derived/$1/bwa/$1.fa" >> /home/ozborn/projects/galaxy/galaxy/tool-data/bwa_index_color.loc echo "index $1 /lustre/project/public_datasets/derived/$1/sam/$1.fa" >> /home/ozborn/projects/galaxy/galaxy/tool-data/sam_fa_indices.loc echo "$1 $1 $2 $3 $4 $5 $6 $7 $8 /lustre/project/public_datasets/derived/$1/sam/$1.fa" >> /home/ozborn/projects/galaxy/galaxy/tool-data/srma_index.loc
This will be committed and others can add to it to update blast databases, perM, and other indices as needed.
Final Steps
Log in to your local galaxy and see if you can run your job. If it all works out, contact Shantanu and push the changes to production. It is best to send a patch or set up your directory so he can pull from it.
Available datasets
Genome | Downloaded | Blast Database | BWA Index | Bowtie Index | PerM Index | Sam Index | SRMA Dict |
---|---|---|---|---|---|---|---|
hg19 (by chromosome) | Yes | Yes | No | Yes | Yes | Yes | Yes |
Mouse (mm9) | Yes | Yes | No | Yes | Yes | Yes | Yes |
Vaccinia Western Reserve | Yes | Yes | Yes | Yes | Yes | Yes | Yes |
Mycoplasma pneumonniae (M129) | Yes | Yes | No | Yes | Yes | Yes | Yes |
Mycoplasma pneumonniae (FH) | Yes | Yes | No | Yes | Yes | Yes | Yes |
Chromosome 11 Mouse Contigs | Yes | Yes | No | Yes | Yes | Yes | Yes |
Public instance
A public instance of Galaxy maintained by Penn State University is at http://usegalaxy.org/
Support
In order to facilitate interaction among UAB Galaxy users, share experience, and provide peer-support we have established a galaxy-users group. To join this group and participate in email discussions please subscribe to the galaxy-user group. On-line archives of these discussions are available here. Please note, the email discussions are a public forum. You are advised to only post information you are authorized to share and comfortable with being public.