Namespaces

Variants
Actions
Personal tools

Galaxy

From UABgrid Documentation

(Difference between revisions)
Jump to: navigation, search
(Adding Novel Datasets)
(Added bash script)
Line 165: Line 165:
 
tar xzvf chromFa.tar.gz
 
tar xzvf chromFa.tar.gz
 
cat chr*.fa 2micron.fa > sacCer2.fa
 
cat chr*.fa 2micron.fa > sacCer2.fa
mkdir /lustre/project/public_datasets/derived/sacCer2
+
</pre>
cd /lustre/project/public_datasets/derived/sacCer2
+
 
 +
A not fully tested script has been written to index the genomes, it can be run instead of commands below to index genomes.
 +
 
 +
<pre>
 +
#!/bin/bash
 +
# Argument #1 = GENOME_NAME - Should be of the form ce6, sacCer2, etc..
 +
# This scripts builds the indices for a new genome. It assumes that:
 +
# /lustre/project/public_datasets/primary/$1 and /lustre/project/public_datasets/primary/$1/$1.fa exist
 +
mkdir /lustre/project/public_datasets/derived/$1
 +
cd /lustre/project/public_datasets/derived/$1
 
mkdir bowtie
 
mkdir bowtie
 
mkdir bowtie/color
 
mkdir bowtie/color
 
mkdir bowtie/base
 
mkdir bowtie/base
 
mkdir bwa
 
mkdir bwa
 +
mkdir perm
 
mkdir sam
 
mkdir sam
</pre>
+
#Bowtie
 +
cd /lustre/project/public_datasets/derived/$1/bowtie/base
 +
cat /lustre/project/public_datasets/derived/sacCer2/bowtie/base/reindex_bowtie_sacCer2.bsh | perl -pe "s/sacCer2/$1/g;" > reindex_bowtie_$1.bsh
 +
cat /lustre/project/public_datasets/derived/sacCer2/bowtie/base/submit_index_job  | perl -pe "s/sacCer2/$1/g;" > submit_index_job
 +
chmod +x submit_index_job
 +
chmod +x reindex_bowtie_$1.bsh
 +
/lustre/project/public_datasets/derived/$1/bwa/submit_index_job
 +
#BWA
 +
cd /lustre/project/public_datasets/derived/$1/bwa
 +
cat /lustre/project/public_datasets/derived/sacCer2/bwa/reindex_bwa_sacCer2.bsh | perl -pe "s/sacCer2/$1/g;" > reindex_bwa_$1.bsh
 +
cat /lustre/project/public_datasets/derived/sacCer2/bwa/submit_index_job  | perl -pe "s/sacCer2/$1/g;" > submit_index_job
 +
ln -s /lustre/project/public_datasets/primary/$1/$1.fa
 +
chmod +x submit_index_job
 +
chmod +x reindex_bwa_$1.bsh
 +
/lustre/project/public_datasets/derived/$1/bwa/submit_index_job
 +
#SAM
 +
cd /lustre/project/public_datasets/derived/$1/sam
 +
cat /lustre/project/public_datasets/derived/sacCer2/sam/reindex_sam_sacCer2.bsh | perl -pe "s/sacCer2/$1/g;" > reindex_sam_$1.bsh
 +
cat /lustre/project/public_datasets/derived/sacCer2/sam/submit_index_job  | perl -pe "s/sacCer2/$1/g;" > submit_index_job
 +
ln -s /lustre/project/public_datasets/primary/$1/$1.fa
 +
chmod +x submit_index_job
 +
chmod +x reindex_sam_$1.bsh
 +
/lustre/project/public_datasets/derived/$1/sam/submit_index_job
  
 +
</pre>
  
 
=== Bowtie Indices (Required for tophat too) ===
 
=== Bowtie Indices (Required for tophat too) ===

Revision as of 16:46, 7 September 2011

Contents


Overview

The UAB Galaxy platform for experimental biology and comparative genomics designed to help you analyze multiple alignments, compare genomic annotations, profile metagenomic samples and more from your web browser. This platform is built on Galaxy, backed by the Cheaha compute cluster, and powered by UABgrid. Documentation on the UAB installation can be found on the UAB Galaxy wiki.

Galaxy@UAB

The UAB Galaxy instance can be accessed at http://galaxy.uabgrid.uab.edu using BlazerID credentials. The https/ssl access will be available soon. The UAB Galaxy instance is using revision 50e249442c5a from the upstream galaxy repository.

Temporary Protocol for moving large sequence files (>2GB) to UAB's galaxy instance (or very large numbers of files).

Hardware

Behind the scenes the Galaxy server at UAB is powered by Cheaha cluster.

Available Tools

Following is a list of tools available through Galaxy platform right now. More description will be added soon.

Software Version Information
bwa 0.5.9 Further information
bowtie 0.12.7 Further Information
lastz 1.02.00 Further information
samtools 0.1.12a Further information
Legacy blast (megablast) 2.2.25 Further information
srma 0.1.15 Further information
velvet 1.1.03 Further information
Top Hat 1.2.0 Further information
Cuff Links 1.0.1 Further information
Lift Over 26-Apr-2011 18:26 2.6M Further information
R R-2.13.0 Further information
RPy 1.0.3 Further information
ps2pdf  ?? Further information
MACS 1.4.0rc2 Further information
taxonomy2tree r3 Further information
sputnik NA Further information
beam2 Unknown Further information
addscores NA Further information
clustalw 2.1 Further information
gmaj NA Further information
gpass NA Further information
HYPHY 2.0020110330 beta Further information
laj NA Further information
pass2 NA Further information
twoBitToFa NA Further information
Perl revision 5 version 8 subversion 8 Further information
perM 3.3 Further information
GNUPlot 4.4.3 Further information
Numpy 1.6.0 Further information
numexpr 1.4.2 Further information
hdf5 1.8.7 Further information
Cython 0.14.1 Further information
Python Tables (tables) 2.2.1 Further information
FastX Toolkit 0.0.13 Further information

Adding Novel Datasets

Prerequisites

You should have checked out your own galaxy instance and run it from git as described in http://projects.uabgrid.uab.edu/galaxy/wiki/GalaxyDevelopment

Introduction

In order to add a new data set, a series of dependent files must be created and configured on cheaha.uabgrid.uab.edu. The configuration files are located in or under:

  • /share/apps/galaxy/galaxy-latest

The dependent files should be located in or under:

  • /lustre/project/public_datasets

Some of the older data sets are still located in or under:

  • /lustre/project/galaxy/public_dataset

I describe here setting up a basic genome and include only a description of how to set up 3 critical pieces:

  • bwa
  • bowtie
  • samtools

Obviously you will need to have an account on cheaha but you will also need to be in the galaxy-admin group.

FASTA File

Download your FASTA file (doing any conversions needed) and place the file in: /lustre/project/public_datasets/primary/MY_GENOME/MY_GENOME.fa

You should following the naming conventions in tool-data/shared/ucsc/builds.txt as shown below.

  • sacCer2 S. cerevisiae June 2008 (SGD/sacCer2) (sacCer2)

For instance, if there is already an entry for your build of S. cerevisiae, then use the dbkey (leftmost column) in builds.txt to name MY_GENOME. In this case it would be sacCer2. In some cases (tree shrew, obscure chimeric mouse genomes you construct yourself) there will be no entry. You will need to edit and create one yourself and update builds.txt.

Make sure the extension is .fa and don't worry if there are multiple files, they can be concatenated together as shown below.

Directory Creation (example for sacSer2)

mkdir /lustre/project/public_datasets/primary/sacCer2
cd /lustre/project/public_datasets/primary/sacCer2
wget http://hgdownload.cse.ucsc.edu/goldenPath/sacCer2/bigZips/chromFa.tar.gz
tar xzvf chromFa.tar.gz
cat chr*.fa 2micron.fa > sacCer2.fa

A not fully tested script has been written to index the genomes, it can be run instead of commands below to index genomes.

#!/bin/bash
# Argument #1 = GENOME_NAME - Should be of the form ce6, sacCer2, etc..
# This scripts builds the indices for a new genome. It assumes that:
# /lustre/project/public_datasets/primary/$1 and /lustre/project/public_datasets/primary/$1/$1.fa exist
mkdir /lustre/project/public_datasets/derived/$1
cd /lustre/project/public_datasets/derived/$1
mkdir bowtie
mkdir bowtie/color
mkdir bowtie/base
mkdir bwa
mkdir perm
mkdir sam
#Bowtie
cd /lustre/project/public_datasets/derived/$1/bowtie/base
cat /lustre/project/public_datasets/derived/sacCer2/bowtie/base/reindex_bowtie_sacCer2.bsh | perl -pe "s/sacCer2/$1/g;" > reindex_bowtie_$1.bsh
cat /lustre/project/public_datasets/derived/sacCer2/bowtie/base/submit_index_job  | perl -pe "s/sacCer2/$1/g;" > submit_index_job
chmod +x submit_index_job
chmod +x reindex_bowtie_$1.bsh
/lustre/project/public_datasets/derived/$1/bwa/submit_index_job
#BWA
cd /lustre/project/public_datasets/derived/$1/bwa
cat /lustre/project/public_datasets/derived/sacCer2/bwa/reindex_bwa_sacCer2.bsh | perl -pe "s/sacCer2/$1/g;" > reindex_bwa_$1.bsh
cat /lustre/project/public_datasets/derived/sacCer2/bwa/submit_index_job  | perl -pe "s/sacCer2/$1/g;" > submit_index_job
ln -s /lustre/project/public_datasets/primary/$1/$1.fa
chmod +x submit_index_job
chmod +x reindex_bwa_$1.bsh
/lustre/project/public_datasets/derived/$1/bwa/submit_index_job
#SAM
cd /lustre/project/public_datasets/derived/$1/sam
cat /lustre/project/public_datasets/derived/sacCer2/sam/reindex_sam_sacCer2.bsh | perl -pe "s/sacCer2/$1/g;" > reindex_sam_$1.bsh
cat /lustre/project/public_datasets/derived/sacCer2/sam/submit_index_job  | perl -pe "s/sacCer2/$1/g;" > submit_index_job
ln -s /lustre/project/public_datasets/primary/$1/$1.fa
chmod +x submit_index_job
chmod +x reindex_sam_$1.bsh
/lustre/project/public_datasets/derived/$1/sam/submit_index_job

Bowtie Indices (Required for tophat too)

Switch to your bowtie base directory and copy in the 2 bowtie scripts from another directory, for example sacCer2.

cd /lustre/project/public_datasets/derived/MY_GENOME/bowtie/base
cp /lustre/project/public_datasets/derived/sacCer2/bowtie/base/reindex_bowtie_sacCer2.bsh .
cp /lustre/project/public_datasets/derived/sacCer2/bowtie/base/submit_index_job .

Find/Replace sacCer2 with the name of your genome (MY_GENOME) in both files. Run:

./submit_index_job

This will submit the job on cheaha.

The next step is to add the index to the bowtie configuration files in your personal galaxy project directory. For example to add the treeshrew genome add the line

treeshrew62     treeshrew62     Tree Shrew Build 62     /lustre/project/public_datasets/derived/treeshrew62/bowtie/base/treeshrew6

to the bottom of:

  • tool-data/bowtie_indices.loc

and

  • tool-data/bowtie_indices_color.loc


BWA Indices

Switch to your bwa directory and copy in the 2 bwa scripts from another directory, for example sacCer2.

cd /lustre/project/public_datasets/derived/MY_GENOME/bwa
cp /lustre/project/public_datasets/derived/sacCer2/bwa/reindex_bwa_sacCer2.bsh .
cp /lustre/project/public_datasets/derived/sacCer2/bwa/submit_index_job .

Find/Replace sacCer2 with the name of your genome (MY_GENOME) in both files. Run:

./submit_index_job

This will submit the job on cheaha.


The next step is to add the index to the bwa configuration files in your personal galaxy project directory. For example to add the treeshrew genome add the line

treeshrew62     treeshrew62     Tree Shrew Build 62     /lustre/project/public_datasets/derived/treeshrew62/bwa/Tupaia_belangeri.TREESHREW.62.dna.nonchromosomal.fa

to the bottom of:

  • tool-data/bwa_indices.loc

and

  • tool-data/bwa_indices_color.loc

Sam and FAIDX indices

Switch to your sam directory and copy in the 2 bwa scripts from another directory, for example sacCer2.

cd /lustre/project/public_datasets/derived/MY_GENOME/sam
cp /lustre/project/public_datasets/derived/sacCer2/sam/reindex_sam_sacCer2.bsh .
cp /lustre/project/public_datasets/derived/sacCer2/sam/submit_index_job .

Find/Replace sacCer2 with the name of your genome (MY_GENOME) in both files. Run:

./submit_index_job

This will submit the job on cheaha.


The next step is to add the index to the sam configuration files in your personal galaxy project directory. For example to add the treeshrew genome add the line

index   treeshrew62     /lustre/project/public_datasets/derived/treeshrew62/sam/Tupaia_belangeri.TREESHREW.62.dna.nonchromosomal.fa

to the bottom of:

  • tool-data/sam_fa_indicess.loc

Also add the index to the srma configuration file (again tree shrew example)

treeshrew62     treeshrew62     Tree Shrew Build 62     /lustre/project/public_datasets/derived/treeshrew62/sam/Tupaia_belangeri.TREESHREW.62.dna.nonchromosomal.fa

in the file

  • tool-data/srma_index.loc

Final Steps

Log in to your local galaxy and see if you can run your job. If it all works out, contact Shantanu and push the changes to production.

Available datasets

Genome Downloaded Blast Database BWA Index Bowtie Index PerM Index Sam Index SRMA Dict
hg19 (by chromosome) Yes Yes No Yes Yes Yes Yes
Mouse (mm9) Yes Yes No Yes Yes Yes Yes
Vaccinia Western Reserve Yes Yes Yes Yes Yes Yes Yes
Mycoplasma pneumonniae (M129) Yes Yes No Yes Yes Yes Yes
Mycoplasma pneumonniae (FH) Yes Yes No Yes Yes Yes Yes
Chromosome 11 Mouse Contigs Yes Yes No Yes Yes Yes Yes

Public instance

A public instance of Galaxy maintained by Penn State University is at http://usegalaxy.org/

Support

In order to facilitate interaction among UAB Galaxy users, share experience, and provide peer-support we have established a galaxy-users group. To join this group and participate in email discussions please subscribe to the galaxy-user group. On-line archives of these discussions are available here. Please note, the email discussions are a public forum. You are advised to only post information you are authorized to share and comfortable with being public.

References