TopHat: Difference between revisions
No edit summary |
(added using tophat) |
||
Line 5: | Line 5: | ||
The latest stable release on [[Cheaha]] is 1.2.0 as part of the [http://main.g2.bx.psu.edu/ Galaxy] module. TopHat can run through the [http://galaxy.uabgrid.uab.edu/ UAB Galaxy] interface or directly on the cluster by submitting jobs to SGE. | The latest stable release on [[Cheaha]] is 1.2.0 as part of the [http://main.g2.bx.psu.edu/ Galaxy] module. TopHat can run through the [http://galaxy.uabgrid.uab.edu/ UAB Galaxy] interface or directly on the cluster by submitting jobs to SGE. | ||
==Using TopHat== | |||
TopHat is a free software that runs on Linux and Mac OS X. | |||
===TopHat on your Desktop === | |||
TopHat can be downloaded and installed on your desktop from http://tophat.cbcb.umd.edu/ | |||
Install instructions for TopHat are available at: http://tophat.cbcb.umd.edu/tutorial.html | |||
===TopHat on Cheaha=== | |||
TopHat is pre-installed on the [[Cheaha]] research computing system. This allows users to run TopHat directly on the cluster without any need to install software. | |||
There are two methods of using TopHat on Cheaha: | |||
1. Direct use on the cluster using SGE submit scripts. | |||
2. TopHat use through the [http://galaxy.uabgrid.uab.edu/ UAB Galaxy] interface. | |||
====Direct Use through SGE submit scripts ==== | |||
These instructions provide an example of how to create and submit a TopHat job on [[Cheaha]]. | |||
First, create the working directory for the job- Replace 'USERNAME' with the account associated username on Cheaha. | |||
You can create any directory to run your job. It is recommended that the job directory be on the scratch (i.e. lustre filesystem) instead of the user home directory. | |||
<pre> | |||
$ mkdir -p /lustre/scratch/USERNAME/jobs/tophat | |||
$ cd /lustre/scratch/USERNAME/jobs/tophat | |||
</pre> | |||
Next, Copy all the files required for TopHat to the working directory. | |||
TopHat requires at least the reference index database and the sequence reads files. | |||
If the Genome index database is not already built, you build it using the following steps. | |||
Load the appropriate modules file to load TopHat and Bowtie. | |||
<pre> | |||
$ module load module load galaxy/galaxy | |||
bowtie-build /lustre/scratch/USERNAME/jobs/tophat/example.fa example_bowtie_idx | |||
</pre> | |||
This builds the Bowtie index for the genome 'example'. You can also download the pre-built index file from other external sources. | |||
Next, create a job submit script as shown below called 'tophatSubmit', make sure to edit the following parameters: | |||
* s_rt to an appropriate soft wall time limit | |||
* h_rt to the maximum wall time for your job | |||
* -N - job name | |||
* -M - user email | |||
* -pe smp numberOfProcessors (-pe smp 8 - run the code in parallel on 8 processors on the same node of Cheaha. Also, -pe smp 12 is the maximum available number of processors) | |||
* -l vf to the maximum memory needed for each task | |||
* in the following example the genome is 'example', the reads are read1,read2, output will be written to testTophat/ folder | |||
<pre> | |||
#!/bin/bash | |||
#$ -S /bin/bash | |||
# | |||
# Execute in the current working directory | |||
#$ -cwd | |||
# | |||
# Job runtime (10 hours) | |||
#$ -l h_rt=10:00:00,s_rt=9:55:00 | |||
#$ -j y | |||
# | |||
# Job Name and email | |||
#$ -N tophat | |||
#$ -M username@uab.edu | |||
# | |||
#$ -pe smp 8 | |||
# Load the appropriate module(s) | |||
module load galaxy/galaxy | |||
# | |||
#$ -V | |||
# Amount of Memory needed (RAM) 3G | |||
#$ -l vf=3G | |||
tophat -p 8 -r 20 -o testTophat/ example read1 read2 | |||
</pre> |
Revision as of 14:26, 3 August 2011
TopHat is a fast splice junction mapper for RNA-Seq reads. It aligns RNA-Seq reads to mammalian-sized genomes using the ultra high-throughput short read aligner Bowtie, and then analyzes the mapping results to identify splice junctions between exons.
TopHat is a collaborative effort between the University of Maryland Center for Bioinformatics and Computational Biology and the University of California, Berkeley Departments of Mathematics and Molecular and Cell Biology.
The latest stable release on Cheaha is 1.2.0 as part of the Galaxy module. TopHat can run through the UAB Galaxy interface or directly on the cluster by submitting jobs to SGE.
Using TopHat
TopHat is a free software that runs on Linux and Mac OS X.
TopHat on your Desktop
TopHat can be downloaded and installed on your desktop from http://tophat.cbcb.umd.edu/
Install instructions for TopHat are available at: http://tophat.cbcb.umd.edu/tutorial.html
TopHat on Cheaha
TopHat is pre-installed on the Cheaha research computing system. This allows users to run TopHat directly on the cluster without any need to install software.
There are two methods of using TopHat on Cheaha: 1. Direct use on the cluster using SGE submit scripts. 2. TopHat use through the UAB Galaxy interface.
Direct Use through SGE submit scripts
These instructions provide an example of how to create and submit a TopHat job on Cheaha.
First, create the working directory for the job- Replace 'USERNAME' with the account associated username on Cheaha. You can create any directory to run your job. It is recommended that the job directory be on the scratch (i.e. lustre filesystem) instead of the user home directory.
$ mkdir -p /lustre/scratch/USERNAME/jobs/tophat $ cd /lustre/scratch/USERNAME/jobs/tophat
Next, Copy all the files required for TopHat to the working directory.
TopHat requires at least the reference index database and the sequence reads files. If the Genome index database is not already built, you build it using the following steps. Load the appropriate modules file to load TopHat and Bowtie.
$ module load module load galaxy/galaxy bowtie-build /lustre/scratch/USERNAME/jobs/tophat/example.fa example_bowtie_idx
This builds the Bowtie index for the genome 'example'. You can also download the pre-built index file from other external sources.
Next, create a job submit script as shown below called 'tophatSubmit', make sure to edit the following parameters:
* s_rt to an appropriate soft wall time limit * h_rt to the maximum wall time for your job * -N - job name * -M - user email * -pe smp numberOfProcessors (-pe smp 8 - run the code in parallel on 8 processors on the same node of Cheaha. Also, -pe smp 12 is the maximum available number of processors) * -l vf to the maximum memory needed for each task * in the following example the genome is 'example', the reads are read1,read2, output will be written to testTophat/ folder
#!/bin/bash #$ -S /bin/bash # # Execute in the current working directory #$ -cwd # # Job runtime (10 hours) #$ -l h_rt=10:00:00,s_rt=9:55:00 #$ -j y # # Job Name and email #$ -N tophat #$ -M username@uab.edu # #$ -pe smp 8 # Load the appropriate module(s) module load galaxy/galaxy # #$ -V # Amount of Memory needed (RAM) 3G #$ -l vf=3G tophat -p 8 -r 20 -o testTophat/ example read1 read2