Attention: Research Computing Documentation has Moved
https://docs.rc.uab.edu/

Please use the new documentation url https://docs.rc.uab.edu/ for all Research Computing documentation needs.

As a result of this move, we have deprecated use of this wiki for documentation. We are providing read-only access to the content to facilitate migration of bookmarks and to serve as an historical record. All content updates should be made at the new documentation site. The original wiki will not receive further updates.

Thank you,

The Research Computing Team

Galaxy RNA-Seq Step-by-Step Tutorial/Protocol

Useful Web Resources

http://main.g2.bx.psu.edu/u/jeremy/p/galaxy-rna-seq-analysis-exercise
http://cufflinks.cbcb.umd.edu/
http://tophat.cbcb.umd.edu/
http://cufflinks.cbcb.umd.edu/tutorial.html

Upload data

For this tutorial, we will have 2 samples, sequences with paired-end reads on an Illumina machine. That gives us 4 FASTQ files to upload (forward and reverse sequences for each sample).

Set filetype and genome

To start, you must move the data (FASTQ) from the sequencing center into the Galaxy instance, be sure to specify the filetype (fastqsanger for UAB and HudsonAlpha) and the organism that was sequenced ("Genome database"), in this tutorial, "mm9" for mouse.

Check quality of data

At this step, we check the quality of sequencing. This says nothing about the quality of the sample, or whether it was the right sample. We'll check that later.

[NGS: QC and manipulation > FASTQ QC > Fastqc] or []

Align using TopHat

For the moment, TopHat is the only NGS aligner for transcript data - it's the only one that handles splicing. In addition, it can be set to detect indels relative to the reference genome.

We'll run TopHat once for each sample (twice, in this case).

Menu: [NGS: RNA Analysis > Tophat for Illumina ]

Parameters:

Reads (FASTQs)

As we're doing "paired" reads, we will need to provide 2 FASTQ files: the forward and reverse reads. In order to have a place to specify the 2nd FASTQ file, we must set the pulldown to "paired-end" redas
The "Mean Inner Distance between Mate Pairs" is a value you must get from your sequencing center. This is the mean fragment length of the molecules sequenced, minus the part sequenced. For many RNAseq experiments, it is around 150-175.

Genome (Built-in or FASTA from history)

We must provide it with the reference genome to align to. In our case, this is Mouse, and we'll use the already installed "mm9" genome build. If you have an genome that is not on the list, you can either have us add it, or you can upload a FASTA file of the genome into your history, and point TopHat at that.

TopHat settings to use

Defaults

This is the easy thing to do, but, as the manual said, "There is no such thing (yet) as an automated gearshift in splice junction identification. It is all like stick-shift driving in San Francisco. In other words, running this tool with default parameters will probably not give you meaningful results."

Full parameter list

The most common thing to change

Allow indel search (from NO to YES)
Minimum isoform fraction (to 0 if looking for rare isoforms)
Maximum/minimum intron length (defaults are for Mammals; other critters will do better with stricter settings)

TopHat output

accepted_hits: .BAM (data) and .BAI (index)
splice_junctions
deletions (if indel search is on)
insertions (if indel search is on)

QC alignment

* %mapped
* visualize in IGV or IGB

Construct transcripts : Cufflinks

* Denovo vs existing annotation
* UAB Modified Reference annotations

UAB Galaxy RNA Seq Step by Step Tutorial

Contents

Galaxy RNA-Seq Step-by-Step Tutorial/Protocol

Upload data

Set filetype and genome

Check quality of data

Align using TopHat

Menu: [NGS: RNA Analysis > Tophat for Illumina ]

TopHat output

QC alignment

Construct transcripts : Cufflinks

Compare transcript levels: cuffdiff/cuffcompare

Navigation menu

UAB Galaxy RNA Seq Step by Step Tutorial

Galaxy RNA-Seq Step-by-Step Tutorial/Protocol

Upload data

Set filetype and genome

Check quality of data

Align using TopHat

Menu: [NGS: RNA Analysis > Tophat for Illumina ]

TopHat output

QC alignment

Construct transcripts : Cufflinks

Compare transcript levels: cuffdiff/cuffcompare

Navigation menu

Search