UAB Galaxy RNA Seq Step by Step Tutorial: Difference between revisions

From Cheaha
Jump to navigation Jump to search
No edit summary
Line 1: Line 1:
= Galaxy RNA-Seq Step-by-Step Tutorial/Protocol =
= Galaxy RNA-Seq Step-by-Step Tutorial/Protocol =
Useful Web Resources
http://main.g2.bx.psu.edu/u/jeremy/p/galaxy-rna-seq-analysis-exercise
http://cufflinks.cbcb.umd.edu/
http://tophat.cbcb.umd.edu/
http://cufflinks.cbcb.umd.edu/tutorial.html


== Upload data ==
== Upload data ==


For this tutorial, we will have 2 samples, sequences with paired-end reads. That gives us 4 FASTQ files to upload.  
For this tutorial, we will have 2 samples, sequences with paired-end reads on an Illumina machine. That gives us 4 FASTQ files to upload (forward and reverse sequences for each sample).  


=== set filetype and genome ===
=== Set filetype and genome ===


To start, you must move the data (FASTQ) from the sequencing center into the Galaxy instance, be sure to specify the filetype (fastqsanger for UAB and HudsonAlpha) and the organism that was sequenced ("Genome database"), in this tutorial, "mm9" for mouse.  
To start, you must move the data (FASTQ) from the sequencing center into the Galaxy instance, be sure to specify the filetype (fastqsanger for UAB and HudsonAlpha) and the organism that was sequenced ("Genome database"), in this tutorial, "mm9" for mouse.  




=== check quality of data ===
=== Check quality of data ===


At this step, we check the quality of sequencing. This says nothing about the quality of the sample, or whether it was the right sample. We'll check that later.  
At this step, we check the quality of sequencing. This says nothing about the quality of the sample, or whether it was the right sample. We'll check that later.  


. This
[NGS: QC and manipulation > FASTQ QC > Fastqc]
or
[]
 


== Align using TopHat ==
== Align using TopHat ==
For the moment, TopHat is the only NGS aligner for transcript data - it's the only one that handles splicing. In addition, it can be set to detect indels relative to the reference genome.
We'll run TopHat once for each sample (twice, in this case).
'''Parameters:'''
; Reads (FASTQs)
: As we're doing "paired" reads, we will need to provide 2 FASTQ files: the forward and reverse reads. In order to have a place to specify the 2nd FASTQ file, we must set the pulldown to "paired-
; Genome (Built-in or FASTA from history)
: We also need to provide it with the reference genome to align to. In our cases this is Mouse, and we'll use the already installed "mm9" genome build. If you have an genome that is not on the list, you can either have us add it, or you can upload a FASTA file of the genome into your history, and point TopHat at that.
;
[NGS: RNA Analysis > Tophat for Illumina ]


== QC alignment ==
== QC alignment ==

Revision as of 20:33, 13 September 2011

Galaxy RNA-Seq Step-by-Step Tutorial/Protocol

Useful Web Resources

http://main.g2.bx.psu.edu/u/jeremy/p/galaxy-rna-seq-analysis-exercise
http://cufflinks.cbcb.umd.edu/
http://tophat.cbcb.umd.edu/
http://cufflinks.cbcb.umd.edu/tutorial.html

Upload data

For this tutorial, we will have 2 samples, sequences with paired-end reads on an Illumina machine. That gives us 4 FASTQ files to upload (forward and reverse sequences for each sample).

Set filetype and genome

To start, you must move the data (FASTQ) from the sequencing center into the Galaxy instance, be sure to specify the filetype (fastqsanger for UAB and HudsonAlpha) and the organism that was sequenced ("Genome database"), in this tutorial, "mm9" for mouse.


Check quality of data

At this step, we check the quality of sequencing. This says nothing about the quality of the sample, or whether it was the right sample. We'll check that later.

[NGS: QC and manipulation > FASTQ QC > Fastqc] or []


Align using TopHat

For the moment, TopHat is the only NGS aligner for transcript data - it's the only one that handles splicing. In addition, it can be set to detect indels relative to the reference genome.

We'll run TopHat once for each sample (twice, in this case). Parameters:

Reads (FASTQs)
As we're doing "paired" reads, we will need to provide 2 FASTQ files: the forward and reverse reads. In order to have a place to specify the 2nd FASTQ file, we must set the pulldown to "paired-
Genome (Built-in or FASTA from history)
We also need to provide it with the reference genome to align to. In our cases this is Mouse, and we'll use the already installed "mm9" genome build. If you have an genome that is not on the list, you can either have us add it, or you can upload a FASTA file of the genome into your history, and point TopHat at that.

[NGS: RNA Analysis > Tophat for Illumina ]




QC alignment

* %mapped
* visualize in IGV or IGB

Construct transcripts : Cufflinks

* Denovo vs existing annotation
* UAB Modified Reference annotations

Compare transcript levels: cuffdiff/cuffcompare