Galaxy DNA-Seq Tutorial: Difference between revisions

From Cheaha
Jump to navigation Jump to search
(Formatting data)
Line 13: Line 13:


* Notice that with just 3 viruses are already over 4 GB of data.
* Notice that with just 3 viruses are already over 4 GB of data.
== Formatting and Grooming Data ==
* Click on the pencil to icon in one of the virus images to pull up the attributes, your screen should look a bit like this:
* Click on the pencil to icon in one of the virus images to pull up the attributes, your screen should look a bit like this:
[[File:DataType.jpg]]
* The important thing to notice is the data type. In Galaxy the expected data type of the galaxy tool must match EXACTLY with the data type in your history pane, otherwise the option to use that particular piece of data will not appear in the tool's drop down menu for data selection.
* There are multiple types of FastQ format, see the wikipedia article on FastQ for an idea. Galaxy requires that everything go into Sanger format to be used. If you know your data is in sanger format, select fastqsanger for your data type. If it is not in that format, select fastq and run the FastQ Groomer.


== Formatting and Grooming Data ==





Revision as of 21:46, 13 September 2011

Galaxy DNA-Seq Tutorial

Linking to data

Link in the Mark Pritchard Vaccinia virus data set.

  • Start with a blank history, there should be no numbered items on the right hand side of the pane. Otherwise create a new history.
  • Select "Shared Data" from the top of the screen to bring up the Shared Data screen
  • Select "Mark Pritchard Vaccinia WR" from the alphabetically sorted list
  • Click the top box to select all 6 files
  • Select "import to current history"
  • Click on "Analyze Data" from the upper main menu. It should bring up the main page and your history pane should now look like the image below.

HistoryPane.jpg

  • Notice that with just 3 viruses are already over 4 GB of data.


Formatting and Grooming Data

  • Click on the pencil to icon in one of the virus images to pull up the attributes, your screen should look a bit like this:

DataType.jpg

  • The important thing to notice is the data type. In Galaxy the expected data type of the galaxy tool must match EXACTLY with the data type in your history pane, otherwise the option to use that particular piece of data will not appear in the tool's drop down menu for data selection.
  • There are multiple types of FastQ format, see the wikipedia article on FastQ for an idea. Galaxy requires that everything go into Sanger format to be used. If you know your data is in sanger format, select fastqsanger for your data type. If it is not in that format, select fastq and run the FastQ Groomer.


Assessing the quality of the data

Performing cleanup

Short read alignment to reference genome using BWA

Looking at differences with SNPEff

De novo assembly (time permitting)

Viewing results in IGV