GalaxyNgs: Difference between revisions

From Cheaha
Jump to navigation Jump to search
(Quick Notes on NGS sequencing analysis using our Galaxy app)
 
No edit summary
 
Line 1: Line 1:
== Doing NGS in Galaxy ==
= Doing NGS in Galaxy =
[[TOC]]


=== active users ===
== active power users ==


  * David Crossman / Genetics & CCTS / dkcrossm@uab.edu
  * David Crossman / Genetics & CCTS / dkcrossm@uab.edu
Line 8: Line 7:
  * John Osborne / CCTS / ozborn@uab.edu
  * John Osborne / CCTS / ozborn@uab.edu


=== Transfering large datasets to the cluster ===
== Transfering large datasets to the cluster ==


Files > 2G can't be uploaded through a browser. See [http://docs.uabgrid.uab.edu/wiki/UploadLargeData Docs - UploadLargeData]
Files > 2G can't be uploaded through a browser. See [http://docs.uabgrid.uab.edu/wiki/UploadLargeData Docs - UploadLargeData]


=== RNAseq ===
== RNAseq ==


  * Useful [http://main.g2.bx.psu.edu/page/list_published?f-username=jeremy RNA-seq tutorial] pages by Jeremy on Galaxy Pages  
  * Useful [http://main.g2.bx.psu.edu/page/list_published?f-username=jeremy RNA-seq tutorial] pages by Jeremy on Galaxy Pages  
Line 26: Line 25:




=== whole genome DNA ===
== whole genome DNA ==


=== Exome ===
== Exome ==


=== Command Line Tools ===
== Command Line Tools ==


==== sam_chr_coverage ====
=== sam_chr_coverage ===
Compute read coverage by chromosome (By John ozborn@uab.edu)
Compute read coverage by chromosome (By John ozborn@uab.edu)
First argument is a SAM file to process
First argument is a SAM file to process
Line 43: Line 42:


   
   
==== q_gunzip ====
=== q_gunzip ===
Take a list of files (or directories) and qsub a gunzip job for every .gz file listed or findable in the directory tree.  
Take a list of files (or directories) and qsub a gunzip job for every .gz file listed or findable in the directory tree.  
Author: Curtis curtish@uab.edu
Author: Curtis curtish@uab.edu
Line 51: Line 50:
}}}
}}}


==== patch_GTF_with_gene_map.pl ====
=== patch_GTF_with_gene_map.pl ===
[not yet available]  
[not yet available]  
Update the gene_id field in the GTF to contain the gene name, for use with Cufflinks, so it reports in terms of gene names.  
Update the gene_id field in the GTF to contain the gene name, for use with Cufflinks, so it reports in terms of gene names.  

Latest revision as of 17:10, 18 August 2011

Doing NGS in Galaxy

active power users

* David Crossman / Genetics & CCTS / dkcrossm@uab.edu
* Curtis Hendrickson / CCTS / curtish@uab.edu
* John Osborne / CCTS / ozborn@uab.edu

Transfering large datasets to the cluster

Files > 2G can't be uploaded through a browser. See Docs - UploadLargeData

RNAseq

* Useful RNA-seq tutorial pages by Jeremy on Galaxy Pages 
* Tophat
  * formula for "mean inner pair distance":
    * !HudsonAlpha 50bp paired ends: 150, normally. 
* Cufflinks
  * To run cufflinks so that transcripts have usable gene names, use one of the patched GTF files in the Shared Data Library "Patched GTF annotation files for Cufflinks" (created by David C & Curtis H). These have the gene symbol/name patched into the gene_id attribute. 
  * NOTE: if you supply a GTF of existing annotation, it will NOT discover novel splice variants, as galaxy uses the -G flag instead of the -g flag (Cufflinks docs)
  * BUG: Runs but produces all 0 ouput if there are "_" (underscores) in the chromosome names (TreeShrew problem - names were GeneScaffold_####!) - still under investigation - may alternately have been caused by sheer number of chromosomes (scaffolds)


whole genome DNA

Exome

Command Line Tools

sam_chr_coverage

Compute read coverage by chromosome (By John ozborn@uab.edu) First argument is a SAM file to process Second argument is the reference genome fasta used to create that SAM alignment (Available Genome Abbreves: hg19, mm9, mycoplasma_fh, mycoplasma_m129, vaccinia_wr_genome)

{{{ /share/apps/galaxy/src/read_coverage/sam_chr_coverage sam_reads mm9 }}}


q_gunzip

Take a list of files (or directories) and qsub a gunzip job for every .gz file listed or findable in the directory tree. Author: Curtis curtish@uab.edu Note: Symbolic links are NOT followed!! dir [additional file

patch_GTF_with_gene_map.pl

[not yet available] Update the gene_id field in the GTF to contain the gene name, for use with Cufflinks, so it reports in terms of gene names. Gene names can be pulled from an text map file, pulled from the gene_name attribute w/in the GTF itself. Author: Curtis curtish@uab.edu {{{ /share/apps/galaxy/galaxy-tools/bin/patch_gtf_gene_id -src mm9_head10.gtf [-map UCSC_mm9_fake.txt] -prog 3 }}}