Galaxy: Difference between revisions

From Cheaha
Jump to navigation Jump to search
(Added section for NextGen indices)
 
(56 intermediate revisions by 4 users not shown)
Line 2: Line 2:


= Overview =
= Overview =
[http://bitbucket.org/galaxy/galaxy-central/wiki/Home Galaxy] is an easy-to-use, open-source, scalable framework for tool and data integration. Galaxy provides access to tools (mainly comparative genomics) through an interface (e.g., a web-based interface). The [http://bitbucket.org/galaxy/galaxy-central/wiki/ImplementationInfo Galaxy framework] is implemented in the Python programming language.
The UAB Galaxy platform for experimental biology and comparative genomics is designed to help you analyze multiple alignments, compare genomic annotations, profile metagenomic samples and more from your web browser. This platform is built on [http://main.g2.bx.psu.edu/ Galaxy], backed by the [http://docs.uabgrid.uab.edu/wiki/Cheaha Cheaha compute cluster], and powered by [http://uabgrid.uab.edu/ UABgrid].  


= End Users =
The primary uses of UAB Galaxy are to provide a simple web interface for NGS (short read sequencing) analysis for genomic and transcriptomic datasets, using tools like BWA, Bowtie, Tophat and Cufflinks, as well as simple sequence manipulation via the EMBOSS toolkit.


A public instance of Galaxy maintained by Penn State University is at http://usegalaxy.org/
== Using Galaxy / [[UAB Galaxy Workshop Tutorial|Tutorials]] ==


= Developers and Core Production Installation =
There are numerous [http://wiki.g2.bx.psu.edu/Learn/Screencasts general tutorials] online at the [http://main.g2.bx.psu.edu/ Penn State public Galaxy site] that are worth looking at.


== Quickstart ==
There are also several [[UAB Galaxy Workshop Tutorial|UAB tutorials on NGS Analysis with Galaxy]], created for [[2011_HPC_Boot_Camp|HPC Boot Camp 2011]] and a nice talk by Jeremy Goecks during [[2011|Research Computing Day 2011.]]
To get started with installing your own Galaxy instance, the only required component is Python (2.4,2.5, and 2.6). Four simple steps will [http://bitbucket.org/galaxy/galaxy-central/wiki/GetGalaxy get you started with your own Galaxy instance]  
# clone the mercurial galaxy distribution <pre>hg clone http://www.bx.psu.edu/hg/galaxy galaxy_dist</pre> or [http://bitbucket.org/galaxy/galaxy-dist/get/tip.tar.gz get a source tarball]  
# execute setup.sh
# execute run.sh
# go to http://localhost:8080 


* Galaxy runs on a local webserver, [http://pythonpaste.org/script/ PasteScript] written in Python. PasteScript is based on Python's library module [http://docs.python.org/library/simplehttpserver.html simplehttpserver]and implemented with the help of python package, [http://www.owlfish.com/software/wsgiutils/index.html WSGIUtils]
== Support ==
UAB galaxy-users list-serv: [https://listserv.uab.edu/scgi-bin/wa?SUBED1=GALAXY-HELP&A=1 subscribe] [https://listserv.uab.edu/scgi-bin/wa?SUBED1=GALAXY-HELP&A=1 search].  


== CentOS School of Medicine Installation ==
UAB galaxy-help list-serv: [mailto:galaxy-help@listserv.uab.edu] to contact admins of the UAB galaxy instance.
This describes galaxy installation on a CentOS 5.5. I would suggest to others considering installation galaxy to try a distribution not based on Red Hat to avoid dealing with python system issues. Failing that, I would use virtualenv or start with a completely separate installation of python 2.6.


=== Temporary Directory and the $TEMP variable ===
== Privacy ==
It is advisable to set the $TEMP variable to a directory with lots of space, otherwise /tmp will be used which often isn't large enough to handle large files. See:
<br>
http://lists.bx.psu.edu/pipermail/galaxy-user/2009-September/000704.html
<BR>
In this case I modified run.sh (which wasn't picking up my TEMP export) to export TEMP=/home/galaxy/galaxy_temp_dir


=== Apache and Postgres Setup ===
Note that your data will be stored on the cluster filesystem, and while not accessible to ordinary users, it can be easily accessed by any of the galaxy or cluster administrators. It is not encrypted. Do not store sensitive information in this system.


* For [http://bitbucket.org/galaxy/galaxy-central/wiki/Config/ProductionServer deployment to production environments], Galaxy documentation suggests using a proxy server like Apache/Nginx to serve up static content and for handling authnz. As mentioned in [http://bitbucket.org/galaxy/galaxy-central/wiki/Config/ApacheProxy Galaxy's ApacheProxy documentation], here's how to edit Apache's httpd.conf to proxy Galaxy on port 80
= Galaxy@UAB =
<pre>
The UAB Galaxy instance can be accessed at https://galaxy.uabgrid.uab.edu using BlazerID credentials. No account on the cluster is needed.  
<Proxy http://localhost:8080>
However, the tools installed for galaxy (BWA, etc) can be accessed via the command line if you have an account on the cluster.
        Order deny,allow
        Allow from all
</Proxy>


RewriteEngine on
== Loading Data ==
RewriteRule ^/galaxy$ /galaxy/ [R]
See [[Galaxy_File_Uploads]].
RewriteRule ^/galaxy/static/style/(.*) /Users/pnm/project/galaxy_dist/static/june_2007_style/blue/$1 [L]
RewriteRule ^/galaxy/static/scripts/(.*) /Users/pnm/project/galaxy_dist/static/scripts/packed/$1 [L]
RewriteRule ^/galaxy/static/(.*) /Users/pnm/project/galaxy_dist/static/$1 [L]
RewriteRule ^/galaxy/favicon.ico /Users/pnm/project/galaxy_dist/static/favicon.ico [L]
RewriteRule ^/galaxy/robots.txt /Users/pnm/project/galaxy_dist/static/robots.txt [L]
RewriteRule ^/galaxy(.*) http://localhost:8080$1 [P]


</pre>
== Available Tools ==
* Change the path to where you have cloned/installed Galaxy
Following is a partial list highlighting some of the important tools available. Additional tools can be installed upon request. To search for tools already integrated into the Galaxy system, see the [http://toolshed.g2.bx.psu.edu/ Galaxy ToolShed].
* http://localhost/galaxy should bring up Galaxy on port 80 now


Note: if you are using CentOS/RedHat as your platform, you may need to adjust or disable SELinux in order to allow Apache to access parts of the file system that are not part of the shipped default Directory directives of the Apache config.  You will see unexplained "permission denied" errors in the errors.log if you are bitten by this.


{| border="1"
|+
! Software !! Version !! Information
|-
! bwa
|  0.5.9-r26 || Align genomic short reads to a reference genome
|-
! bowtie
| 0.12.7 || Align genomic short reads to a reference genome
|-
! tophat
| 1.4.0 || Align transcriptome short reads to a reference genome
|-
! cufflinks, cuffdiff, cuffcompare
| 1.3.0 || Reconstruct and quantify transcript levels from tophat alignments.
|-
! samtools
| 0.1.12a || Alignment (SAM/BAM file) manipulations
|-
! velvet
| 1.1.03 || Denovo Assembly
|-
! [http://en.wikipedia.org/wiki/EMBOSS EMBOSS]
| 6.3.1  || European Molecular Biology Open Software Suite - sequence manipulation and format conversion
|-
|}


In production mode it is recommended that something other than SQLlite and the python web server be used, preferably postgres and apache. When setting up apache to proxy requests to the python web server on CentOS it is critical that the default CentOS security policy be overridden to allow proxying as shown below.
== Installed Genome Indexes ==
<pre>
setsebool -P httpd_can_network_relay=1
</pre>


Additionally, redirects to the file system are blocked by selinux security policy. The current workaround for that is to running:
You can always use your own genome by uploading the .fasta into your history, but alignments against installed (pre-indexed) genomes run much more quickly. If you need an additional genome installed, please contact [mailto:galaxy-help@vo.uabgrid.uab.edu].
<pre>
{| border="1"
setenforce 0
|+
</pre>
! dbkey !! Genome !! Accessions
A better solution will be to allow access to the required directories using chcon like this:
|-
<pre>
| hg19 || Human Feb. 2009 (GRCh37/hg19) (hg19)
chcon -R -t httpd_user_content_t galaxy_dist/
|-
</pre>
| hg18 || Human Mar. 2006 (NCBI36/hg18) (hg18)
This has been done and appears to work, allowing selinux to be turned back on.
|-
| hg17 || Human May 2004 (NCBI35/hg17) (hg17)
|-
| hg16 || Human July 2003 (NCBI34/hg16) (hg16)
|-
| mm10 || Mouse Dec. 2011 (GRCm38/mm10) (mm10)
|-
| mm9 || Mouse July 2007 (NCBI37/mm9) (mm9)
|-
| mm8 || Mouse Feb. 2006 (NCBI36/mm8) (mm8)
|-
| mm7 || Mouse Aug. 2005 (NCBI35/mm7) (mm7)
|-
| mm6
|-
| mm5
|-
|sacCer3 || S. cerevisiae Apr. 2011 (SacCer_Apr2011/sacCer3) (sacCer3)
|-
|sacCer2 || S. cerevisiae June 2008 (SGD/sacCer2) (sacCer2)
|-
|ce10 || C. elegans Oct. 2010 (WS220/ce10) (ce10)
|-
|rn5 || Rat Mar. 2012 (RGSC 5.0/rn5) (rn5)
|-
|rn4 || Rat Nov. 2004 (Baylor 3.4/rn4) (rn4)
|-
|danRer7 || Zebrafish Jul. 2010 (Zv9/danRer7) (danRer7)
|-
|eschColi_APEC_O1 || Escherichia coli APEC O1 || chr=5082025
|-
|eschColi_CFT073 || Escherichia coli CFT073 || chr=5231428
|-
|eschColi_EC4115 || Escherichia coli EC4115 || chr=5572075,plasmid_pO157=94644,plasmid_pEC4115=37452
|-
|eschColi_K12 || Escherichia coli K12 || chr=4639675
|-
|eschColi_EDL993 || Escherichia coli O157:H7 EDL933 || NC_007414=92077,NC_002655=5528445
|-
|eschColi_O157H7 || Escherichia coli O157:H7 EDL933 || NC_007414=92077,NC_002655=5528445
|-
|eschColi_TW14359 || Escherichia coli TW14359 || chr=5528136,plasmid_pO157=94601
|-
|}


Additionally, postgres ident privileges should be changed. One workaround is to set postgres to trust all local users as shown below in /var/lib/pgsql/data/pg_hba.conf
== Additional Genomes that can be quickly installed ==
<pre>
These are pre-indexed genomes we can easily download from Penn State's [http://wiki.galaxyproject.org/Admin/Data%20Integration Galaxy Data-Cache]
local  all        all                              trust
</pre>


SMTP Configuration
=== Organisms ===
In lib/galaxy/config.py and universe_wsgi.ini
* AaegL1
<pre>
* Acropora_digitifera
self.smtp_server = kwargs.get( 'smtp_server', "vera.dpo.uab.edu" )
* AgamP3
and
* Arabidopsis_thaliana_TAIR10
#smtp_server = vera.dpo.uab.edu
* Arabidopsis_thaliana_TAIR9
</pre>
* Araly1
* Bombyx_mori_p50T_2.0
* CpipJ1
* Homo_sapiens_AK1
* Homo_sapiens_nuHg19_mtrCRS
* Hydra_JCVI
* IscaW1
* PhumU1
* Physcomitrella_patens_patens
* Ptrichocarpa_156
* Saccharomyces_cerevisiae_S288C_SGD2010
* Schizosaccharomyces_pombe_1.1
* Spur_v2.6
* Sscrofa9.58
* Tcacao_1.0
* Tcas_3.0
* Theobroma_cocoa
* Zea_mays_B73_RefGen_v2
* ailMel1
* anoCar1
* anoCar2
* anoGam1
* apiMel1
* apiMel2
* apiMel3
* apiMel4.5
* aplCal1
* bighorn_sheep
* borEut13
* bosTau2
* bosTau3
* bosTau4
* bosTau5
* bosTau6
* bosTau7
* bosTauMd3
* braFlo1
* caeJap1
* caePb1
* caePb2
* caeRem2
* caeRem3
* calJac1
* calJac3
* canFam1
* canFam2
* canFam3
* cavPor2
* cavPor3
* cb3
* ce10
* ce2
* ce3
* ce4
* ce5
* ce6
* ce7
* ce8
* ce9
* choHof1
* chrPic1
* ci2
* danRer2
* danRer3
* danRer4
* danRer5
* danRer6
* danRer7
* dasNov1
* dasNov2
* dipOrd1
* dm1
* dm2
* dm3
* dp3
* dp4
* droAna1
* droAna2
* droAna3
* droEre1
* droEre2
* droGri1
* droGri2
* droMoj1
* droMoj2
* droMoj3
* droPer1
* droSec1
* droSim1
* droVir1
* droVir2
* droVir3
* droWil1
* droYak1
* droYak2
* echTel1
* emf
* equCab1
* equCab2
* equCab2_chrM
* eriEur1
* felCat3
* felCat4
* fr1
* fr2
* fr3
* galGal2
* galGal3
* galGal4
* gasAcu1
* geoFor1
* gorGor1
* gorGor3
* hetGla1
* hetGla2
* hg16
* hg17
* hg18
* hg19
* hg_g1k_v37
* lMaj5
* lengths
* loxAfr3
* loxAfr4
* macEug1
* melGal1
* melUnd1
* micMur1
* mm10
* mm5
* mm6
* mm7
* mm8
* mm9
* monDom4
* monDom5
* myoLuc1
* myoLuc2
* nomLeu1
* nomLeu2
* ochPri2
* ornAna1
* oryCun1
* oryCun2
* oryLat1
* oryLat2
* oryza_sativa_japonica_nipponbare_IRGSP4.0
* otoGar1
* oviAri1
* pUC18
* panTro1
* panTro2
* panTro3
* papHam1
* petMar1
* phiX
* ponAbe2
* priPac1
* rheMac2
* rheMac3
* rn3
* rn4
* rn5
* sacCer1
* sacCer2
* sacCer3
* sarHar1
* sorAra1
* strPur2
* strPur3
* susScr2
* susScr3
* taeGut1
* tarSyr1
* tetNig1
* tetNig2
* triCas2
* tupBel1
* venter1
* xenTro1
* xenTro2
* xenTro3


=== Microbes ===


=== Shibboleth Installation ===
* Staphylococcus_aureus_aureus_USA300_FPR3757
The instructions for the base rpm install on the SP (your galaxy box) are here:
* Xanthomonas_oryzae_PXO99A
* acidBact_ELLIN345
* acidCell_11B
* acidCryp_JF_5
* acidJS42
* acinSp_ADP1
* actiPleu_L20
* aerPer1
* aeroHydr_ATCC7966
* alcaBork_SK2
* alkaEhrl_MLHE_1
* anabVari_ATCC29413
* anaeDeha_2CP_C
* anapMarg_ST_MARIES
* aquiAeol
* archFulg1
* arthFB24
* azoaSp_EBN1
* azorCaul2
* baciAnth_AMES
* baciHalo
* baciSubt
* bactThet_VPI_5482
* bartHens_HOUSTON_1
* baumCica_HOMALODISCA
* bdelBact
* bifiLong
* blocFlor
* bordBron
* borrBurg
* bradJapo
* brucMeli
* buchSp
* burk383
* burkCeno_AU_1054
* burkCeno_HI2424
* burkCepa_AMMD
* burkMall_ATCC23344
* burkPseu_1106A
* burkThai_E264
* burkViet_G4
* burkXeno_LB400
* caldMaqu1
* caldSacc_DSM8903
* campFetu_82_40
* campJeju
* campJeju_81_176
* campJeju_RM1221
* candCars_RUDDII
* candPela_UBIQUE_HTCC1
* carbHydr_Z_2901
* caulCres
* chlaPneu_CWL029
* chlaTrac
* chloChlo_CAD3
* chloTepi_TLS
* chroSale_DSM3043
* chroViol
* clavMich_NCPPB_382
* colwPsyc_34H
* coryEffi_YS_314
* coxiBurn
* cytoHutc_ATCC33406
* dechArom_RCB
* dehaEthe_195
* deinGeot_DSM11300
* deinRadi
* desuHafn_Y51
* desuPsyc_LSV54
* desuRedu_MI_1
* desuVulg_HILDENBOROUG
* dichNodo_VCS1703A
* ehrlRumi_WELGEVONDEN
* ente638
* enteFaec_V583
* erwiCaro_ATROSEPTICA
* erytLito_HTCC2594
* eschColi_APEC_O1
* eschColi_CFT073
* eschColi_EC4115
* eschColi_EDL933
* eschColi_K12
* eschColi_MG1655
* eschColi_O157H7
* eschColi_TW14359
* flavJohn_UW101
* franCcI3
* franTula_TULARENSIS
* fusoNucl
* geobKaus_HTA426
* geobMeta_GS15
* geobSulf
* geobTher_NG80_2
* geobUran_RF4
* gloeViol
* glucOxyd_621H
* gramFors_KT0803
* granBeth_CGDNIH1
* haemInfl_KW20
* haemSomn_129PT
* haheChej_KCTC_2396
* halMar1
* haloHalo1
* haloHalo_SL1
* haloWals1
* heliAcin_SHEEBA
* heliHepa
* heliPylo_26695
* heliPylo_HPAG1
* heliPylo_J99
* hermArse
* hypeButy1
* hyphNept_ATCC15444
* idioLoih_L2TR
* jannCCS1
* lactLact
* lactPlan
* lactSali_UCC118
* lawsIntr_PHE_MN1_00
* legiPneu_PHILADELPHIA
* leifXyli_XYLI_CTCB0
* leptInte
* leucMese_ATCC8293
* listInno
* magnMC1
* magnMagn_AMB_1
* mannSucc_MBEL55E
* mariAqua_VT8
* mariMari_MCS10
* mculMari1
* mesoFlor_L1
* mesoLoti
* metAce1
* metMar1
* metaSedu
* methAeol1
* methBark1
* methBoon1
* methBurt2
* methCaps_BATH
* methFlag_KT
* methHung1
* methJann1
* methKand1
* methLabrZ_1
* methMari_C5_1
* methMari_C7
* methMaze1
* methPetr_PM1
* methSmit1
* methStad1
* methTher1
* methTherPT1
* methVann1
* moorTher_ATCC39073
* mycoGeni
* mycoTube_H37RV
* myxoXant_DK_1622
* nanEqu1
* natrPhar1
* neisGono_FA1090_1
* neisMeni_FAM18_1
* neisMeni_MC58_1
* neisMeni_Z2491_1
* neorSenn_MIYAYAMA
* nitrEuro
* nitrMult_ATCC25196
* nitrOcea_ATCC19707
* nitrWino_NB_255
* nocaFarc_IFM10152
* nocaJS61
* nostSp
* novoArom_DSM12444
* oceaIhey
* oenoOeni_PSU_1
* onioYell_PHYTOPLASMA
* orieTsut_BORYONG
* paraDeni_PD1222
* paraSp_UWE25
* pastMult
* pediPent_ATCC25745
* peloCarb
* peloLute_DSM273
* peloTher_SI
* photLumi
* photProf_SS9
* picrTorr1
* pireSp
* polaJS66
* polyQLWP
* porpGing_W83
* procMari_CCMP1375
* propAcne_KPA171202
* pseuAeru
* pseuHalo_TAC125
* psycArct_273_4
* psycIngr_37
* pyrAby1
* pyrAer1
* pyrFur2
* pyrHor1
* pyroArse1
* pyroCali1
* pyroIsla1
* ralsEutr_JMP134
* ralsSola
* rhizEtli_CFN_42
* rhodPalu_CGA009
* rhodRHA1
* rhodRubr_ATCC11170
* rhodSpha_2_4_1
* rickBell_RML369_C
* roseDeni_OCH_114
* rubrXyla_DSM9941
* saccDegr_2_40
* saccEryt_NRRL_2338
* saliRube_DSM13855
* saliTrop_CNB_440
* salmEnte_PARATYPI_ATC
* salmTyph
* salmTyph_TY2
* shewANA3
* shewAmaz
* shewBalt
* shewDeni
* shewFrig
* shewLoihPV4
* shewMR4
* shewMR7
* shewOnei
* shewPutrCN32
* shewW318
* shigFlex_2A
* siliPome_DSS_3
* sinoMeli
* sodaGlos_MORSITANS
* soliUsit_ELLIN6076
* sphiAlas_RB2256
* stapAure_MU50
* stapMari1
* streCoel
* strePyog_M1_GAS
* sulSol1
* sulfAcid1
* sulfToko1
* symbTher_IAM14863
* synePCC6
* syneSp_WH8102
* syntAcid_SB
* syntFuma_MPOB
* syntWolf_GOETTINGEN
* therAcid1
* therElon
* therFusc_YX
* therKoda1
* therMari
* therPend1
* therPetr_RKU_1
* therTeng
* therTher_HB27
* therTher_HB8
* therVolc1
* thioCrun_XCL_2
* thioDeni_ATCC25259
* thioDeni_ATCC33889
* trepPall
* tricEryt_IMS101
* tropWhip_TW08_27
* uncuMeth_RCI
* ureaUrea
* vermEise_EF01_2
* vibrChol1
* vibrChol_O395_1
* vibrFisc_ES114_1
* vibrPara1
* vibrVuln_CMCP6_1
* vibrVuln_YJ016_1
* wiggBrev
* wolbEndo_OF_DROSOPHIL
* woliSucc
* xantCamp
* xyleFast
* yersPest_CO92
* zymoMobi_ZM4


https://spaces.internet2.edu/display/SHIB2/NativeSPLinuxInstall


When doing a yum install, it will install both 32 bit and 64 bit giving this error:
[[Category:Software]][[Category:Bioinformatics]][[Category:NGS]]
 
<pre>Cannot load /usr/lib/shibboleth/mod_shib_22.so into server: /usr/lib/shibboleth/mod_shib_22.so: wrong ELF class: ELFCLASS32 </pre>
 
This can be fixed by doing:
<pre>
I have found that if you edit "/etc/httpd/conf.d/shib.conf" and alter the LoadModule line and change
 
"LoadModule mod_shib /usr/lib/shibboleth/mod_shib_22.so"
 
to
 
"LoadModule mod_shib /usr/lib64/shibboleth/mod_shib_22.so"
 
shibboleth sp will then start correctly.
</pre>
 
=== Python ===
Galaxy makes some assumptions about python being > 2.4 as of February 2011. A completely new galaxy specific install of python 2.6.6 was done on the CentOS 5 galaxy VM. Some directions are found here:
http://bda.ath.cx/blog/2009/04/08/installing-python-26-in-centos-5-or-rhel5/
 
=== RPy, R, Numpy,Nose, Atlas, BLAS, LAPACK, gFortran or g77 ===
RPy requires R. R must be entirely rebuild with shared library option set on. Not a major problem. R can also be installed without shared libs using yum install R.
 
Rpy strongly suggests that numerical python (Numpy) be installed, which is dependent on Atlas, BLAS, LAPACK and needs Nose for testing. These packages can build with either the GNU of g77 fortran compiler, but they must use all the same one (I choose GNU) and renamed g77 to diable_g77 as numpy searches for g77 first. Numpy has problems installing, originally manually copied the directory into the python lib directories as the INSTALL instructions seem to suggest that no install step is required and will be handled by setup. This is not the case, "python setup install" must be run. It is worth checking that numpy can be imported.
 
CentOS using yum install, can install atlas, blast and lapack but it cannot build rpy with it do to fortran incomaptibilies.
 
Atlas and atlas-dev installs won't be compatible with numpy - but work easily with yum install.
 
Additionally, steps on installing mercurial can be found here:
http://jake.murzy.com/post/2992010793/installing-and-setting-up-mercurial-rhel-centos-apache
 
Current issue is that rpy is old, didn't recognize the version of R (since it used 2 digits). Fixed rpy code to handle this, but rpy still looks for non-existent files in numpy. Probably these files existed in an earlier version of numpy.
 
Another workaround from: http://www.mail-archive.com/rpy-list@lists.sourceforge.net/msg01573.html
 
Just substitute line 77 of src/RPy.h
#include <Rdevice.h> /* must follow Graphics.h */
with
#include <Rembedded.h> /* must follow Graphics.h */
it builds and seems to work
 
Also worked around the lack of finding arrayobjects.h by manually copying across the numpy include directory into:
/usr/local/galaxy-python/lib/python2.6/numpy/core/numpy/include
 
The proper correct solution is to update galaxy so that it can use rpy2...
 
 
 
=== NextGen Mapping Setup (bowtie, bwa, lastz, megablast, srma ===
 
Instructions here:
https://bitbucket.org/galaxy/galaxy-central/wiki/NGSLocalSetup
Requires fastx and RPy to work.
 
Placing fastx binary files into /usr/local/bin as suggested, rather than trying to do something more advanced at this point. This does overlap with the EMBOSS binaries, probably not a good idea in the future.
Placed links to bowtie, bwa and lastz in /usr/local/bin as well.<br>
Downloaded and copied srma-0.1.14.jar to ~/galaxy_dist/tool-data/shared/jars/ <br>
megablast from NCBI was downloaded and run as an RPM, no problems. Stuck everything in /usr/bin <BR>
 
=== Post Processing ===
SAML Tools are required, though it can't convert megablast output. Problem building:
<pre>
gcc -g -Wall -O2  -o samtools bam_tview.o bam_maqcns.o bam_plcmd.o sam_view.o bam_rmdup.o bam_rmdupse.o bam_mate.o bam_stat.o bam_color.o bamtk.o kaln.o bam2bcf.o bam2bcf_indel.o errmod.o sample.o libbam.a -lm  -lcurses  -lz -Lbcftools -lbcf
libbam.a(bam_import.o): In function `__bam_get_lines':
/usr/local/src/samtools-0.1.12a/bam_import.c:76: undefined reference to `gzopen64'
libbam.a(bam_import.o): In function `sam_open':
/usr/local/src/samtools-0.1.12a/bam_import.c:442: undefined reference to `gzopen64'
libbam.a(bam_import.o): In function `sam_header_read2':
/usr/local/src/samtools-0.1.12a/bam_import.c:126: undefined reference to `gzopen64'
collect2: ld returned 1 exit status
make[1]: *** [samtools] Error 1
make[1]: Leaving directory `/usr/local/src/samtools-0.1.12a'
make: *** [all-recur] Error 1
</pre>
Looks like both 32 bit and 64 bit zlib were on the system, and it was preferentially using the 32 bit open. The 32 bit has 64 packages dependent on it, so I did:
<pre>
rpm -e --nodeps zlib-1.2.3-3.i386
rpm -e --nodeps zlib-devel-1.2.3-3.i386
</pre>
Unfortunately this failed to fix the build, as the problem may be due to the lack of a gzopen function in the CentOS version of zlib. Apparently the authors of zlib are developing on ubuntu now, another reason to switch. For more information see the post here: <BR>
http://forums.fedoraforum.org/showthread.php?t=228945
<BR>
However I did manage to find a version of libz in /usr/local/lib which based on:
<PRE> objdump -T /usr/local/lib/libz.so | grep gzopen</PRE>
Indicated that it contained gzopen64 unlike the CentOS standard version in /usr/lib64
<BR>
Unfortunately, make continued to fail after I linked in the gzopen64 containing zlib library due to the inability of the make script to find libbc as specified as the gcc parameter -lbc. However this turned out to be a typo on my part, the actual fix was to adjust the CFLAGS setting in the Makefile to:
<PRE>
CFLAGS=        -g -Wall -O2 -L/usr/local/lib #-m64 #-arch ppc
</PRE>
I then re-installed the 32 bit zlib version as it has a number of CentOS dependencies I don't want to break:
<PRE>
yum install zlib-1.2.3-3.i386
yum install zlib-devel-1.2.3-3.i386
</PRE>
Also, samtools needs the version of the zlib library to run, so I set the galaxy LD_LIBRARY_PATH to check /usr/local/lib first in order to load the correct library.
 
 
=== GNUPlot ===
Is required for some plotting, including seeing fastq quality statistics. I did:
<PRE>yum install gnuplot</PRE>
However this gives an error as shown below
<PRE>line 0: undefined variable inside</PRE>
and also described here:
http://cell-innovation.nig.ac.jp/wiki/tiki-view_forum_thread.php?topics_offset=1&forumId=7&comments_parentId=103
 
But it still works!
I tried upgrading to gnuplot 4.4.1 and 4.4.2 but there are other dependency issues on libg, newer versions of Cairo and pango that make it not worthwhile to try on this VM.
 
=== SNP Analysis ===
Apparently there is a program called SNPeff (http://snpeff.sourceforge.net/)  which is integrated with galaxy but doesn't come set up with the base install of galaxy. It requires java, which was installed easily with:
<Pre>
yum install java
</PRE>
SNPeff should go in /usr/local/snpeff
 
=== Errors/Bugs ===
As per galaxy-dev email I sent the file:
lib/galaxy/model/migrate/versions/0068_rename_sequencer_to_external_services.py
Contains SQL errors as it tries to rename a SEQUENCE with the incorrect sytax. ALTER TABLE should be used, not ALTER SEQUENCE. This file has errors in 2 locations.
The problem remains as of February the 7th, 2011.
 
== Cheaha Cluster Installation Setup Notes ==
 
Software installation proceeding as described here:
http://dev.uabgrid.uab.edu/wiki/GalaxyUsage
 
  1. Downloaded/copy archive/zip files of various tools in $GALAXY_BASEDIR/archives directory.
  2. Extract archived files in $GALAXY_BASEDIR/src directory.
  3. Install these tools in $GALAXY_BASEDIR/galaxy-tools directory.
  4. Confirm with system-admin if you need to make any changes to universe_wsgi.ini file. I think typically you (galaxy tool installer/users) shouldn't require to make any changes to this file. We will change this policy if we discover any shortcomings.
 
== Next Gen Sequencing Software (order of install on "cheaha") ==
 
* Python - 2.6.6 already installed
* bwa-0.5.9 - Built from source without problems. Commands:make  Then linked to bwa file in src directory from /share/apps/galaxy/galaxy-tools/bin
* bowtie-0.12.7 - Build from source without problems. Previously used binaries for SOM install. Linked to bowtie files in src directoy from /share/apps/galaxy/galaxy-tools/bin
* EMBOSS-6.3.1 - Build from source. Commands:1)./configure --prefix=/share/apps/galaxy/galaxy-tools --exec-prefix=/share/apps/galaxy/galaxy-tools 2)make 3)make install
* lastz-distrib-1.02.00 Built from source. Commands: 1) make 2)make test  Install:Linked lastz and lastz_D in src directory to galaxy-tools/bin
* samtools-0.1.12a - Build from source without problems. Commands:make Install: Linked to samtools and bcftools in /share/apps/galaxy/galaxy-tools/bin
* snpEff 1.9 - Downloaded binaries, they provide galaxy inferface xml file. Placed tool in /share/apps/galaxy/galaxy-tools/snpEff_v1_9.  Created snpEff directory in /share/apps/galaxy/galaxy-latest/tools and created symbolic link to /share/apps/galaxy/galaxy-tools/snpEff_v1_9 for snpEff.config, snpEff.xml and  .jar
* blast-2.2.25 - Downloaded binary legacy blast, this is for MegaBlast. Galaxy actually uses version 2.2.22 but hoping this will be fine, not installed earlier on SOM cluster. Created symbolic links in the galaxy-tools/bin directory
* srma-0.1.15 - Linked (instead of placed) it in $GALAXY_PATH/tool-data/shared/jars (/share/apps/galaxy/galaxy-latest/tool-data/shared/jars). Actually put it in the archives directory and in /share/apps/galaxy/galaxy-tools/lib/srma-0.1.15/srma-0.1.15.jar where it is pointed to and used. No compiling, just class files, single jar download.
 
=== Reference Genome Building and Indexing ===
Modified /home/ozborn/galaxy/galaxy-latest/tool-data/shared/ucsc/builds.txt to have Vaccinia Western Reserve
Copied across old School of Medicine galaxy_ng_indices to /home/ozborn/galaxy/database/galaxy_ng_indices
 
= References =
* [http://bitbucket.org/galaxy/galaxy-central/wiki/Home Galaxy Wiki]
* [http://bitbucket.org/galaxy/galaxy-central/wiki/ImplementationInfo Galaxy Architecture]
* [http://bitbucket.org/galaxy/galaxy-central/wiki/GetGalaxy Get Galaxy]
* [http://bitbucket.org/galaxy/galaxy-central/wiki/Config/ProductionServer Galaxy Advanced Config]

Latest revision as of 14:48, 13 December 2017

Overview

The UAB Galaxy platform for experimental biology and comparative genomics is designed to help you analyze multiple alignments, compare genomic annotations, profile metagenomic samples and more from your web browser. This platform is built on Galaxy, backed by the Cheaha compute cluster, and powered by UABgrid.

The primary uses of UAB Galaxy are to provide a simple web interface for NGS (short read sequencing) analysis for genomic and transcriptomic datasets, using tools like BWA, Bowtie, Tophat and Cufflinks, as well as simple sequence manipulation via the EMBOSS toolkit.

Using Galaxy / Tutorials

There are numerous general tutorials online at the Penn State public Galaxy site that are worth looking at.

There are also several UAB tutorials on NGS Analysis with Galaxy, created for HPC Boot Camp 2011 and a nice talk by Jeremy Goecks during Research Computing Day 2011.

Support

UAB galaxy-users list-serv: subscribe search.

UAB galaxy-help list-serv: [1] to contact admins of the UAB galaxy instance.

Privacy

Note that your data will be stored on the cluster filesystem, and while not accessible to ordinary users, it can be easily accessed by any of the galaxy or cluster administrators. It is not encrypted. Do not store sensitive information in this system.

Galaxy@UAB

The UAB Galaxy instance can be accessed at https://galaxy.uabgrid.uab.edu using BlazerID credentials. No account on the cluster is needed. However, the tools installed for galaxy (BWA, etc) can be accessed via the command line if you have an account on the cluster.

Loading Data

See Galaxy_File_Uploads.

Available Tools

Following is a partial list highlighting some of the important tools available. Additional tools can be installed upon request. To search for tools already integrated into the Galaxy system, see the Galaxy ToolShed.


Software Version Information
bwa 0.5.9-r26 Align genomic short reads to a reference genome
bowtie 0.12.7 Align genomic short reads to a reference genome
tophat 1.4.0 Align transcriptome short reads to a reference genome
cufflinks, cuffdiff, cuffcompare 1.3.0 Reconstruct and quantify transcript levels from tophat alignments.
samtools 0.1.12a Alignment (SAM/BAM file) manipulations
velvet 1.1.03 Denovo Assembly
EMBOSS 6.3.1 European Molecular Biology Open Software Suite - sequence manipulation and format conversion

Installed Genome Indexes

You can always use your own genome by uploading the .fasta into your history, but alignments against installed (pre-indexed) genomes run much more quickly. If you need an additional genome installed, please contact [2].

dbkey Genome Accessions
hg19 Human Feb. 2009 (GRCh37/hg19) (hg19)
hg18 Human Mar. 2006 (NCBI36/hg18) (hg18)
hg17 Human May 2004 (NCBI35/hg17) (hg17)
hg16 Human July 2003 (NCBI34/hg16) (hg16)
mm10 Mouse Dec. 2011 (GRCm38/mm10) (mm10)
mm9 Mouse July 2007 (NCBI37/mm9) (mm9)
mm8 Mouse Feb. 2006 (NCBI36/mm8) (mm8)
mm7 Mouse Aug. 2005 (NCBI35/mm7) (mm7)
mm6
mm5
sacCer3 S. cerevisiae Apr. 2011 (SacCer_Apr2011/sacCer3) (sacCer3)
sacCer2 S. cerevisiae June 2008 (SGD/sacCer2) (sacCer2)
ce10 C. elegans Oct. 2010 (WS220/ce10) (ce10)
rn5 Rat Mar. 2012 (RGSC 5.0/rn5) (rn5)
rn4 Rat Nov. 2004 (Baylor 3.4/rn4) (rn4)
danRer7 Zebrafish Jul. 2010 (Zv9/danRer7) (danRer7)
eschColi_APEC_O1 Escherichia coli APEC O1 chr=5082025
eschColi_CFT073 Escherichia coli CFT073 chr=5231428
eschColi_EC4115 Escherichia coli EC4115 chr=5572075,plasmid_pO157=94644,plasmid_pEC4115=37452
eschColi_K12 Escherichia coli K12 chr=4639675
eschColi_EDL993 Escherichia coli O157:H7 EDL933 NC_007414=92077,NC_002655=5528445
eschColi_O157H7 Escherichia coli O157:H7 EDL933 NC_007414=92077,NC_002655=5528445
eschColi_TW14359 Escherichia coli TW14359 chr=5528136,plasmid_pO157=94601

Additional Genomes that can be quickly installed

These are pre-indexed genomes we can easily download from Penn State's Galaxy Data-Cache

Organisms

  • AaegL1
  • Acropora_digitifera
  • AgamP3
  • Arabidopsis_thaliana_TAIR10
  • Arabidopsis_thaliana_TAIR9
  • Araly1
  • Bombyx_mori_p50T_2.0
  • CpipJ1
  • Homo_sapiens_AK1
  • Homo_sapiens_nuHg19_mtrCRS
  • Hydra_JCVI
  • IscaW1
  • PhumU1
  • Physcomitrella_patens_patens
  • Ptrichocarpa_156
  • Saccharomyces_cerevisiae_S288C_SGD2010
  • Schizosaccharomyces_pombe_1.1
  • Spur_v2.6
  • Sscrofa9.58
  • Tcacao_1.0
  • Tcas_3.0
  • Theobroma_cocoa
  • Zea_mays_B73_RefGen_v2
  • ailMel1
  • anoCar1
  • anoCar2
  • anoGam1
  • apiMel1
  • apiMel2
  • apiMel3
  • apiMel4.5
  • aplCal1
  • bighorn_sheep
  • borEut13
  • bosTau2
  • bosTau3
  • bosTau4
  • bosTau5
  • bosTau6
  • bosTau7
  • bosTauMd3
  • braFlo1
  • caeJap1
  • caePb1
  • caePb2
  • caeRem2
  • caeRem3
  • calJac1
  • calJac3
  • canFam1
  • canFam2
  • canFam3
  • cavPor2
  • cavPor3
  • cb3
  • ce10
  • ce2
  • ce3
  • ce4
  • ce5
  • ce6
  • ce7
  • ce8
  • ce9
  • choHof1
  • chrPic1
  • ci2
  • danRer2
  • danRer3
  • danRer4
  • danRer5
  • danRer6
  • danRer7
  • dasNov1
  • dasNov2
  • dipOrd1
  • dm1
  • dm2
  • dm3
  • dp3
  • dp4
  • droAna1
  • droAna2
  • droAna3
  • droEre1
  • droEre2
  • droGri1
  • droGri2
  • droMoj1
  • droMoj2
  • droMoj3
  • droPer1
  • droSec1
  • droSim1
  • droVir1
  • droVir2
  • droVir3
  • droWil1
  • droYak1
  • droYak2
  • echTel1
  • emf
  • equCab1
  • equCab2
  • equCab2_chrM
  • eriEur1
  • felCat3
  • felCat4
  • fr1
  • fr2
  • fr3
  • galGal2
  • galGal3
  • galGal4
  • gasAcu1
  • geoFor1
  • gorGor1
  • gorGor3
  • hetGla1
  • hetGla2
  • hg16
  • hg17
  • hg18
  • hg19
  • hg_g1k_v37
  • lMaj5
  • lengths
  • loxAfr3
  • loxAfr4
  • macEug1
  • melGal1
  • melUnd1
  • micMur1
  • mm10
  • mm5
  • mm6
  • mm7
  • mm8
  • mm9
  • monDom4
  • monDom5
  • myoLuc1
  • myoLuc2
  • nomLeu1
  • nomLeu2
  • ochPri2
  • ornAna1
  • oryCun1
  • oryCun2
  • oryLat1
  • oryLat2
  • oryza_sativa_japonica_nipponbare_IRGSP4.0
  • otoGar1
  • oviAri1
  • pUC18
  • panTro1
  • panTro2
  • panTro3
  • papHam1
  • petMar1
  • phiX
  • ponAbe2
  • priPac1
  • rheMac2
  • rheMac3
  • rn3
  • rn4
  • rn5
  • sacCer1
  • sacCer2
  • sacCer3
  • sarHar1
  • sorAra1
  • strPur2
  • strPur3
  • susScr2
  • susScr3
  • taeGut1
  • tarSyr1
  • tetNig1
  • tetNig2
  • triCas2
  • tupBel1
  • venter1
  • xenTro1
  • xenTro2
  • xenTro3

Microbes

  • Staphylococcus_aureus_aureus_USA300_FPR3757
  • Xanthomonas_oryzae_PXO99A
  • acidBact_ELLIN345
  • acidCell_11B
  • acidCryp_JF_5
  • acidJS42
  • acinSp_ADP1
  • actiPleu_L20
  • aerPer1
  • aeroHydr_ATCC7966
  • alcaBork_SK2
  • alkaEhrl_MLHE_1
  • anabVari_ATCC29413
  • anaeDeha_2CP_C
  • anapMarg_ST_MARIES
  • aquiAeol
  • archFulg1
  • arthFB24
  • azoaSp_EBN1
  • azorCaul2
  • baciAnth_AMES
  • baciHalo
  • baciSubt
  • bactThet_VPI_5482
  • bartHens_HOUSTON_1
  • baumCica_HOMALODISCA
  • bdelBact
  • bifiLong
  • blocFlor
  • bordBron
  • borrBurg
  • bradJapo
  • brucMeli
  • buchSp
  • burk383
  • burkCeno_AU_1054
  • burkCeno_HI2424
  • burkCepa_AMMD
  • burkMall_ATCC23344
  • burkPseu_1106A
  • burkThai_E264
  • burkViet_G4
  • burkXeno_LB400
  • caldMaqu1
  • caldSacc_DSM8903
  • campFetu_82_40
  • campJeju
  • campJeju_81_176
  • campJeju_RM1221
  • candCars_RUDDII
  • candPela_UBIQUE_HTCC1
  • carbHydr_Z_2901
  • caulCres
  • chlaPneu_CWL029
  • chlaTrac
  • chloChlo_CAD3
  • chloTepi_TLS
  • chroSale_DSM3043
  • chroViol
  • clavMich_NCPPB_382
  • colwPsyc_34H
  • coryEffi_YS_314
  • coxiBurn
  • cytoHutc_ATCC33406
  • dechArom_RCB
  • dehaEthe_195
  • deinGeot_DSM11300
  • deinRadi
  • desuHafn_Y51
  • desuPsyc_LSV54
  • desuRedu_MI_1
  • desuVulg_HILDENBOROUG
  • dichNodo_VCS1703A
  • ehrlRumi_WELGEVONDEN
  • ente638
  • enteFaec_V583
  • erwiCaro_ATROSEPTICA
  • erytLito_HTCC2594
  • eschColi_APEC_O1
  • eschColi_CFT073
  • eschColi_EC4115
  • eschColi_EDL933
  • eschColi_K12
  • eschColi_MG1655
  • eschColi_O157H7
  • eschColi_TW14359
  • flavJohn_UW101
  • franCcI3
  • franTula_TULARENSIS
  • fusoNucl
  • geobKaus_HTA426
  • geobMeta_GS15
  • geobSulf
  • geobTher_NG80_2
  • geobUran_RF4
  • gloeViol
  • glucOxyd_621H
  • gramFors_KT0803
  • granBeth_CGDNIH1
  • haemInfl_KW20
  • haemSomn_129PT
  • haheChej_KCTC_2396
  • halMar1
  • haloHalo1
  • haloHalo_SL1
  • haloWals1
  • heliAcin_SHEEBA
  • heliHepa
  • heliPylo_26695
  • heliPylo_HPAG1
  • heliPylo_J99
  • hermArse
  • hypeButy1
  • hyphNept_ATCC15444
  • idioLoih_L2TR
  • jannCCS1
  • lactLact
  • lactPlan
  • lactSali_UCC118
  • lawsIntr_PHE_MN1_00
  • legiPneu_PHILADELPHIA
  • leifXyli_XYLI_CTCB0
  • leptInte
  • leucMese_ATCC8293
  • listInno
  • magnMC1
  • magnMagn_AMB_1
  • mannSucc_MBEL55E
  • mariAqua_VT8
  • mariMari_MCS10
  • mculMari1
  • mesoFlor_L1
  • mesoLoti
  • metAce1
  • metMar1
  • metaSedu
  • methAeol1
  • methBark1
  • methBoon1
  • methBurt2
  • methCaps_BATH
  • methFlag_KT
  • methHung1
  • methJann1
  • methKand1
  • methLabrZ_1
  • methMari_C5_1
  • methMari_C7
  • methMaze1
  • methPetr_PM1
  • methSmit1
  • methStad1
  • methTher1
  • methTherPT1
  • methVann1
  • moorTher_ATCC39073
  • mycoGeni
  • mycoTube_H37RV
  • myxoXant_DK_1622
  • nanEqu1
  • natrPhar1
  • neisGono_FA1090_1
  • neisMeni_FAM18_1
  • neisMeni_MC58_1
  • neisMeni_Z2491_1
  • neorSenn_MIYAYAMA
  • nitrEuro
  • nitrMult_ATCC25196
  • nitrOcea_ATCC19707
  • nitrWino_NB_255
  • nocaFarc_IFM10152
  • nocaJS61
  • nostSp
  • novoArom_DSM12444
  • oceaIhey
  • oenoOeni_PSU_1
  • onioYell_PHYTOPLASMA
  • orieTsut_BORYONG
  • paraDeni_PD1222
  • paraSp_UWE25
  • pastMult
  • pediPent_ATCC25745
  • peloCarb
  • peloLute_DSM273
  • peloTher_SI
  • photLumi
  • photProf_SS9
  • picrTorr1
  • pireSp
  • polaJS66
  • polyQLWP
  • porpGing_W83
  • procMari_CCMP1375
  • propAcne_KPA171202
  • pseuAeru
  • pseuHalo_TAC125
  • psycArct_273_4
  • psycIngr_37
  • pyrAby1
  • pyrAer1
  • pyrFur2
  • pyrHor1
  • pyroArse1
  • pyroCali1
  • pyroIsla1
  • ralsEutr_JMP134
  • ralsSola
  • rhizEtli_CFN_42
  • rhodPalu_CGA009
  • rhodRHA1
  • rhodRubr_ATCC11170
  • rhodSpha_2_4_1
  • rickBell_RML369_C
  • roseDeni_OCH_114
  • rubrXyla_DSM9941
  • saccDegr_2_40
  • saccEryt_NRRL_2338
  • saliRube_DSM13855
  • saliTrop_CNB_440
  • salmEnte_PARATYPI_ATC
  • salmTyph
  • salmTyph_TY2
  • shewANA3
  • shewAmaz
  • shewBalt
  • shewDeni
  • shewFrig
  • shewLoihPV4
  • shewMR4
  • shewMR7
  • shewOnei
  • shewPutrCN32
  • shewW318
  • shigFlex_2A
  • siliPome_DSS_3
  • sinoMeli
  • sodaGlos_MORSITANS
  • soliUsit_ELLIN6076
  • sphiAlas_RB2256
  • stapAure_MU50
  • stapMari1
  • streCoel
  • strePyog_M1_GAS
  • sulSol1
  • sulfAcid1
  • sulfToko1
  • symbTher_IAM14863
  • synePCC6
  • syneSp_WH8102
  • syntAcid_SB
  • syntFuma_MPOB
  • syntWolf_GOETTINGEN
  • therAcid1
  • therElon
  • therFusc_YX
  • therKoda1
  • therMari
  • therPend1
  • therPetr_RKU_1
  • therTeng
  • therTher_HB27
  • therTher_HB8
  • therVolc1
  • thioCrun_XCL_2
  • thioDeni_ATCC25259
  • thioDeni_ATCC33889
  • trepPall
  • tricEryt_IMS101
  • tropWhip_TW08_27
  • uncuMeth_RCI
  • ureaUrea
  • vermEise_EF01_2
  • vibrChol1
  • vibrChol_O395_1
  • vibrFisc_ES114_1
  • vibrPara1
  • vibrVuln_CMCP6_1
  • vibrVuln_YJ016_1
  • wiggBrev
  • wolbEndo_OF_DROSOPHIL
  • woliSucc
  • xantCamp
  • xyleFast
  • yersPest_CO92
  • zymoMobi_ZM4