Galaxy: Difference between revisions
(GNU Plot problems with Galaxy VM) |
(SNPeff) |
||
Line 192: | Line 192: | ||
But it still works! | But it still works! | ||
I tried upgrading to gnuplot 4.4.1 and 4.4.2 but there are other dependency issues on libg, newer versions of Cairo and pango that make it not worthwhile to try on this VM. | I tried upgrading to gnuplot 4.4.1 and 4.4.2 but there are other dependency issues on libg, newer versions of Cairo and pango that make it not worthwhile to try on this VM. | ||
== SNP Analysis == | |||
Apparently there is a program called SNPeff (http://snpeff.sourceforge.net/) which is integrated with galaxy but doesn't come set up with the base install of galaxy. It requires java, which was installed easily with: | |||
<Pre> | |||
yum install java | |||
</PRE> | |||
SNPeff should go in /usr/local/snpeff | |||
== Errors/Bugs == | == Errors/Bugs == |
Revision as of 19:45, 1 March 2011
Overview
Galaxy is an easy-to-use, open-source, scalable framework for tool and data integration. Galaxy provides access to tools (mainly comparative genomics) through an interface (e.g., a web-based interface). The Galaxy framework is implemented in the Python programming language.
End Users
A public instance of Galaxy maintained by Penn State University is at http://usegalaxy.org/
Developers
To get started with installing your own Galaxy instance, the only required component is Python (2.4,2.5, and 2.6). Four simple steps will get you started with your own Galaxy instance
- clone the mercurial galaxy distribution
hg clone http://www.bx.psu.edu/hg/galaxy galaxy_dist
or get a source tarball - execute setup.sh
- execute run.sh
- go to http://localhost:8080
- Galaxy runs on a local webserver, PasteScript written in Python. PasteScript is based on Python's library module simplehttpserverand implemented with the help of python package, WSGIUtils
- For deployment to production environments, Galaxy documentation suggests using a proxy server like Apache/Nginx to serve up static content and for handling authnz. As mentioned in Galaxy's ApacheProxy documentation, here's how to edit Apache's httpd.conf to proxy Galaxy on port 80
<Proxy http://localhost:8080> Order deny,allow Allow from all </Proxy> RewriteEngine on RewriteRule ^/galaxy$ /galaxy/ [R] RewriteRule ^/galaxy/static/style/(.*) /Users/pnm/project/galaxy_dist/static/june_2007_style/blue/$1 [L] RewriteRule ^/galaxy/static/scripts/(.*) /Users/pnm/project/galaxy_dist/static/scripts/packed/$1 [L] RewriteRule ^/galaxy/static/(.*) /Users/pnm/project/galaxy_dist/static/$1 [L] RewriteRule ^/galaxy/favicon.ico /Users/pnm/project/galaxy_dist/static/favicon.ico [L] RewriteRule ^/galaxy/robots.txt /Users/pnm/project/galaxy_dist/static/robots.txt [L] RewriteRule ^/galaxy(.*) http://localhost:8080$1 [P]
- Change the path to where you have cloned/installed Galaxy
- http://localhost/galaxy should bring up Galaxy on port 80 now
Note: if you are using CentOS/RedHat as your platform, you may need to adjust or disable SELinux in order to allow Apache to access parts of the file system that are not part of the shipped default Directory directives of the Apache config. You will see unexplained "permission denied" errors in the errors.log if you are bitten by this.
Apache and Postgres Setup
In production mode it is recommended that something other than SQLlite and the python web server be used, preferably postgres and apache. When setting up apache to proxy requests to the python web server on CentOS it is critical that the default CentOS security policy be overridden to allow proxying as shown below.
setsebool -P httpd_can_network_relay=1
Additionally, redirects to the file system are blocked by selinux security policy. The current workaround for that is to running:
setenforce 0
A better solution will be to allow access to the required directories using chcon like this:
chcon -R -t httpd_user_content_t galaxy_dist/
This has been done and appears to work, allowing selinux to be turned back on.
Additionally, postgres ident privileges should be changed. One workaround is to set postgres to trust all local users as shown below in /var/lib/pgsql/data/pg_hba.conf
local all all trust
SMTP Configuration In lib/galaxy/config.py and universe_wsgi.ini
self.smtp_server = kwargs.get( 'smtp_server', "vera.dpo.uab.edu" ) and #smtp_server = vera.dpo.uab.edu
Shibboleth Installation
The instructions for the base rpm install on the SP (your galaxy box) are here:
https://spaces.internet2.edu/display/SHIB2/NativeSPLinuxInstall
When doing a yum install, it will install both 32 bit and 64 bit giving this error:
Cannot load /usr/lib/shibboleth/mod_shib_22.so into server: /usr/lib/shibboleth/mod_shib_22.so: wrong ELF class: ELFCLASS32
This can be fixed by doing:
I have found that if you edit "/etc/httpd/conf.d/shib.conf" and alter the LoadModule line and change "LoadModule mod_shib /usr/lib/shibboleth/mod_shib_22.so" to "LoadModule mod_shib /usr/lib64/shibboleth/mod_shib_22.so" shibboleth sp will then start correctly.
Python
Galaxy makes some assumptions about python being > 2.4 as of February 2011. A completely new galaxy specific install of python 2.6.6 was done on the CentOS 5 galaxy VM. Some directions are found here: http://bda.ath.cx/blog/2009/04/08/installing-python-26-in-centos-5-or-rhel5/
RPy, R, Numpy,Nose, Atlas, BLAS, LAPACK, gFortran or g77
RPy requires R. R must be entirely rebuild with shared library option set on. Not a major problem. R can also be installed without shared libs using yum install R.
Rpy strongly suggests that numerical python (Numpy) be installed, which is dependent on Atlas, BLAS, LAPACK and needs Nose for testing. These packages can build with either the GNU of g77 fortran compiler, but they must use all the same one (I choose GNU) and renamed g77 to diable_g77 as numpy searches for g77 first. Numpy has problems installing, originally manually copied the directory into the python lib directories as the INSTALL instructions seem to suggest that no install step is required and will be handled by setup. This is not the case, "python setup install" must be run. It is worth checking that numpy can be imported.
CentOS using yum install, can install atlas, blast and lapack but it cannot build rpy with it do to fortran incomaptibilies.
Atlas and atlas-dev installs won't be compatible with numpy - but work easily with yum install.
Additionally, steps on installing mercurial can be found here: http://jake.murzy.com/post/2992010793/installing-and-setting-up-mercurial-rhel-centos-apache
Current issue is that rpy is old, didn't recognize the version of R (since it used 2 digits). Fixed rpy code to handle this, but rpy still looks for non-existent files in numpy. Probably these files existed in an earlier version of numpy.
Another workaround from: http://www.mail-archive.com/rpy-list@lists.sourceforge.net/msg01573.html
Just substitute line 77 of src/RPy.h
- include <Rdevice.h> /* must follow Graphics.h */
with
- include <Rembedded.h> /* must follow Graphics.h */
it builds and seems to work
Also worked around the lack of finding arrayobjects.h by manually copying across the numpy include directory into: /usr/local/galaxy-python/lib/python2.6/numpy/core/numpy/include
The proper correct solution is to update galaxy so that it can use rpy2...
NextGen Sequencing Setup
Instructions here: https://bitbucket.org/galaxy/galaxy-central/wiki/NGSLocalSetup Requires fastx and RPy to work.
NextGen Mapping Setup (bowtie, bwa, lastz, megablast, srma
Placing fastx binary files into /usr/local/bin as suggested, rather than trying to do something more advanced at this point. This does overlap with the EMBOSS binaries, probably not a good idea in the future.
Placed links to bowtie, bwa and lastz in /usr/local/bin as well.
Downloaded and copied srma-0.1.14.jar to ~/galaxy_dist/tool-data/shared/jars/
megablast from NCBI was downloaded and run as an RPM, no problems. Stuck everything in /usr/bin
Post Processing
SAML Tools are required, though it can't convert megablast output. Problem building:
gcc -g -Wall -O2 -o samtools bam_tview.o bam_maqcns.o bam_plcmd.o sam_view.o bam_rmdup.o bam_rmdupse.o bam_mate.o bam_stat.o bam_color.o bamtk.o kaln.o bam2bcf.o bam2bcf_indel.o errmod.o sample.o libbam.a -lm -lcurses -lz -Lbcftools -lbcf libbam.a(bam_import.o): In function `__bam_get_lines': /usr/local/src/samtools-0.1.12a/bam_import.c:76: undefined reference to `gzopen64' libbam.a(bam_import.o): In function `sam_open': /usr/local/src/samtools-0.1.12a/bam_import.c:442: undefined reference to `gzopen64' libbam.a(bam_import.o): In function `sam_header_read2': /usr/local/src/samtools-0.1.12a/bam_import.c:126: undefined reference to `gzopen64' collect2: ld returned 1 exit status make[1]: *** [samtools] Error 1 make[1]: Leaving directory `/usr/local/src/samtools-0.1.12a' make: *** [all-recur] Error 1
Looks like both 32 bit and 64 bit zlib were on the system, and it was preferentially using the 32 bit open. The 32 bit has 64 packages dependent on it, so I did:
rpm -e --nodeps zlib-1.2.3-3.i386 rpm -e --nodeps zlib-devel-1.2.3-3.i386
Unfortunately this failed to fix the build, as the problem may be due to the lack of a gzopen function in the CentOS version of zlib. Apparently the authors of zlib are developing on ubuntu now, another reason to switch. For more information see the post here:
http://forums.fedoraforum.org/showthread.php?t=228945
However I did manage to find a version of libz in /usr/local/lib which based on:
objdump -T /usr/local/lib/libz.so | grep gzopen
Indicated that it contained gzopen64 unlike the CentOS standard version in /usr/lib64
Unfortunately, make continued to fail after I linked in the gzopen64 containing zlib library due to the inability of the make script to find libbc as specified as the gcc parameter -lbc. However this turned out to be a typo on my part, the actual fix was to adjust the CFLAGS setting in the Makefile to:
CFLAGS= -g -Wall -O2 -L/usr/local/lib #-m64 #-arch ppc
I then re-installed the 32 bit zlib version as it has a number of CentOS dependencies I don't want to break:
yum install zlib-1.2.3-3.i386 yum install zlib-devel-1.2.3-3.i386
Also, samtools needs the version of the zlib library to run, so I set the galaxy LD_LIBRARY_PATH to check /usr/local/lib first in order to load the correct library.
Temporary Directory and the $TEMP variable
It is advisable to set the $TEMP variable to a directory with lots of space, otherwise /tmp will be used which often isn't large enough to handle large files. See:
http://lists.bx.psu.edu/pipermail/galaxy-user/2009-September/000704.html
In this case I modified run.sh (which wasn't picking up my TEMP export) to export TEMP=/home/galaxy/galaxy_temp_dir
GNUPlot
Is required for some plotting, including seeing fastq quality statistics. I did:
yum install gnuplot
However this gives an error as shown below
line 0: undefined variable inside
and also described here: http://cell-innovation.nig.ac.jp/wiki/tiki-view_forum_thread.php?topics_offset=1&forumId=7&comments_parentId=103
But it still works! I tried upgrading to gnuplot 4.4.1 and 4.4.2 but there are other dependency issues on libg, newer versions of Cairo and pango that make it not worthwhile to try on this VM.
SNP Analysis
Apparently there is a program called SNPeff (http://snpeff.sourceforge.net/) which is integrated with galaxy but doesn't come set up with the base install of galaxy. It requires java, which was installed easily with:
yum install java
SNPeff should go in /usr/local/snpeff
Errors/Bugs
As per galaxy-dev email I sent the file: lib/galaxy/model/migrate/versions/0068_rename_sequencer_to_external_services.py Contains SQL errors as it tries to rename a SEQUENCE with the incorrect sytax. ALTER TABLE should be used, not ALTER SEQUENCE. This file has errors in 2 locations. The problem remains as of February the 7th, 2011.