@lab Wiki: Difference between revisions

From Cheaha
Jump to navigation Jump to search
(→‎Upgrading Drupal: Add note about configuration of instances)
 
(24 intermediate revisions by one other user not shown)
Line 4: Line 4:


* [[VO Web Servers]]
* [[VO Web Servers]]
= Drupal CMS =
The Drupal CMS is a flexible web development platform that enables construction of all manner of web sites, in addition to its support for website-in-a-box configurations.
While Drupal is powerful it has it's limitations.  One of it's biggest drawbacks is backwards compatibility.  It's worth reading the Drupal team's [http://drupal.org/node/65922 stance on backward compatibility] and the [http://drupal.org/upgrade/tutorial-introduction Drupal Upgrade Guide] for a broader perspective on this.
The project takes the approach that they want to be free to adopt the latest practices without retaining the the weight of legacy code interfaces. They reserve the right to change interfaces between releases (and most often do). Drupal '''won't''' break data between releases, though and '''will''' provide an upgrade path for the core modules.
These restrictions mostly affect Drupal extensions.  In otherwords the world for extension developers is one of constant change, making the utility of casual extensions questionable. Themes fall under this catagory, somewhat, so using a theme abstraction layer (Xtemplate, PHPTemplate) is advisable.
The impact of this approach is limited on UABgrid because Drupal is simply a  stand-alone application.  Rather than use many of the extensions available for Drupal, our system framework is based on intagration across applications.  This makes the internal interfaces of Drupal less critical.  We can build on the core functionality and focus our efforts, if necessary, on a limited set of extensions that are critical to our operations.
At some point, Drupal (and many other CMSes) will hopefully gain an stable interface similar to the approach taken by other operating systems.
== Requirements ==
Drupal mainly requires PHP and a relational database.  The specific requirements can be found on their site: [http://drupal.org/requirements].  Of interest is their transition from PHP4 to PHP5 as this impacts what version run on specific Linux releases using vendor-supplied packages.


== Upgrading Drupal ==
== Upgrading Drupal ==
Important -- before starting, please read the [http://drupal.org/upgrade/tutorial-introduction Drupal Upgrade Guide].


The @lab drupal site has aged significantly. It's not been updated since 4.3.x.  In order to upgrade to the latest release in the 4.x line (4.7.6 as of this writing) all intermediate version update.php scripts need to be applied to the database. This is pretty easy to do.  It just requires that you install the intermediate releases, configure them to point at the database, and step through the update.php scripts.
The @lab drupal site has aged significantly. It's not been updated since 4.3.x.  In order to upgrade to the latest release in the 4.x line (4.7.6 as of this writing) all intermediate version update.php scripts need to be applied to the database. This is pretty easy to do.  It just requires that you install the intermediate releases, configure them to point at the database, and step through the update.php scripts.
Line 19: Line 39:
Some additional important notes for upgrading can be found on the drupal site http://drupal.org/upgrade/
Some additional important notes for upgrading can be found on the drupal site http://drupal.org/upgrade/


== Configuring instances for upgrade ===
'''UPDATE''' - the @lab drupal install was already at 4.4.0 as evidenced by the CHANGES.txt file, so that instance isn't needed. [[User:Jpr@uab.edu|Jpr@uab.edu]] 17:21, 19 July 2007 (CDT)
 
== Configuring instances for upgrade ==


The 4.4 and 4.5 release still use the includes/conf.php file. After that it switches to the site/default/settings.php approach. They all use the $db_url and $base_url configuration values, though so it should be easy to just plug in the original site values. The $base_url is optional in 4.7.6.
The 4.4 and 4.5 release still use the includes/conf.php file. After that it switches to the site/default/settings.php approach. They all use the $db_url and $base_url configuration values, though so it should be easy to just plug in the original site values. The $base_url is optional in 4.7.6.
Perl script to change config across many versions for upgrading:
 
  #!/usr/bin/perl -w
 
  $sitedb = $ARGV[0];
  $siteurl = $ARGV[1];
 
  while (<STDIN>) {
  s/([\"\']).*([\'\"])/$1$sitedb$2/ if ( /^\$db_url =/ );
  s/([\"\']).*([\'\"])/$1$siteurl$2/ if ( /^\$base_url =/ );
  print;
  }
Name the above script "setsite.pl" and run it as follows:
<nowiki>
# cd to top-level of multi-drupal site dir
for file in `grep -rl '^\$db_url =' *`
do
    webpath=`echo $file | cut -d / -f 1`
    ./setsite.pl mysql://dbuser:dbpass@localhost/dbname \
        http://host/$webpath < $file > $file.new
    mv $file.new $file
done
</nowiki>
== Performing the Upgrade ==
The upgrade is performed on a version-by-version basis. After running the above config scripts, go to the version-based URL and invoke the 'update.php' script.  You'll need to turn of the auth access check at the start of this file (manually) in order to allow non-auth'd access to the script. This is OK only if you're doing this is an isolalated env protected from outsiders.
=== 4.4.0 to 4.5.0 ===
In order to prepare for later upgrades (specifically the 4.6.0 one which changes some core abstractions), you need to log into the existing 4.4.0 site and change to the default blue marine theme and disable non-core modules. 
For the @lab site we don't have our bluemarine theme anymore.  It's been replaced with a variant of Xtemplate.  We can wait till after the 4.5.0 update to switch to bluemarine from the out-of-box 4.5.0 upgrade install. 
The non-core themes need to be turned off now though because they won't be there in the out-of-box code used for the upgrades.
update system set status = 0 where type='module' and  name not in (
    'admin', 'aggregator' ,'archive' ,'block' ,'blog' ,'blogapi' ,'book' ,
    'comment' ,'drupal' ,'filter' ,'forum' ,'help' ,'legacy' ,'locale' ,
    'menu' ,'node' ,'page' ,'path' ,'ping' ,'poll' ,'profile' ,'queue' ,
    'search' ,'statistics' ,'story' ,'system' ,'taxonomy' ,'throttle' ,
    'tracker' ,'upload' ,'user', 'watchdog' );
Make sure your $base_url is set correctly in the config file. This depends on how you're invoking the update.php script. Keep this in mind if you are ssh-tunneling to the dev server from a remote location.
The 4.4.0 update.php script runs smoothly without error.
=== 4.5.0 to 4.6.0 ===
This is a more involved update due to some changes in architecture.  See the [http://drupal.org/node/30699 update instructions on the drupal site] for details.  We basically follow the steps except that the modules were turned off earlier and the bluemarine theme was enabled after the 4.5.0 upgrade.
Before running the update.php script there are a couple of data errors that need to be corrected.  The  2004-11-07 update changes the sessions table  to avoid duplicates (ALTER TABLE sessions ADD PRIMARY KEY sid (sid)).  This will cause update.php to fail.  The easiest thing is just to delete existing session.
  delete from sessions;
The term_node two-column table is updated in the 2005-03-21 update to use both columns in a key to avoid duplicates (ALTER TABLE term_node ADD PRIMARY KEY (tid,nid)).  There may be many duplicates and has to be fixed manually.  You can check if you'll be affected by this by runnin this sql statement in a mysql client:
  select nid, tid, count(1) as num
          from term_node
          group by nid,tid
          having num > 1
          order by num;
The following script should take care of it.  Just call it something like fix_term_node.php and invoke it before update.php:
<?php
// Connecting, selecting database
$link = mysql_connect('dbhost', 'dbname', 'dbpass')
    or die('Could not connect: ' . mysql_error());
echo 'Connected successfully<nowiki><br /></nowiki>';
mysql_select_db('dbname') or die('Could not select database');
// Get the entries with duplacates, run this query manually to sanity check count below
$query = "select nid, tid, count(1) as num
          from term_node
          group by nid,tid
          having num > 1
          order by num";
$result = mysql_query($query) or die('Query failed: ' . mysql_error());
if (mysql_num_rows($result) == 0) {
      echo "No rows found, nothing to do so am exiting";
      exit;
}
// Putting results in array elliminates the duplicates
while ($row = mysql_fetch_assoc($result)) {
      $t[$row["nid"]] = $row["tid"];
      $count++;
}
print "Target replace: $count<nowiki><br /></nowiki>";
// Comment this out when your ready
//exit();
// Step through the now unique entries and clean up the db
foreach ($t as $nid => $tid) {
      $delnode = "delete from term_node where tid = $tid and nid = $nid";
      $insnode = "insert into term_node (tid, nid) values ($tid, $nid)";
      #print "$delnode<nowiki><br /></nowiki>";
      #print "$insnode<nowiki><br /></nowiki>";
      $result = mysql_query($delnode) or die('Query failed ($delnode): ' . mysql_error());
      $result = mysql_query($insnode) or die('Query failed ($insnode): ' . mysql_error());
}
// Closing connection
mysql_close($link);
?>
=== 4.6.0 to 4.7.6 ===
There are several significant [http://drupal.org/node/57649 changes between 4.6.0 and 4.7.x highlighted on the Drupal site].  The biggest concern is about the $base_url and the impact of re-rooting a site on relative URLs in the site data.  Will let that be as it is for now an monitor the impact.
There were problems upgrading from 4.6.0 to 4.7.6.  The upgrade broke at the 4.7.0 boundary.  Upgrade to 4.6.6 first, the last 4.6.6 release prior to 4.7.0, and the go to 4.7.0 directly.  After that upgrade to 4.7.6.  See [http://drupal.org/node/109659#comment-256251 comment posted to this issue discussion at drupal.org].
=== Theme Upgrade ===
The atlabit xtemplate based theme was upgraded from 4.4.0 site instance to the 4.7.6 site instance.  The upgrade requires installing the [http://drupal.org/project/xtemplate xtemplate theme engine] for 4.7.6 since it was dropped from core at 4.7.0. 
The basic upgrade wasn't too difficult. After the xtemplate engine is install in the themes/engines subfolder the atlabit tree was copied in place.  The theme was enabled through the admin ui and it came up OK except for the header.  The logo was missing and so where the secondary and primary links.  The logo was easily fixed by correcting the xtemplate.xtmpl file which had the path coded in it.  The primary and secondary links were lost from the variable table and had to be manually moved forward from 4.4.0 to 4.7.6 db.  This didn't cause them to appear, though. 
Rather than debug the problem, their data was just directly placed in the xtemplate.xtmpl file for atlabit.  The real solution is to migrate the theme to the phptemplate engine instead, so we didn't want to invest the time on heavy debugging.
== Post Upgrade Notes ==
Having completed the @lab site upgrade to a reasonably functional state, the http://lab.ac.uab.edu site has been re-enabled.  This was done but defining a namebased vhost for just the ip hosting the atlab, it's possible to [http://httpd.apache.org/docs/2.0/vhosts/name-based.html mix and match name-based and ip-based vhosts] as long as you're dealing with distinct ip addresses.  The working dir for the upgrade was copied over to the document root for the atlab vhost.  Only the xtemplate.css file had to be changed to record the path differnce for the header background.
=== Open Issues ===
Open issues remain but the site is function.
* Filters still need to be installed to support the phpwiki formatting on many posts.
* Some paths for urls in the content sections were converted by the 4.7.0 upgrade to use the working dir of the upgrade.  These need to be identified and fixed.
* Subpages that were originally on the site may still need to be restored.
=== Problems ===
==== PHP maximum execute time exceeded ====
There have been some problems with the @lab site since the upgrade.  On July 27, 2007 the server's kernel dumped, a fully patched CentOS4 line system.  The postmortum seems to indicate a slow process leaking of httpd due to a PHP timeout limit being exceeded when querying the database:
PHP Fatal error:  Maximum execution time of 30 seconds exceeded in /var/www/vhosts/atlab/includes/database.mysql.inc on line 105
This seems to leave and httpd process hanging, or hanging around, and then the httpd parent spawns a replacement to keep the pool at 10.  Over time these processes fill up the system and the out of memory (resources) crash results.  There was only the one crash, but this makes sense as the cause.  It seems to be triggered by search engines crawling the site.  At first i thought search.uab.edu was causing the problem but it seems all search engines will hit on it eventually.
I've tried a few things to overcome this problem.  There is a [http://dev.mysql.com/doc/refman/4.1/en/slow-query-log.html log_slow_queries option] for mysqld that will record queries lasting longer the 10 seconds but there was never a query logged eventhough the above error reoccurred.
I found that the search index feature in Drupal was at 0% so I ran cron.php until it was fully indexed at 100% but the problem still remained.  Stepping through the cron.php revealed there is a data error in the variables table and reports an error with [http://us2.php.net/unserialize PHP's unserialize() function]:
  PHP Notice:  unserialize(): Error at offset 2 of 725 bytes in /var/www/vhosts/atlab/includes/bootstrap.inc on line 244, referer: http://www.google.com/
  PHP Notice:  unserialize(): Error at offset 2 of 172 bytes in /var/www/vhosts/atlab/includes/bootstrap.inc on line 244, referer: http://www.google.com/
R  PHP Notice:  unserialize(): Error at offset 149 of 149 bytes in /var/www/vhosts/atlab/includes/bootstrap.inc on line 244, referer: http://www.google.com/
The unserialize error presents itself predictably at cron.php runs and has some precident in a [http://drupal.org/node/49694 Drupal bug][http://drupal.org/node/157882 and image module issues].
The robots.txt file requests that /search not be followed, but I'm not sure if that's the path they're entering from.  Unfortunately the error is not accompanied by any useful debugging info, except the generic line number of the database wrapper script, i.e. no query or URL is logged.
The search feature for anonymous users (crawlers) has been turned off in the Drupal config for now to see if that avoids the problem and, therefore, isolates it to the search of the site and might imply complex internal data gathering.  This would be similar to a problem experienced in ConnoteaCode when search engines stepped through the /tag tree.
= System Configuration Notes =
== Getting Firefox to use Java on openSUSE 10.2 ==
Ensure Firefox can support Java as a plugin requires that you install Sun Java (1.5 preferred) via YaST.  Be sure and include the -plugin RPM, that's the one that contains the critical part.
After the YaST install completes run the following set of commands:
cd /usr/lib/firefox/plugins
ln -s /usr/lib/jvm/java-1.5.0-sun-1.5.0_update8/jre/plugin/i386/ns7/libjavaplugin_oji.so .
References:
* [http://www.suseforums.net/index.php?showtopic=36347 Manually installing the JRE]
= Emerging Docs =
[[UABgridBootCamp]]
[[AgileProgramming/xp]]

Latest revision as of 14:24, 27 October 2008

Supporting Local VO Rosources

The @lab contributes many of it's own resources to the VO. These pages will help document the configuration and management of those resources.

Drupal CMS

The Drupal CMS is a flexible web development platform that enables construction of all manner of web sites, in addition to its support for website-in-a-box configurations.

While Drupal is powerful it has it's limitations. One of it's biggest drawbacks is backwards compatibility. It's worth reading the Drupal team's stance on backward compatibility and the Drupal Upgrade Guide for a broader perspective on this.

The project takes the approach that they want to be free to adopt the latest practices without retaining the the weight of legacy code interfaces. They reserve the right to change interfaces between releases (and most often do). Drupal won't break data between releases, though and will provide an upgrade path for the core modules.

These restrictions mostly affect Drupal extensions. In otherwords the world for extension developers is one of constant change, making the utility of casual extensions questionable. Themes fall under this catagory, somewhat, so using a theme abstraction layer (Xtemplate, PHPTemplate) is advisable.

The impact of this approach is limited on UABgrid because Drupal is simply a stand-alone application. Rather than use many of the extensions available for Drupal, our system framework is based on intagration across applications. This makes the internal interfaces of Drupal less critical. We can build on the core functionality and focus our efforts, if necessary, on a limited set of extensions that are critical to our operations.

At some point, Drupal (and many other CMSes) will hopefully gain an stable interface similar to the approach taken by other operating systems.

Requirements

Drupal mainly requires PHP and a relational database. The specific requirements can be found on their site: [1]. Of interest is their transition from PHP4 to PHP5 as this impacts what version run on specific Linux releases using vendor-supplied packages.

Upgrading Drupal

Important -- before starting, please read the Drupal Upgrade Guide.

The @lab drupal site has aged significantly. It's not been updated since 4.3.x. In order to upgrade to the latest release in the 4.x line (4.7.6 as of this writing) all intermediate version update.php scripts need to be applied to the database. This is pretty easy to do. It just requires that you install the intermediate releases, configure them to point at the database, and step through the update.php scripts.

Links to older versions of drupal are not too hard to find, at least back to 4.5. The basic URL structure for the release page is http://drupal.org/drupal-x.x.x and the download file is http://drupal.org/files/projects/drupal-x.x.x.tar.gz. Before the 4.5.0 release, the release pages don't have well-known page names, seems Google is the best option here, just look for drupal-x.x.x, where the x's are the vesion number. The file structure changes, but it's still guessable, just replace "tar.gz" with "tgz".

For our upgrade path, the relevant release pages and download links are:

Some additional important notes for upgrading can be found on the drupal site http://drupal.org/upgrade/

UPDATE - the @lab drupal install was already at 4.4.0 as evidenced by the CHANGES.txt file, so that instance isn't needed. Jpr@uab.edu 17:21, 19 July 2007 (CDT)

Configuring instances for upgrade

The 4.4 and 4.5 release still use the includes/conf.php file. After that it switches to the site/default/settings.php approach. They all use the $db_url and $base_url configuration values, though so it should be easy to just plug in the original site values. The $base_url is optional in 4.7.6.

Perl script to change config across many versions for upgrading:


 #!/usr/bin/perl -w
 
 $sitedb = $ARGV[0];
 $siteurl = $ARGV[1];
 
 while (<STDIN>) {
  s/([\"\']).*([\'\"])/$1$sitedb$2/ if ( /^\$db_url =/ );
  s/([\"\']).*([\'\"])/$1$siteurl$2/ if ( /^\$base_url =/ );
  print;
 }

Name the above script "setsite.pl" and run it as follows:

 # cd to top-level of multi-drupal site dir
 for file in `grep -rl '^\$db_url =' *`
 do
    webpath=`echo $file | cut -d / -f 1`
    ./setsite.pl mysql://dbuser:dbpass@localhost/dbname \
        http://host/$webpath < $file > $file.new
    mv $file.new $file
 done
 

Performing the Upgrade

The upgrade is performed on a version-by-version basis. After running the above config scripts, go to the version-based URL and invoke the 'update.php' script. You'll need to turn of the auth access check at the start of this file (manually) in order to allow non-auth'd access to the script. This is OK only if you're doing this is an isolalated env protected from outsiders.

4.4.0 to 4.5.0

In order to prepare for later upgrades (specifically the 4.6.0 one which changes some core abstractions), you need to log into the existing 4.4.0 site and change to the default blue marine theme and disable non-core modules.

For the @lab site we don't have our bluemarine theme anymore. It's been replaced with a variant of Xtemplate. We can wait till after the 4.5.0 update to switch to bluemarine from the out-of-box 4.5.0 upgrade install.

The non-core themes need to be turned off now though because they won't be there in the out-of-box code used for the upgrades.

update system set status = 0 where type='module' and  name not in (
   'admin', 'aggregator' ,'archive' ,'block' ,'blog' ,'blogapi' ,'book' ,
   'comment' ,'drupal' ,'filter' ,'forum' ,'help' ,'legacy' ,'locale' ,
   'menu' ,'node' ,'page' ,'path' ,'ping' ,'poll' ,'profile' ,'queue' ,
   'search' ,'statistics' ,'story' ,'system' ,'taxonomy' ,'throttle' ,
   'tracker' ,'upload' ,'user', 'watchdog' );

Make sure your $base_url is set correctly in the config file. This depends on how you're invoking the update.php script. Keep this in mind if you are ssh-tunneling to the dev server from a remote location.

The 4.4.0 update.php script runs smoothly without error.

4.5.0 to 4.6.0

This is a more involved update due to some changes in architecture. See the update instructions on the drupal site for details. We basically follow the steps except that the modules were turned off earlier and the bluemarine theme was enabled after the 4.5.0 upgrade.

Before running the update.php script there are a couple of data errors that need to be corrected. The 2004-11-07 update changes the sessions table to avoid duplicates (ALTER TABLE sessions ADD PRIMARY KEY sid (sid)). This will cause update.php to fail. The easiest thing is just to delete existing session.

 delete from sessions;

The term_node two-column table is updated in the 2005-03-21 update to use both columns in a key to avoid duplicates (ALTER TABLE term_node ADD PRIMARY KEY (tid,nid)). There may be many duplicates and has to be fixed manually. You can check if you'll be affected by this by runnin this sql statement in a mysql client:

 select nid, tid, count(1) as num
          from term_node
          group by nid,tid
          having num > 1
          order by num;

The following script should take care of it. Just call it something like fix_term_node.php and invoke it before update.php:

<?php
// Connecting, selecting database
$link = mysql_connect('dbhost', 'dbname', 'dbpass')
   or die('Could not connect: ' . mysql_error());
echo 'Connected successfully<br />';
mysql_select_db('dbname') or die('Could not select database');

// Get the entries with duplacates, run this query manually to sanity check count below
$query = "select nid, tid, count(1) as num
          from term_node
          group by nid,tid
          having num > 1
          order by num";
$result = mysql_query($query) or die('Query failed: ' . mysql_error());

if (mysql_num_rows($result) == 0) {
     echo "No rows found, nothing to do so am exiting";
     exit;
}

// Putting results in array elliminates the duplicates
while ($row = mysql_fetch_assoc($result)) {
     $t[$row["nid"]] = $row["tid"];
     $count++;
}
print "Target replace: $count<br />";
// Comment this out when your ready
//exit();

// Step through the now unique entries and clean up the db
foreach ($t as $nid => $tid) {
     $delnode = "delete from term_node where tid = $tid and nid = $nid";
     $insnode = "insert into term_node (tid, nid) values ($tid, $nid)";
     #print "$delnode<br />";
     #print "$insnode<br />";
     $result = mysql_query($delnode) or die('Query failed ($delnode): ' . mysql_error());
     $result = mysql_query($insnode) or die('Query failed ($insnode): ' . mysql_error());
}

// Closing connection
mysql_close($link);
?>

4.6.0 to 4.7.6

There are several significant changes between 4.6.0 and 4.7.x highlighted on the Drupal site. The biggest concern is about the $base_url and the impact of re-rooting a site on relative URLs in the site data. Will let that be as it is for now an monitor the impact.

There were problems upgrading from 4.6.0 to 4.7.6. The upgrade broke at the 4.7.0 boundary. Upgrade to 4.6.6 first, the last 4.6.6 release prior to 4.7.0, and the go to 4.7.0 directly. After that upgrade to 4.7.6. See comment posted to this issue discussion at drupal.org.

Theme Upgrade

The atlabit xtemplate based theme was upgraded from 4.4.0 site instance to the 4.7.6 site instance. The upgrade requires installing the xtemplate theme engine for 4.7.6 since it was dropped from core at 4.7.0.

The basic upgrade wasn't too difficult. After the xtemplate engine is install in the themes/engines subfolder the atlabit tree was copied in place. The theme was enabled through the admin ui and it came up OK except for the header. The logo was missing and so where the secondary and primary links. The logo was easily fixed by correcting the xtemplate.xtmpl file which had the path coded in it. The primary and secondary links were lost from the variable table and had to be manually moved forward from 4.4.0 to 4.7.6 db. This didn't cause them to appear, though.

Rather than debug the problem, their data was just directly placed in the xtemplate.xtmpl file for atlabit. The real solution is to migrate the theme to the phptemplate engine instead, so we didn't want to invest the time on heavy debugging.

Post Upgrade Notes

Having completed the @lab site upgrade to a reasonably functional state, the http://lab.ac.uab.edu site has been re-enabled. This was done but defining a namebased vhost for just the ip hosting the atlab, it's possible to mix and match name-based and ip-based vhosts as long as you're dealing with distinct ip addresses. The working dir for the upgrade was copied over to the document root for the atlab vhost. Only the xtemplate.css file had to be changed to record the path differnce for the header background.

Open Issues

Open issues remain but the site is function.

  • Filters still need to be installed to support the phpwiki formatting on many posts.
  • Some paths for urls in the content sections were converted by the 4.7.0 upgrade to use the working dir of the upgrade. These need to be identified and fixed.
  • Subpages that were originally on the site may still need to be restored.

Problems

PHP maximum execute time exceeded

There have been some problems with the @lab site since the upgrade. On July 27, 2007 the server's kernel dumped, a fully patched CentOS4 line system. The postmortum seems to indicate a slow process leaking of httpd due to a PHP timeout limit being exceeded when querying the database:

PHP Fatal error:  Maximum execution time of 30 seconds exceeded in /var/www/vhosts/atlab/includes/database.mysql.inc on line 105

This seems to leave and httpd process hanging, or hanging around, and then the httpd parent spawns a replacement to keep the pool at 10. Over time these processes fill up the system and the out of memory (resources) crash results. There was only the one crash, but this makes sense as the cause. It seems to be triggered by search engines crawling the site. At first i thought search.uab.edu was causing the problem but it seems all search engines will hit on it eventually.

I've tried a few things to overcome this problem. There is a log_slow_queries option for mysqld that will record queries lasting longer the 10 seconds but there was never a query logged eventhough the above error reoccurred.

I found that the search index feature in Drupal was at 0% so I ran cron.php until it was fully indexed at 100% but the problem still remained. Stepping through the cron.php revealed there is a data error in the variables table and reports an error with PHP's unserialize() function:

 PHP Notice:  unserialize(): Error at offset 2 of 725 bytes in /var/www/vhosts/atlab/includes/bootstrap.inc on line 244, referer: http://www.google.com/
 PHP Notice:  unserialize(): Error at offset 2 of 172 bytes in /var/www/vhosts/atlab/includes/bootstrap.inc on line 244, referer: http://www.google.com/

R PHP Notice: unserialize(): Error at offset 149 of 149 bytes in /var/www/vhosts/atlab/includes/bootstrap.inc on line 244, referer: http://www.google.com/

The unserialize error presents itself predictably at cron.php runs and has some precident in a Drupal bugand image module issues.

The robots.txt file requests that /search not be followed, but I'm not sure if that's the path they're entering from. Unfortunately the error is not accompanied by any useful debugging info, except the generic line number of the database wrapper script, i.e. no query or URL is logged.

The search feature for anonymous users (crawlers) has been turned off in the Drupal config for now to see if that avoids the problem and, therefore, isolates it to the search of the site and might imply complex internal data gathering. This would be similar to a problem experienced in ConnoteaCode when search engines stepped through the /tag tree.

System Configuration Notes

Getting Firefox to use Java on openSUSE 10.2

Ensure Firefox can support Java as a plugin requires that you install Sun Java (1.5 preferred) via YaST. Be sure and include the -plugin RPM, that's the one that contains the critical part.

After the YaST install completes run the following set of commands:

cd /usr/lib/firefox/plugins
ln -s /usr/lib/jvm/java-1.5.0-sun-1.5.0_update8/jre/plugin/i386/ns7/libjavaplugin_oji.so .

References:

Emerging Docs

UABgridBootCamp

AgileProgramming/xp