Load Balancing Apache and Plone
This How-to applies to:
All
This How-to is intended for:
Advanced Server Administrator
There are many solutions to this issue, this might be one of the neater ones ...
Purpose
With Plone and indeed other web application engines, it is possible to run a number of front-end servers in order to spread the load and effectively increase a web applications overall capacity. This implicitly provides a degree of fail-over should you need to take down one of the front-end servers or if one should fail.
Q. Why is this difficult, after all load balancing is a standard feature in Apache2?
A. Sure, however session multiplexing is not a standard feature and using raw load balancing will immediately break a Plone setup as soon as you try to log in.
The issue most people have is lack of access to the underlying Plone instances on the individual front-end servers, indeed on ISP based setups the individual instances are owned and run by third parties, so playing with login procedures in order to set session variables for Apache to use in load balancing is not really an option.
Prerequisities
You will need a number of Plone application servers (although this should work with *any* application server) and a recent copy of Apache 2.x. (I'm using 2.2.4 however earlier versions should also work) Your Apache server will need to be suitably configured to allow proxy pass-thru and URL rewriting - we won't cover this here.
Note that this solution 'ties' a session to the first front-end server that services it and only reverts to a different server should the servicing server fail. In this instance (typically) the user would find themselves unexpectedly logged out, unless you are sharing the SESSION information via ZEO, something that's not always as reliable as one might like.
Step by step
First, as a little background, we use this setup to run over 100 shares plone instances over six back-end zope servers on one ZEO server with many hundreds of thousands of visits per day. (so we think it works)
We run apache on a stand-alone Ubuntu Server (well, it's actually a Xen instance) running on the "Ubuntu Gutsy" version.
Pretty much all of our configuration goes into /etc/apache2/httpd.conf as follows;
Defining the nodes
The beauty of this solution is that you don't need to touch the Zope servers or the login processes in order to make Apache handle proper session stickyness. Let's define two Zope servers as follows;
<VirtualHost *:80>
ServerName node1
RewriteEngine On
RewriteRule . - [E=MYHOST:zope1]
RewriteRule . - [E=MYPORT:8080]
RewriteRule . - [CO=BALANCEID:balancer.%{ENV:MYHOST}:.%{HTTP:X-Forwarded-Server}:1200]
RewriteRule /(.*)$ http://%{ENV:MYHOST}:%{ENV:MYPORT}/$1 [L,P]
</VirtualHost>
<VirtualHost *:80>
ServerName node2
RewriteEngine On
RewriteRule . - [E=MYHOST:zope2]
RewriteRule . - [E=MYPORT:8080]
RewriteRule . - [CO=BALANCEID:balancer.%{ENV:MYHOST}:.%{HTTP:X-Forwarded-Server}:1200]
RewriteRule /(.*)$ http://%{ENV:MYHOST}:%{ENV:MYPORT}/$1 [L,P]
</VirtualHost
Each node forwards any request to it's respective Zope server, however prior to forwarding the request sets a cookie that can be referenced later by Apache to redirect subsequent requests from the same session to the same server.
Explaination
[E=MYHOST:zope1] Sets an environment variable called MYHOST to the value "zope1"
[E=MYPORT:8080] Sets MYPORT to be "8080"
CO= This sets a cookie called BALANCEID to be "balancer.MYHOST" within the cookie domain of
the requested server, i.e. what's in the X-Forwarded-Server HTTP header line .. and it sets an
expiry time of 1200 seconds on the cookie.
All in all it's fairly obvious once you get your head around how cookies are handled (i.e. per domain) and how Apache sets up a cookie and which parameters are needed. Strictly speaking we don't need to use variables MYHOST and MYPORT but in real life we relagate the last two lines to an "include" file and use it to shorten the spec for all six Zope instances, and it's also handy documentation.
Defining a Balancer
Apache's design decision re; how to implement balancers has a lot going for it, however ease of definition probably isn't right at the top of the list. Here's an example balancer which would reference the two nodes above.
<Proxy balancer://bronze>
BalancerMember http://node1 lbset=1 min=1 max=6 smax=16 loadfactor=1 timeout=10 retry=60 route=zope1
BalancerMember http://node1 lbset=1 min=1 max=6 smax=16 loadfactor=1 timeout=10 retry=60 route=zope2
BalancerMember http://backup lbset=2 min=1 max=4 smax=8 loadfactor=1 timeout=10 retry=60 status=+H
ProxySet lbmethod=byrequests stickysession=BALANCEID nofailover=Off maxattempts=2 timeout=20
ErrorDocument 503 /ERROR_503.html
Allow from all
</Proxy>
Note that we're referencing a "backup" node that we've not defined, however it would be defined in exacly the same way as zope1/2 above, the difference being we set lbset=2 which means it's only tried after all the rest (with lbset=1) have failed, and there's no stickysession defined so the server won't stick the session to the backup node if both zope1 and zope2 fail. (i.e. it will fail forward back onto zope1/2 as soon as they become available again)
Note also that the Cookie is set to "balancer.zope<n>" and when we come to specify the router in the balancer section, the router is simply "zope<n>".
Defining a balanced Virtual server
Ok, so now we have a balancer with a couple of sticky Zope servers, what do we do with them?
Here's a sample virtual host configuration that will typically live in /etc/apache2/sites-enabled (under Ubuntu).
<VirtualHost *:80>
ServerName linux.co.uk
DocumentRoot /tmp
RewriteRule ^/(.*) balancer://bronze/VirtualHostBase/http/%{HTTP_HOST}:80/plone/linux/VirtualHostRoot/$1 [L,P]
</VirtualHost>
Note that with Plone we make use of the inbuilt "VirtualHostMonster" to handle URL remapping within Zope itself. So effectively each request is making three passes at Apache.
The initial pass enters the virtual host definition and gets redirected to the balancer.
The second pass enters the balancer and one of two things happen. Either it picks a randon route if it has no session setup for that particular site, or it uses a pre-existing cookie to determine which zope server to forward requests on to.
The third pass activates either zope1 or zope2 and actually forwards the request to the appropriate Zope server, so Apache does all the session tracking work for you without any modifications to Zope, Plone or anything else.
Further information
There's lots of cryptic documentation out on the web about how to do things, I think the main "Plone" examples involve modifying the login process and setting cookies the first time you log in. This is fine for a single application but a non-starter when you're talking about mass Plone hosting.
Useful points of reference;
- Apache2 mod_proxy
- Apache2 mod_proxy_balancer
- Another approach from Plone.org
- Some recent postings on the subject from Plone.org

