Thursday 12 July 2012 6:31:30 pm
In the previous installment in this series, we showed how to configure eZFind for high-availability.
Now we tackle the problem of load balancing...
Having set-up eZfind with replication and failover is fine and dandy, but what about scalability?
If your website is pushing a few thousand search requests per second, it might be a good idea to actually implement a Master/Slave solution with one Solr master Node and many slaves, where the master receives the indexation requests and the slaves receive the search requests.
As eZ Publish does not support natively this configuration (as it does e.g. for the Mysql database), a little bit of http magic is needed, to split all of the requests sent by eZ to Solr.
This is really not as scary as it sounds: all communication between the two parties happens via plain HTTP requests, and they're always initiated by the eZ side. This means that any stock reverse proxy with load-balancing capabilities will do the trick - Varnish, Squid, Nginx and Apache to name a few.
The only thing you need to know: the Solr communication protocol follows closely the http model, with GET requests used for searches and POST for updates. Alas, history showed us that that Jetty servlet container, shipped by default to run Solr with eZFind, has trouble with big GET requests, and it is not uncommon to find configuration where POST calls are used for searches as well (with all policy and language filters plus faceting options, the search request query string can get huge indeed). This means that our reverse proxy configuration will be based on URL analysis to decide where to route the requests.
And without further ado, here it is: a sample configuration for Apache (tested w. Apache 2.2). Note that you will need mod_proxy enabled for this, and a few other Apache modules used by mod_proxy itself.
Note that this is for 2 servers, and the Master server is used to received Search requests as well, but it should be easy enough to tailor to your needs.
########### # An Apache reverse-proxy configuration to load-balance Solr ########### # @todo test support for pinging backends for health # @bug https://issues.apache.org/bugzilla/show_bug.cgi?id=52402 # Definition of the vhost used as reverse proxy: we use a dedicated port number Listen 8983 <VirtualHost *:8983> # Not a Forward proxy ProxyRequests Off # Search requests: we balance them ProxyPass /solr/select balancer://solrcluster # All the rest: we send to master # (including the css/images used for admin console) ProxyPass /solr balancer://solrmaster # Definition of members in the upstream servers pool <Proxy balancer://solrcluster> BalancerMember http://<10.0.0.3>:8983/solr/select disablereuse=On BalancerMember http://<10.0.0.2>:8983/solr/select disablereuse=On </Proxy> <Proxy balancer://solrmaster> BalancerMember http://<10.0.0.2>:8983/solr </Proxy> <Proxy balancer://solrslaves> BalancerMember http://<10.0.0.3>:8983 </Proxy> # Allow external scripts to check/enable/disable cluster members # nb: it is of no use outside vhost, as it will report no info then <Location /balancer-manager> SetHandler balancer-manager Order Allow,Deny Allow from localhost 127 </Location> </VirtualHost>