eZ Community » Blogs » Gaetano Giunta » Load balancing eZ Find for Fun and...

By

Load balancing eZ Find for Fun and Profit

Saturday 07 July 2012 4:19:51 pm

  • Currently 5 out of 5 Stars.
  • 1
  • 2
  • 3
  • 4
  • 5

A common question I encounter in my consulting work is: how can Solr be configured for achieving scalability and high-availability?

There seems to be a lot of confusion around this topic, and the solutions I find deployed vary a lot. Often they are far from optimal, and offer little in both availability and scalability.

I will present here a simple configuration that needs little else than stock Apache and Solr servers to achieve both high-availability scalability. Read on for the details...

Intro

Although eZ Publish supports so-called "clustered" configurations for absorbing high traffic loads and achieving high availability, little documentation/support is given out of the box for the search engine.

Multiple frontend webservers are easy to set up, using the eZDFS configuration.

The database can be set up in master-slave mode (only for mysql, or using RAC for Oracle), by telling eZ the IP addresses of all servers.

But for Solr, only one IP address can be used by eZFind.

One of the easiest solution to achieve high-availability of the search engine is to use the native replication capabilities of Solr.

Architecture

One Solr Master server, one Solr Slave server. Solr replicates the index from Master to Slave.

eZ Publish sends queries to only 1 server at any given time.

Configuration files

If you are using eZFind 2.7, you will find that the java/solr/conf/solrconfig.xml file already contains all the needed bits.

If you are running eZFind 2.6 or lower, you will need to edit that file. At the end of it, append the following code:

  <requestHandler name="/replication" class="solr.ReplicationHandler" >
 
  <lst name="master">
    <str name="enable">${enable.master:false}</str>
    <str name="replicateAfter">commit</str>
    <str name="replicateAfter">startup</str>
    <str name="replicateAfter">optimize</str>
    <str name="confFiles">elevate.xml</str>
  </lst>
 
  <lst name="slave">
    <str name="enable">${enable.slave:false}</str>
    <str name="masterUrl">http://${master.core.url:localhost\:8983}/${solr.core.name}/replication</str>
    <str name="pollInterval">${poll.time:'00:00:10'}</str>
  </lst>
 
  </requestHandler>

With this done, you are ready to run 2 solr servers side by side. You could run two processes on the same box, but that would add little in terms of fault-tolerance. So let's imagine you run the two Solr processes on 2 separate servers, both listening on the default port 8983.

Starting the processes

Master server: run

java -Denable.master=true -jar start.jar

Slave server: run

java -Denable.slave=true -Dmaster.core.url=<IP_OF_MASTER> -jar start.jar

If all goes well, you will have the two servers up and running.

The slave will poll the master every 10 seconds for new data (this can be changed by editing solrconfig.xml or by using -Dpoll.time on the slave command line).

The master server will replicate the index to the slave after commit and optimize operation.

You can check out that everything woks by connecting to the solr admin console on both servers. You will find a replication link at the top of the page, which can be used to check replication status. NB: if there is no data sent to solr the index will not replicate, so do not worry if you don't see it working as soon as the servers are started.

And on the eZ side?

eZ Publish will send all requests to the IP of the master.

To achieve real high-availability, you will need to work a bit more:

- set up a virtual IP that will be used by the Solr cluster to receive requests

- configure eZ to use that IP in solr.ini.appernd.php

- assign that IP to the Solr Master server

- put in place some polling mechanism that periodically checks if Solr is alive on master and, if not, fails over the IP to the slave

A couple more notes about failovers and failbacks:

- the 10 seconds polling interval means that there is a small chance that if master Solr fails, some content will not be updated in the index (ie. you can loose max 10 seconds of indexed data - unless you use delayed indexing, in which case it can be more)

- as soon as the VIP is failed over, the index gets updated on the Slave server and stops being updated on the Master server. If (when ) a failback happens, the Slave server will happily replicate again the Master's index, and forget all the indexed content it has acquired in the meantime. This means you probably want to prevent uncontrolled failbacks. For the most simple scenario, if your monitoring system sends you an email upon VIP failover, you can have the sysadmin log in to the Master server and disable the Solr service. For a proper failback later on, strategies include: either a full reindexation, a copy of the index from slave to master, or restarting both Solr processes with inverted command-line

Last words of advice:

- there is a current bug in Solr wrt. replication of dictionary data. A workaround will be posted shortly...

But what about scalability?

So far we only have augmented the availability of the setup, making sure the the search engine functionality can survive downtime of the search server.

To actually make eZ spread the load of search requests on multiple Solr server, a little bit of reverse proxy magic is needed. Keep on posted for the 2nd installment in this series...

Proudly Developed with from