eZ Community » Forums » eZ Publish 5 Platform » Search Service with Solr
expandshrink

Search Service with Solr

Search Service with Solr

Friday 15 August 2014 8:19:54 pm - 6 replies

Hi,

I know that this http://share.ez.no/forums/ez-publish-5-platform/search-with-ez-5 topic was posted awhile ago, however, I was wanting to find out what progress has been made with being able to use Solr as the search handler for the search service. According to that previous topic the handler was a prototype at that point, which is now over a year ago.

Saturday 16 August 2014 2:29:03 am

Hi Michael,

last status is that we got a bit stuck thanks to Location Search, and we are currently evaluating ElasticSearch to see if it's join system is more suitable for the problem, so far it seems to solve some issues but not all.

In short the problem is:

Location search currently has full search functionality and requires location to be the "root" of the search for it to work, this effectively means that every content object will have to be indexed 1+x times, where x is amount of locations.

A completely other issue is the current translation approach in new stack witch (as pointed out in recent post by Donat), requires all languages to be returned for fields, unless field filter which is currently not implemented is uses. So this means the documents will be large if you have many languages.

So some of the choices we have which we would like to get some feedback on (blog post planned for after vacation) is:

  • How many "search" features can we drop from locations search? We assume 99% use it as location list/tree fetch, so we assume we can drop quite some features there to either reduce the index duplication issue or remove it completely if we model location search differently
  • Would it be ok to launch it with a sub optimal index size first and improve it later?
  • Are there any use cases out there for searching against several languages at the same time?
    aka AND is the question here: AND( Field.x = "good morning", Field.y = "Bonjour"  ) 

Modified on Saturday 16 August 2014 2:30:58 am by André R

Monday 18 August 2014 5:43:36 pm

Hi André,

Thanks for the information. One thing that I did notice with the search is a huge use of memory being used for large result sets, and it does not seem to be a linear increase in memory usage either. I was able to narrow it down to the actual conversion of the search result hits to php objects (Content and Location) via the ContentService and the LocationService. While there is less memory being used for the LocationService, it is still a lot of memory being required. Is this something which should be expected or an issue which needs to be addressed?

Tuesday 19 August 2014 10:30:27 am

About memory usage: is it only on dev mode or also on prod ?

Also, is it bigger than the memory used for an equivalent content fetch (with/without SPI cache) ?

Tuesday 19 August 2014 2:41:37 pm

Besides Gaetano's hints about might spi persistence cache and logging (in dev mode) has something to do with it, if you/someone can pinpoint where this is happening and if possible suggestions on ways to improve it, then we would gladly do something about it. If this is indeed in the search service and not Persistence/Legacy, then it is an additional issue to the one Donat pointed out.

This would be of great help, as the core team does not have the same daily experience using the kernel out in the wild as you guys have.

Tuesday 19 August 2014 4:26:20 pm

I don't really know anything about how the SPI cache or how to determine memory when in prod mode since I am using the profiler toolbar to read the memory for overall page load. As for caching, I did memory comparisons after a cache clear while in dev mode in order to have an accurate read on the most amount of memory needed.

I'm current working on a project which has some pages which require a large amount of content/locations returned. There is one page which needs 387 items for page content in addition to the search pulls for the side menu. This page regularly goes over the php memory limit of 1024MB, especially after a cache clear.

As part of debugging, I started removing parts of the design. As I added each part back in, I checked the memory usage until I hit a "mystery" jump. The page starts out at ~140MB usage, and then if I were to add either the sidemenu or the 387 item pull, the page would jump to ~900MB usage even though the apparent memory for the 387 item search was only ~80MB (used "memory_get_usage" around the search). I did find the log call in the search service (both findLocations and findContent) and commented this out resulting in an overall reduction of memory usage by ~200MB. Using "findLocations" in the search to retrieve the 387 did decrease the memory usage down to ~12MB, however the page still overall used ~800MB.

Because of these results, I suspected that the issue may not actually be with the search service, but when the search results get converted into php object, either Content or Location. In order to test this, I made a static array of both the Content IDs and the Location IDs for the 387 items. In a custom controller I did a loop over the each id list and loaded the respective php object using either the LocationService or the ContentService wrapping "memory_get_usage" calls to get memory usage. The page without calls to either service starts at 140MB overall memory usage. When using the content service to load content in the loop, the memory usage is 50.75MB with a 613.8MB overall. Interestingly, when using the location service to load location in the loop, the memory usage reported is 0MB with a 236.8MB overall. While this does have a performance increase, in order to access any of the field values to the location, the content does have to be fetched.

NOTE: as a reminder all my testing was on page load after a cache clear, and the system is in dev mode with debug true. The version is 5.3 enterprise release.

Wednesday 20 August 2014 11:16:24 am

A few random notes:

1. to fetch the hundreds of objects per page which are used to build menus / side columns, I hope you are using subcontrollers which have dedicated http expiration rules. This way your page will be fast and use little memory for 99% of the accesses.

2. when measuring memory used for a code block between points A and B, please distinguish the peak memory used, which might happen at any point between A and B, and the delta mem(B) - mem(A). Maybe you are confusing the numbers you measure yourself with the ones given by the Sf profiler by using different scales

3. never design your code based on memory usage on dev mode. For a starter, you design your code based on need (location != content). Second, you only measure data in prod mode, not dev, as correlation is definitely not linear. Third, you generally have to worry more about performances of caches: making sure they are effective is generally a better goal than trying to optimize for the page-completely-uncached scenario. A dirty but effective trick to avoid the user-gets-the-cold-cached-page is to warm up the caches using cronjobs.

4. when testing code in loops, it might be a good test to trigger a php garbage collection at the end of each loop pass. This might have a beneficial effect - or none - depending on your specific code

5. about spi cache: the spi cache is used to cache each content and location php object and avoid requesting it from the db multiple times. By default it is on, enabled using the "stash" block of configuration in ezpublish.yml. It stores a copy of each cached object both on disk and in memory. You can try setting stash.caches.default.inmemory = false to see if it save you memory

6. cant you change the php.ini value on the dev machine so that php can use 2 or 3 gigs of ram?

7. last but not least: it seems to me that the "not enough memory to run a dev environment" problem is popping up frequently enough to warrant opening a feature request. Here you go: https://jira.ez.no/browse/EZP-23275

expandshrink

You must be logged in to post messages in this topic!

36 542 Users on board!

Forums menu

Proudly Developed with from