eZ Community » Forums » Developer » eZ Find / Solr config for partial...
expandshrink

eZ Find / Solr config for partial search on latin chars

eZ Find / Solr config for partial search on latin chars

Friday 13 May 2011 5:16:29 pm - 2 replies

Hi,

We have an install with eZ find 2.3 configured with ReversedWildcardFilterFactory. It works good on partial wildcard searches, but we still have issues when searching for words that starts with an latin character, when the query is to, three or four characters long.

An example:

We have indexed an object named Ølbolle

Query - result:

øl - no result

ølb - no result

ølbo - no result

ølbol - no result

ølboll - ølbolle

ølbolle - ølbolle

When searchin for word starting with non-latin characters, this is not an issue.

Our indez-analyser is set up as follows:

<analyzer type="index">
 
<tokenizer class="solr.WhitespaceTokenizerFactory"/>
 
 
<filter class="solr.StopFilterFactory" ignoreCase="true" words="stopwords.txt" enablePositionIncrements="true" />
 
 
<filter class="solr.WordDelimiterFilterFactory" generateWordParts="1" generateNumberParts="1" catenateWords="1" catenateNumbers="1" catenateAll="0" splitOnCaseChange="1"/>
 
 
<filter class="solr.LowerCaseFilterFactory"/>
<filter class="ISOLatin1AccentFilterFactory"/>
 
<filter class="solr.ReversedWildcardFilterFactory" withOriginal="true" maxPosAsterisk="3" maxPosQuestion="2" maxFractionAsterisk="0.33"/>
</analyzer>

Any experience / suggestions greatly appreciated.

Modified on Friday 13 May 2011 6:47:51 pm by Bjørnar Grøtterud

Monday 16 May 2011 8:58:44 pm

Hello Bjørnar

I guess you mean that using normal wildcards (asterix) after the characters which are added in your template logic

The culprit is actually the ISOLatin filter factory: when doing wildcard searches, the analysis steps are not done.There is still no stable resolution in Solr (where the "problem" actually is). The same for lowercasing by the way.

If you really want to rely so much on wildcards (which I dont recommend either actually), best is to remove the ISOlatin "normalisation" as well (both index and query analysis steps)

Furthermore, I think you did not remove the stemming step for the query part of your text field type in schema.xml (otherwise you should not get a match for ølboll). You should absolutely remove the stemming part there as well.

It would be good that you email me your schema.xml for closer inspection

Cheers

Paul

Thursday 19 May 2011 9:42:11 am

Hi Paul,

thanks for your reply.

The reason why we have added wildcards, is that we found this as the only solution for partial search for both first and last part of a word. It actually works well, exept for latin chars.

Ill try your suggestion on removing the ISOLatin1AccentFilterFactory.

We have already removed the SnowballPorterFilterFactory from the query part in our schema.xml.

Ill send you our schema.xml, thanks for taking a look!

Best regards

Bjørnar

expandshrink

You must be logged in to post messages in this topic!

36 542 Users on board!

Forums menu

Proudly Developed with from