eZ Community » Forums » Extensions » eZ Find » Facets : Truncated values on ezstring
expandshrink

Facets : Truncated values on ezstring

Facets : Truncated values on ezstring

Monday 12 July 2010 12:33:20 pm - 22 replies

Hello,

I am trying to configure ezfind on a website but when i ask for "class/attribute" facet, on a ezstring datatype, all results are truncated.

For exemple if you have keyword "my-house" you get results :

  • my
  • house
  • hous

I understand the possible benefit of this in certain cases but how can i modify this behavior ? Is there a way to tell solr "Hey please disable this truncate word feature".

Modified on Wednesday 01 June 2011 12:41:33 pm by H-Works Agency

Monday 12 July 2010 6:18:31 pm

The problem (or feature) you experience is the result of how Solr tokenizes text. There is a word delimiter filter while indexing which breaks down the words with 'dash' in it. These tokens are then used for faceting.

There are new functionalities in ezfind 2.2 regarding this (using special fields for faceting) but I didn't explore it yet.

But you can always tune schema.xml happy.gif Emoticon

Tuesday 13 July 2010 12:20:34 am

Indeed, in ezfind 2.2 you can define dedicated field types for attributes in a facet context ... this was introduced exactly for having both meaningful search results (in that case you usually want this "break up"blunk.gif Emoticon and facets/sorting (where you want verbatim strings).

What datatype is used for keywords? You are using either eZ Find 2.0 or eZ Find 2.1+ with a text field judging from your results

Paul

Monday 26 July 2010 3:17:47 pm

Thank you for those informations.

This solr query syntax looks very powerful.

Modified on Thursday 07 October 2010 6:49:24 pm by H-Works Agency

Tuesday 27 July 2010 2:45:30 pm

In case you do want to tune schema.xml, here is the information you need. Leave in the line:

<dynamicField name="*_t" type="text" indexed="true" stored="true" multiValued="true" termVectors="true"/>

but add a definition underneath that one for your own field, and replace type="text" in type="long", e.g.:

<field name="attr_dc_coverage_t" type="long" indexed="true" stored="true" multiValued="true"/>

Tuesday 05 October 2010 7:18:22 pm

Hello everyone and thank you for the answers.

For example my facets results for a city attribute is "Paris, Pari" But the "s" letter is not a word separator isn't it ?

I tried Sebastiaan answer by adding :

<field name="attr_ville_t" type="long" indexed="true" stored="true" multiValued="true"/>

just after the mentionned line but it doesn't change anything sad.gif Emoticon

Modified on Wednesday 01 June 2011 12:39:48 pm by H-Works Agency

Tuesday 05 October 2010 8:18:26 pm

This looks interesting too: on http://wiki.apache.org/solr/Analy...ters#solr.WordDelimiterFilterFactory check out the entry for
solr.WordDelimiterFilterFactory, which has an option preserveOriginal="1", which causes the original token to be indexed without modifications (in addition to the tokens produced due to other options).

for example:

<fieldtype name="subword" class="solr.TextField">       <analyzer type="query">           <tokenizer class="solr.WhitespaceTokenizerFactory"/>           <filter class="solr.WordDelimiterFilterFactory"                 generateWordParts="1"                 generateNumberParts="1"                 catenateWords="0"                 catenateNumbers="0"                 catenateAll="0"                 preserveOriginal="1"                 />
etc...

Wednesday 06 October 2010 2:09:25 am

You can control the way content is indexed by defining a mapping between ez-datatypes and solr field-types. This can be configured in ezfind.ini[.append.php] independently for searching, sorting, faceting and filtering. For faceting the solr-field-type "string" is probably what you want.

[SolrFieldMapSettings]
# this is the configuration for searching
DatatypeMap[ezstring]=text
...
 
# for sorting
DatatypeMapSort[]
DatatypeMapSort[ezstring]=string
...
 
# for faceting 
DatatypeMapFacet[]
DatatypeMapFacet[ezstring]=string
...
 
# for filtering
DatatypeMapFilter[]
DatatypeMapFilter[ezstring]=string
..

Remember to run updatesearchindexsolr.php after you make these changes. hope this helps.

Modified on Wednesday 06 October 2010 2:10:18 am by Patrick Kaiser

Wednesday 06 October 2010 10:47:02 am

Damn still not working sad.gif Emoticon

I added those variables in ezfind.ini (which seems to be cleaner than modifying system wide schema.xml) :

  • DatatypeMap[ezstring]=string
  • DatatypeMapSort[ezstring]=string
  • DatatypeMapFilter[ezstring]=string
  • DatatypeMapFacet[ezstring]=string
  • Default=string

Then rerun updatesearchindexsolr.php -s $siteaccess_name --clean-all

My ezstring attribute still return facets truncated values.

Modified on Wednesday 06 October 2010 5:39:11 pm by H-Works Agency

Thursday 07 October 2010 9:16:09 am

Martin, did you also try the option below in scheme.xml?

preserveOriginal="1"

Modified on Thursday 07 October 2010 9:16:45 am by S V

Thursday 07 October 2010 11:44:21 am

I tried to put this everywhere but tweaking of ezfind.ini or schema.xml seems to have no effect on what solr or ezfind returns.

Even deleting or bugging schema.xml doesn't change anything : After running "updatesearchindexsolr.php" all facets results remains the same !!!

Could someone tell me which schema.xml do we have to edit ? Here is the list i found :

  • ./java/solr/conf/schema.xml
  • ./java/solr.multicore/eng-GB/conf/schema.xml
  • ./java/solr.multicore/fre-FR/conf/schema.xml
  • ./java/solr.multicore/nor-NO/conf/schema.xml

None of those seems to be used ? if i delete all those files nothing changes.

Modified on Monday 18 October 2010 3:41:13 pm by H-Works Agency

Monday 18 October 2010 4:03:48 pm

Two quick checks:

Did you restart solr after editing schema.xml?
Did you delete your previous index first and then commit?

Monday 18 October 2010 5:04:30 pm

Thank you. In fact i haven't restarted solr.

What do you mean by deleting previous index and commit ? Commit = restart solr with new schema.xml ?

When i add my directive : <field name="attr_ville_t" type="long" indexed="true" stored="true" multiValued="true"/>

Then solr is crashing : (curl error 7)

Monday 18 October 2010 5:52:27 pm

Hello Patrick,

What does those modifications on ezfind.ini are supposed to do ?

Are they supposed to modify the way facets are returned through DatatypeMapFilter[] ?

I really don't get it as nothing ever change no matter what i modify in this file.

Monday 18 October 2010 6:00:20 pm

This looks interesting too: on http://wiki.apache.org/solr/Analy...ters#solr.WordDelimiterFilterFactory check out the entry for
solr.WordDelimiterFilterFactory, which has an option preserveOriginal="1", which causes the original token to be indexed without modifications (in addition to the tokens produced due to other options).

for example:

<fieldtype name="subword" class="solr.TextField">       <analyzer type="query">           <tokenizer class="solr.WhitespaceTokenizerFactory"/>           <filter class="solr.WordDelimiterFilterFactory"                 generateWordParts="1"                 generateNumberParts="1"                 catenateWords="0"                 catenateNumbers="0"                 catenateAll="0"                 preserveOriginal="1"                 />
etc...

This is what i get after adding "preserveOriginal="1" to schema.xml on line 221 (then restarting solr, then removing extension/ezfind/java/(...)/data/*, then rerunning updatesolrindex) :

<body><h2>HTTP ERROR: 500</h2><pre>Severe errors in solr configuration.  Check your log files for more detailed information on what may be wrong.  If you want solr to continue after configuration errors, change:    &lt;abortOnConfigurationError&gt;false&lt;/abortOnConfigurationError&gt;  in solr.xml

Monday 18 October 2010 6:10:55 pm

if you follow my directions then there should be no need to even touch the schema.xml.

If you didnt configure multicore solr, then your schema.xml ist this one: ./java/solr/conf/schema.xml

Before proceeding replace your messed up one with the orginal file. restart solr and make sure solr runs.

Then really make sure the attribute you want to facet on is of type ezstring (perhaps you are using eztext or something?). You could also try the ezkeyword datatype which should work "out of the box".

did this help?

Monday 18 October 2010 6:31:51 pm

Ok Patrick.

Thanks all those informations are a great help for me to finally being able to use ezfind on production projects.

My attribute is a simple "ezstring" attribute holding city names.

If i just add : DatatypeMapFacet[ezstring]=lckeyword in ezfind.ini then its not changing anything. after rerunning updatesolrindex.

My results are still truncated like this :

  • Paris become Pari
  • Rennes become Renn
  • ...etc

Do i need to insert : DatatypeMapFacet[ezstring]=string ?

Modified on Monday 18 October 2010 7:04:45 pm by H-Works Agency

Monday 18 October 2010 7:13:02 pm

I meant you should try add a new field of type ezkeyword to your class in addition to your existing city attribute. edit a few objects and add content in the new keyword field.

adjust ezfind.ini:

DatatypeMapFacet[]
DatatypeMapFacet[ezstring]=string
DatatypeMapFacet[ezkeyword]=lckeyword

clear the cache and control in admin interface if the siteaccess you are using for your facet tests uses the right settings for ezfind.ini (it really seems that the settings for ezfind.ini are not used).

then rerun updatesearchindexsolr.php -s YOUR_SITEACCESS --clean-all

then you can try try faceting on both fields, actually both should work.

Tuesday 19 October 2010 7:57:42 pm

You were right i had a loading problem with my ezfind.ini....extension loading order problem...sad.gif Emoticon

Now everything works with your directives !

Thanx a lot !

Modified on Tuesday 19 October 2010 7:58:38 pm by H-Works Agency

Friday 01 April 2011 1:08:25 pm

Hi everybody,

I have the same problem with an ezobjectrelationlist attribute.
Word space are considered not as a "normal charachter", but as a separator char.
I tried setting up ezfind.ini in this way:
DatatypeMap[ezobjectrelationlist]=text
DatatypeMapSort[ezobjectrelationlist]=string
DatatypeMapFacet[ezobjectrelationlist]=string
DatatypeMapFilter[ezobjectrelationlist]=string

I set preserveOriginal="1" on

<fieldType name="text" class="solr.TextField" positionIncrementGap="100">...

<filter class="solr.WordDelimiterFilterFactory" generateWordParts="1" generateNumberParts="1" catenateWords="0" catenateNumbers="0" catenateAll="0" splitOnCaseChange="1" preserveOriginal="1"/>...

I restart solr, reindex, but facets of content type ezobjectrelationlist are truncated.
Could someone help me?

Bye

Friday 01 April 2011 4:15:42 pm

I solved using http://projects.ez.no/ezfsolrdocumentfieldobjectrelation and making some modifications to it.

Now my problem is filtering with white space words...

Bye

expandshrink

You must be logged in to post messages in this topic!

36 542 Users on board!

Forums menu

Proudly Developed with from