This site has been archived and you can no longer log in or post new messages. For up-to-date community resources please visit

eZ Community » Forums » Extensions » eZ Find » Solr Indexing Error

Solr Indexing Error

Solr Indexing Error

Tuesday 07 April 2009 2:04:14 pm - 3 replies

Hi there running ezfind2 indexation i notice some data are not indexed >_<

Doing some digging i found out that Solr::addDocs() got some serious issues

   function addDocs ( $docs = array(), $commit = true, $optimize = false  )
        if (! is_array( $docs ) )
        	echo("docs is not an array\n");
            return false;
        if ( count ( $docs ) == 0)
        	echo("docs is empty\n");
        	return false;
            $postString = '<add>';
            foreach ( $docs as $doc )
                $postString .= $doc->docToXML();
            $postString .= '</add>';
            $updateResult = $this->postQuery ( '/update', $postString, 'text/xml' );
			echo $updateResult;

This last echo output some java errors:

<meta http-equiv="Content-Type" content="text/html; charset=ISO-8859-1"/>
<title>Error 500 </title>
<body><h2>HTTP ERROR: 500</h2><pre>ParseError at [row,col]:[25,1]
Message: An invalid XML character (Unicode: 0xc) was found in the element content of the document. ParseError at [row,col]:[25,1]
Message: An invalid XML character (Unicode: 0xc) was found in the element content of the document.
  at org.apache.solr.handler.XmlUpdateRequestHandler.readDoc(
  at org.apache.solr.handler.XmlUpdateRequestHandler.processUpdate(
  at org.apache.solr.handler.XmlUpdateRequestHandler.handleRequestBody(
  at org.apache.solr.handler.RequestHandlerBase.handleRequest(
  at org.apache.solr.core.SolrCore.execute(
  at org.apache.solr.servlet.SolrDispatchFilter.execute(
  at org.apache.solr.servlet.SolrDispatchFilter.doFilter(
  at org.mortbay.jetty.servlet.ServletHandler$CachedChain.doFilter(
  at org.mortbay.jetty.servlet.ServletHandler.handle(
  at org.mortbay.jetty.servlet.SessionHandler.handle(
  at org.mortbay.jetty.handler.ContextHandler.handle(
  at org.mortbay.jetty.webapp.WebAppContext.handle(
  at org.mortbay.jetty.handler.ContextHandlerCollection.handle(
  at org.mortbay.jetty.handler.HandlerCollection.handle(
  at org.mortbay.jetty.handler.HandlerWrapper.handle(
  at org.mortbay.jetty.Server.handle(
  at org.mortbay.jetty.HttpConnection.handleRequest(
  at org.mortbay.jetty.HttpConnection$RequestHandler.content(
  at org.mortbay.jetty.HttpParser.parseNext(
  at org.mortbay.jetty.HttpParser.parseAvailable(
  at org.mortbay.jetty.HttpConnection.handle(
  at org.mortbay.thread.BoundedThreadPool$
<p>RequestURI=/solr/update</p><p><i><small><a href="">Powered by Jetty://</a></small></i></p><br/>

Obvisouly the generated xml is not parsable and the resulting content is not indexed !
The content object contains binary pdf files and images.

Anyone got a fix for EzFind stable?

Tuesday 07 April 2009 2:12:41 pm

I use both




the last is a shell script based on xpdf tool pdftotext

/usr/bin/pdftotext -enc "UTF-8" $1 -

Thursday 09 April 2009 7:15:04 am

Another option is to use the eZ Tika extension, which allows indexing of a large variety of binary file types like MsWord, MsOffice, PDF, Excel, ODF:


Saturday 09 May 2009 1:40:23 am

As it seems there is a utf8 doublebyte character being interpreted as two 8-bit characters - which is not correct.

I think i have the same issue here (german umlauts and stuff like that) and found a promising site which explains the tomcat/solr charset settings:

Perhaps this is the issue. xml cannot have those characters in it. so the solr-xml-parser crashes. can you try it out? currently i am not able to get access to a ez-installation.



You must be logged in to post messages in this topic!

36 542 Users on board!

Forums menu

Proudly Developed with from