eZ Community » Forums » Developer » Import XML Data Topic
expandshrink

Import XML Data Topic

Import XML Data Topic

Wednesday 07 December 2005 10:09:59 pm - 54 replies

I created this thread to discuss further evolutions (and issues) of the ImportXMLData contrib. This should help leaving the contrib comment area to message directly related to the use of the contrib and not implementation detail.

Best regards

Olivier

Wednesday 07 December 2005 10:20:07 pm

Salut,

I changed quite a few things on the ImportXML.
Some could be useful for everyone, like setting the publication date based on a xml field and dealing with a few more attribute types than what you did, other are quite specific (eg finding the parent's node based on a value in an xml field).

The big issue on my side was the memory: it just doesn't handle a file bigger than a few hundred records and xml fields (the xml path library seems to be quite sub-optimal to say the least and ez on the other hand...).

I had to reimplement it to run from the shell, and it worked like a charm!

Don't hesitate to contact me by mail if you want some of these features (I probably won't have the time to clean the hacks, but you might find a few things to reuse.)

Thanks for your extension !

X+

Wednesday 07 December 2005 11:17:32 pm

Xavier,

I already dropped you an email saying I was waiting for your changes but unfortunately you could not answer. I'll retry...

Olivier

Thursday 08 December 2005 7:53:57 am

"Known issues:" still shows "UTF-8 is not supported", while "Changelog" shows "1.4.1 .... - UTF-8 support "

Thursday 08 December 2005 9:16:08 am

Sorry, but

$fieldValue = utf8_decode($xPathEngine->wholeText("$path_item/$fieldName"."[1]"));

does not solve problem. I'm still getting ??? instead of UTF8 symbols.
I do not unerstand, why utf8_decode is used. As i understood from documentation http://lt.php.net/manual/en/function.utf8-decode.php , this function converts string to ISO-8859-1. After such convertion UTF8 symbols are lost...

Problem is that xml_parser reads file as iso-8859-1 and ignores encoding specified in xml file.

This problem should be fixed in xpathengine. I putted to XPatth.class.php line 1680:

      $parser = xml_parser_create('UTF-8');

This is only way i found, to get it working with utf-8 ...

 

Thursday 08 December 2005 10:57:43 pm

Now I got it vytis !

I actually tested utf8_decode with characters convertible to iso-8859-1 so it *seemed* to work.
For the moment situation is as follows:
if "is UTF8" checkbox is ticked it will use utf8_decode() else not.
This is broken so I will remove this checkbox asap - in the meantime do not tick it.

Best and only (known) way for now to import UTF-8 is yours.

I will change the doc and the code accordingly.

Olivier

Friday 09 December 2005 4:41:08 pm

Import function has limitations: i cannot import more than 340 records by one turn... sad.gif Emoticon

Tuesday 13 December 2005 2:21:03 pm

Is it any way to run import script from comman line?

Wednesday 14 December 2005 9:28:02 pm

Surely
I guess we need to add a file called import-cli.php that would parse the argument and call the

function &importXMLData( $xmldata, $datatype, $remove, $movetotrash) 

in importXMLDatafunctioncollection.

Of course context should be set appropriately if not I think the call:

$class =& eZContentClass::fetchByIdentifier( $identifiantClasse );

as many other kernel related eZ API calls.

I think we should have a look at how runcronjob.php is written.

Another option would be to wait for the Xavier hacks to this extension because I know he is using a script approach to run this extension.

Thursday 15 December 2005 8:02:01 am

I spent two days trying to write such code, but not successful... Finaly i found, that administrator updated php, but without mysql support...
I moved my site to another server, i will test my written code there. if it works, i will post it here.

Thursday 15 December 2005 9:36:54 am

Sorry Olivier,

I now I'm late, but I can't find the time to clean up the mess of custom things I've added. I'll try to do that this week-end, the delay is just ridiculous.

Thanks for your patience

X+

Thursday 15 December 2005 1:09:13 pm

Finally i made it! blunk.gif Emoticon now you can import data from commandline. I wrote additional script to read xml file and initialize ez object user in Olivier's script. I will clean debug prints, and will post it here.

Idea is great, but import script is very slow, it takes about 51s to import 100 entries.
I need to import ~23 000 entries.. With current speed of script it will take about 3.5 hours...

How fast is your import algorithm, Xavier?

Thursday 15 December 2005 2:55:23 pm

There is it:

<?php
//
// Created on: <2005-12-15 14:52:57 vytis>
//
// This file may be distributed and/or modified under the terms of the
// "GNU General Public License" version 2 as published by the Free
// Software Foundation and appearing in the file LICENSE included in
// the packaging of this file.
//
// This file is provided AS IS with NO WARRANTY OF ANY KIND, INCLUDING
// THE WARRANTY OF DESIGN, MERCHANTABILITY AND FITNESS FOR A PARTICULAR
// PURPOSE.
//
// The "GNU General Public License" (GPL) is available at
// http://www.gnu.org/copyleft/gpl.html.
//
// Contact licence@ez.no if any conditions of this licencing isn't clear to
// you.
//

include_once( 'lib/ezutils/classes/ezcli.php' );
include_once( 'kernel/classes/ezscript.php' );

$cli =& eZCLI::instance();
$script =& eZScript::instance( array( 'description' => ( "eZ publish Script Executor\n\n" .
                                                         "Allows execution of simple PHP scripts which uses eZ publish functionality,\n" .
                                                         "when the script is called all necessary initialization is done\n" .
                                                         "\n" .
                                                         "ezexec.php myscript.php" ),
                                      'use-session' => true,
                                      'use-modules' => true,
                                      'use-extensions' => true ) );

$script->startup();

$options = $script->getOptions( "",
                                "[scriptfile]",
                                array() );

if ( count( $options['arguments'] ) < 5 )
{
    $script->shutdown( 1, "Usage of import script:\n SiteAccess, \n XML file, \n datatype, \n remove? (0 - no, 1 - yes) , \n move to trash? (0 - no, 1 - yes),\n user's ID");
    die();
}

$script->setUseSiteAccess($options['arguments'][0]);

$options = $script->getOptions( "",
                                "[scriptfile]",
                                array() );
$script->initialize();


include_once ('extension/importXMLData/modules/importXMLData/importXMLDatafunctioncollection.php');
include_once ('kernel/classes/ezcontentclass.php');
 
$xmldata = file_get_contents($options['arguments'][1]);
importXMLDataFunctionCollection::importXMLData($xmldata, $options['arguments'][2], $options['arguments'][3], $options['arguments'][4], $options['arguments'][5]);

$script->shutdown();
?>

This need some modifications of import script:
1. I didn't login system. Instead of this, i put users ID as parameter in importXMLDatafunctioncollection.php:

	function &importXMLData( $xmldata, $datatype, $remove, $movetotrash, $userID)

So, you should delete from importXMLDatafunctioncollection.php:

		  $user =& eZUser::currentUser();
		  // set user ID 
		  $userID =& $user->attribute( 'contentobject_id' );

2. Additionaly, i put some debug print to see progres of import:
Iin the beginning of function:

 
$cli =& eZCLI::instance();

 

Then:

 
		$cli->output( "Preparing list for import" );
		$paths_item = $xPathEngine->match("//$listTag/$itemTag");
		$cli->output("List size: ".count($paths_item));


Then i changed:

 
	$ii=0;  
	foreach ($paths_item as $path_item) 
	{
		$ii++;
		if( bcmod($ii, 100) == 0)
		{
			$cli->output("\n imported $ii of ".count($paths_item) );
		}		
		foreach($fieldNameList as $fieldName) 
		{
 

Good luck.
Next, i'm going to make shell script to import multiple xml documents. I think this is usefull, when you need to import several thousands of records, because xPathEngine uses to much memory.

Friday 16 December 2005 9:40:57 am

Hi,

Yes, the import is dead slow and yes Xpath swallows all the memory it can find (and more). I modified a few things to release memory into Olivier's script.

I didn't properly benchmark, but it was very long. In my case, that was a one shot import, so didn't mattered too much.

X+

Friday 03 February 2006 3:02:47 pm

Hey.

I've tried to change the importer so that it's possible to import utf-8 files, but it doesn't work...

$parser = xml_parser_create('UTF-8');

doesnt't work

My problem is that I can't import chars like "ä", "ö", "ü"

Any ideas?? Thanks a lot...

Monday 06 February 2006 8:16:08 am

it should work.
I had similar problem, when i tested this extension. Problem can be, that your data file is saved not in UTF-8.
If you use windows, open data file with notepad, and save with different name, then you can choose UTF-8 encoding. If UTF-8 is selected by default in "save as" dialog, then your file is in UTF-8, if not - save it as UTF-8 and try to import new file.

Monday 06 February 2006 9:02:01 am

Hm, thanks for your reply, but it still doesn't work...

btw: I'm using eZ version 3.6.0

Monday 06 February 2006 9:32:19 am

i used it on ez 3.6.4. What you see instead of letter with umlauts?
Maybe UTF8 is not set on your template?

Monday 06 February 2006 10:05:17 am

Hm, ok, I can't write the symbols down here, they are changed into html letters... I made a screenshot:

http://www.philip-kahlen.de/import_error.gif

Modified on Monday 06 February 2006 10:16:57 am by Philip K.

Thursday 09 February 2006 8:37:17 am

Do you have on top of your templates

{*?template charset=utf-8?*}

I had several cases, when utf was not displayed because of missing that header.

I put this header to all templates of xmlimport extension.

But i still think, that your data file is in different encoding. for editing utf8 files i recomend notepad++ from sourceforge.net

Tuesday 14 February 2006 11:58:54 am

Hello,

I have a little question: This import system is compatible with the ezxmltext format ?

expandshrink

You must be logged in to post messages in this topic!

36 542 Users on board!

Forums menu