eZ Community » Forums » Install & configuration » update to 3.8 and codepage problems
expandshrink

update to 3.8 and codepage problems

update to 3.8 and codepage problems

Thursday 04 May 2006 1:06:23 pm - 13 replies

Hi,

Have updated a website from exp 3.7.4 to 3.8 and now I have problems (probably) related with codepage.
I have ç, Ã, ó, ê instead of ç, à, ó, ê.

Before update to 3.8, have updated the database from "latin1_swedish_ci" collation to "utf-8" collation on all tables.
The new table (3.7.4 with utf-8) works fine in the 3.7.4 instalation without any problems.

Have problems just after update the database to 3.8

What is wrong ?

axel

Thursday 04 May 2006 1:55:28 pm

It seems like your eZ installation doesn't know the data in the database is UTF8.

Do you have this in site.ini.append:

[DatabaseSettings]
Charset=utf8

Thursday 04 May 2006 2:59:27 pm

yes, I have utf8 enabled in the site.ini.append.php

And I guess is not
Charset=utf8
but
Charset=utf-8

Both sites (ezp 3.7.4 and 3.8) have this setting enabled.

Works fine n 3.7.4 but not in 3.8

Also have Charset=utf-8 in i18n.ini.append.php

Modified on Thursday 04 May 2006 3:02:33 pm by Alexandre Cunha

Thursday 04 May 2006 3:05:58 pm

Do you see the message <b>SET NAMES 'utf8'</b> in the debug output?

Thursday 04 May 2006 3:15:39 pm

Thanks for your help Kristof,

Yes, I see <b>SET NAMES 'utf8'</b> on both sites (ezp 3.7.4 and 3.8).

What means that ?

Modified on Thursday 04 May 2006 3:16:32 pm by Alexandre Cunha

Thursday 04 May 2006 3:41:50 pm

That means that it's really using UTF8 to read the data from your database. So we aren't any further yet happy.gif Emoticon

Do you have the problem with those characters with both simple text line/text field attributes, as well as with ezxmltext attributes?

Thursday 04 May 2006 4:42:29 pm

 

Do you have the problem with those characters with both simple text line/text field attributes, as well as with ezxmltext attributes?
 

Nice question ....

Seems the problem is only in [XML block] (Message in the weblog class, Intro and Body in the Article class)
Titles (weblog and Article classes) are not affected by this problem.

Your question guides me to the solution ...

Looking to field "data_text" in the table "ezcontentobject_attribute":

some "data_text" begins with:

<?xml version="1.0" encoding="UTF-8"?>"

And this seams the problem because all new content have in the "data_text":

 	<?xml version="1.0" encoding="utf-8"?>

Have replace on sample content "iso-8859-1" by "utf-8" and solves the problem.

The next problem is: How to replace all "iso-8859-1" by "utf-8" ?

Modified on Thursday 04 May 2006 4:54:29 pm by Alexandre Cunha

Thursday 04 May 2006 6:10:45 pm

Some time ago I've written a command line script which we used to convert all our eZ databases to UTF8. It is made for running on a latin-1 (default) encoded database, so you will need to run it on the initial database. We've used it with success on 5 sites (2 rather big and 3 small ones).

First it goes looking for all attributes who use utf8 encoded XML in the db. It will fetch them and reinsert their content into the db with latin-1 encoding.

Then it uses the SQL command "CONVERT TO CHARACTER SET utf8" on all tables, which will convert all data to UTF8.

As a final step, it again goes through all attributes looking for latin-1 encoded XML (the xml encoding attribute is not right now, since we converted everything to utf8 in the previous step). It will replace the iso-8859-1 string with utf-8 and then reinserts the data. I know the way this step is done now is not 100% safe, because it replaces every instance of "iso-8859-1" with "utf-8". But that was no issue for us.

#!/usr/bin/env php
<?php

include_once( 'lib/ezutils/classes/ezcli.php' );
include_once( 'kernel/classes/ezscript.php' );
include_once( 'lib/ezdb/classes/ezdb.php' );

$cli =& eZCLI::instance();
$script =& eZScript::instance( array( 'description' => ( "Changes your eZ publish database tables to use UTF8" ),
                                      'use-session' => false,
                                      'use-modules' => false,
                                      'use-extensions' => true ) );

$script->startup();

$options = $script->getOptions();

$db =& eZDB::instance();

// first store XML saved as utf-8 back as iso-8859-1
$selectSQL = "SELECT id, version, data_text FROM ezcontentobject_attribute where data_text LIKE '<?xml%utf-8%' LIMIT 10";
$result = $db->arrayQuery( $selectSQL );

while ( count( $result ) > 0 )
{
    include_once( 'lib/ezxml/classes/ezxml.php' );

    foreach ( array_keys( $result ) as $i )
    {
        $cli->output( 'converting attribute ' . $result[$i]['id'] . ' version ' . $result[$i]['version'] );
        $value = $result[$i]['data_text'];

        $xml = new eZXML();
        $doc =& $xml->domTree( $value );

        if ( $doc )
        {
            $value = $doc->toString( 'iso-8859-1' );
            $success = $db->query( "UPDATE ezcontentobject_attribute SET data_text='" . $db->escapeString( $value ) . "' WHERE id=" . $result[$i]['id'] . " AND version=" . $result[$i]['version'] );

            if ( !$success )
            {
                $cli->output( 'unable to update the db.' );
            }
        }
        else
        {
            $cli->output( 'error while creating DOM document' );
        }
    }

    unset( $result );
    $result = $db->arrayQuery( $selectSQL );
}

$selectSQL = "SELECT id, version, data_text5 FROM ezcontentclass_attribute where data_text5 LIKE '<?xml%utf-8%' LIMIT 10";
$result = $db->arrayQuery( $selectSQL );

while ( count( $result ) > 0 )
{
    include_once( 'lib/ezxml/classes/ezxml.php' );

    foreach ( array_keys( $result ) as $i )
    {
        $cli->output( 'converting class attribute ' . $result[$i]['id'] . ' version ' . $result[$i]['version'] );
        $value = $result[$i]['data_text5'];

        $xml = new eZXML();
        $doc =& $xml->domTree( $value );

        if ( $doc )
        {
            $value = $doc->toString( 'iso-8859-1' );
            $success = $db->query( "UPDATE ezcontentclass_attribute SET data_text5='" . $db->escapeString( $value ) . "' WHERE id=" . $result[$i]['id'] . " AND version=" . $result[$i]['version'] );

            if ( !$success )
            {
                $cli->output( 'unable to update the db.' );
            }
        }
        else
        {
            $cli->output( 'error while creating DOM document' );
        }
    }

    unset( $result );
    $result = $db->arrayQuery( $selectSQL );
}

$tables = $db->arrayQuery( 'SHOW tables' );

foreach ( $tables as $table )
{
    $tableName = false;
    
    foreach( array_keys( $table ) as $i )
    {
         $tableName = $table[$i];
         break;  
    }
    
    if ( $tableName )
    {
        $cli->output( 'Changing table: ' . $tableName );
        $db->query( 'ALTER TABLE ' . $db->escapeString( $tableName ) . ' CONVERT TO CHARACTER SET utf8' );
    }
}

$selectSQL = "SELECT id, version, data_text FROM ezcontentobject_attribute where data_text LIKE '<?xml%iso-8859-1%' LIMIT 10";
$result = $db->arrayQuery( $selectSQL );

while ( count( $result ) > 0 )
{
    foreach ( array_keys( $result ) as $i )
    {
        $cli->output( 'converting attribute ' . $result[$i]['id'] . ' version ' . $result[$i]['version'] );
        $value = $result[$i]['data_text'];
        $value = str_replace( 'iso-8859-1', 'utf-8', $value );
        $success = $db->query( "UPDATE ezcontentobject_attribute SET data_text='" . $db->escapeString( $value ) . "' WHERE id=" . $result[$i]['id'] . " AND version=" . $result[$i]['version'] );

        if ( !$success )
        {
            $cli->output( 'unable to update the db.' );
        }
    }
    
    unset( $result );
    $result = $db->arrayQuery( $selectSQL );
}

$selectSQL = "SELECT id, version, data_text5 FROM ezcontentclass_attribute where data_text5 LIKE '<?xml%iso-8859-1%' LIMIT 10";
$result = $db->arrayQuery( $selectSQL );

while ( count( $result ) > 0 )
{
    foreach ( array_keys( $result ) as $i )
    {
        $cli->output( 'converting class attribute ' . $result[$i]['id'] . ' version ' . $result[$i]['version'] );
        $value = $result[$i]['data_text5'];
        $value = str_replace( 'iso-8859-1', 'utf-8', $value );
        $success = $db->query( "UPDATE ezcontentclass_attribute SET data_text5='" . $db->escapeString( $value ) . "' WHERE id=" . $result[$i]['id'] . " AND version=" . $result[$i]['version'] );

        if ( !$success )
        {
            $cli->output( 'unable to update the db.' );
        }
    }
    
    unset( $result );
    $result = $db->arrayQuery( $selectSQL );
}

$script->shutdown();

?>

If you want to run something similar on your database as it is now, you can try to use only the last step.

Modified on Thursday 04 May 2006 6:30:34 pm by Kristof Coomans

Thursday 04 May 2006 9:02:30 pm

Very nice script. Have you sumbit it in the "contibutions" section of this site ?

Because my problem is only a "find and replace" solution, I have do this:

 

    UPDATE ezcontentobject_attribute SET data_text = REPLACE(data_text, 'encoding="iso-8859-1"?>', 'encoding="utf-8"?>')
 

I put this here because can help other users.

Kristof, thank you very mutch for your help.

axel

Friday 05 May 2006 8:41:52 am

No, I didn't post it yet as a contribution. I first wanted to see what you thought about it happy.gif Emoticon

Your SQL query is much better (and faster) than my step 3 blunk.gif Emoticon Maybe we can change my script a bit so it uses your query instead of mine, and then post it as a contribution?

Thursday 11 May 2006 2:16:58 pm

Maybe I ill test it in the next 2 or 3 weeks when I update another site from ezp3.7 to ezp3.8

Maybe interest other users.

 

Tuesday 16 May 2006 1:49:26 pm

Hi,

I'm very interested in such a contribution happy.gif Emoticon

Tuesday 19 August 2008 6:22:14 pm

hi Kristof,

I am using your script to do the uft-8 conversion while upgrading 3.6.1 to 3.8.0. Scripts seems fine, no problems at all.

But in the debug on both adminstration and the public websites I am geting following message.

SET NAMES 'utf8'

I am running the script with the following sequence.

1. Changing the Charset=utf-8 in override/i18n.ini.append.php
[CharacterSettings]
Charset=utf-8

2. Leaving the Charset empty in override/site.ini.append.php
[DatabaseSettings]
Charset=

3. Then running the utf8 conversion script.

4. Running the other upgrade scripts for upgrading to 3.8.0

Can you please guide me, I am still getting the debug Notice.....

Cheers.

Modified on Tuesday 19 August 2008 7:07:40 pm by John Smith

Wednesday 20 August 2008 1:01:02 pm

Anyone please????

expandshrink

You must be logged in to post messages in this topic!

36 542 Users on board!

Forums menu

Proudly Developed with from