eZ Community » Forums » General » Convert from iso-8859-1 encoding to...
expandshrink

Convert from iso-8859-1 encoding to utf-8

Convert from iso-8859-1 encoding to utf-8

Thursday 18 May 2006 3:45:30 pm - 19 replies

Hi,

I have searched the forums and documentation on how to convert my database from iso-8859-1 encoding to utf-8 without luck.

Can someone give me some pointers to where such info is located?

The site is running eZ Publish 3.8 on Apache 2.0.55, PHP 4.4.2 and MySQL 4.1.19

Modified on Tuesday 11 January 2011 9:56:57 am by Nicolas Pastorino

Thursday 18 May 2006 5:46:59 pm

Hi Hans-Henry

You probably know about the script I posted in the forum ( http://ez.no/community/forum/inst.../update_to_3_8_and_codepage_problems ).

You can save it into bin/php/dbconvertutf8.php and then run it like any other CLI script that comes with eZ publish. Of course run it on a test installation first, do not immediately use it on your live portal(s) blunk.gif Emoticon

Then in i18n.ini.append, configure the right charset:

[CharacterSettings]
Charset=utf8

And make sure that in site.ini.append, the Charset setting under the DatabaseSettings group is left empty (then the charset from i18n will be used):

[DatabaseSettings]
Charset=

Let us know if it worked.

Good luck!

Friday 19 May 2006 8:12:05 am

Hi,

I had similar problems as Alexandre Cunha had in the post you suggested.
The Norwegian letters Æ, Ø and Å were similar to ç, Ã, ó so I had to revert to a backup version of the database. I had cleared cache and debug reported SET NAMES 'utf8'.

Any suggestions how to fix this?

Friday 19 May 2006 8:43:59 am

First revert the charset configuration. Then run the script. And finally set the charset back to utf-8.

Tuesday 06 June 2006 9:28:24 am

hallo,

how can I write in Greek ( Greece ) letters?

Tuesday 06 June 2006 10:42:09 am

Hello George

Welcome to the eZ community!

What exactly do you mean with "writing in Greece"? Do you want the user to be able to input Greece characters into content attributes? Please start another topic then and try to explain clearly what you want to achieve. This topic is about converting an existing eZ publish database from iso-8859-1 to utf-8.

Thank you!

Friday 23 June 2006 9:32:26 pm

Hi Hans-Henry

If you are still having difficulties with this, you might like to try this sql which if applied to a 3.8 site does the entire conversion. It's rather nice in that it does the whole thing with sql. I've run this on a number of installs now to convert to utf-8 and it's worked for me. After some testing, I just ran it on a 4-language site and it has converted everything cleanly. I think it should be easily adaptable to 3.7, just make sure all the tables are listed, although I haven't tried it.

Of course, make sure you keep backups!

ALTER TABLE ezapprove_items CONVERT TO CHARACTER SET utf8 COLLATE utf8_unicode_ci;
ALTER TABLE ezbasket CONVERT TO CHARACTER SET utf8 COLLATE utf8_unicode_ci;
ALTER TABLE ezbinaryfile CONVERT TO CHARACTER SET utf8 COLLATE utf8_unicode_ci;
ALTER TABLE ezcollab_group CONVERT TO CHARACTER SET utf8 COLLATE utf8_unicode_ci;
ALTER TABLE ezcollab_item CONVERT TO CHARACTER SET utf8 COLLATE utf8_unicode_ci;
ALTER TABLE ezcollab_item_group_link CONVERT TO CHARACTER SET utf8 COLLATE utf8_unicode_ci;
ALTER TABLE ezcollab_item_message_link CONVERT TO CHARACTER SET utf8 COLLATE utf8_unicode_ci;
ALTER TABLE ezcollab_item_participant_link CONVERT TO CHARACTER SET utf8 COLLATE utf8_unicode_ci;
ALTER TABLE ezcollab_item_status CONVERT TO CHARACTER SET utf8 COLLATE utf8_unicode_ci;
ALTER TABLE ezcollab_notification_rule CONVERT TO CHARACTER SET utf8 COLLATE utf8_unicode_ci;
ALTER TABLE ezcollab_profile CONVERT TO CHARACTER SET utf8 COLLATE utf8_unicode_ci;
ALTER TABLE ezcollab_simple_message CONVERT TO CHARACTER SET utf8 COLLATE utf8_unicode_ci;
ALTER TABLE ezcontentbrowsebookmark CONVERT TO CHARACTER SET utf8 COLLATE utf8_unicode_ci;
ALTER TABLE ezcontentbrowserecent CONVERT TO CHARACTER SET utf8 COLLATE utf8_unicode_ci;
ALTER TABLE ezcontentclass CONVERT TO CHARACTER SET utf8 COLLATE utf8_unicode_ci;
ALTER TABLE ezcontentclassgroup CONVERT TO CHARACTER SET utf8 COLLATE utf8_unicode_ci;
ALTER TABLE ezcontentclass_attribute CONVERT TO CHARACTER SET utf8 COLLATE utf8_unicode_ci;
ALTER TABLE ezcontentclass_classgroup CONVERT TO CHARACTER SET utf8 COLLATE utf8_unicode_ci;
ALTER TABLE ezcontentobject CONVERT TO CHARACTER SET utf8 COLLATE utf8_unicode_ci;
ALTER TABLE ezcontentobject_attribute CONVERT TO CHARACTER SET utf8 COLLATE utf8_unicode_ci;
ALTER TABLE ezcontentobject_link CONVERT TO CHARACTER SET utf8 COLLATE utf8_unicode_ci;
ALTER TABLE ezcontentobject_name CONVERT TO CHARACTER SET utf8 COLLATE utf8_unicode_ci;
ALTER TABLE ezcontentobject_tree CONVERT TO CHARACTER SET utf8 COLLATE utf8_unicode_ci;
ALTER TABLE ezcontentobject_version CONVERT TO CHARACTER SET utf8 COLLATE utf8_unicode_ci;
ALTER TABLE ezcontent_language CONVERT TO CHARACTER SET utf8 COLLATE utf8_unicode_ci;
ALTER TABLE ezcurrencydata CONVERT TO CHARACTER SET utf8 COLLATE utf8_unicode_ci;
ALTER TABLE ezdiscountrule CONVERT TO CHARACTER SET utf8 COLLATE utf8_unicode_ci;
ALTER TABLE ezdiscountsubrule CONVERT TO CHARACTER SET utf8 COLLATE utf8_unicode_ci;
ALTER TABLE ezdiscountsubrule_value CONVERT TO CHARACTER SET utf8 COLLATE utf8_unicode_ci;
ALTER TABLE ezenumobjectvalue CONVERT TO CHARACTER SET utf8 COLLATE utf8_unicode_ci;
ALTER TABLE ezenumvalue CONVERT TO CHARACTER SET utf8 COLLATE utf8_unicode_ci;
ALTER TABLE ezforgot_password CONVERT TO CHARACTER SET utf8 COLLATE utf8_unicode_ci;
ALTER TABLE ezgeneral_digest_user_settings CONVERT TO CHARACTER SET utf8 COLLATE utf8_unicode_ci;
ALTER TABLE ezimage CONVERT TO CHARACTER SET utf8 COLLATE utf8_unicode_ci;
ALTER TABLE ezimagefile CONVERT TO CHARACTER SET utf8 COLLATE utf8_unicode_ci;
ALTER TABLE ezimagevariation CONVERT TO CHARACTER SET utf8 COLLATE utf8_unicode_ci;
ALTER TABLE ezinfocollection CONVERT TO CHARACTER SET utf8 COLLATE utf8_unicode_ci;
ALTER TABLE ezinfocollection_attribute CONVERT TO CHARACTER SET utf8 COLLATE utf8_unicode_ci;
ALTER TABLE ezkeyword CONVERT TO CHARACTER SET utf8 COLLATE utf8_unicode_ci;
ALTER TABLE ezkeyword_attribute_link CONVERT TO CHARACTER SET utf8 COLLATE utf8_unicode_ci;
ALTER TABLE ezmedia CONVERT TO CHARACTER SET utf8 COLLATE utf8_unicode_ci;
ALTER TABLE ezmessage CONVERT TO CHARACTER SET utf8 COLLATE utf8_unicode_ci;
ALTER TABLE ezmodule_run CONVERT TO CHARACTER SET utf8 COLLATE utf8_unicode_ci;
ALTER TABLE ezmultipricedata CONVERT TO CHARACTER SET utf8 COLLATE utf8_unicode_ci;
ALTER TABLE eznode_assignment CONVERT TO CHARACTER SET utf8 COLLATE utf8_unicode_ci;
ALTER TABLE eznotificationcollection CONVERT TO CHARACTER SET utf8 COLLATE utf8_unicode_ci;
ALTER TABLE eznotificationcollection_item CONVERT TO CHARACTER SET utf8 COLLATE utf8_unicode_ci;
ALTER TABLE eznotificationevent CONVERT TO CHARACTER SET utf8 COLLATE utf8_unicode_ci;
ALTER TABLE ezoperation_memento CONVERT TO CHARACTER SET utf8 COLLATE utf8_unicode_ci;
ALTER TABLE ezorder CONVERT TO CHARACTER SET utf8 COLLATE utf8_unicode_ci;
ALTER TABLE ezorder_item CONVERT TO CHARACTER SET utf8 COLLATE utf8_unicode_ci;
ALTER TABLE ezorder_status CONVERT TO CHARACTER SET utf8 COLLATE utf8_unicode_ci;
ALTER TABLE ezorder_status_history CONVERT TO CHARACTER SET utf8 COLLATE utf8_unicode_ci;
ALTER TABLE ezpackage CONVERT TO CHARACTER SET utf8 COLLATE utf8_unicode_ci;
ALTER TABLE ezpaymentobject CONVERT TO CHARACTER SET utf8 COLLATE utf8_unicode_ci;
ALTER TABLE ezpdf_export CONVERT TO CHARACTER SET utf8 COLLATE utf8_unicode_ci;
ALTER TABLE ezpending_actions CONVERT TO CHARACTER SET utf8 COLLATE utf8_unicode_ci;
ALTER TABLE ezpolicy CONVERT TO CHARACTER SET utf8 COLLATE utf8_unicode_ci;
ALTER TABLE ezpolicy_limitation CONVERT TO CHARACTER SET utf8 COLLATE utf8_unicode_ci;
ALTER TABLE ezpolicy_limitation_value CONVERT TO CHARACTER SET utf8 COLLATE utf8_unicode_ci;
ALTER TABLE ezpreferences CONVERT TO CHARACTER SET utf8 COLLATE utf8_unicode_ci;
ALTER TABLE ezproductcategory CONVERT TO CHARACTER SET utf8 COLLATE utf8_unicode_ci;
ALTER TABLE ezproductcollection CONVERT TO CHARACTER SET utf8 COLLATE utf8_unicode_ci;
ALTER TABLE ezproductcollection_item CONVERT TO CHARACTER SET utf8 COLLATE utf8_unicode_ci;
ALTER TABLE ezproductcollection_item_opt CONVERT TO CHARACTER SET utf8 COLLATE utf8_unicode_ci;
ALTER TABLE ezrole CONVERT TO CHARACTER SET utf8 COLLATE utf8_unicode_ci;
ALTER TABLE ezrss_export CONVERT TO CHARACTER SET utf8 COLLATE utf8_unicode_ci;
ALTER TABLE ezrss_export_item CONVERT TO CHARACTER SET utf8 COLLATE utf8_unicode_ci;
ALTER TABLE ezrss_import CONVERT TO CHARACTER SET utf8 COLLATE utf8_unicode_ci;
ALTER TABLE ezsearch_object_word_link CONVERT TO CHARACTER SET utf8 COLLATE utf8_unicode_ci;
ALTER TABLE ezsearch_return_count CONVERT TO CHARACTER SET utf8 COLLATE utf8_unicode_ci;
ALTER TABLE ezsearch_search_phrase CONVERT TO CHARACTER SET utf8 COLLATE utf8_unicode_ci;
ALTER TABLE ezsearch_word CONVERT TO CHARACTER SET utf8 COLLATE utf8_unicode_ci;
ALTER TABLE ezsection CONVERT TO CHARACTER SET utf8 COLLATE utf8_unicode_ci;
ALTER TABLE ezsession CONVERT TO CHARACTER SET utf8 COLLATE utf8_unicode_ci;
ALTER TABLE ezsite_data CONVERT TO CHARACTER SET utf8 COLLATE utf8_unicode_ci;
ALTER TABLE ezsubtree_notification_rule CONVERT TO CHARACTER SET utf8 COLLATE utf8_unicode_ci;
ALTER TABLE eztipafriend_counter CONVERT TO CHARACTER SET utf8 COLLATE utf8_unicode_ci;
ALTER TABLE eztipafriend_request CONVERT TO CHARACTER SET utf8 COLLATE utf8_unicode_ci;
ALTER TABLE eztrigger CONVERT TO CHARACTER SET utf8 COLLATE utf8_unicode_ci;
ALTER TABLE ezurl CONVERT TO CHARACTER SET utf8 COLLATE utf8_unicode_ci;
ALTER TABLE ezurlalias CONVERT TO CHARACTER SET utf8 COLLATE utf8_unicode_ci;
ALTER TABLE ezurl_object_link CONVERT TO CHARACTER SET utf8 COLLATE utf8_unicode_ci;
ALTER TABLE ezuser CONVERT TO CHARACTER SET utf8 COLLATE utf8_unicode_ci;
ALTER TABLE ezuservisit CONVERT TO CHARACTER SET utf8 COLLATE utf8_unicode_ci;
ALTER TABLE ezuser_accountkey CONVERT TO CHARACTER SET utf8 COLLATE utf8_unicode_ci;
ALTER TABLE ezuser_discountrule CONVERT TO CHARACTER SET utf8 COLLATE utf8_unicode_ci;
ALTER TABLE ezuser_role CONVERT TO CHARACTER SET utf8 COLLATE utf8_unicode_ci;
ALTER TABLE ezuser_setting CONVERT TO CHARACTER SET utf8 COLLATE utf8_unicode_ci;
ALTER TABLE ezvatrule CONVERT TO CHARACTER SET utf8 COLLATE utf8_unicode_ci;
ALTER TABLE ezvatrule_product_category CONVERT TO CHARACTER SET utf8 COLLATE utf8_unicode_ci;
ALTER TABLE ezvattype CONVERT TO CHARACTER SET utf8 COLLATE utf8_unicode_ci;
ALTER TABLE ezview_counter CONVERT TO CHARACTER SET utf8 COLLATE utf8_unicode_ci;
ALTER TABLE ezwaituntildatevalue CONVERT TO CHARACTER SET utf8 COLLATE utf8_unicode_ci;
ALTER TABLE ezwishlist CONVERT TO CHARACTER SET utf8 COLLATE utf8_unicode_ci;
ALTER TABLE ezworkflow CONVERT TO CHARACTER SET utf8 COLLATE utf8_unicode_ci;
ALTER TABLE ezworkflow_assign CONVERT TO CHARACTER SET utf8 COLLATE utf8_unicode_ci;
ALTER TABLE ezworkflow_event CONVERT TO CHARACTER SET utf8 COLLATE utf8_unicode_ci;
ALTER TABLE ezworkflow_group CONVERT TO CHARACTER SET utf8 COLLATE utf8_unicode_ci;
ALTER TABLE ezworkflow_group_link CONVERT TO CHARACTER SET utf8 COLLATE utf8_unicode_ci;
ALTER TABLE ezworkflow_process CONVERT TO CHARACTER SET utf8 COLLATE utf8_unicode_ci;
UPDATE ezcontentobject_attribute SET data_text = REPLACE(data_text, 'xml version="1.0" encoding="UTF-8"','xml version="1.0" encoding="UTF-8"');

To use:

1. Copy above code into a file e.g. 'ezutf8.sql' on your server
2. run the command:

mysql -u USERNAME -p DATABASENAME < ezutf8.sql

(replace USERNAME and DATABASE with your database connection settings - you will be prompted for the password)

3. Change your ini settings as described above.

4. Clear all caches.

Alternatively, you could paste the sql into phpmyadmin, although for large operations I prefer to use the command line.

Good luck
Matthew

Friday 29 December 2006 5:01:43 am

I have successfully used rather crude approach to convert from latin1 to utf-8. Has worked for me though.
Basicly:

1. make a database dump with mysqldump
2. use iconv to convert the dump
3. use sed (or some other replacement tool) to convert table creation passages in it to charset utf8
4. use sed to change back double-converted characters. This is necessary because ezp (3.6 at least) apparently saves fields in mixed charsets depending on browser settings of user, so even if site is set up to latin, database often also contains utf chars.
5. import the dump.

Here is a portion of a longer script i use to semiautomatically convert 3.6.2 sites to 3.8.3 :

iconv -f LATIN1 -t UTF-8 $SITENAME.sql > $SITENAME.sql.iconv
sed -e 's/DEFAULT CHARSET=latin1/DEFAULT CHARSET=utf8 COLLATE=utf8_bin/g' \
    -e 's/encoding=\\"iso-8859-1\\"/encoding=\\"UTF-8\\"/g' \
    -e 's/\xC3\x83\xC2\(.\)/\xC3\1/g' \
    -e 's/\xC3\x82\xC2\(.\)/\xC2\1/g' \
    -e 's/\xc3\xa2\xc2\x80\xc2\(.\)/\xe2\x80\1/g' \
    $SITENAME.sql.iconv > $SITENAME.sql.iconv.utf8

Modified on Friday 02 March 2007 5:53:24 pm by zurgutt -

Friday 16 February 2007 11:51:04 pm

Hi,

with serialized fields recently introduced in ezp, you also have to take care of the length of serialized strings when you translate from latin1 to utf8 and vice versa. For example, the following serialized field is false after translating from utf8 to latin1 because the length of Abrégé is 6 and not 8 anymore :
a:2:{s:6:"fre-FR";s:8:"Abrégé";s:16:"always-available";s:6:"fre-FR";}
The correct one is
a:2:{s:6:"fre-FR";s:6:"Abrégé";s:16:"always-available";s:6:"fre-FR";}

For me, that was the reason why, after migration, some attributes name had vanished when displaying objects.

The fields concerned by that issue are at least, serialized_name_list in ezcontentclass_attribute and ezcontentclass tables.

Cheers
Berthy

Monday 19 February 2007 1:59:48 am

oh, crap sad.gif Emoticon

This would be impossible to fix in some dump conversion script.. and massaging the dump seems to be only conceivable way for total charset-upgrade that i can see.. so one just has to light lots of black candles, sprinkle around goat blood and pray that someone has not used accented chars in any names :P

Monday 19 February 2007 11:20:01 am

Hi zurgutt,

Could you share with us the script you mention or is that too specific ?

X+

Monday 19 February 2007 12:27:40 pm

Hi guys,

you'll find here http://cpjc.free.fr/index.php?ezpublish what I did to migrate FROM utf-8 TO iso-8859-1. I know the topic is the contrary but that may provide guidance, well I hope happy.gif Emoticon

Cheers
Berthy

Modified on Monday 19 February 2007 12:28:05 pm by Christophe Berthelé

Tuesday 20 February 2007 12:42:46 am

Xavier, look a couple of posts up, the bit with iconv and sed, thats about it.

If you mean the full 3.6 to 3.8 script then rest of it is quite setup specific - creates directory for new site, symlinks as much as possible from central location (kernel, base styles etc), copies necessary dirs from 3.6 (see upgrading instructions), runs the database charset conversion and import, then runs official upgrade scripts. If anyone wants it, ask me on #ezpublish.

zrg

Modified on Tuesday 20 February 2007 3:27:00 am by zurgutt -

Tuesday 20 February 2007 9:39:13 am

Hi,

I've upgraded from 3.6-site with iso-8859-1 on apache 1.3 and mysql 4, to 3.8-site with utf-8 on apache 2.2 and mysql 5.

I made a documentation, of what I did, and scripts I used to do the charset conversion etc.

Click and read happy.gif Emoticon
http://home.hit.no/~jonnybe/eZ/docs/upgrading-to-3.8-experiences.pdf

-Jonny

Tuesday 20 February 2007 12:28:01 pm

Many thanks jonny.

May I suggest you to create a page about it on http://www.ezpedia.org ?

X+

Friday 30 November 2007 2:57:20 pm

Hi Jonny

I discovered there's some unneeded SQL in your tutorial:

UPDATE ezcontentobject_name SET name = REPLACE(name, 'xml version="1.0" encoding="iso-8859-1"','xml version="1.0" encoding="UTF-8"');

The name column of the ezcontentobject_name table does not contain any serialized xml.

Tuesday 18 December 2007 5:09:07 pm

---

Modified on Wednesday 19 December 2007 11:47:38 am by laurent le cadet

Wednesday 25 March 2009 6:51:35 am

Hi all,
I m a newbie here thought of joining this forum because find it nice, this forum interest me I personally liked it.

Friday 08 January 2010 8:02:30 am

The reply has been removed because of violation of forum rules.

Tuesday 20 April 2010 8:00:55 am

The reply has been removed because of violation of forum rules.

expandshrink

You must be logged in to post messages in this topic!

36 542 Users on board!

Forums menu

Proudly Developed with from