eZ Community » Forums » Developer » Remove huge list of ezUser
expandshrink

Remove huge list of ezUser

Remove huge list of ezUser

Tuesday 29 January 2013 2:03:33 am - 16 replies

I need to get a developer database version from an existing database of a huge site. I figured out that the most heavy data in the ez database are in these tables: ezcontentobject, ezcontentobject_attribute and ezuser.

I need to cancel lots of useless data for a development database. My idea is to create a script that fetch objects and remove them. For this i thought something like this

 eZContentObjectTreeNode::removeSubtrees($deleteIDArray);

I want to do something similar also for ezUser, but i only find this way to cancel a single ezUser one-by-one, that is not very clever for a database with 200k and more users.

foreach ($userList as $user) {$user->removeUser($user->id());}

Does someone has a better way to erase multiple ezUser with a single query?

I also need to know all the tables that can be truncated to save space and performance in a development environment, and any advice related to this problem. Thanks in advice!

Modified on Tuesday 29 January 2013 2:07:04 am by ruben rizzi

Tuesday 29 January 2013 11:38:27 am

Maybe you can first move all unwanted users to a specific group, the remove the node corresponding to that group?

This way you also make sure you're not deleting anon + admin users blunk.gif Emoticon

Tuesday 29 January 2013 1:34:22 pm

Are you sure you wouldn't be better off approaching this from the other way - export whatever content and users you want and import them to a fresh db?

What Gaetano said...

If you have a range or you have the users in specific groups then whatever script you are using should work for users too - they are, in the end, just content objects.

You shouldn't have to truncate anything.  Except maybe the ezsession table if it's a really old version of ezpublish... pre 4.3 I think.

Tuesday 29 January 2013 6:33:10 pm

Consider using Batchtool for this, ref. http://projects.ez.no/batchtool . This may take a lot of time for a big database, but there's really no good way around that when it's the ezcontentobject_attribute table you need to trim down. Tell me how well Batchtool works for a huge database, haven't really stress tested it. happy.gif Emoticon

Just importing specific content doesn't really give a good basis for development, since it's typically the content tree structure, roles, policies, classes, sections, content states and everything else you want to have available when developing and debugging.

Modified on Tuesday 29 January 2013 6:33:55 pm by Arne Bakkebo

Wednesday 30 January 2013 1:56:30 am

Thanks everyone!

Gaetano: it's a good idea, but when i try to delete the User Group node, i have no result, in fact the code:

$node = eZContentObjectTreeNode::fetch( 299 );
$cli->output("Can remove? " . var_export($node->canRemove(), 1) );

display "Can remove? false"

I also try to use the id of the Anonimous User Group node, thinking that this issue depends on the huge number of children, but also in this case i get the same results, so there's another reason

Steven: i like your idea, but it's not very suitable in this scenario: too many different contents, too much criptical mechanism

Arne: i think the site is too complex to use a script to get efficient custom query to reduce database size. But i think i'll give it a try.

Wednesday 30 January 2013 7:35:07 am

You could try eep:
https://github.com/mugoweb/eep

See also:
http://www.mugo.ca/Blog/Command-line-tool-for-eZ-Publish-called-eep

You would do

cd <your ez root>
eep use ezroot .
eep contentclass fetchallinstances user > foo.txt
## here, edit foo.txt to remove lines representing users you want to keep (admin, etc)
cat foo.txt | awk '$1=="|" {print $2}' \
     xargs -IOID eep contentobject delete OID

Almost certainly you will have to dump the users in chunks, so modify that step to something like:

eep contentclass fetchallinstances --limit=10000

And back up the db first!

Wednesday 30 January 2013 6:31:54 pm

About canRemove: all cli scripts run as anon uer by default. Since most permission-checkin in eZ is done at the level of views, the scripts still do work even when doing content-related stuff.

But in this case, it might be better if you pass a username and password to your script, so that it logs you in as admin (it's part of the default options offered by ezscript, if you just find out how to activate it)

Wednesday 30 January 2013 7:53:09 pm

My experimental, optimized suggestion:

  • Do an SQL: SELECT contentobject_id FROM ezcontentobject_tree WHERE path_string LIKE '<your user group path string>%';
  • Then another SQL: DELETE FROM ezontentobject_attribute WHERE contentobject_id IN <all selected contentobject_id>;

Or combine the two, with subqueries or otherwise. Some SQL expert can probably optimize this.

Note that I have not tried this, and there may definitely be unforseen consequences. But if you don't relate to these users in admin, I can't see that it should be a problem.

ezcontentobject_attribute is the one huge, problematic db table, so this procedure should lighten the database somewhat. Feel free to tell us how it's working. happy.gif Emoticon

Wednesday 27 February 2013 5:26:40 pm

I chose batchtool extension, i find it very good. I create a bash script that removes object inside any section and subsection, and also the public users (i deleted about 500k objects). In my example the bash repeat the batchtool command in this format:

 php runcronjobs.php batchtool --filter="fetchnodelist;parent=386532;classname=gossip_gallery_folder:gossip_news:video;offset=20" --operation="nodedelete"

After that i also used flatten.php to remove useless version. After this procedure Phpmyadmin shows the same database size as before (i think that's because is innodb). So i dumped it and i restored it from my dump, and i saved only 1,6 GB in the ezcontentobject_attribute table (from 6 to 4.4).

There's other script like flatten that i can use to reduce database size? Can someone tell me what are the most memory consuming ezobject type? Is there a way to see memory usation for objects of the same class, or the memory used for each section? I really need to figure out why my operation saves only 1/3, 1/4 when I'm expecting a very consistent memory reduction

Modified on Wednesday 27 February 2013 5:29:39 pm by ruben rizzi

Wednesday 27 February 2013 8:41:53 pm

Cool that you tested Batchtool with 500k objects. How much time did that take? happy.gif Emoticon

I think you should have a look at how many records are contained in the other tables in the database. Sounds to me like something is taking up a huge amount of space here.

Modified on Wednesday 27 February 2013 8:42:22 pm by Arne Bakkebo

Wednesday 27 February 2013 9:26:56 pm

ezcontentobject_attribute and ezcontentobject are the big tables, in my scenario these 2 tables occupies more than 50% of database space.

The script took 8 hours to erase 250k users object (with 2g of memory use). Then i launched the other script that deleted 1000 object each for a 90-180m of memory use, 250 times with a batch. I dont know how much time the second part took me because i did other operation, but i think about 4-6 hours

Modified on Wednesday 27 February 2013 9:36:31 pm by ruben rizzi

Thursday 07 March 2013 8:49:36 am

Hi, Ruben.

Don't know if you're still working on this, but for completeness I'll add my own solution to this problem.

I needed a small database for automatic testing, and set up an sql file (attached below) for cleaning up a large bulk of the database. I got the database I've tested it on down to about 5% of original size, which is ok, but it still takes about 15 seconds to import it, which amounts to a lot if we use it on a lot of tests. It's an acceptable beginning though.

Within the sql, you need to specify a list of contentclass_id's for which all content objects can be obliterated. Also, you need to specify a list of path_string's to obliterate a complete subtree in the tree structure.

The sql will also take care of removing all versions of the content objects except the current published one, and will remove all objects without a node and all nodes without an object (and related tables), as well as truncate a bunch of tables you possibly don't require while testing.

In my project I'm using this in combination with Batchtool, to be able to remove some of the content objects but not all of them (that's not trivial to do with a line of sql), so that I have some left to test on for strategic classes. I noticed that the kernel function used by Batchtool to delete an object apparently only deletes the node, despite the inline comments claiming otherwise. My sql file takes care of that though.

 /* SQL file to make a clean and small database for testing purposes */
/* Do not run on production server!!! The consequences will be severe!!! */
 
/* Unless you need to test functionality related to object states */
TRUNCATE ezcobj_state;
TRUNCATE ezcobj_state_group;
TRUNCATE ezcobj_state_group_language;
TRUNCATE ezcobj_state_language;
TRUNCATE ezcobj_state_link;
 
/* Unless you need to test functionality related to collaboration */
TRUNCATE ezcollab_group;
TRUNCATE ezcollab_item;
TRUNCATE ezcollab_item_group_link;
TRUNCATE ezcollab_item_message_link;
TRUNCATE ezcollab_item_participant_link;
TRUNCATE ezcollab_item_status;
TRUNCATE ezcollab_notification_rule;
TRUNCATE ezcollab_profile;
TRUNCATE ezcollab_simple_message;
 
/* Can be removed if eZ Find is used (or search is not tested) */
TRUNCATE ezsearch_object_word_link;
TRUNCATE ezsearch_return_count;
TRUNCATE ezsearch_search_phrase;
TRUNCATE ezsearch_word;
 
/* Unless you need to test functionality related to eZ Survey */
TRUNCATE ezsurvey;
TRUNCATE ezsurveymetadata;
TRUNCATE ezsurveyquestion;
TRUNCATE ezsurveyquestionmetadata;
TRUNCATE ezsurveyquestionresult;
TRUNCATE ezsurveyrelatedconfig;
TRUNCATE ezsurveyresult;
 
TRUNCATE ezuservisit;
TRUNCATE ezimagefile;
TRUNCATE ezoperation_memento;
TRUNCATE ezpending_actions;
TRUNCATE ezprest_token;
TRUNCATE ezcontentobject_trash;
TRUNCATE ezcontentbrowsebookmark;
TRUNCATE ezcontentbrowserecent;
TRUNCATE ezforgot_password;
TRUNCATE ezpreferences;
 
/* Clean up any versions not matching current content version */
DELETE FROM t1 USING ezcontentobject_attribute AS t1 LEFT JOIN ezcontentobject AS object ON object.id = t1.contentobject_id WHERE  t1.version != object.current_version;
DELETE FROM t1 USING ezcontentobject_version AS t1 LEFT JOIN ezcontentobject AS object ON object.id = t1.contentobject_id WHERE  t1.version != object.current_version;
DELETE FROM t1 USING ezcontentobject_name AS t1 LEFT JOIN ezcontentobject AS object ON object.id = t1.contentobject_id WHERE  t1.content_version != object.current_version;
DELETE FROM t1 USING eznode_assignment AS t1 LEFT JOIN ezcontentobject AS object ON object.id = t1.contentobject_id WHERE  t1.contentobject_version != object.current_version;
 
/* Remove any content under specified subtrees that is not required for testing */
DELETE FROM ezcontentobject_tree WHERE path_string LIKE '/1/2/16622/%'; /* Drafts and imported */
 
/* Clean up any objects not referenced by any nodes */
DELETE FROM object USING ezcontentobject AS object LEFT JOIN ezcontentobject_tree AS node ON object.id = node.contentobject_id WHERE node.contentobject_id IS NULL;
 
/* Remove any content based on classes not required for testing */
DELETE FROM ezcontentobject WHERE contentclass_id = 16; /* article */
DELETE FROM ezcontentobject WHERE contentclass_id = 33; /* image */
DELETE FROM ezcontentobject WHERE contentclass_id = 28; /* file */
DELETE FROM ezcontentobject WHERE contentclass_id = 158; /* file_notified */
DELETE FROM ezcontentobject WHERE contentclass_id = 34; /* link */
DELETE FROM ezcontentobject WHERE contentclass_id = 43; /* event */
 
/* Clean up any records relating to objects that does not exist any more */
DELETE FROM t1 USING ezcontentobject_attribute AS t1 LEFT JOIN ezcontentobject AS object ON t1.contentobject_id = object.id WHERE object.id IS NULL;
DELETE FROM t1 USING ezcontentobject_tree AS t1 LEFT JOIN ezcontentobject AS object ON t1.contentobject_id = object.id WHERE object.id IS NULL;
DELETE FROM t1 USING ezcontentobject_version AS t1 LEFT JOIN ezcontentobject AS object ON t1.contentobject_id = object.id WHERE object.id IS NULL;
DELETE FROM t1 USING ezcontentobject_name AS t1 LEFT JOIN ezcontentobject AS object ON t1.contentobject_id = object.id WHERE object.id IS NULL;
DELETE FROM t1 USING ezcontentobject_link AS t1 LEFT JOIN ezcontentobject AS object ON t1.from_contentobject_id = object.id WHERE  object.id IS NULL;
DELETE FROM t1 USING ezcontentobject_link AS t1 LEFT JOIN ezcontentobject AS object ON t1.to_contentobject_id = object.id WHERE  object.id IS NULL;
DELETE FROM t1 USING ezcontentobject_link AS t1 LEFT JOIN ezcontentobject AS object ON t1.from_contentobject_id = object.id AND t1.from_contentobject_version = object.current_version WHERE object.id IS NULL;
DELETE FROM node USING eznode_assignment AS node LEFT JOIN ezcontentobject AS object ON node.contentobject_id = object.id WHERE  object.id IS NULL;
DELETE FROM user USING ezuser AS user LEFT JOIN ezcontentobject AS object ON user.contentobject_id = object.id WHERE  object.id IS NULL;
DELETE FROM user USING ezuser_accountkey AS user LEFT JOIN ezcontentobject AS object ON user.user_id = object.id WHERE  object.id IS NULL;
DELETE FROM user USING ezuser_setting AS user LEFT JOIN ezcontentobject AS object ON user.user_id = object.id WHERE  object.id IS NULL;
DELETE FROM t1 USING ezurlalias_ml AS t1 LEFT JOIN ezcontentobject_tree AS node ON SUBSTR( t1.action, 8 ) = node.node_id WHERE  node.node_id IS NULL;
DELETE FROM file USING ezbinaryfile AS file LEFT JOIN ezcontentobject_attribute AS attribute ON file.contentobject_attribute_id = attribute.id WHERE  attribute.id IS NULL;
DELETE FROM keyword USING ezkeyword_attribute_link AS keyword LEFT JOIN ezcontentobject_attribute AS attribute ON keyword.objectattribute_id = attribute.id WHERE  attribute.id IS NULL;
DELETE FROM keyword USING ezkeyword AS keyword LEFT JOIN ezkeyword_attribute_link AS attribute ON keyword.id = attribute.keyword_id WHERE  attribute.id IS NULL;
DELETE FROM rating USING ezstarrating AS rating LEFT JOIN ezcontentobject_attribute AS attribute ON rating.contentobject_attribute_id = attribute.id WHERE  attribute.id IS NULL;
DELETE FROM rating USING ezstarrating_data AS rating LEFT JOIN ezcontentobject_attribute AS attribute ON rating.contentobject_attribute_id = attribute.id WHERE  attribute.id IS NULL;
DELETE FROM url USING ezurl_object_link AS url LEFT JOIN ezcontentobject_attribute AS attribute ON url.contentobject_attribute_id = attribute.id WHERE  attribute.id IS NULL;
DELETE FROM url USING ezurl AS url LEFT JOIN ezurl_object_link AS url_link ON url.id = url_link.url_id WHERE  url_link.url_id IS NULL;
 
/* Clean up url alias ml increment table, need to set latest record value here */
DELETE FROM ezurlalias_ml_incr WHERE id < 103037;
 
/* Unless you need to test functionality related to eZ XML Export */
TRUNCATE ezxport_available_cclass_attr;
TRUNCATE ezxport_available_cclasses;
TRUNCATE ezxport_customers;
TRUNCATE ezxport_export_object_log;
TRUNCATE ezxport_exports;
DELETE FROM ezxport_process_logs;
 
/* Unless you need to test functionality related to SQLIImport */
TRUNCATE sqliimport_item;
TRUNCATE sqliimport_scheduled;

Wednesday 13 March 2013 3:19:12 pm

Thanks Arne Bakkebo, your script saves me some GB!

I have another question, i see that you truncate table ezsearch_object_word_link. That one is very big: can i simply truncate it or there's some other table dependencies?

ezurlalias, ezurlalias_ml, ezurlalias_ml_bcknewurls, ezurlalias_ml_incr, ezurlalias_ml_witholdurls are very big too, i found some other topic saying that you can truncate ezurlalias_ml and then use updateniceurls.php to regenerate it: but if i want truncate ALL of them, i mess up my db?

ezimagefile table is the biggest, i can simply truncate it, but i prefer deleting all the reference not linked to the corresponding ezcontentobject_attribute. Does this query is correct for you?

 DELETE FROM t1 USING ezimagefile AS t1 LEFT JOIN ezcontentobject_attribute AS object ON t1.contentobject_attribute_id = object.id WHERE object.id IS NULL;

Thursday 14 March 2013 2:52:16 am

If you use ezfind, you should drop storage of table ezsearch_object_word_link, as it is not used anymore

Thursday 14 March 2013 8:04:48 pm

Hi, Ruben.

All ezsearch_* tables relate to the built in/default search in eZ Publish. As Gaetano says, if you're using eZ Find (as you probably are on a big site), they are no longer in use and can be removed. Even if you're not, the tables can be truncated and you can reindex the content after reducing the database with bin/php/updatesearchindex.php.

You will not mess up your database if truncating ezurlalias_ml table, but you won't be able to use nice urls (only urls on the format /content/view/full/xxx). And you can recreate the nice urls by running the script you mention. However, eZ also stores any historic urls for content objects that have been renamed, and redirects them to the new url. These older urls will be lost if you truncate the ezurlalias_ml table, which may cause Google and bookmarked links to fail.

Your ezimagefile query looks sensible to me. Try it. But not on production database! happy.gif Emoticon

How much are you able to reduce your 6GB database now?

Modified on Thursday 14 March 2013 8:07:51 pm by Arne Bakkebo

Tuesday 19 March 2013 11:27:25 am

No only ezcontentobject_attribute was 6 gb happy.gif Emoticon
With batchtool deleting most of the backend visible content, with trash.php, with flatten.php, with cleanup.php, and with the manual table truncation, i reduced database from 14,6GB to 8,6GB

ezcontentobject_attribute table from 6,4gb to 4,6gb

It's not a bad result, but sadly is still unacceptable for a development purpose

Monday 20 October 2014 5:07:27 am

The problem is that batchtool is not optimized for removing content. I will create a batch that will eliminate all the recurrences soon, so hopefully the community could benefit from that.

expandshrink

You must be logged in to post messages in this topic!

36 542 Users on board!

Forums menu

Proudly Developed with from