Groups | Search | Server Info | Keyboard shortcuts | Login | Register [http] [https] [nntp] [nntps]


Groups > comp.lang.python > #17183 > unrolled thread

Re: xml, minidom, ElementTree

Started byStefan Behnel <stefan_ml@behnel.de>
First post2011-12-14 08:22 +0100
Last post2011-12-14 08:22 +0100
Articles 1 — 1 participant

Back to article view | Back to comp.lang.python

This discussion starts older than the indexed window; earlier articles aren't shown. The article labeled Started by below is the oldest one visible, not the original post.


Contents

  Re: xml, minidom, ElementTree Stefan Behnel <stefan_ml@behnel.de> - 2011-12-14 08:22 +0100

#17183 — Re: xml, minidom, ElementTree

FromStefan Behnel <stefan_ml@behnel.de>
Date2011-12-14 08:22 +0100
SubjectRe: xml, minidom, ElementTree
Message-ID<mailman.3627.1323847363.27778.python-list@python.org>
Terry Reedy, 14.12.2011 06:01:
> On 12/13/2011 6:21 PM, Ethan Furman wrote:
>> In the near future I will need to parse and rewrite parts of a xml files
>> created by a third-party program (PrintShopMail, for the curious).
>> It contains both binary and textual data.
>>
>> There has been some strong debate about the merits of minidom vs
>> ElementTree.
>>
>> Recommendations?
>
> People's reaction to the DOM interface seem quite varied, with a majority,
> perhaps, being negative. I personally would look at both enough to
> understand the basic API model to see where *I* fit.

The API is one thing, yes, but there's also the fact that MiniDOM doesn't 
scale. If your XML files are of a notable size (a couple of MB), MiniDOM 
may simply not be able to handle them. I collected some numbers in a blog 
post. Note that this is using a recent CPython 3.3 build which has an 
optimised Unicode string implementation, thus yielding lower memory 
requirements on average than Py2.x.

http://blog.behnel.de/index.php?p=197

The memory consumption makes a difference of a factor of 5-10 compared to 
cElementTree, which makes it two orders of magnitude larger than the size 
of the serialised file. You may be able to stuff one such file into memory, 
but you'll quickly get into trouble when you try to do parallel processing 
or otherwise use more than one document at a time.

Stefan

[toc] | [standalone]


Back to top | Article view | comp.lang.python


csiph-web