Groups | Search | Server Info | Keyboard shortcuts | Login | Register [http] [https] [nntp] [nntps]


Groups > comp.lang.python > #17183

Re: xml, minidom, ElementTree

Path csiph.com!x330-a1.tempe.blueboxinc.net!usenet.pasdenom.info!weretis.net!feeder1.news.weretis.net!feeder.erje.net!newsfeed.xs4all.nl!newsfeed6.news.xs4all.nl!xs4all!post.news.xs4all.nl!not-for-mail
Return-Path <python-python-list@m.gmane.org>
X-Original-To python-list@python.org
Delivered-To python-list@mail.python.org
X-Spam-Status OK 0.002
X-Spam-Evidence '*H*': 1.00; '*S*': 0.00; 'cpython': 0.05; 'subject:xml': 0.07; 'terry': 0.07; '*i*': 0.09; 'magnitude': 0.09; 'negative.': 0.09; 'received:80.91': 0.09; 'received:80.91.229': 0.09; 'received:80.91.229.12': 0.09; 'received:gmane.org': 0.09; 'received:list': 0.09; 'received:lo.gmane.org': 0.09; 'api': 0.09; 'binary': 0.13; 'from:addr:behnel.de': 0.16; 'from:addr:stefan_ml': 0.16; 'from:name:stefan behnel': 0.16; 'optimised': 0.16; 'perhaps,': 0.16; 'reaction': 0.16; 'received:mnet-online.de': 0.16; 'url:blog': 0.17; 'wrote:': 0.18; 'rewrite': 0.18; 'memory': 0.21; 'stuff': 0.22; "doesn't": 0.22; 'header:In-Reply-To:1': 0.22; 'thus': 0.23; 'interface': 0.23; 'string': 0.24; "people's": 0.24; 'stefan': 0.24; 'fact': 0.27; 'compared': 0.28; 'unicode': 0.29; 'pm,': 0.29; 'seem': 0.30; 'consumption': 0.30; 'dom': 0.30; 'xml': 0.31; 'quite': 0.32; 'third-party': 0.32; 'header:User- Agent:1': 0.33; 'header:X-Complaints-To:1': 0.33; 'there': 0.33; 'to:addr:python-list': 0.34; 'file.': 0.34; 'parse': 0.34; 'scale.': 0.34; 'trouble': 0.35; 'file': 0.36; 'data.': 0.36; 'requirements': 0.37; 'two': 0.37; 'but': 0.37; "there's": 0.37; 'enough': 0.38; 'using': 0.38; 'received:org': 0.38; 'some': 0.38; 'couple': 0.38; 'created': 0.38; 'processing': 0.39; 'received:de': 0.39; 'being': 0.39; 'files': 0.39; 'strong': 0.39; 'to:addr:python.org': 0.40; 'difference': 0.40; 'more': 0.61; 'your': 0.61; 'url:index': 0.62; 'url:p': 0.62; 'lower': 0.64; 'memory,': 0.67; 'fit.': 0.73; 'average': 0.80; 'url:php': 0.82; '5-10': 0.84; 'received:88.217': 0.84; '3.3': 0.91; 'notable': 0.91
X-Injected-Via-Gmane http://gmane.org/
To python-list@python.org
From Stefan Behnel <stefan_ml@behnel.de>
Subject Re: xml, minidom, ElementTree
Date Wed, 14 Dec 2011 08:22:25 +0100
References <jbsfar$en7$1@dough.gmane.org> <4EE1C9AB.2040301@v.loewis.de> <E8A216E9-6AC9-4307-8250-B9B185604B82@masklinn.net> <4EE528BD.2040102@v.loewis.de> <4EE7DDE8.2050706@stoneleaf.us> <jc9aik$mgp$2@dough.gmane.org>
Mime-Version 1.0
Content-Type text/plain; charset=UTF-8; format=flowed
Content-Transfer-Encoding 7bit
X-Gmane-NNTP-Posting-Host ppp-88-217-95-222.dynamic.mnet-online.de
User-Agent Mozilla/5.0 (X11; U; Linux x86_64; en-US; rv:1.9.2.23) Gecko/20110921 Lightning/1.0b2 Thunderbird/3.1.15
In-Reply-To <jc9aik$mgp$2@dough.gmane.org>
X-BeenThere python-list@python.org
X-Mailman-Version 2.1.12
Precedence list
List-Id General discussion list for the Python programming language <python-list.python.org>
List-Unsubscribe <http://mail.python.org/mailman/options/python-list>, <mailto:python-list-request@python.org?subject=unsubscribe>
List-Archive <http://mail.python.org/pipermail/python-list>
List-Post <mailto:python-list@python.org>
List-Help <mailto:python-list-request@python.org?subject=help>
List-Subscribe <http://mail.python.org/mailman/listinfo/python-list>, <mailto:python-list-request@python.org?subject=subscribe>
Newsgroups comp.lang.python
Message-ID <mailman.3627.1323847363.27778.python-list@python.org> (permalink)
Lines 32
NNTP-Posting-Host 2001:888:2000:d::a6
X-Trace 1323847363 news.xs4all.nl 6889 [2001:888:2000:d::a6]:41915
X-Complaints-To abuse@xs4all.nl
Xref x330-a1.tempe.blueboxinc.net comp.lang.python:17183

Show key headers only | View raw


Terry Reedy, 14.12.2011 06:01:
> On 12/13/2011 6:21 PM, Ethan Furman wrote:
>> In the near future I will need to parse and rewrite parts of a xml files
>> created by a third-party program (PrintShopMail, for the curious).
>> It contains both binary and textual data.
>>
>> There has been some strong debate about the merits of minidom vs
>> ElementTree.
>>
>> Recommendations?
>
> People's reaction to the DOM interface seem quite varied, with a majority,
> perhaps, being negative. I personally would look at both enough to
> understand the basic API model to see where *I* fit.

The API is one thing, yes, but there's also the fact that MiniDOM doesn't 
scale. If your XML files are of a notable size (a couple of MB), MiniDOM 
may simply not be able to handle them. I collected some numbers in a blog 
post. Note that this is using a recent CPython 3.3 build which has an 
optimised Unicode string implementation, thus yielding lower memory 
requirements on average than Py2.x.

http://blog.behnel.de/index.php?p=197

The memory consumption makes a difference of a factor of 5-10 compared to 
cElementTree, which makes it two orders of magnitude larger than the size 
of the serialised file. You may be able to stuff one such file into memory, 
but you'll quickly get into trouble when you try to do parallel processing 
or otherwise use more than one document at a time.

Stefan

Back to comp.lang.python | Previous | Next | Find similar | Unroll thread


Thread

Re: xml, minidom, ElementTree Stefan Behnel <stefan_ml@behnel.de> - 2011-12-14 08:22 +0100

csiph-web