Path: csiph.com!v102.xanadu-bbs.net!xanadu-bbs.net!feeder.erje.net!eu.feeder.erje.net!eternal-september.org!feeder.eternal-september.org!news.eternal-september.org!.POSTED!not-for-mail
From: Marko Rauhamaa <marko@pacujo.net>
Newsgroups: comp.lang.python
Subject: Re: parsing multiple root element XML into text
Date: Fri, 09 May 2014 15:31:20 +0300
Organization: A noiseless patient Spider
Lines: 27
Message-ID: <8738gjf813.fsf@elektro.pacujo.net>
References: <0e5e9a24-3663-4293-a530-239486cf28fc@googlegroups.com> <87oaz7uvo4.fsf@dpt-info.u-strasbg.fr> <87a9arfdha.fsf@elektro.pacujo.net> <87k39vupnc.fsf@dpt-info.u-strasbg.fr>
Mime-Version: 1.0
Content-Type: text/plain; charset=us-ascii
Injection-Info: mx05.eternal-september.org; posting-host="ff5cf27ef3d5b31f034d3b72bdc27a41"; logging-data="26988"; mail-complaints-to="abuse@eternal-september.org";	posting-account="U2FsdGVkX1+e2xaYNFOl9YIf2xVjPsfo"
User-Agent: Gnus/5.13 (Gnus v5.13) Emacs/23.3 (gnu/linux)
Cancel-Lock: sha1:HMQgcvV9TstYoYk3qOdrClqp0vg= sha1:manJh2r18ZRq90GBQc6SIloKuHQ=
Xref: csiph.com comp.lang.python:71168

Alain Ketterlin <alain@dpt-info.u-strasbg.fr>:

> Marko Rauhamaa <marko@pacujo.net> writes:
>> Sometimes the XML elements come through a pipe as an endless
>> sequence. You can still use the wrapping technique and a SAX parser.
>> However, the other option is to write a tiny XML scanner that
>> identifies the end of each element. Then, you can cut out the
>> complete XML element and hand it over to a DOM parser.
>
> Well maybe, even though I see no point in doing so. If the whole
> transaction is a single document and you need to get sub-elements on
> the fly, just use the SAX parser: there is no need to use a "tiny XML
> scanner" (whatever that is), and building a DOM for a part of the
> document in your SAX handler is easy if needed (for the OP's case a
> simple state machine would be enough, probably).

An example is <URL:
http://en.wikipedia.org/wiki/XMPP#XMPP_via_HTTP_and_WebSocket_transports>.

The "document" is potentially infinitely long. The elements are
messages.

The programmer would rather process the elements as DOM trees than
follow the meandering SAX parser.


Marko