Path: csiph.com!v102.xanadu-bbs.net!xanadu-bbs.net!feeder.erje.net!eu.feeder.erje.net!eternal-september.org!feeder.eternal-september.org!news.eternal-september.org!.POSTED!not-for-mail
From: Marko Rauhamaa <marko@pacujo.net>
Newsgroups: comp.lang.python
Subject: Re: parsing multiple root element XML into text
Date: Fri, 09 May 2014 13:33:37 +0300
Organization: A noiseless patient Spider
Lines: 17
Message-ID: <87a9arfdha.fsf@elektro.pacujo.net>
References: <0e5e9a24-3663-4293-a530-239486cf28fc@googlegroups.com> <87oaz7uvo4.fsf@dpt-info.u-strasbg.fr>
Mime-Version: 1.0
Content-Type: text/plain; charset=us-ascii
Injection-Info: mx05.eternal-september.org; posting-host="ff5cf27ef3d5b31f034d3b72bdc27a41"; logging-data="10850"; mail-complaints-to="abuse@eternal-september.org";	posting-account="U2FsdGVkX19rmunhnIkRmjRQgxVJn//x"
User-Agent: Gnus/5.13 (Gnus v5.13) Emacs/23.3 (gnu/linux)
Cancel-Lock: sha1:Z785lCMbX41cOc80zOpwCbli7F8= sha1:UWmJfLDXTl0tNmpSzCJqteG4HOI=
Xref: csiph.com comp.lang.python:71165

Alain Ketterlin <alain@dpt-info.u-strasbg.fr>:

> Technically speaking, this is not a well-formed XML document (it is a
> well-formed external general parsed entity, though). If you have other
> XML processors in your workflow, they will/should reject it.

Sometimes the XML elements come through a pipe as an endless sequence.
You can still use the wrapping technique and a SAX parser. However, the
other option is to write a tiny XML scanner that identifies the end of
each element. Then, you can cut out the complete XML element and hand it
over to a DOM parser.

Such a scanner can be really small and nonrecursive because of the
welformedness rules of XML.


Marko