Path: csiph.com!v102.xanadu-bbs.net!xanadu-bbs.net!feeder.erje.net!eu.feeder.erje.net!eternal-september.org!feeder.eternal-september.org!news.eternal-september.org!.POSTED!not-for-mail From: Marko Rauhamaa Newsgroups: comp.lang.python Subject: Re: parsing multiple root element XML into text Date: Fri, 09 May 2014 13:33:37 +0300 Organization: A noiseless patient Spider Lines: 17 Message-ID: <87a9arfdha.fsf@elektro.pacujo.net> References: <0e5e9a24-3663-4293-a530-239486cf28fc@googlegroups.com> <87oaz7uvo4.fsf@dpt-info.u-strasbg.fr> Mime-Version: 1.0 Content-Type: text/plain; charset=us-ascii Injection-Info: mx05.eternal-september.org; posting-host="ff5cf27ef3d5b31f034d3b72bdc27a41"; logging-data="10850"; mail-complaints-to="abuse@eternal-september.org"; posting-account="U2FsdGVkX19rmunhnIkRmjRQgxVJn//x" User-Agent: Gnus/5.13 (Gnus v5.13) Emacs/23.3 (gnu/linux) Cancel-Lock: sha1:Z785lCMbX41cOc80zOpwCbli7F8= sha1:UWmJfLDXTl0tNmpSzCJqteG4HOI= Xref: csiph.com comp.lang.python:71165 Alain Ketterlin : > Technically speaking, this is not a well-formed XML document (it is a > well-formed external general parsed entity, though). If you have other > XML processors in your workflow, they will/should reject it. Sometimes the XML elements come through a pipe as an endless sequence. You can still use the wrapping technique and a SAX parser. However, the other option is to write a tiny XML scanner that identifies the end of each element. Then, you can cut out the complete XML element and hand it over to a DOM parser. Such a scanner can be really small and nonrecursive because of the welformedness rules of XML. Marko