Groups > comp.lang.python > #71157 > unrolled thread

parsing multiple root element XML into text

Started by	Percy Tambunan <percy.tambunan@gmail.com>
First post	2014-05-09 01:59 -0700
Last post	2014-05-09 21:46 +0200
Articles	20 on this page of 21 — 7 participants

Back to article view | Back to comp.lang.python

  parsing multiple root element XML into text Percy Tambunan <percy.tambunan@gmail.com> - 2014-05-09 01:59 -0700
    Re: parsing multiple root element XML into text Marko Rauhamaa <marko@pacujo.net> - 2014-05-09 12:01 +0300
    Re: parsing multiple root element XML into text Chris Angelico <rosuav@gmail.com> - 2014-05-09 19:02 +1000
      Re: parsing multiple root element XML into text Percy Tambunan <percy.tambunan@gmail.com> - 2014-05-11 21:12 -0700
        Re: parsing multiple root element XML into text Peter Otten <__peter__@web.de> - 2014-05-12 10:22 +0200
    Re: parsing multiple root element XML into text Stefan Behnel <stefan_ml@behnel.de> - 2014-05-09 11:13 +0200
    Re: parsing multiple root element XML into text Chris Angelico <rosuav@gmail.com> - 2014-05-09 19:15 +1000
    Re: parsing multiple root element XML into text Alain Ketterlin <alain@dpt-info.u-strasbg.fr> - 2014-05-09 11:51 +0200
      Re: parsing multiple root element XML into text Marko Rauhamaa <marko@pacujo.net> - 2014-05-09 13:33 +0300
        Re: parsing multiple root element XML into text Alain Ketterlin <alain@dpt-info.u-strasbg.fr> - 2014-05-09 14:01 +0200
          Re: parsing multiple root element XML into text Marko Rauhamaa <marko@pacujo.net> - 2014-05-09 15:31 +0300
            Re: parsing multiple root element XML into text Marko Rauhamaa <marko@pacujo.net> - 2014-05-09 15:38 +0300
              Re: parsing multiple root element XML into text Stefan Behnel <stefan_ml@behnel.de> - 2014-05-09 15:55 +0200
                Re: parsing multiple root element XML into text Marko Rauhamaa <marko@pacujo.net> - 2014-05-09 18:29 +0300
              Re: parsing multiple root element XML into text Burak Arslan <burak.arslan@arskom.com.tr> - 2014-05-09 19:52 +0300
              Re: parsing multiple root element XML into text Stefan Behnel <stefan_ml@behnel.de> - 2014-05-09 21:51 +0200
            Re: parsing multiple root element XML into text Alain Ketterlin <alain@dpt-info.u-strasbg.fr> - 2014-05-09 17:50 +0200
              Re: parsing multiple root element XML into text Marko Rauhamaa <marko@pacujo.net> - 2014-05-09 19:15 +0300
                Re: parsing multiple root element XML into text Alain Ketterlin <alain@dpt-info.u-strasbg.fr> - 2014-05-09 19:16 +0200
                  Re: parsing multiple root element XML into text Marko Rauhamaa <marko@pacujo.net> - 2014-05-09 21:04 +0300
                    Re: parsing multiple root element XML into text Stefan Behnel <stefan_ml@behnel.de> - 2014-05-09 21:46 +0200

Page 1 of 2 [1] 2 Next page →

#71157 — parsing multiple root element XML into text

From	Percy Tambunan <percy.tambunan@gmail.com>
Date	2014-05-09 01:59 -0700
Subject	parsing multiple root element XML into text
Message-ID	<0e5e9a24-3663-4293-a530-239486cf28fc@googlegroups.com>

Hai, I would like to parse this multiple root element XML 

<object class="EnumDnSched">
  <field name="enumDn">
    <value>343741014</value>
  </field>
  <field name="naptrFlags">
    <value>nu</value>
  </field>
</object>
<object class="EnumDnSched">
  <field name="enumDn">
    <value>343741015</value>
  </field>
  <field name="naptrFlags">
    <value>nu</value>
  </field>
</object>

into this
create enumdnsched 4.1.0.1.4.7.3.4.3.2.6.e164.arpa -set naptrFlags=nu
create enumdnsched 5.1.0.1.4.7.3.4.3.2.6.e164.arpa -set naptrFlags=nu

Anyone can great example how to do that, I would really appreciate it.

Thanks,
Percy

[toc] | [next] | [standalone]

#71158

From	Marko Rauhamaa <marko@pacujo.net>
Date	2014-05-09 12:01 +0300
Message-ID	<87fvkjfhqk.fsf@elektro.pacujo.net>
In reply to	#71157

Percy Tambunan <percy.tambunan@gmail.com>:

> Hai, I would like to parse this multiple root element XML 

How about creating a file-like object that wraps the multi-root file
into a single-root document?


Marko

[toc] | [prev] | [next] | [standalone]

#71159

From	Chris Angelico <rosuav@gmail.com>
Date	2014-05-09 19:02 +1000
Message-ID	<mailman.9815.1399626164.18130.python-list@python.org>
In reply to	#71157

On Fri, May 9, 2014 at 6:59 PM, Percy Tambunan <percy.tambunan@gmail.com> wrote:
> Hai, I would like to parse this multiple root element XML

Easy fix might be to wrap it in <root> and </root>, which will give
you a new root. Would that help?

ChrisA

[toc] | [prev] | [next] | [standalone]

#71367

From	Percy Tambunan <percy.tambunan@gmail.com>
Date	2014-05-11 21:12 -0700
Message-ID	<c7a885fa-11fc-4365-ab19-d70262541f3a@googlegroups.com>
In reply to	#71159

On Friday, May 9, 2014 4:02:42 PM UTC+7, Chris Angelico wrote:
> On Fri, May 9, 2014 at 6:59 PM, Percy Tambunan <percy.tambunan@gmail.com> wrote:
> 
> > Hai, I would like to parse this multiple root element XML
> 
> 
> 
> Easy fix might be to wrap it in <root> and </root>, which will give
> 
> you a new root. Would that help?
> 
> 
> 
> ChrisA

Thanks chris for the idea. 
Any suggestion to make it print like this:

create enumdnsched 4.1.0.1.4.7.3.4.3.2.6.e164.arpa -set naptrFlags=nu 
create enumdnsched 5.1.0.1.4.7.3.4.3.2.6.e164.arpa -set naptrFlags=nu

[toc] | [prev] | [next] | [standalone]

#71378

From	Peter Otten <__peter__@web.de>
Date	2014-05-12 10:22 +0200
Message-ID	<mailman.9911.1399882939.18130.python-list@python.org>
In reply to	#71367

Percy Tambunan wrote:

> On Friday, May 9, 2014 4:02:42 PM UTC+7, Chris Angelico wrote:
>> On Fri, May 9, 2014 at 6:59 PM, Percy Tambunan <percy.tambunan@gmail.com>
>> wrote:
>> 
>> > Hai, I would like to parse this multiple root element XML
>> 
>> 
>> 
>> Easy fix might be to wrap it in <root> and </root>, which will give
>> 
>> you a new root. Would that help?
>> 
>> 
>> 
>> ChrisA
> 
> Thanks chris for the idea.
> Any suggestion to make it print like this:
> 
> create enumdnsched 4.1.0.1.4.7.3.4.3.2.6.e164.arpa -set naptrFlags=nu
> create enumdnsched 5.1.0.1.4.7.3.4.3.2.6.e164.arpa -set naptrFlags=nu

[Stefan Behnel]

> ElementTree's XMLParser() can be use efficiently for this. Something like
> this should work:
> 
>     from xml.etree.ElementTree import XMLParser
> 
>     parser = XMLParser()
>     parser.feed(b'<root>')
>     parser.feed(real_input_data)
>     parser.feed(b'</root>')
>     root = parser.close()
> 
>     for subtree in root:
>         ...
 
Have you tried to integrate Stefan's example into your script? If so, what 
is the current state of your code, and what problems prevented you from 
completing it?

Expect help to fix these problems, but not necessarily a ready-to-run 
solution.

If you have not made an attempt yet look here for ideas on how you can use 
XPath to extract interesting data from the subtree:

https://docs.python.org/dev/library/xml.etree.elementtree.html#example

[toc] | [prev] | [next] | [standalone]

#71160

From	Stefan Behnel <stefan_ml@behnel.de>
Date	2014-05-09 11:13 +0200
Message-ID	<mailman.9816.1399626847.18130.python-list@python.org>
In reply to	#71157

Chris Angelico, 09.05.2014 11:02:
> On Fri, May 9, 2014 at 6:59 PM, Percy Tambunan wrote:
>> Hai, I would like to parse this multiple root element XML
> 
> Easy fix might be to wrap it in <root> and </root>, which will give
> you a new root.

ElementTree's XMLParser() can be use efficiently for this. Something like
this should work:

    from xml.etree.ElementTree import XMLParser

    parser = XMLParser()
    parser.feed(b'<root>')
    parser.feed(real_input_data)
    parser.feed(b'</root>')
    root = parser.close()

    for subtree in root:
        ...

Stefan

[toc] | [prev] | [next] | [standalone]

#71161

From	Chris Angelico <rosuav@gmail.com>
Date	2014-05-09 19:15 +1000
Message-ID	<mailman.9817.1399626951.18130.python-list@python.org>
In reply to	#71157

On Fri, May 9, 2014 at 7:13 PM, Stefan Behnel <stefan_ml@behnel.de> wrote:
> Chris Angelico, 09.05.2014 11:02:
>> On Fri, May 9, 2014 at 6:59 PM, Percy Tambunan wrote:
>>> Hai, I would like to parse this multiple root element XML
>>
>> Easy fix might be to wrap it in <root> and </root>, which will give
>> you a new root.
>
> ElementTree's XMLParser() can be use efficiently for this. Something like
> this should work:
>
>     from xml.etree.ElementTree import XMLParser
>
>     parser = XMLParser()
>     parser.feed(b'<root>')
>     parser.feed(real_input_data)
>     parser.feed(b'</root>')
>     root = parser.close()
>
>     for subtree in root:
>         ...

That looks good to me :)

ChrisA

[toc] | [prev] | [next] | [standalone]

#71163

From	Alain Ketterlin <alain@dpt-info.u-strasbg.fr>
Date	2014-05-09 11:51 +0200
Message-ID	<87oaz7uvo4.fsf@dpt-info.u-strasbg.fr>
In reply to	#71157

Percy Tambunan <percy.tambunan@gmail.com> writes:

> Hai, I would like to parse this multiple root element XML 
>
> <object class="EnumDnSched">
[...]
> </object>
> <object class="EnumDnSched">
[...]
> </object>

Technically speaking, this is not a well-formed XML document (it is a
well-formed external general parsed entity, though). If you have other
XML processors in your workflow, they will/should reject it.

The easiest fix is to wrap this inside a root element (see other
messages in this thread), or use a DTD-declared entity to include this
fragment in a document.

-- Alain.

[toc] | [prev] | [next] | [standalone]

#71165

From	Marko Rauhamaa <marko@pacujo.net>
Date	2014-05-09 13:33 +0300
Message-ID	<87a9arfdha.fsf@elektro.pacujo.net>
In reply to	#71163

Alain Ketterlin <alain@dpt-info.u-strasbg.fr>:

> Technically speaking, this is not a well-formed XML document (it is a
> well-formed external general parsed entity, though). If you have other
> XML processors in your workflow, they will/should reject it.

Sometimes the XML elements come through a pipe as an endless sequence.
You can still use the wrapping technique and a SAX parser. However, the
other option is to write a tiny XML scanner that identifies the end of
each element. Then, you can cut out the complete XML element and hand it
over to a DOM parser.

Such a scanner can be really small and nonrecursive because of the
welformedness rules of XML.


Marko

[toc] | [prev] | [next] | [standalone]

#71167

From	Alain Ketterlin <alain@dpt-info.u-strasbg.fr>
Date	2014-05-09 14:01 +0200
Message-ID	<87k39vupnc.fsf@dpt-info.u-strasbg.fr>
In reply to	#71165

Marko Rauhamaa <marko@pacujo.net> writes:

> Alain Ketterlin <alain@dpt-info.u-strasbg.fr>:
>
>> Technically speaking, this is not a well-formed XML document (it is a
>> well-formed external general parsed entity, though). If you have other
>> XML processors in your workflow, they will/should reject it.
>
> Sometimes the XML elements come through a pipe as an endless sequence.
> You can still use the wrapping technique and a SAX parser. However, the
> other option is to write a tiny XML scanner that identifies the end of
> each element. Then, you can cut out the complete XML element and hand it
> over to a DOM parser.

Well maybe, even though I see no point in doing so. If the whole
transaction is a single document and you need to get sub-elements on the
fly, just use the SAX parser: there is no need to use a "tiny XML
scanner" (whatever that is), and building a DOM for a part of the
document in your SAX handler is easy if needed (for the OP's case a
simple state machine would be enough, probably).

-- Alain.

[toc] | [prev] | [next] | [standalone]

#71168

From	Marko Rauhamaa <marko@pacujo.net>
Date	2014-05-09 15:31 +0300
Message-ID	<8738gjf813.fsf@elektro.pacujo.net>
In reply to	#71167

Alain Ketterlin <alain@dpt-info.u-strasbg.fr>:

> Marko Rauhamaa <marko@pacujo.net> writes:
>> Sometimes the XML elements come through a pipe as an endless
>> sequence. You can still use the wrapping technique and a SAX parser.
>> However, the other option is to write a tiny XML scanner that
>> identifies the end of each element. Then, you can cut out the
>> complete XML element and hand it over to a DOM parser.
>
> Well maybe, even though I see no point in doing so. If the whole
> transaction is a single document and you need to get sub-elements on
> the fly, just use the SAX parser: there is no need to use a "tiny XML
> scanner" (whatever that is), and building a DOM for a part of the
> document in your SAX handler is easy if needed (for the OP's case a
> simple state machine would be enough, probably).

An example is <URL:
http://en.wikipedia.org/wiki/XMPP#XMPP_via_HTTP_and_WebSocket_transports>.

The "document" is potentially infinitely long. The elements are
messages.

The programmer would rather process the elements as DOM trees than
follow the meandering SAX parser.


Marko

[toc] | [prev] | [next] | [standalone]

#71169

From	Marko Rauhamaa <marko@pacujo.net>
Date	2014-05-09 15:38 +0300
Message-ID	<87y4ybdt46.fsf@elektro.pacujo.net>
In reply to	#71168

Marko Rauhamaa <marko@pacujo.net>:

> Alain Ketterlin <alain@dpt-info.u-strasbg.fr>:
>
>> Marko Rauhamaa <marko@pacujo.net> writes:
>>> Sometimes the XML elements come through a pipe as an endless
>>> sequence. You can still use the wrapping technique and a SAX parser.
>>> However, the other option is to write a tiny XML scanner that
>>> identifies the end of each element. Then, you can cut out the
>>> complete XML element and hand it over to a DOM parser.
>>
>> Well maybe, even though I see no point in doing so. If the whole
>> transaction is a single document and you need to get sub-elements on
>> the fly, just use the SAX parser: there is no need to use a "tiny XML
>> scanner" (whatever that is), and building a DOM for a part of the
>> document in your SAX handler is easy if needed (for the OP's case a
>> simple state machine would be enough, probably).
>
> An example is <URL:
> http://en.wikipedia.org/wiki/XMPP#XMPP_via_HTTP_and_WebSocket_transports>.
>
> The "document" is potentially infinitely long. The elements are
> messages.
>
> The programmer would rather process the elements as DOM trees than
> follow the meandering SAX parser.

In fact, the best thing would be if the DOM parser supported the use
case out of the box: give the partial, whole or oversize document to the
parser. If the document isn't complete, the parser should indicate the
need for more input. If there are bytes after the document is
successfully finished, the parser should leave the excess bytes in the
pipeline.

IOW, if the DOM parser knows full well where the document ends, why
must the application tell it to it?

Marko

[toc] | [prev] | [next] | [standalone]

#71171

From	Stefan Behnel <stefan_ml@behnel.de>
Date	2014-05-09 15:55 +0200
Message-ID	<mailman.9821.1399643762.18130.python-list@python.org>
In reply to	#71169

Marko Rauhamaa, 09.05.2014 14:38:
> Marko Rauhamaa:
>> Alain Ketterlin:
>>> Marko Rauhamaa writes:
>>>> Sometimes the XML elements come through a pipe as an endless
>>>> sequence. You can still use the wrapping technique and a SAX parser.
>>>> However, the other option is to write a tiny XML scanner that
>>>> identifies the end of each element. Then, you can cut out the
>>>> complete XML element and hand it over to a DOM parser.
>>>
>>> Well maybe, even though I see no point in doing so. If the whole
>>> transaction is a single document and you need to get sub-elements on
>>> the fly, just use the SAX parser: there is no need to use a "tiny XML
>>> scanner" (whatever that is), and building a DOM for a part of the
>>> document in your SAX handler is easy if needed (for the OP's case a
>>> simple state machine would be enough, probably).
>>
>> An example is <URL:
>> http://en.wikipedia.org/wiki/XMPP#XMPP_via_HTTP_and_WebSocket_transports>.
>>
>> The "document" is potentially infinitely long. The elements are
>> messages.
>>
>> The programmer would rather process the elements as DOM trees than
>> follow the meandering SAX parser.
> 
> In fact, the best thing would be if the DOM parser supported the use
> case out of the box: give the partial, whole or oversize document to the
> parser. If the document isn't complete, the parser should indicate the
> need for more input. If there are bytes after the document is
> successfully finished, the parser should leave the excess bytes in the
> pipeline.
> 
> IOW, if the DOM parser knows full well where the document ends, why
> must the application tell it to it?

In fact, XMPP traffic has a root element. And I agree that a tree is much
easier to handle than SAX events. ElementTree has gained a nice API in
Py3.4 that supports this in a much saner way than SAX, using iterators.
Basically, you just dump in some data that you received and get back an
iterator over the elements (and their subtrees) that it generated from it.
Intercept on the right top elements and you get your next subtree as soon
as it's ready.

https://docs.python.org/3.4/library/xml.etree.elementtree.html#pull-api-for-non-blocking-parsing

It's also supported by recent versions of lxml, which additionally has easy
to use support for the sending side with its xmlfile() tool.

http://lxml.de/parsing.html#incremental-event-parsing

http://lxml.de/api.html#incremental-xml-generation

Stefan

[toc] | [prev] | [next] | [standalone]

#71176

From	Marko Rauhamaa <marko@pacujo.net>
Date	2014-05-09 18:29 +0300
Message-ID	<87tx8zdl7e.fsf@elektro.pacujo.net>
In reply to	#71171

Stefan Behnel <stefan_ml@behnel.de>:

> ElementTree has gained a nice API in Py3.4 that supports this in a
> much saner way than SAX, using iterators.

Good to know.


Marko

[toc] | [prev] | [next] | [standalone]

#71181

From	Burak Arslan <burak.arslan@arskom.com.tr>
Date	2014-05-09 19:52 +0300
Message-ID	<mailman.9825.1399654376.18130.python-list@python.org>
In reply to	#71169

[Multipart message — attachments visible in raw view] — view raw

On 05/09/14 16:55, Stefan Behnel wrote:
> ElementTree has gained a nice API in
> Py3.4 that supports this in a much saner way than SAX, using iterators.
> Basically, you just dump in some data that you received and get back an
> iterator over the elements (and their subtrees) that it generated from it.
> Intercept on the right top elements and you get your next subtree as soon
> as it's ready.

Hi Stefan,

Here's a small script:

    events = etree.iterparse(istr, events=("start", "end"))
    stack = deque()
    for event, element in events:
    if event == "start":
    stack.append(element)
    elif event == "end":
    stack.pop()

    if len(stack) == 0:
    break

    print(istr.tell(), "%5s, %4s, %s" % (event, element.tag, element.text))

where istr is an input-stream. (Fully working example:
https://gist.github.com/plq/025005a71e8135c46800)

I was expecting to have istr.tell() return the position where the first
root element ends, which would make it possible to continue parsing with
another call to etree.iterparse(). But istr.tell() returns the position
of EOF after the first call to next() on the iterator it returns.
Without the stack check, the loop eventually throws an exception and the
offset value in that exception is None.

So I'm lost here, how it'd possible to parse OP's document with lxml?

Best,
Burak

[toc] | [prev] | [next] | [standalone]

#71186

From	Stefan Behnel <stefan_ml@behnel.de>
Date	2014-05-09 21:51 +0200
Message-ID	<mailman.9827.1399665133.18130.python-list@python.org>
In reply to	#71169

Burak Arslan, 09.05.2014 18:52:
> On 05/09/14 16:55, Stefan Behnel wrote:
>> ElementTree has gained a nice API in
>> Py3.4 that supports this in a much saner way than SAX, using iterators.
>> Basically, you just dump in some data that you received and get back an
>> iterator over the elements (and their subtrees) that it generated from it.
>> Intercept on the right top elements and you get your next subtree as soon
>> as it's ready.
> 
> Here's a small script:

A bit hard to read, though.


>     events = etree.iterparse(istr, events=("start", "end"))
>     stack = deque()
>     for event, element in events:
>     if event == "start":
>     stack.append(element)
>     elif event == "end":
>     stack.pop()
>      
>     if len(stack) == 0:
>     break
>      
>     print(istr.tell(), "%5s, %4s, %s" % (event, element.tag, element.text))
> 
> where istr is an input-stream. (Fully working example:
> https://gist.github.com/plq/025005a71e8135c46800)
> 
> I was expecting to have istr.tell() return the position where the first
> root element ends, which would make it possible to continue parsing with
> another call to etree.iterparse(). But istr.tell() returns the position
> of EOF after the first call to next() on the iterator it returns.

Correct, because it finished parsing. It controls the reading process and
reads ahead, that's how iterparse() works.


> Without the stack check, the loop eventually throws an exception and the
> offset value in that exception is None.
> 
> So I'm lost here, how it'd possible to parse OP's document with lxml?

See my earlier post. Instead of XMLParser, just use the XMLPullParser for
incremental (non-blocking) parsing and processing.

To make this clear, though: to use an XML parser, you need well formed XML,
and that means exactly one root element.

Stefan

[toc] | [prev] | [next] | [standalone]

#71177

From	Alain Ketterlin <alain@dpt-info.u-strasbg.fr>
Date	2014-05-09 17:50 +0200
Message-ID	<87fvkjuf2c.fsf@dpt-info.u-strasbg.fr>
In reply to	#71168

Marko Rauhamaa <marko@pacujo.net> writes:

> Alain Ketterlin <alain@dpt-info.u-strasbg.fr>:
>
>> Marko Rauhamaa <marko@pacujo.net> writes:
>>> Sometimes the XML elements come through a pipe as an endless
>>> sequence. You can still use the wrapping technique and a SAX parser.
>>> However, the other option is to write a tiny XML scanner that
>>> identifies the end of each element. Then, you can cut out the
>>> complete XML element and hand it over to a DOM parser.
>>
>> Well maybe, even though I see no point in doing so. If the whole
>> transaction is a single document and you need to get sub-elements on
>> the fly, just use the SAX parser: there is no need to use a "tiny XML
>> scanner" (whatever that is), and building a DOM for a part of the
>> document in your SAX handler is easy if needed (for the OP's case a
>> simple state machine would be enough, probably).
>
> An example is <URL:
> http://en.wikipedia.org/wiki/XMPP#XMPP_via_HTTP_and_WebSocket_transports>.
>
> The "document" is potentially infinitely long. The elements are
> messages.
>
> The programmer would rather process the elements as DOM trees than
> follow the meandering SAX parser.

which does an exact traversal of potential the DOM tree... (assuming a
DOM is even defined on a non well-formed XML document).

Anyway, my point was only to warn the OP that he is not doing XML.

-- Alain.

[toc] | [prev] | [next] | [standalone]

#71179

From	Marko Rauhamaa <marko@pacujo.net>
Date	2014-05-09 19:15 +0300
Message-ID	<87lhubdj2j.fsf@elektro.pacujo.net>
In reply to	#71177

Alain Ketterlin <alain@dpt-info.u-strasbg.fr>:

> which does an exact traversal of potential the DOM tree... (assuming a
> DOM is even defined on a non well-formed XML document).
>
> Anyway, my point was only to warn the OP that he is not doing XML.

I consider that one of the multitude of flaws in XML.

Compare that with the close analogue: S expressions. Every lisp/scheme
command interface takes in a sequence of unframed S expressions and
interprets them in sequence. I have built similar interfaces using XML:
the command interface accepts a sequence of XML elements and acts on
them upon reaching the end of a complete element.


Marko

[toc] | [prev] | [next] | [standalone]

#71182

From	Alain Ketterlin <alain@dpt-info.u-strasbg.fr>
Date	2014-05-09 19:16 +0200
Message-ID	<87bnv6vpn0.fsf@dpt-info.u-strasbg.fr>
In reply to	#71179

Marko Rauhamaa <marko@pacujo.net> writes:

> Alain Ketterlin <alain@dpt-info.u-strasbg.fr>:
>
>> which does an exact traversal of potential the DOM tree... (assuming a
>> DOM is even defined on a non well-formed XML document).
>>
>> Anyway, my point was only to warn the OP that he is not doing XML.
>
> I consider that one of the multitude of flaws in XML.

I consider such use cases one of the multitude misuses of XML.

> Compare that with the close analogue: S expressions. [...]

How do you specify the encoding of sexprs? How can you require that an
attribute value must match the value of an id-attribute? or whatever
insanely complex integrity rule that XML Schemas lets you express? And
so on.

If all you need to do is parse properly bracketed input, go with sexprs,
or json, or yaml, or pickle if both ends are python programs. Using XML
for such a trivial task is looking for trouble.

-- Alain.

[toc] | [prev] | [next] | [standalone]

#71183

From	Marko Rauhamaa <marko@pacujo.net>
Date	2014-05-09 21:04 +0300
Message-ID	<87bnv6de1j.fsf@elektro.pacujo.net>
In reply to	#71182

Alain Ketterlin <alain@dpt-info.u-strasbg.fr>:

> How do you specify the encoding of sexprs? How can you require that an
> attribute value must match the value of an id-attribute? or whatever
> insanely complex integrity rule that XML Schemas lets you express? And
> so on.

I don't suppose there is a universal schema language available for S
expressions, nor do I suppose many people want such a thing. You specify
S expression encoding the way you specify Python library functions.

I think the worst part of XML is that you can't parse it without a DTD
or schema.

> If all you need to do is parse properly bracketed input, go with
> sexprs, or json, or yaml, or pickle if both ends are python programs.

I was very hopeful about json until I discovered they require the parser
to dynamically support five different character encodings.

XML at least standardized on UTF-8.

I have found ast.literal_eval() to be highly usable.

Pickle I would advise you to stay away from: <URL:
http://lwn.net/Articles/595352/>.


Marko

[toc] | [prev] | [next] | [standalone]

Page 1 of 2 [1] 2 Next page →

csiph-web

parsing multiple root element XML into text

Contents

#71157 — parsing multiple root element XML into text

#71158

#71159

#71367

#71378

#71160

#71161

#71163

#71165

#71167

#71168

#71169

#71171

#71176

#71181

#71186

#71177

#71179

#71182

#71183