Groups | Search | Server Info | Keyboard shortcuts | Login | Register [http] [https] [nntp] [nntps]


Groups > comp.lang.python > #32306

Re: problems with xml parsing (python 3.3)

Path csiph.com!usenet.pasdenom.info!weretis.net!feeder1.news.weretis.net!feeder.erje.net!eu.feeder.erje.net!newsfeed.xs4all.nl!newsfeed6.news.xs4all.nl!xs4all!post.news.xs4all.nl!not-for-mail
Return-Path <python-python-list@m.gmane.org>
X-Original-To python-list@python.org
Delivered-To python-list@mail.python.org
X-Spam-Status OK 0.000
X-Spam-Evidence '*H*': 1.00; '*S*': 0.00; 'root': 0.04; 'attribute': 0.05; '*not*': 0.07; 'attributes': 0.07; 'python': 0.09; 'received:80.91': 0.09; 'received:80.91.229': 0.09; 'received:gmane.org': 0.09; 'received:list': 0.09; 'snippet': 0.09; 'subject:parsing': 0.09; 'subject:problems': 0.09; 'subject:xml': 0.09; 'subject:python': 0.11; 'assume': 0.11; 'document:': 0.16; 'element,': 0.16; 'node)': 0.16; 'nodes': 0.16; 'parser.': 0.16; 'received:80.91.229.3': 0.16; 'received:dip0.t-ipconnect.de': 0.16; 'received:plane.gmane.org': 0.16; 'received:t-ipconnect.de': 0.16; 'subject:3.3': 0.16; 'element': 0.17; 'handles': 0.18; 'import': 0.21; 'second': 0.24; 'header:User-Agent:1': 0.26; 'skip:e 30': 0.27; 'tree': 0.27; 'header:X-Complaints-To:1': 0.28; 'behaviour': 0.29; 'container': 0.29; 'dom': 0.29; 'node': 0.29; 'parent': 0.29; 'writes:': 0.29; 'case,': 0.29; 'compatible': 0.30; 'code': 0.31; 'print': 0.32; 'problem': 0.33; 'to:addr:python-list': 0.33; 'text': 0.34; 'otherwise.': 0.35; 'received:org': 0.36; 'but': 0.36; 'subject:with': 0.36; 'charset:us-ascii': 0.36; 'subject: (': 0.36; 'xml': 0.37; 'quite': 0.37; 'subject:: ': 0.38; 'to:addr:python.org': 0.39; 'skip:" 10': 0.40; 'header:Received:5': 0.40; 'your': 0.60; 'first': 0.61; 'different': 0.63; 'email addr:gmail.com': 0.63; 'book.': 0.65; 'received:217': 0.68; 'respect.': 0.84
X-Injected-Via-Gmane http://gmane.org/
To python-list@python.org
From Dieter Maurer <dieter@handshake.de>
Subject Re: problems with xml parsing (python 3.3)
Date Sun, 28 Oct 2012 08:30:36 +0100
References <97d8de0d-3daa-49be-a91f-c65fc8a9019f@googlegroups.com>
Mime-Version 1.0
Content-Type text/plain; charset=us-ascii
X-Gmane-NNTP-Posting-Host pd9e080a4.dip0.t-ipconnect.de
User-Agent Gnus/5.1008 (Gnus v5.10.8) XEmacs/21.4.22 (linux)
Cancel-Lock sha1:B1OsTvohBSF7C9YE4veCkF2t4Ck=
X-BeenThere python-list@python.org
X-Mailman-Version 2.1.15
Precedence list
List-Id General discussion list for the Python programming language <python-list.python.org>
List-Unsubscribe <http://mail.python.org/mailman/options/python-list>, <mailto:python-list-request@python.org?subject=unsubscribe>
List-Archive <http://mail.python.org/pipermail/python-list/>
List-Post <mailto:python-list@python.org>
List-Help <mailto:python-list-request@python.org?subject=help>
List-Subscribe <http://mail.python.org/mailman/listinfo/python-list>, <mailto:python-list-request@python.org?subject=subscribe>
Newsgroups comp.lang.python
Message-ID <mailman.2953.1351409448.27098.python-list@python.org> (permalink)
Lines 38
NNTP-Posting-Host 2001:888:2000:d::a6
X-Trace 1351409448 news.xs4all.nl 6960 [2001:888:2000:d::a6]:51085
X-Complaints-To abuse@xs4all.nl
Xref csiph.com comp.lang.python:32306

Show key headers only | View raw


jannidis@gmail.com writes:

> I am new to Python and have a problem with the behaviour of the xml parser. Assume we have this xml document:
>
> <?xml version="1.0" encoding="UTF-8"?>
> <bibliography>
>     <entry>
>             Title of the first book.
>         </entry>
>         <entry>
>             <coauthored/>
> Title of the second book.
>         </entry>
> </bibliography>    
>
>
> If I now check for the text of all 'entry' nodes, the text for the node with the empty element isn't shown
>
>
>
> import xml.etree.ElementTree as ET
> tree = ET.ElementTree(file='test.xml')
> root = tree.getroot()
> resultSet = root.findall(".//entry")
> for r in resultSet:
> 	print (r.text)

I do not know about "xml.etree" but the (said) quite compatible
"lxml.etree" handles text nodes in a quite different way from
that of "DOM": they are *not* considered children of the parent
element but are attached as attributes "text" and "tail" to either
the container element (if the first DOM node is a text node) or the preceeding
element, otherwise.

Your code snippet suggests that "xml.etree" behaves identically in
this respect. In this case, you would find "Title of the second book"
as the "tail" attribute of the element "coauthored".

Back to comp.lang.python | Previous | NextPrevious in thread | Next in thread | Find similar | Unroll thread


Thread

problems with xml parsing (python 3.3) jannidis@gmail.com - 2012-10-27 19:27 -0700
  Re: problems with xml parsing (python 3.3) jannidis@gmail.com - 2012-10-27 19:30 -0700
  Re: problems with xml parsing (python 3.3) MRAB <python@mrabarnett.plus.com> - 2012-10-28 03:08 +0000
  Re: problems with xml parsing (python 3.3) Dieter Maurer <dieter@handshake.de> - 2012-10-28 08:30 +0100
  Re: problems with xml parsing (python 3.3) jannidis@gmail.com - 2012-10-29 15:54 -0700
  Re: problems with xml parsing (python 3.3) jannidis@gmail.com - 2012-10-30 05:37 -0700

csiph-web