Path: csiph.com!usenet.pasdenom.info!weretis.net!feeder1.news.weretis.net!feeder.erje.net!eu.feeder.erje.net!newsfeed.xs4all.nl!newsfeed6.news.xs4all.nl!xs4all!post.news.xs4all.nl!not-for-mail Return-Path: X-Original-To: python-list@python.org Delivered-To: python-list@mail.python.org X-Spam-Status: OK 0.000 X-Spam-Evidence: '*H*': 1.00; '*S*': 0.00; 'root': 0.04; 'attribute': 0.05; '*not*': 0.07; 'attributes': 0.07; 'python': 0.09; 'received:80.91': 0.09; 'received:80.91.229': 0.09; 'received:gmane.org': 0.09; 'received:list': 0.09; 'snippet': 0.09; 'subject:parsing': 0.09; 'subject:problems': 0.09; 'subject:xml': 0.09; 'subject:python': 0.11; 'assume': 0.11; 'document:': 0.16; 'element,': 0.16; 'node)': 0.16; 'nodes': 0.16; 'parser.': 0.16; 'received:80.91.229.3': 0.16; 'received:dip0.t-ipconnect.de': 0.16; 'received:plane.gmane.org': 0.16; 'received:t-ipconnect.de': 0.16; 'subject:3.3': 0.16; 'element': 0.17; 'handles': 0.18; 'import': 0.21; 'second': 0.24; 'header:User-Agent:1': 0.26; 'skip:e 30': 0.27; 'tree': 0.27; 'header:X-Complaints-To:1': 0.28; 'behaviour': 0.29; 'container': 0.29; 'dom': 0.29; 'node': 0.29; 'parent': 0.29; 'writes:': 0.29; 'case,': 0.29; 'compatible': 0.30; 'code': 0.31; 'print': 0.32; 'problem': 0.33; 'to:addr:python-list': 0.33; 'text': 0.34; 'otherwise.': 0.35; 'received:org': 0.36; 'but': 0.36; 'subject:with': 0.36; 'charset:us-ascii': 0.36; 'subject: (': 0.36; 'xml': 0.37; 'quite': 0.37; 'subject:: ': 0.38; 'to:addr:python.org': 0.39; 'skip:" 10': 0.40; 'header:Received:5': 0.40; 'your': 0.60; 'first': 0.61; 'different': 0.63; 'email addr:gmail.com': 0.63; 'book.': 0.65; 'received:217': 0.68; 'respect.': 0.84 X-Injected-Via-Gmane: http://gmane.org/ To: python-list@python.org From: Dieter Maurer Subject: Re: problems with xml parsing (python 3.3) Date: Sun, 28 Oct 2012 08:30:36 +0100 References: <97d8de0d-3daa-49be-a91f-c65fc8a9019f@googlegroups.com> Mime-Version: 1.0 Content-Type: text/plain; charset=us-ascii X-Gmane-NNTP-Posting-Host: pd9e080a4.dip0.t-ipconnect.de User-Agent: Gnus/5.1008 (Gnus v5.10.8) XEmacs/21.4.22 (linux) Cancel-Lock: sha1:B1OsTvohBSF7C9YE4veCkF2t4Ck= X-BeenThere: python-list@python.org X-Mailman-Version: 2.1.15 Precedence: list List-Id: General discussion list for the Python programming language List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , Newsgroups: comp.lang.python Message-ID: Lines: 38 NNTP-Posting-Host: 2001:888:2000:d::a6 X-Trace: 1351409448 news.xs4all.nl 6960 [2001:888:2000:d::a6]:51085 X-Complaints-To: abuse@xs4all.nl Xref: csiph.com comp.lang.python:32306 jannidis@gmail.com writes: > I am new to Python and have a problem with the behaviour of the xml parser. Assume we have this xml document: > > > > > Title of the first book. > > > > Title of the second book. > > > > > If I now check for the text of all 'entry' nodes, the text for the node with the empty element isn't shown > > > > import xml.etree.ElementTree as ET > tree = ET.ElementTree(file='test.xml') > root = tree.getroot() > resultSet = root.findall(".//entry") > for r in resultSet: > print (r.text) I do not know about "xml.etree" but the (said) quite compatible "lxml.etree" handles text nodes in a quite different way from that of "DOM": they are *not* considered children of the parent element but are attached as attributes "text" and "tail" to either the container element (if the first DOM node is a text node) or the preceeding element, otherwise. Your code snippet suggests that "xml.etree" behaves identically in this respect. In this case, you would find "Title of the second book" as the "tail" attribute of the element "coauthored".