Path: csiph.com!v102.xanadu-bbs.net!xanadu-bbs.net!news.mixmin.net!feeds.phibee-telecom.net!newsfeed.xs4all.nl!newsfeed1.news.xs4all.nl!xs4all!newsgate.cistron.nl!newsgate.news.xs4all.nl!post.news.xs4all.nl!not-for-mail Return-Path: X-Original-To: python-list@python.org Delivered-To: python-list@mail.python.org X-Spam-Status: OK 0.005 X-Spam-Evidence: '*H*': 0.99; '*S*': 0.00; 'tree': 0.05; '"""': 0.07; 'subject:file': 0.07; 'subject:string': 0.09; 'cc:addr:python- list': 0.11; 'def': 0.12; '(),': 0.16; 'elem': 0.16; 'subject:XML': 0.16; 'skip:= 10': 0.16; 'import': 0.22; 'cc:addr:python.org': 0.22; 'skip:e 30': 0.24; 'skip:{ 20': 0.24; 'cc:2**0': 0.24; 'header:In-Reply-To:1': 0.27; 'xml': 0.29; 'message-id:@mail.gmail.com': 0.30; 'skip:= 20': 0.31; 'subject:all': 0.32; 'received:google.com': 0.35; 'yield': 0.36; 'skip:& 10': 0.38; 'thank': 0.38; 'skip:& 20': 0.39; 'skip:n 10': 0.64; 'peter,': 0.84; 'to:none': 0.92 DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=gmail.com; s=20120113; h=mime-version:in-reply-to:references:date:message-id:subject:from:cc :content-type; bh=7Ptlxw3Hd50y+HWTFQIOV3z1yjUq4Y+58aiDpGU5ZAQ=; b=Jjx4a298x8tpY1mS9s5cORMxoLP0gdARDCt6sYCtUKkaxb2gYzCOinWR4W3L2wV2Ym 3BrTLk654E1cV5fh7iSgg5HayzRUh5SH9F9eiwuBu64EPunJSlEIb1GXGZZlfrilQkYh TCiMzi0koEzN47Lgt8dquN+RxTxfhVyWOl31W254TYkgBvNLDPMYYA7slmb/p+/AIgV7 dwwI798D1PnKFGySqITpkivfTjLKOrU9f1IdckscU4j2TsnZOsDmkHQLA2e6e5Nfryk4 IGNduiZq9Op4nGk2TTQWF15gdr3jjpxJxzkIsezZMKQxN2k1uSqcDQMqWjf2RkvL+OhR fasQ== MIME-Version: 1.0 X-Received: by 10.50.129.68 with SMTP id nu4mr3967608igb.9.1372014901612; Sun, 23 Jun 2013 12:15:01 -0700 (PDT) In-Reply-To: References: Date: Sun, 23 Jun 2013 13:15:01 -0600 Subject: Re: Finding all instances of a string in an XML file From: Jason Friedman Cc: python-list Content-Type: multipart/alternative; boundary=047d7b414030d6641904dfd71cf1 X-BeenThere: python-list@python.org X-Mailman-Version: 2.1.15 Precedence: list List-Id: General discussion list for the Python programming language List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , Newsgroups: comp.lang.python Message-ID: Lines: 87 NNTP-Posting-Host: 2001:888:2000:d::a6 X-Trace: 1372014910 news.xs4all.nl 15864 [2001:888:2000:d::a6]:34193 X-Complaints-To: abuse@xs4all.nl Xref: csiph.com comp.lang.python:49002 --047d7b414030d6641904dfd71cf1 Content-Type: text/plain; charset=UTF-8 > xml = """ > > > > > > > > > > > > > > """ > > import xml.etree.ElementTree as etree > > tree = etree.fromstring(xml) > > def walk(elem, path, token): > path += (elem,) > if token in elem.attrib.values(): > yield path > for child in elem.getchildren(): > for match in walk(child, path, token): > yield match > > for path in walk(tree, (), "Property X"): > print(", ".join("{} {}".format(elem.tag, elem.attrib) for elem in > path)) > > Peter, thank you, that exactly meets my need. --047d7b414030d6641904dfd71cf1 Content-Type: text/html; charset=UTF-8 Content-Transfer-Encoding: quoted-printable

=
xml =3D """<?xml version=3D"1.0" encoding=3D&qu= ot;UTF-8"?>
<!DOCTYPE KMART SYSTEM "my.dtd">
<LEVEL_1>
=C2=A0 <LEVEL_2 ATTR=3D"hello">
=C2=A0 =C2=A0 <ATTRIBUTE NAME=3D"Property X" VALUE =3D"2&= quot;/>
=C2=A0 </LEVEL_2>
=C2=A0 <LEVEL_2 ATTR=3D"goodbye">
=C2=A0 =C2=A0 <ATTRIBUTE NAME=3D"Property Y" VALUE =3D"NU= LL"/>
=C2=A0 =C2=A0 <LEVEL_3 ATTR=3D"aloha">
=C2=A0 =C2=A0 =C2=A0 <ATTRIBUTE NAME=3D"Property X" VALUE =3D&= quot;3"/>
=C2=A0 =C2=A0 </LEVEL_3>
=C2=A0 =C2=A0 <ATTRIBUTE NAME=3D"Property Z" VALUE =3D"we= lcome"/>
=C2=A0 </LEVEL_2>
</LEVEL_1>
"""

import xml.etree.ElementTree as etree

tree =3D etree.fromstring(xml)

def walk(elem, path, token):
=C2=A0 =C2=A0 path +=3D (elem,)
=C2=A0 =C2=A0 if token in elem.attrib.values():
=C2=A0 =C2=A0 =C2=A0 =C2=A0 yield path
=C2=A0 =C2=A0 for child in elem.getchildren():
=C2=A0 =C2=A0 =C2=A0 =C2=A0 for match in walk(child, path, token):
=C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 yield match

for path in walk(tree, (), "Property X"):
=C2=A0 =C2=A0 print(", ".join("{} {}".format(elem.tag, = elem.attrib) for elem in path))

Peter, thank you, that exactly meets my need.=C2=A0
--047d7b414030d6641904dfd71cf1--