Path: csiph.com!v102.xanadu-bbs.net!xanadu-bbs.net!goblin2!goblin.stu.neva.ru!newsfeed.xs4all.nl!newsfeed4.news.xs4all.nl!xs4all!post.news.xs4all.nl!not-for-mail Return-Path: X-Original-To: python-list@python.org Delivered-To: python-list@mail.python.org X-Spam-Status: OK 0.005 X-Spam-Evidence: '*H*': 0.99; '*S*': 0.00; 'else:': 0.03; 'output': 0.05; 'attribute': 0.07; 'subject:file': 0.07; 'string': 0.09; '<?xml': 0.09; 'path)': 0.09; 'subject:string': 0.09; '{},': 0.09; 'python': 0.11; 'def': 0.12; 'be:': 0.16; 'subject:XML': 0.16; 'val,': 0.16; 'appears': 0.22; 'code,': 0.22; 'looks': 0.24; 'tried': 0.27; 'xml': 0.29; 'message-id:@mail.gmail.com': 0.30; "i'm": 0.30; 'getting': 0.31; 'lines': 0.31; '"': 0.31; 'node': 0.31; 'skip:= 20': 0.31; 'file': 0.32; 'subject:all': 0.32; 'skip:& 30': 0.33; 'to:name:python-list': 0.33; 'actual': 0.34; 'skip:d 20': 0.34; 'received:google.com': 0.35; 'really': 0.36; '8bit%:9': 0.36; 'leads': 0.36; 'skip:& 10': 0.38; 'thank': 0.38; 'to:addr:python-list': 0.38; 'skip:& 20': 0.39; 'to:addr:python.org': 0.39; 'skip:p 20': 0.39; 'even': 0.60; 'skip:a 30': 0.61; 'you.': 0.62; 'times': 0.62; 'such': 0.63; 'skip:n 10': 0.64; '8bit%:10': 0.64; '2000': 0.65; 'levels': 0.65; 'close': 0.67 DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=gmail.com; s=20120113; h=mime-version:date:message-id:subject:from:to:content-type; bh=geS7AGQYVVfA6pHi4EtMwt6HwQ7D+8aZ/FRaFE6u4/E=; b=fXInLVDuNBPHTe4R7NDvgu20prkQhIdnZ5lsSkvdvPn4O0SiIplb3XmXR0J/Nlt6ve QgKyU+5T17ZKrugdUuy9SItimEzi2d9PjxcNlQ+iVdKSHtjBtvXr6N31pNWl1AvhrI19 Cweyc/GQ+Liw7SSlaO8qDNFr0pOJM8LDOPmjE7LCXr7Lrm7YmTMHuzblNMDJfq6YoOUv DDu4lirIQMOYaW9PvGOV9puv0dVCp2rXKOuroAd7+90XLzf11oQWej5segiXwMDiOgTe /6ookIylLs3QvXupc2e/iTKtbOa1r5g6tdFCKxC9Kg7TYNgeO6dezlixNuFWT2XZle2/ nwmQ== MIME-Version: 1.0 X-Received: by 10.50.36.10 with SMTP id m10mr1005612igj.31.1371778206062; Thu, 20 Jun 2013 18:30:06 -0700 (PDT) Date: Thu, 20 Jun 2013 19:30:06 -0600 Subject: Finding all instances of a string in an XML file From: Jason Friedman To: python-list Content-Type: multipart/alternative; boundary=089e013c69bcaee2bd04dfa000b1 X-BeenThere: python-list@python.org X-Mailman-Version: 2.1.15 Precedence: list List-Id: General discussion list for the Python programming language List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , Newsgroups: comp.lang.python Message-ID: Lines: 98 NNTP-Posting-Host: 2001:888:2000:d::a6 X-Trace: 1371778210 news.xs4all.nl 15963 [2001:888:2000:d::a6]:33082 X-Complaints-To: abuse@xs4all.nl Xref: csiph.com comp.lang.python:48836 --089e013c69bcaee2bd04dfa000b1 Content-Type: text/plain; charset=UTF-8 I have XML which looks like: The "Property X" string appears twice times and I want to output the "path" that leads to all such appearances. In this case the output would be: LEVEL_1 {}, LEVEL_2 {"ATTR": "hello"}, ATTRIBUTE {"NAME": "Property X", "VALUE": "2"} LEVEL_1 {}, LEVEL_2 {"ATTR": "goodbye"}, LEVEL_3 {"ATTR": "aloha"}, ATTRIBUTE {"NAME": "Property X", "VALUE": "3"} My actual XML file is 2000 lines and contains up to 8 levels of nesting. I have tried this so far (partial code, using the xml.etree.ElementTree module): def get_path(data_dictionary, val, path): for node in data_dictionary[CHILDREN]: if node[CHILDREN]: if not path or node[TAG] != path[-1]: path.append(node[TAG]) print(CR + "recursing ...") get_path(node, val, path) else: for k,v in node[ATTRIB].items(): if v == val: print("path- ",path) print("---- " + node[TAG] + " " + str(node[ATTRIB])) I'm really not even close to getting the output I am looking for. Python 3.2.2. Thank you. --089e013c69bcaee2bd04dfa000b1 Content-Type: text/html; charset=UTF-8 Content-Transfer-Encoding: quoted-printable
I have XML which looks like:

= <?xml version=3D"1.0" encoding=3D"UTF-8"?>
<= div><!DOCTYPE KMART SYSTEM "my.dtd">
<LEVEL_1&= gt;
=C2=A0 <LEVEL_2 ATTR=3D"hello">
=C2=A0 =C2= =A0 <ATTRIBUTE NAME=3D"Property X" VALUE =3D"2"/>=
=C2=A0 </LEVEL_2>
=C2=A0 <LEVEL_2 ATTR=3D&quo= t;goodbye">
=C2=A0 =C2=A0 <ATTRIBUTE NAME=3D"Prop= erty Y" VALUE =3D"NULL"/>
=C2=A0 =C2=A0 <LEVEL_3 ATTR=3D"aloha">
=C2= =A0 =C2=A0 =C2=A0 <ATTRIBUTE NAME=3D"Property X" VALUE =3D&quo= t;3"/>
=C2=A0 =C2=A0 </LEVEL_3>
=C2=A0 = =C2=A0 <ATTRIBUTE NAME=3D"Property Z" VALUE =3D"welcome&q= uot;/>
=C2=A0 </LEVEL_2>
</LEVEL_1>

The "Property X" string appears twice tim= es and I want to output the "path" that leads =C2=A0to all such a= ppearances. =C2=A0In this case the output would be:

LEVEL_1 {}, LEVEL_2 {"ATTR&q= uot;: "hello"}, ATTRIBUTE {"NAME": "Property X&quo= t;, "VALUE": "2"}
LEVEL_1 {}, LEVEL_2 {"= ATTR": "goodbye"}, LEVEL_3 {"ATTR": "aloha&qu= ot;}, ATTRIBUTE {"NAME": "Property X", "VALUE"= ;: "3"}

My actual XML file is 2000 lines and contai= ns up to 8 levels of nesting.

I have tried t= his so far (partial code, using the xml.etree.ElementTree module):
def get_path(data_dictionary, val, path):
=C2=A0 for no= de in data_dictionary[CHILDREN]:
=C2=A0 =C2=A0 if node[CHILDREN]:=
=C2=A0 =C2=A0 =C2=A0 =C2=A0 if not path or node[TAG] !=3D path[-= 1]:
=C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 path.append(node[TA= G])
=C2=A0 =C2=A0 =C2=A0 =C2=A0 print(CR + "recursing ...")
=C2=A0 =C2=A0 =C2=A0 =C2=A0 get_path(node, val, path)
=C2= =A0 =C2=A0 else:
=C2=A0 =C2=A0 =C2=A0 =C2=A0 for k,v in node[ATTR= IB].items():
=C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 if v =3D= =3D val:
=C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 = print("path- ",path)
=C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 print("--= -- " + node[TAG] + " " + str(node[ATTRIB]))
=
I'm really not even close to getting the output I = am looking for.
Python 3.2.2.
Thank you.

--089e013c69bcaee2bd04dfa000b1--