Groups | Search | Server Info | Keyboard shortcuts | Login | Register [http] [https] [nntp] [nntps]
Groups > comp.lang.python > #48844
| From | Peter Otten <__peter__@web.de> |
|---|---|
| Subject | Re: Finding all instances of a string in an XML file |
| Date | 2013-06-21 08:16 +0200 |
| Organization | None |
| References | <CANy1k1igA3PLR-Pgy9w4T1pbn7d2tK9=6tNpnett6iTjACDs7w@mail.gmail.com> |
| Newsgroups | comp.lang.python |
| Message-ID | <mailman.3652.1371795314.3114.python-list@python.org> (permalink) |
Jason Friedman wrote:
> I have XML which looks like:
>
> <?xml version="1.0" encoding="UTF-8"?>
> <!DOCTYPE KMART SYSTEM "my.dtd">
> <LEVEL_1>
> <LEVEL_2 ATTR="hello">
> <ATTRIBUTE NAME="Property X" VALUE ="2"/>
> </LEVEL_2>
> <LEVEL_2 ATTR="goodbye">
> <ATTRIBUTE NAME="Property Y" VALUE ="NULL"/>
> <LEVEL_3 ATTR="aloha">
> <ATTRIBUTE NAME="Property X" VALUE ="3"/>
> </LEVEL_3>
> <ATTRIBUTE NAME="Property Z" VALUE ="welcome"/>
> </LEVEL_2>
> </LEVEL_1>
>
> The "Property X" string appears twice times and I want to output the
> "path"
> that leads to all such appearances. In this case the output would be:
>
> LEVEL_1 {}, LEVEL_2 {"ATTR": "hello"}, ATTRIBUTE {"NAME": "Property X",
> "VALUE": "2"}
> LEVEL_1 {}, LEVEL_2 {"ATTR": "goodbye"}, LEVEL_3 {"ATTR": "aloha"},
> ATTRIBUTE {"NAME": "Property X", "VALUE": "3"}
>
> My actual XML file is 2000 lines and contains up to 8 levels of nesting.
That's still small, so
xml = """<?xml version="1.0" encoding="UTF-8"?>
<!DOCTYPE KMART SYSTEM "my.dtd">
<LEVEL_1>
<LEVEL_2 ATTR="hello">
<ATTRIBUTE NAME="Property X" VALUE ="2"/>
</LEVEL_2>
<LEVEL_2 ATTR="goodbye">
<ATTRIBUTE NAME="Property Y" VALUE ="NULL"/>
<LEVEL_3 ATTR="aloha">
<ATTRIBUTE NAME="Property X" VALUE ="3"/>
</LEVEL_3>
<ATTRIBUTE NAME="Property Z" VALUE ="welcome"/>
</LEVEL_2>
</LEVEL_1>
"""
import xml.etree.ElementTree as etree
tree = etree.fromstring(xml)
def walk(elem, path, token):
path += (elem,)
if token in elem.attrib.values():
yield path
for child in elem.getchildren():
for match in walk(child, path, token):
yield match
for path in walk(tree, (), "Property X"):
print(", ".join("{} {}".format(elem.tag, elem.attrib) for elem in path))
Back to comp.lang.python | Previous | Next | Find similar | Unroll thread
Re: Finding all instances of a string in an XML file Peter Otten <__peter__@web.de> - 2013-06-21 08:16 +0200
csiph-web