Groups | Search | Server Info | Keyboard shortcuts | Login | Register [http] [https] [nntp] [nntps]


Groups > comp.lang.python > #48844

Re: Finding all instances of a string in an XML file

From Peter Otten <__peter__@web.de>
Subject Re: Finding all instances of a string in an XML file
Date 2013-06-21 08:16 +0200
Organization None
References <CANy1k1igA3PLR-Pgy9w4T1pbn7d2tK9=6tNpnett6iTjACDs7w@mail.gmail.com>
Newsgroups comp.lang.python
Message-ID <mailman.3652.1371795314.3114.python-list@python.org> (permalink)

Show all headers | View raw


Jason Friedman wrote:

> I have XML which looks like:
> 
> <?xml version="1.0" encoding="UTF-8"?>
> <!DOCTYPE KMART SYSTEM "my.dtd">
> <LEVEL_1>
>   <LEVEL_2 ATTR="hello">
>     <ATTRIBUTE NAME="Property X" VALUE ="2"/>
>   </LEVEL_2>
>   <LEVEL_2 ATTR="goodbye">
>     <ATTRIBUTE NAME="Property Y" VALUE ="NULL"/>
>     <LEVEL_3 ATTR="aloha">
>       <ATTRIBUTE NAME="Property X" VALUE ="3"/>
>     </LEVEL_3>
>     <ATTRIBUTE NAME="Property Z" VALUE ="welcome"/>
>   </LEVEL_2>
> </LEVEL_1>
> 
> The "Property X" string appears twice times and I want to output the
> "path"
> that leads  to all such appearances.  In this case the output would be:
> 
> LEVEL_1 {}, LEVEL_2 {"ATTR": "hello"}, ATTRIBUTE {"NAME": "Property X",
> "VALUE": "2"}
> LEVEL_1 {}, LEVEL_2 {"ATTR": "goodbye"}, LEVEL_3 {"ATTR": "aloha"},
> ATTRIBUTE {"NAME": "Property X", "VALUE": "3"}
> 
> My actual XML file is 2000 lines and contains up to 8 levels of nesting.

That's still small, so

xml = """<?xml version="1.0" encoding="UTF-8"?>
<!DOCTYPE KMART SYSTEM "my.dtd">
<LEVEL_1>
  <LEVEL_2 ATTR="hello">
    <ATTRIBUTE NAME="Property X" VALUE ="2"/>
  </LEVEL_2>
  <LEVEL_2 ATTR="goodbye">
    <ATTRIBUTE NAME="Property Y" VALUE ="NULL"/>
    <LEVEL_3 ATTR="aloha">
      <ATTRIBUTE NAME="Property X" VALUE ="3"/>
    </LEVEL_3>
    <ATTRIBUTE NAME="Property Z" VALUE ="welcome"/>
  </LEVEL_2>
</LEVEL_1>
"""

import xml.etree.ElementTree as etree

tree = etree.fromstring(xml)

def walk(elem, path, token):
    path += (elem,)
    if token in elem.attrib.values():
        yield path
    for child in elem.getchildren():
        for match in walk(child, path, token):
            yield match

for path in walk(tree, (), "Property X"):
    print(", ".join("{} {}".format(elem.tag, elem.attrib) for elem in path))

Back to comp.lang.python | Previous | Next | Find similar | Unroll thread


Thread

Re: Finding all instances of a string in an XML file Peter Otten <__peter__@web.de> - 2013-06-21 08:16 +0200

csiph-web