Groups | Search | Server Info | Keyboard shortcuts | Login | Register [http] [https] [nntp] [nntps]


Groups > comp.lang.python > #48836 > unrolled thread

Finding all instances of a string in an XML file

Started byJason Friedman <jsf80238@gmail.com>
First post2013-06-20 19:30 -0600
Last post2013-06-20 19:30 -0600
Articles 1 — 1 participant

Back to article view | Back to comp.lang.python


Contents

  Finding all instances of a string in an XML file Jason Friedman <jsf80238@gmail.com> - 2013-06-20 19:30 -0600

#48836 — Finding all instances of a string in an XML file

FromJason Friedman <jsf80238@gmail.com>
Date2013-06-20 19:30 -0600
SubjectFinding all instances of a string in an XML file
Message-ID<mailman.3648.1371778210.3114.python-list@python.org>

[Multipart message — attachments visible in raw view] — view raw

I have XML which looks like:

<?xml version="1.0" encoding="UTF-8"?>
<!DOCTYPE KMART SYSTEM "my.dtd">
<LEVEL_1>
  <LEVEL_2 ATTR="hello">
    <ATTRIBUTE NAME="Property X" VALUE ="2"/>
  </LEVEL_2>
  <LEVEL_2 ATTR="goodbye">
    <ATTRIBUTE NAME="Property Y" VALUE ="NULL"/>
    <LEVEL_3 ATTR="aloha">
      <ATTRIBUTE NAME="Property X" VALUE ="3"/>
    </LEVEL_3>
    <ATTRIBUTE NAME="Property Z" VALUE ="welcome"/>
  </LEVEL_2>
</LEVEL_1>

The "Property X" string appears twice times and I want to output the "path"
that leads  to all such appearances.  In this case the output would be:

LEVEL_1 {}, LEVEL_2 {"ATTR": "hello"}, ATTRIBUTE {"NAME": "Property X",
"VALUE": "2"}
LEVEL_1 {}, LEVEL_2 {"ATTR": "goodbye"}, LEVEL_3 {"ATTR": "aloha"},
ATTRIBUTE {"NAME": "Property X", "VALUE": "3"}

My actual XML file is 2000 lines and contains up to 8 levels of nesting.

I have tried this so far (partial code, using the xml.etree.ElementTree
module):
def get_path(data_dictionary, val, path):
  for node in data_dictionary[CHILDREN]:
    if node[CHILDREN]:
        if not path or node[TAG] != path[-1]:
            path.append(node[TAG])
        print(CR + "recursing ...")
        get_path(node, val, path)
    else:
        for k,v in node[ATTRIB].items():
            if v == val:
                print("path- ",path)
                print("---- " + node[TAG] + " " + str(node[ATTRIB]))

I'm really not even close to getting the output I am looking for.
Python 3.2.2.
Thank you.

[toc] | [standalone]


Back to top | Article view | comp.lang.python


csiph-web