Groups | Search | Server Info | Keyboard shortcuts | Login | Register [http] [https] [nntp] [nntps]


Groups > comp.lang.python > #31469 > unrolled thread

ElementTree Issue - Search and remove elements

Started byTharanga Abeyseela <tharanga.abeyseela@gmail.com>
First post2012-10-17 16:47 +1100
Last post2012-10-17 09:01 +0200
Articles 3 — 3 participants

Back to article view | Back to comp.lang.python


Contents

  ElementTree Issue - Search and remove elements Tharanga Abeyseela <tharanga.abeyseela@gmail.com> - 2012-10-17 16:47 +1100
    Re: ElementTree Issue - Search and remove elements Alain Ketterlin <alain@dpt-info.u-strasbg.fr> - 2012-10-17 08:25 +0200
      Re: ElementTree Issue - Search and remove elements Stefan Behnel <stefan_ml@behnel.de> - 2012-10-17 09:01 +0200

#31469 — ElementTree Issue - Search and remove elements

FromTharanga Abeyseela <tharanga.abeyseela@gmail.com>
Date2012-10-17 16:47 +1100
SubjectElementTree Issue - Search and remove elements
Message-ID<mailman.2323.1350452831.27098.python-list@python.org>
Hi Guys,

I need to remove the parent node, if a particular match found.

ex:


<?xml version="1.0" encoding="ISO-8859-1" standalone="yes"?>
<Feed xmlns="http://schemas.xxxx.xx/xx/2011/06/13/xx">
    <TVEpisode>
        <Provider>0x5</Provider>
        <ItemId>http://fxxxxxxl</ItemId>
        <Title>WWE</Title>
        <SortTitle>WWE </SortTitle>
        <Description>WWE</Description>
        <IsUserGenerated>false</IsUserGenerated>
        <Images>
            <Image>
                <ImagePurpose>BoxArt</ImagePurpose>
                <Url>https://xxxxxx.xx/@006548-thumb.jpg</Url>
            </Image>
        </Images>
        <LastModifiedDate>2012-10-16T00:00:19.814+11:00</LastModifiedDate>
        <Genres>
            <Genre>xxxxx</Genre>
        </Genres>
        <ParentalControl>
            <System>xxxx</System>
            <Rating>M</Rating>


if i found <Rating>NC</Rating>, i need to remove the <TVEpisode> from
the XML. i have TVseries,Movies,and several items. (they also have
Rating element). i need to remove all if i found the NC keyword.inside
<Ratging>


im using following code.

when i do the following on python shell  i can see the result (NC,M,etc)

>>> x[1].text
'NC'

but when i do this inside the script, im getting the following error.

Traceback (most recent call last):
  File "./test.py", line 10, in ?
    x = child.find('Rating').text
AttributeError: 'NoneType' object has no attribute 'text'


but how should i remove the parent node if i found the string "NC" i
need to do this for all elements (TVEpisode,Movies,TVshow etc)
how can i use python to remove the parent node if that string found.
(not only TVEpisodes, but others as well)


#!/usr/bin/env python

import elementtree.ElementTree as ET

tree = ET.parse('test.xml')
root = tree.getroot()


for child in root.findall(".//{http://schemas.CCC.com/CCC/2011/06/13/CC}Rating"):
       x = child.find('Rating').text
        if child[1].text == 'NC':
                print "found"
               root.remove('TVEpisode') ?????
tree.write('output.xml')


Really appreciate your thoughts on this.

Thanks in advance,
Tharanga

[toc] | [next] | [standalone]


#31475

FromAlain Ketterlin <alain@dpt-info.u-strasbg.fr>
Date2012-10-17 08:25 +0200
Message-ID<87pq4hbonj.fsf@dpt-info.u-strasbg.fr>
In reply to#31469
Tharanga Abeyseela <tharanga.abeyseela@gmail.com> writes:

> I need to remove the parent node, if a particular match found.

It looks like you can't get the parent of an Element with elementtree (I
would love to be proven wrong on this).

The solution is to find all nodes that have a Rating (grand-) child, and
then test explicitly for the value you're looking for.

> <?xml version="1.0" encoding="ISO-8859-1" standalone="yes"?>
> <Feed xmlns="http://schemas.xxxx.xx/xx/2011/06/13/xx">
>     <TVEpisode>
[...]
>         <ParentalControl>
>             <System>xxxx</System>
>             <Rating>M</Rating>


> for child in root.findall(".//{http://schemas.CCC.com/CCC/2011/06/13/CC}Rating"):
>        x = child.find('Rating').text
>         if child[1].text == 'NC':
>                 print "found"
>                root.remove('TVEpisode') ?????

Your code doesn't work because findall() already returns Rating
elements, and these have no Rating child (so your first call to find()
fails, i.e., returns None). And list indexes starts at 0, btw.

Also, Rating is not a child of TVEpisode, it is a child of
ParentalControl.

Here is my suggestion:

# Find nodes having a ParentalControl child
for child in root.findall(".//*[ParentalControl]"):
    x = child.find("ParentalControl/Rating").text
    if x == "NC":
        ...

Note that a complete XPath implementation would make that simpler: your
query basically is //*[ParentalControl/Rating=="NC"]

-- Alain.

[toc] | [prev] | [next] | [standalone]


#31477

FromStefan Behnel <stefan_ml@behnel.de>
Date2012-10-17 09:01 +0200
Message-ID<mailman.2328.1350457332.27098.python-list@python.org>
In reply to#31475
Alain Ketterlin, 17.10.2012 08:25:
> It looks like you can't get the parent of an Element with elementtree (I
> would love to be proven wrong on this).

No, that's by design. ElementTree allows you to reuse subtrees in a
document, for example, which wouldn't work if you enforced a single parent.
Also, keeping parent references out simplifies the tree structure
considerably, saves space and time and all that. ElementTree is really
great for what it does.

If you need to access the parent more often in a read-only tree, you can
quickly build up a back reference dict that maps each Element to its parent
by traversing the tree once.

Alternatively, use lxml.etree, in which Elements have a getparent() method
and in which single parents are enforced (also by design).

Stefan

[toc] | [prev] | [standalone]


Back to top | Article view | comp.lang.python


csiph-web