Path: csiph.com!fu-berlin.de!uni-berlin.de!not-for-mail From: jmp Newsgroups: comp.lang.python Subject: Re: Newbie XML problem Date: Tue, 22 Dec 2015 13:27:33 +0100 Lines: 78 Message-ID: References: <3d2e5064-9cc0-43de-a708-faf528a795ca@googlegroups.com> Mime-Version: 1.0 Content-Type: text/plain; charset=windows-1252; format=flowed Content-Transfer-Encoding: 7bit X-Trace: news.uni-berlin.de r4csmwWY64BLY6OfwiE+rAhWW9PtRsCb6Cwa4JlpCXoA== Return-Path: X-Original-To: python-list@python.org Delivered-To: python-list@mail.python.org X-Spam-Status: OK 0.000 X-Spam-Evidence: '*H*': 1.00; '*S*': 0.00; 'attributes': 0.07; 'canvas': 0.07; '(0,': 0.09; 'python:': 0.09; 'received:80.91': 0.09; 'received:80.91.229': 0.09; 'received:gmane.org': 0.09; 'received:list': 0.09; 'python': 0.10; 'anyway': 0.11; '2.7': 0.13; 'do,': 0.15; 'properly': 0.15; "'b'": 0.16; '1280,': 0.16; 'attributes:': 0.16; 'coordinates': 0.16; 'happily': 0.16; 'node.': 0.16; 'nodes': 0.16; 'nodes.': 0.16; 'received:80.91.229.3': 0.16; 'received:io': 0.16; 'received:plane.gmane.org': 0.16; 'received:psf.io': 0.16; 'specified.': 0.16; 'subject:XML': 0.16; 'wrote:': 0.16; 'attribute': 0.18; 'config': 0.18; 'tree': 0.18; 'filtering': 0.22; 'fine,': 0.22; 'parser': 0.22; 'subject:problem': 0.22; 'cheers,': 0.22; 'am,': 0.23; 'code.': 0.23; 'comment:': 0.23; 'import': 0.24; 'skip:b 30': 0.24; 'xml': 0.24; 'header:In-Reply- To:1': 0.24; 'header:User-Agent:1': 0.26; 'example': 0.26; 'header:X-Complaints-To:1': 0.26; 'cool': 0.27; 'yield': 0.27; 'values': 0.28; 'node': 0.29; "i'm": 0.30; 'print': 0.30; 'code': 0.30; 'statement': 0.32; 'point': 0.33; 'usually': 0.33; 'structure': 0.34; 'file': 0.34; 'list': 0.34; 'gets': 0.35; 'instance': 0.35; 'but': 0.36; 'skip:i 20': 0.36; 'beginning': 0.36; 'child': 0.36; 'depends': 0.36; 'to:addr:python-list': 0.36; 'subject:: ': 0.37; 'really': 0.37; 'received:org': 0.37; 'skip:e 20': 0.39; 'skip:x 10': 0.40; 'to:addr:python.org': 0.40; 'received:194': 0.61; 'charset:windows-1252': 0.62; 'skip:n 10': 0.62; 'panel': 0.63; 'can:': 0.84 X-Injected-Via-Gmane: http://gmane.org/ X-Gmane-NNTP-Posting-Host: paris.sequans.com User-Agent: Mozilla/5.0 (X11; Linux x86_64; rv:31.0) Gecko/20100101 Thunderbird/31.6.0 In-Reply-To: <3d2e5064-9cc0-43de-a708-faf528a795ca@googlegroups.com> X-BeenThere: python-list@python.org X-Mailman-Version: 2.1.20+ Precedence: list List-Id: General discussion list for the Python programming language List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , Xref: csiph.com comp.lang.python:100726 On 12/22/2015 05:29 AM, KP wrote: > > From my first foray into XML with Python: > > I would like to retrieve this list from the XML upon searching for the 'config' with id attribute = 'B' > > > config = {id: 1, canvas: (3840, 1024), comment: "a comment", > {id: 4, gate: 3, (0, 0, 1280, 1024)}, > {id: 5, gate: 2, (1280, 0, 2560, 1024)}, > {id: 6, gate: 1, (2560, 0, 3840, 1024)}} > > I have started to use this code - but this is beginning to feel very non-elegant; not the cool Python code I usually see... > > import xml.etree.ElementTree as ET > > tree = ET.parse('master.xml') > master = tree.getroot() > > for config in master: > if config.attrib['id'] == 'B': > ... It much depends on 1/ the xml parser you use. 2/ the xml structure 1/ I'm happily using beautifulSoup. Using it is really simple and yield simple code. 2/ Whenever the code gets complicated is because the xml is not properly structured. For instance in you example, 'id' is an attribute of 'config' nodes, that's fine, but for 'panel' nodes it's a child node. There's no point using a node when only one 'id' can be specified. Filtering by attributes is much easier than by child nodes. Anyway here's an example of using beautifulSoup: python 2.7 (fix the print statement if you're using python3) import bs4 xmlp = bs4.BeautifulSoup(open('test.xml', 'r'), 'xml') # print all canvas for cfg in xmlp.findAll('config'): print cfg.canvas.text # find config B panel 6 coordinates xmlp.find('config', id='B').find(lambda node: node.name=='panel' and node.id.text=='6').coordinates.text # if panel id were attributes: xmlp.find('config', id='B').find('panel', id='6').coordinates.text If you can change the layout of the xml file it's better that you do, put every values as attribute whenever you can: comments can span on multiple lines, you probably need a node Properly structured xml will yield proper python code. cheers, JM