Path: csiph.com!usenet.pasdenom.info!news.albasani.net!newsfeed.freenet.ag!news2.euro.net!newsgate.cistron.nl!newsgate.news.xs4all.nl!post.news.xs4all.nl!not-for-mail Return-Path: X-Original-To: python-list@python.org Delivered-To: python-list@mail.python.org X-Spam-Status: OK 0.008 X-Spam-Evidence: '*H*': 0.98; '*S*': 0.00; 'python.': 0.02; 'broken': 0.03; 'python': 0.09; 'subject:xml': 0.09; 'xml.': 0.09; 'cc:addr :python-list': 0.10; 'creating,': 0.16; 'integration)': 0.16; 'slashes.': 0.16; 'wrote:': 0.17; 'fix': 0.17; 'module': 0.19; 'translate': 0.20; 'parse': 0.22; "i'd": 0.22; 'cc:2**0': 0.23; 'example': 0.23; 'this:': 0.23; 'cc:no real name:2**0': 0.24; 'header': 0.24; 'cc:addr:python.org': 0.25; 'header:In-Reply- To:1': 0.25; 'header:User-Agent:1': 0.26; 'creating': 0.26; 'am,': 0.27; 'replace': 0.27; "doesn't": 0.28; '(perhaps': 0.29; 'subject:other': 0.29; 'on,': 0.30; 'subject:data': 0.33; 'subject:?': 0.35; 'there': 0.35; 'engineering': 0.36; 'turn': 0.36; 'possible': 0.37; 'xml': 0.37; 'does': 0.37; 'data': 0.37; 'subject:: ': 0.38; 'description': 0.39; 'received:192': 0.39; 'hello,': 0.39; 'received:192.168': 0.40; 'think': 0.40; 'your': 0.60; "you've": 0.61; 'reverse': 0.65; 'forward': 0.66; 'header :Reply-To:1': 0.68; 'received:74.208': 0.71; 'reply-to:no real name:2**0': 0.72; 'power': 0.74; 'upstream': 0.84 Date: Tue, 13 Nov 2012 13:01:17 -0500 From: Dave Angel User-Agent: Mozilla/5.0 (X11; Linux x86_64; rv:16.0) Gecko/20121011 Thunderbird/16.0.1 MIME-Version: 1.0 To: Artie Ziff Subject: Re: xml data or other? References: <509CFD13.9080206@gmail.com> In-Reply-To: <509CFD13.9080206@gmail.com> Content-Type: text/plain; charset=ISO-8859-1 Content-Transfer-Encoding: 7bit X-Provags-ID: V02:K0:gzL6DuHWekHwTok1LlwIPQBZn0j1A1KxyP0ll+3IIIf Bu2cinfGQsYXoA4GmJgFMMs1iQq0uLamKM3PB3WIsRxCZaeqP6 87CxO5ixWZWe7bY8zGap0tUMA5RjiUzQ1hwXQHIyizVpllw9oP oiHWIJrKqAyyR60NH788sPq9RZf54jgrGw7UcZItwLELL8v0FK Q8PAlbad5YBq7CAARtMoaiPdhsPToVd3snVF8hvlLN5TOU0o4z VfPjsXPlyqYzzBH8XL6ly80ral3TtS/EkSTN8yh0eQYmPwYveT aFHNKF15FY2z07wyAElJrDVFIzq0/XE9Vo4vs6pv4YDvwxXrA= = Cc: python-list@python.org X-BeenThere: python-list@python.org X-Mailman-Version: 2.1.15 Precedence: list Reply-To: d@davea.name List-Id: General discussion list for the Python programming language List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , Newsgroups: comp.lang.python Message-ID: Lines: 43 NNTP-Posting-Host: 2001:888:2000:d::a6 X-Trace: 1352829700 news.xs4all.nl 6981 [2001:888:2000:d::a6]:49002 X-Complaints-To: abuse@xs4all.nl Xref: csiph.com comp.lang.python:33254 On 11/09/2012 07:54 AM, Artie Ziff wrote: > Hello, > > I want to process XML-like data like this: > > > > ACPI (Advanced Control Power & Integration) testscript for 2.5 > kernels. > > <\description> > > ltp/testcases/kernel/device-drivers/acpi/ltpacpi.sh > <\test_location> > <\testname> > > > > Is there a name for the format above (perhaps xhtml)? The only word I can think of is "broken." xml and html and xhtml all use forward slashes. > I'd like to find a python module that can translate it to proper xml. > Does one exist? etree? > I think you've already figured it out. Just take your description and turn it into Python. in other words, replace all "<\" with "" with " /", although your example doesn't happen to have any of these. Tack a xml header on, and try to parse it with etree. If you can't, then let someone manually fix it. Or better, fix the program upstream that's creating this mess. There isn't a reliable way to "fix" all the possible broken xml it might be creating, without reverse engineering it. -- DaveA