Groups | Search | Server Info | Keyboard shortcuts | Login | Register [http] [https] [nntp] [nntps]


Groups > comp.lang.python > #33254

Re: xml data or other?

Path csiph.com!usenet.pasdenom.info!news.albasani.net!newsfeed.freenet.ag!news2.euro.net!newsgate.cistron.nl!newsgate.news.xs4all.nl!post.news.xs4all.nl!not-for-mail
Return-Path <d@davea.name>
X-Original-To python-list@python.org
Delivered-To python-list@mail.python.org
X-Spam-Status OK 0.008
X-Spam-Evidence '*H*': 0.98; '*S*': 0.00; 'python.': 0.02; 'broken': 0.03; 'python': 0.09; 'subject:xml': 0.09; 'xml.': 0.09; 'cc:addr :python-list': 0.10; 'creating,': 0.16; 'integration)': 0.16; 'slashes.': 0.16; 'wrote:': 0.17; 'fix': 0.17; 'module': 0.19; 'translate': 0.20; 'parse': 0.22; "i'd": 0.22; 'cc:2**0': 0.23; 'example': 0.23; 'this:': 0.23; 'cc:no real name:2**0': 0.24; 'header': 0.24; 'cc:addr:python.org': 0.25; 'header:In-Reply- To:1': 0.25; 'header:User-Agent:1': 0.26; 'creating': 0.26; 'am,': 0.27; 'replace': 0.27; "doesn't": 0.28; '(perhaps': 0.29; 'subject:other': 0.29; 'on,': 0.30; 'subject:data': 0.33; 'subject:?': 0.35; 'there': 0.35; 'engineering': 0.36; 'turn': 0.36; 'possible': 0.37; 'xml': 0.37; 'does': 0.37; 'data': 0.37; 'subject:: ': 0.38; 'description': 0.39; 'received:192': 0.39; 'hello,': 0.39; 'received:192.168': 0.40; 'think': 0.40; 'your': 0.60; "you've": 0.61; 'reverse': 0.65; 'forward': 0.66; 'header :Reply-To:1': 0.68; 'received:74.208': 0.71; 'reply-to:no real name:2**0': 0.72; 'power': 0.74; 'upstream': 0.84
Date Tue, 13 Nov 2012 13:01:17 -0500
From Dave Angel <d@davea.name>
User-Agent Mozilla/5.0 (X11; Linux x86_64; rv:16.0) Gecko/20121011 Thunderbird/16.0.1
MIME-Version 1.0
To Artie Ziff <artie.ziff@gmail.com>
Subject Re: xml data or other?
References <509CFD13.9080206@gmail.com>
In-Reply-To <509CFD13.9080206@gmail.com>
Content-Type text/plain; charset=ISO-8859-1
Content-Transfer-Encoding 7bit
X-Provags-ID V02:K0:gzL6DuHWekHwTok1LlwIPQBZn0j1A1KxyP0ll+3IIIf Bu2cinfGQsYXoA4GmJgFMMs1iQq0uLamKM3PB3WIsRxCZaeqP6 87CxO5ixWZWe7bY8zGap0tUMA5RjiUzQ1hwXQHIyizVpllw9oP oiHWIJrKqAyyR60NH788sPq9RZf54jgrGw7UcZItwLELL8v0FK Q8PAlbad5YBq7CAARtMoaiPdhsPToVd3snVF8hvlLN5TOU0o4z VfPjsXPlyqYzzBH8XL6ly80ral3TtS/EkSTN8yh0eQYmPwYveT aFHNKF15FY2z07wyAElJrDVFIzq0/XE9Vo4vs6pv4YDvwxXrA= =
Cc python-list@python.org
X-BeenThere python-list@python.org
X-Mailman-Version 2.1.15
Precedence list
Reply-To d@davea.name
List-Id General discussion list for the Python programming language <python-list.python.org>
List-Unsubscribe <http://mail.python.org/mailman/options/python-list>, <mailto:python-list-request@python.org?subject=unsubscribe>
List-Archive <http://mail.python.org/pipermail/python-list/>
List-Post <mailto:python-list@python.org>
List-Help <mailto:python-list-request@python.org?subject=help>
List-Subscribe <http://mail.python.org/mailman/listinfo/python-list>, <mailto:python-list-request@python.org?subject=subscribe>
Newsgroups comp.lang.python
Message-ID <mailman.3638.1352829700.27098.python-list@python.org> (permalink)
Lines 43
NNTP-Posting-Host 2001:888:2000:d::a6
X-Trace 1352829700 news.xs4all.nl 6981 [2001:888:2000:d::a6]:49002
X-Complaints-To abuse@xs4all.nl
Xref csiph.com comp.lang.python:33254

Show key headers only | View raw


On 11/09/2012 07:54 AM, Artie Ziff wrote:
> Hello,
>
> I want to process XML-like data like this:
>
> <testname=ltpacpi.sh>
>     <description>
>         ACPI (Advanced Control Power & Integration) testscript for 2.5
> kernels.
>
>     <\description>
>     <test_location>
>         ltp/testcases/kernel/device-drivers/acpi/ltpacpi.sh
>     <\test_location>
> <\testname>
> <snip...>
>
>
> Is there a name for the format above (perhaps xhtml)?

The only word I can think of is "broken."  xml and html and xhtml all
use forward slashes.

> I'd like to find a python module that can translate it to proper xml.
> Does one exist? etree?
>

I think you've already figured it out.    Just take your description and
turn it into Python.  in other words, replace all "<\" with "</" and
perhaps " \>" with " /", although your example doesn't happen to have
any of these.  Tack a  xml header on, and try to parse it with etree. 
If you can't, then let someone manually fix it.

Or better, fix the program upstream that's creating this mess.  There
isn't a reliable way to "fix" all the possible broken xml it might be
creating, without reverse engineering it.



-- 

DaveA

Back to comp.lang.python | Previous | Next | Find similar | Unroll thread


Thread

Re: xml data or other? Dave Angel <d@davea.name> - 2012-11-13 13:01 -0500

csiph-web