Groups | Search | Server Info | Keyboard shortcuts | Login | Register [http] [https] [nntp] [nntps]


Groups > comp.lang.python > #33559

RE: xml data or other?

Path csiph.com!usenet.pasdenom.info!weretis.net!feeder1.news.weretis.net!feeder.erje.net!eu.feeder.erje.net!newsfeed.xs4all.nl!newsfeed5.news.xs4all.nl!xs4all!newsgate.cistron.nl!newsgate.news.xs4all.nl!post.news.xs4all.nl!not-for-mail
Return-Path <ramit.prasad@jpmorgan.com>
X-Original-To python-list@python.org
Delivered-To python-list@mail.python.org
X-Spam-Status OK 0.000
X-Spam-Evidence '*H*': 1.00; '*S*': 0.00; 'broken': 0.03; 'output': 0.04; '"""': 0.05; 'completeness': 0.07; 'data):': 0.07; 'filename': 0.07; 'parsing': 0.07; 'postfix': 0.07; 'prefix': 0.07; 'trailing': 0.07; '###': 0.09; "'w')": 0.09; 'brackets': 0.09; 'filename)': 0.09; 'iterate': 0.09; 'received:155': 0.09; 'self.data': 0.09; 'subject:xml': 0.09; 'def': 0.10; 'file,': 0.15; 'angle': 0.16; 'disclaimers': 0.16; 'disclaimers,': 0.16; 'from:addr:jpmorgan.com': 0.16; 'installs': 0.16; 'received:155.180': 0.16; 'received:159.53': 0.16; 'received:exchad.jpmchase.net': 0.16; 'received:jpmchase.com': 0.16; 'received:jpmchase.net': 0.16; 'securities,': 0.16; 'self.data)': 0.16; 'slash': 0.16; 'substitute': 0.16; 'tags.': 0.16; 'url:disclosures': 0.16; 'url:jpmorgan': 0.16; 'wrote:': 0.17; 'section.': 0.17; 'working.': 0.17; 'tests': 0.18; 'input': 0.18; 'memory': 0.18; 'to:name:python-list@python.org': 0.20; 'basis,': 0.22; 'parse': 0.22; 'to:2**1': 0.23; 'received:169.254': 0.24; 'script': 0.24; 'header:In-Reply-To:1': 0.25; 'url:wiki': 0.26; 'am,': 0.27; 'accuracy': 0.27; 'schedules': 0.29; 'subject:other': 0.29; 'testcase': 0.29; 'url:wikipedia': 0.29; 'convert': 0.29; 'received:169': 0.29; 'skip:_ 10': 0.29; 'class': 0.29; 'getting': 0.33; 'correctly.': 0.33; 'subject:data': 0.33; 'handle': 0.33; 'to:addr:python-list': 0.33; 'text': 0.34; 'thanks': 0.34; 'dir': 0.35; 'nov': 0.35; 'path': 0.35; 'pm,': 0.35; 'subject:?': 0.35; 'url:org': 0.36; 'test': 0.36; 'should': 0.36; 'charset:us-ascii': 0.36; 'xml': 0.37; 'data': 0.37; 'subject:: ': 0.38; 'files': 0.38; 'skip:o 20': 0.38; 'url:en': 0.38; 'description': 0.39; 'to:addr:python.org': 0.39; 'skip:" 10': 0.40; 'think': 0.40; 'your': 0.60; 'skip:u 10': 0.60; 'containing': 0.61; 'information,': 0.63; 'url:email': 0.63; 'within': 0.64; 'legal': 0.65; 'due': 0.66; 'subject': 0.66; 'purchase': 0.67; 'sale': 0.76; 'directory:': 0.84; 'received:169.254.8': 0.84; 'rusi': 0.91
X-DKIM OpenDKIM Filter v2.1.3 sz1.jpmchase.com qAJLgF0A025183
DKIM-Signature v=1; a=rsa-sha256; c=relaxed/relaxed; d=jpmorgan.com; s=smtpout; t=1353361336; bh=AOE0xhIEugl/hri1hDeYnt+N+VVOTj0dFIIQJBT+iAc=; h=From:To:Subject:Date:Message-ID:References:In-Reply-To: Content-Transfer-Encoding:MIME-Version:Content-Type; b=QcjI8r2Rb3cqgzSTG00Ik9pY+lvBhBFodFptY+I6U4wBNUfrsYndqPKeQItnsyYmD 83so4CqiFV7KkLtxnl9jYADH43O2sl99PwgHtxkfsT5IIo/+EWQMQkQP6czjiO7Ydb yMm7WnZX72EErd0xZkglAuK/ymoa1qcioLFIy6Bo=
From "Prasad, Ramit" <ramit.prasad@jpmorgan.com>
To Artie Ziff <artie.ziff@gmail.com>, "python-list@python.org" <python-list@python.org>
Subject RE: xml data or other?
Thread-Topic xml data or other?
Thread-Index AQHNvoElKNZqZVl2gES8glYpOgXvzZfv+RsAgAHFKUA=
Date Mon, 19 Nov 2012 21:42:00 +0000
References <mailman.3490.1352465695.27098.python-list@python.org> <96b24715-cb4b-4588-844e-fc2e2f51a170@m4g2000pbd.googlegroups.com> <50A8E36A.5010606@gmail.com>
In-Reply-To <50A8E36A.5010606@gmail.com>
Accept-Language en-US
Content-Language en-US
X-MS-Has-Attach
X-MS-TNEF-Correlator
x-originating-ip [10.67.79.47]
Content-Transfer-Encoding quoted-printable
MIME-Version 1.0
X-DLP-FWD Yes
Content-Type text/plain; charset="us-ascii"
X-BeenThere python-list@python.org
X-Mailman-Version 2.1.15
Precedence list
List-Id General discussion list for the Python programming language <python-list.python.org>
List-Unsubscribe <http://mail.python.org/mailman/options/python-list>, <mailto:python-list-request@python.org?subject=unsubscribe>
List-Archive <http://mail.python.org/pipermail/python-list/>
List-Post <mailto:python-list@python.org>
List-Help <mailto:python-list-request@python.org?subject=help>
List-Subscribe <http://mail.python.org/mailman/listinfo/python-list>, <mailto:python-list-request@python.org?subject=subscribe>
Newsgroups comp.lang.python
Message-ID <mailman.14.1353362801.29569.python-list@python.org> (permalink)
Lines 48
NNTP-Posting-Host 2001:888:2000:d::a6
X-Trace 1353362801 news.xs4all.nl 6887 [2001:888:2000:d::a6]:35489
X-Complaints-To abuse@xs4all.nl
Xref csiph.com comp.lang.python:33559

Show key headers only | View raw


Artie Ziff wrote:
> 
> On 11/9/12 5:50 AM, rusi wrote:
> > On Nov 9, 5:54 pm, Artie Ziff <artie.z...@gmail.com> wrote:
> > # submit correctedinput to etree
> I was very grateful to get the "leg up" on getting started down that
> right path with my coding. Many thanks to you, rusi. I took your
> excellent advices and have this working.
> 
> class Converter():
>      PREFIX = """<?xml version="1.0"?>
>      <data>
>      """
>      POSTFIX = "</data>"
>      def __init__(self, data):
>          self.data = data
>          self.writeXML()
>      def writeXML(self):
>          pattern = re.compile('<testname=(.*)>')
>          replaceStr = r'<testname name="\1">'
>          xmlData = re.sub(pattern, replaceStr, self.data)
>          self.dataXML = self.PREFIX + xmlData.replace("\\", "/") +
> self.POSTFIX
> 
> ###  main
> # input to script is directory:
> # sanitize trailing slash
> testPkgDir = sys.argv[1].rstrip('/')
> # Within each test package directory is doc/testcase
> tcDocDir = "doc/testcases"
> # set input dir, containing broken files
> tcTxtDir = os.path.join(testPkgDir, tcDocDir)
> # set output dir, to write proper XML files
> tcXmlDir = os.path.join(testPkgDir, tcDocDir + "_XML")
> if not os.path.exists(tcXmlDir):
>      os.makedirs(tcXmlDir)
> # iterate through files in input dir
> for filename in os.listdir(tcTxtDir):
>      # set filepaths
>      filepathTXT = os.path.join(tcTxtDir, filename)
>      base = os.path.splitext(filename)[0]
>      fileXML = base + ".xml"
>      filepathXML = os.path.join(tcXmlDir, fileXML)
>      # read broken file, convert to proper XML
>      with open(filepathTXT) as f:
>          c = Converter(f.read())
>          xmlFO = open(filepathXML, 'w')   # xmlFileObject
>          xmlFO.write(c.dataXML)
>          xmlFO.close()
> 
> ###
> 
> Writing XML files so to see whats happening. My plan is to
> keep xml data in memory and parse with xml.etree.ElementTree.
> 
> Unfortunately, xml parsing fails due to angle brackets inside
> description tags. In particular, xml.etree.ElementTree.parse()
> aborts on '<' inside xml data such as the following:
> 
> <testname name="cron_test.sh">
>      <description>
>          This testcase tests if crontab <filename> installs the cronjob
>          and cron schedules the job correctly.
>      <\description>
> 
> ##
> 
> What is right way to handle the extra angle brackets?
> Substitute on line-by-line basis, if that works?
> Or learn to write a simple stack-style parser, or
> recursive descent, it may be called?

I think your description text should be in a CDATA section.
http://en.wikipedia.org/wiki/CDATA#CDATA_sections_in_XML

~Ramit


This email is confidential and subject to important disclaimers and
conditions including on offers for the purchase or sale of
securities, accuracy and completeness of information, viruses,
confidentiality, legal privilege, and legal entity disclaimers,
available at http://www.jpmorgan.com/pages/disclosures/email.  

Back to comp.lang.python | Previous | NextPrevious in thread | Next in thread | Find similar | Unroll thread


Thread

xml data or other? Artie Ziff <artie.ziff@gmail.com> - 2012-11-09 04:54 -0800
  Re: xml data or other? rusi <rustompmody@gmail.com> - 2012-11-09 05:50 -0800
    Re: xml data or other? Artie Ziff <artie.ziff@gmail.com> - 2012-11-18 05:32 -0800
      Re: xml data or other? rusi <rustompmody@gmail.com> - 2012-11-18 07:54 -0800
        Re: xml data or other? rusi <rustompmody@gmail.com> - 2012-11-18 07:58 -0800
    RE: xml data or other? "Prasad, Ramit" <ramit.prasad@jpmorgan.com> - 2012-11-19 21:42 +0000
    Re: xml data or other? Stefan Behnel <stefan_ml@behnel.de> - 2012-11-20 06:48 +0100
  Re: xml data or other? shivers.paul@yahoo.co.uk - 2012-11-13 06:05 -0800
  Re: xml data or other? shivers.paul@yahoo.co.uk - 2012-11-13 06:05 -0800

csiph-web