Groups | Search | Server Info | Keyboard shortcuts | Login | Register [http] [https] [nntp] [nntps]


Groups > comp.lang.python > #4205

Re: ElementTree XML parsing problem

Path csiph.com!x330-a1.tempe.blueboxinc.net!usenet.pasdenom.info!aioe.org!feeder.news-service.com!newsfeed.xs4all.nl!newsfeed6.news.xs4all.nl!xs4all!post.news.xs4all.nl!not-for-mail
Return-Path <python-python-list@m.gmane.org>
X-Original-To python-list@python.org
Delivered-To python-list@mail.python.org
X-Spam-Status OK 0.000
X-Spam-Evidence '*H*': 1.00; '*S*': 0.00; 'parser': 0.05; 'ascii': 0.07; 'character,': 0.07; 'script,': 0.07; 'encoding.': 0.09; 'received:80.91': 0.09; 'received:80.91.229': 0.09; 'received:80.91.229.12': 0.09; 'received:gmane.org': 0.09; 'received:list': 0.09; 'received:lo.gmane.org': 0.09; 'subject:parsing': 0.09; '(name': 0.16; '002': 0.16; 'from:addr:behnel.de': 0.16; 'from:addr:stefan_ml': 0.16; 'from:name:stefan behnel': 0.16; 'non-standard': 0.16; 'subject:XML': 0.16; 'switching': 0.16; 'yield': 0.19; 'header:In- Reply-To:1': 0.22; 'file,': 0.22; 'parse': 0.23; 'thus': 0.24; 'checked': 0.25; 'subject:problem': 0.25; 'xml': 0.26; "i'm": 0.26; 'settings': 0.26; 'changed': 0.27; 'changing': 0.29; 'supports': 0.29; 'good.': 0.29; 'stefan': 0.29; 'probably': 0.30; '(the': 0.30; 'stops': 0.31; 'second': 0.31; 'to:addr:python- list': 0.32; "i've": 0.33; 'record': 0.34; 'using': 0.34; 'header:X-Complaints-To:1': 0.34; 'change': 0.34; 'file': 0.35; 'editor': 0.35; 'header:User-Agent:1': 0.35; 'saves': 0.35; 'hello,': 0.36; 'think': 0.36; 'but': 0.38; 'so,': 0.38; 'received:org': 0.38; 'to:addr:python.org': 0.39; 'received:de': 0.39; 'header:Mime-Version:1': 0.39; 'would': 0.40; 'header:Received:5': 0.40; 'high': 0.66; 'received:93': 0.80; '(id': 0.84; '001': 0.84; 'river': 0.91
X-Injected-Via-Gmane http://gmane.org/
To python-list@python.org
From Stefan Behnel <stefan_ml@behnel.de>
Subject Re: ElementTree XML parsing problem
Date Thu, 28 Apr 2011 07:57:28 +0200
References <ip9n72$ol6$1@dont-email.me> <20110427193335.GA2675@arxnet.hu>
Mime-Version 1.0
Content-Type text/plain; charset=UTF-8; format=flowed
Content-Transfer-Encoding 8bit
X-Gmane-NNTP-Posting-Host ppp-93-104-16-186.dynamic.mnet-online.de
User-Agent Mozilla/5.0 (X11; U; Linux x86_64; en-US; rv:1.9.2.14) Gecko/20110223 Lightning/1.0b2 Thunderbird/3.1.8
In-Reply-To <20110427193335.GA2675@arxnet.hu>
X-BeenThere python-list@python.org
X-Mailman-Version 2.1.12
Precedence list
List-Id General discussion list for the Python programming language <python-list.python.org>
List-Unsubscribe <http://mail.python.org/mailman/options/python-list>, <mailto:python-list-request@python.org?subject=unsubscribe>
List-Archive <http://mail.python.org/pipermail/python-list>
List-Post <mailto:python-list@python.org>
List-Help <mailto:python-list-request@python.org?subject=help>
List-Subscribe <http://mail.python.org/mailman/listinfo/python-list>, <mailto:python-list-request@python.org?subject=subscribe>
Newsgroups comp.lang.python
Message-ID <mailman.925.1303970264.9059.python-list@python.org> (permalink)
Lines 44
NNTP-Posting-Host 82.94.164.166
X-Trace 1303970265 news.xs4all.nl 41102 [::ffff:82.94.164.166]:33697
X-Complaints-To abuse@xs4all.nl
Xref x330-a1.tempe.blueboxinc.net comp.lang.python:4205

Show key headers only | View raw


Hegedüs Ervin, 27.04.2011 21:33:
> hello,
>
>> I'm using ElementTree to parse an XML file, but it stops at the
>> second record (id = 002), which contains a non-standard ascii
>> character, ä. Here's the XML:
>>
>> <?xml version="1.0"?>
>> <snapshot time="Mon Apr 25 08:47:23 PDT 2011">
>> <records>
>> <record id="001" education="High School" employment="7 yrs" />
>> <record id="002" education="Universität Bremen" employment="3 years" />
>> <record id="003" education="River College" employment="5 yrs" />
>> </records>
>> </snapshot>
>>
>> The complaint offered up by the parser is
>
> I've checked this xml with your script, I think your locales
> settings are not good.
>
> $ ./parse.py
>
> XML file: test.xml
> 001 High School
> 002 Universität Bremen
> 003 River College
>
> (name of xml file is "test.xml")
>
> So, I started change the codepage mark of xml:
>
> <?xml version="1.0" encoding="UTF-8" ?>  - same result
> <?xml version="1.0" encoding="ISO-8859-2" ?>  - same result
> <?xml version="1.0" encoding="ISO-8859-1" ?>  - same result

You probably changed this in an editor that supports XML and thus saves the 
file in the declared encoding. Switching between the three by simply 
changing the first line (the XML declaration) and not adapting the encoding 
of the document itself would otherwise not yield the same result for the 
document given above.

Stefan

Back to comp.lang.python | Previous | NextPrevious in thread | Next in thread | Find similar


Thread

ElementTree XML parsing problem Mike <Mike@invalid.invalid> - 2011-04-27 11:26 -0700
  Re: ElementTree XML parsing problem Benjamin Kaplan <benjamin.kaplan@case.edu> - 2011-04-27 14:41 -0400
  Re: ElementTree XML parsing problem Neil Cerutti <neilc@norwich.edu> - 2011-04-27 19:24 +0000
    Re: ElementTree XML parsing problem Mike <Mike@invalid.invalid> - 2011-04-27 13:43 -0700
  Re: ElementTree XML parsing problem Philip Semanchuk <philip@semanchuk.com> - 2011-04-27 15:32 -0400
  Re: ElementTree XML parsing problem Hegedüs Ervin <airween@gmail.com> - 2011-04-27 21:33 +0200
    Re: ElementTree XML parsing problem Mike <Mike@invalid.invalid> - 2011-04-27 13:32 -0700
  Re: ElementTree XML parsing problem Stefan Behnel <stefan_ml@behnel.de> - 2011-04-28 07:57 +0200
  Re: ElementTree XML parsing problem Ervin Hegedüs <airween@gmail.com> - 2011-04-28 08:24 +0200

csiph-web