Path: csiph.com!x330-a1.tempe.blueboxinc.net!usenet.pasdenom.info!aioe.org!feeder.news-service.com!newsfeed.xs4all.nl!newsfeed6.news.xs4all.nl!xs4all!post.news.xs4all.nl!not-for-mail Return-Path: X-Original-To: python-list@python.org Delivered-To: python-list@mail.python.org X-Spam-Status: OK 0.000 X-Spam-Evidence: '*H*': 1.00; '*S*': 0.00; 'parser': 0.05; 'ascii': 0.07; 'character,': 0.07; 'script,': 0.07; 'encoding.': 0.09; 'received:80.91': 0.09; 'received:80.91.229': 0.09; 'received:80.91.229.12': 0.09; 'received:gmane.org': 0.09; 'received:list': 0.09; 'received:lo.gmane.org': 0.09; 'subject:parsing': 0.09; '(name': 0.16; '002': 0.16; 'from:addr:behnel.de': 0.16; 'from:addr:stefan_ml': 0.16; 'from:name:stefan behnel': 0.16; 'non-standard': 0.16; 'subject:XML': 0.16; 'switching': 0.16; 'yield': 0.19; 'header:In- Reply-To:1': 0.22; 'file,': 0.22; 'parse': 0.23; 'thus': 0.24; 'checked': 0.25; 'subject:problem': 0.25; 'xml': 0.26; "i'm": 0.26; 'settings': 0.26; 'changed': 0.27; 'changing': 0.29; 'supports': 0.29; 'good.': 0.29; 'stefan': 0.29; 'probably': 0.30; '(the': 0.30; 'stops': 0.31; 'second': 0.31; 'to:addr:python- list': 0.32; "i've": 0.33; 'record': 0.34; 'using': 0.34; 'header:X-Complaints-To:1': 0.34; 'change': 0.34; 'file': 0.35; 'editor': 0.35; 'header:User-Agent:1': 0.35; 'saves': 0.35; 'hello,': 0.36; 'think': 0.36; 'but': 0.38; 'so,': 0.38; 'received:org': 0.38; 'to:addr:python.org': 0.39; 'received:de': 0.39; 'header:Mime-Version:1': 0.39; 'would': 0.40; 'header:Received:5': 0.40; 'high': 0.66; 'received:93': 0.80; '(id': 0.84; '001': 0.84; 'river': 0.91 X-Injected-Via-Gmane: http://gmane.org/ To: python-list@python.org From: Stefan Behnel Subject: Re: ElementTree XML parsing problem Date: Thu, 28 Apr 2011 07:57:28 +0200 References: <20110427193335.GA2675@arxnet.hu> Mime-Version: 1.0 Content-Type: text/plain; charset=UTF-8; format=flowed Content-Transfer-Encoding: 8bit X-Gmane-NNTP-Posting-Host: ppp-93-104-16-186.dynamic.mnet-online.de User-Agent: Mozilla/5.0 (X11; U; Linux x86_64; en-US; rv:1.9.2.14) Gecko/20110223 Lightning/1.0b2 Thunderbird/3.1.8 In-Reply-To: <20110427193335.GA2675@arxnet.hu> X-BeenThere: python-list@python.org X-Mailman-Version: 2.1.12 Precedence: list List-Id: General discussion list for the Python programming language List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , Newsgroups: comp.lang.python Message-ID: Lines: 44 NNTP-Posting-Host: 82.94.164.166 X-Trace: 1303970265 news.xs4all.nl 41102 [::ffff:82.94.164.166]:33697 X-Complaints-To: abuse@xs4all.nl Xref: x330-a1.tempe.blueboxinc.net comp.lang.python:4205 Hegedüs Ervin, 27.04.2011 21:33: > hello, > >> I'm using ElementTree to parse an XML file, but it stops at the >> second record (id = 002), which contains a non-standard ascii >> character, ä. Here's the XML: >> >> >> >> >> >> >> >> >> >> >> The complaint offered up by the parser is > > I've checked this xml with your script, I think your locales > settings are not good. > > $ ./parse.py > > XML file: test.xml > 001 High School > 002 Universität Bremen > 003 River College > > (name of xml file is "test.xml") > > So, I started change the codepage mark of xml: > > - same result > - same result > - same result You probably changed this in an editor that supports XML and thus saves the file in the declared encoding. Switching between the three by simply changing the first line (the XML declaration) and not adapting the encoding of the document itself would otherwise not yield the same result for the document given above. Stefan