Groups | Search | Server Info | Keyboard shortcuts | Login | Register [http] [https] [nntp] [nntps]
Groups > comp.lang.python > #4163
| From | Mike <Mike@invalid.invalid> |
|---|---|
| Newsgroups | comp.lang.python |
| Subject | Re: ElementTree XML parsing problem |
| Date | 2011-04-27 13:43 -0700 |
| Organization | A noiseless patient Spider |
| Message-ID | <ip9v8c$2uq$1@dont-email.me> (permalink) |
| References | <ip9n72$ol6$1@dont-email.me> <91r8s4Fk28U4@mid.individual.net> |
On 4/27/2011 12:24 PM, Neil Cerutti wrote:
> On 2011-04-27, Mike<Mike@invalid.invalid> wrote:
>> I'm using ElementTree to parse an XML file, but it stops at the
>> second record (id = 002), which contains a non-standard ascii
>> character, ?. Here's the XML:
>>
>> <?xml version="1.0"?>
>> <snapshot time="Mon Apr 25 08:47:23 PDT 2011">
>> <records>
>> <record id="001" education="High School" employment="7 yrs" />
>> <record id="002" education="Universit?t Bremen" employment="3 years" />
>> <record id="003" education="River College" employment="5 yrs" />
>> </records>
>> </snapshot>
>>
>> The complaint offered up by the parser is
>>
>> Unexpected error opening simple_fail.xml: not well-formed
>> (invalid token): line 5, column 40
>
> It seems to be an invalid XML document, as another poster
> indicated.
>
>> and if I change the line to eliminate the ?, everything is
>> wonderful. The parser is perfectly happy with this
>> modification:
>>
>> <record id="002" education="University Bremen" employment="3
>> yrs" />
>>
>> I can't find anything in the ElementTree docs about allowing
>> additional text characters or coercing strange ascii to
>> Unicode.
>
> If you're not the one generating that bogus file, then you can
> specify the encoding yourself instead by declaring an XMLParser.
>
> import xml.etree.ElementTree as etree
> with open('file.xml') as xml_file:
> parser = etree.XMLParser(encoding='ISO-8859-1')
> root = etree.parse(xml_file, parser=parser).getroot()
>
Thanks, Neil. I'm not generating the file, just trying to parse it. Your
solution is precisely what I was looking for, even if I didn't quite ask
correctly. I appreciate the help!
-- Mike --
Back to comp.lang.python | Previous | Next — Previous in thread | Next in thread | Find similar
ElementTree XML parsing problem Mike <Mike@invalid.invalid> - 2011-04-27 11:26 -0700
Re: ElementTree XML parsing problem Benjamin Kaplan <benjamin.kaplan@case.edu> - 2011-04-27 14:41 -0400
Re: ElementTree XML parsing problem Neil Cerutti <neilc@norwich.edu> - 2011-04-27 19:24 +0000
Re: ElementTree XML parsing problem Mike <Mike@invalid.invalid> - 2011-04-27 13:43 -0700
Re: ElementTree XML parsing problem Philip Semanchuk <philip@semanchuk.com> - 2011-04-27 15:32 -0400
Re: ElementTree XML parsing problem Hegedüs Ervin <airween@gmail.com> - 2011-04-27 21:33 +0200
Re: ElementTree XML parsing problem Mike <Mike@invalid.invalid> - 2011-04-27 13:32 -0700
Re: ElementTree XML parsing problem Stefan Behnel <stefan_ml@behnel.de> - 2011-04-28 07:57 +0200
Re: ElementTree XML parsing problem Ervin Hegedüs <airween@gmail.com> - 2011-04-28 08:24 +0200
csiph-web