Groups | Search | Server Info | Keyboard shortcuts | Login | Register [http] [https] [nntp] [nntps]
Groups > comp.lang.python > #55831
| References | <AANLkTim+8Wj2zAtuYB0+LyqdqdYYny7QHENXTPvx6qK8@mail.gmail.com> <AANLkTinOyE1ofhUo2Y0hWXTXfYKdk1AfG=fWWRAMPAJd@mail.gmail.com> |
|---|---|
| Date | 2011-02-09 09:32 +0100 |
| Subject | Re: Unicode error in sax parser |
| From | Rickard Lindberg <ricli85@gmail.com> |
| Newsgroups | comp.lang.python |
| Message-ID | <mailman.31.1297240335.1633.python-list@python.org> (permalink) |
On Tue, Feb 8, 2011 at 5:41 PM, Chris Rebert <clp2@rebertia.com> wrote:
>> Here is a bash script to reproduce my error:
>
> Including the error message and traceback is still helpful, for future
> reference.
Thanks for pointing it out.
>> #!/bin/sh
>>
>> cat > å.timeline <<EOF
> <snip>
>> EOF
>>
>> python <<EOF
>> # encoding: utf-8
>> from xml.sax import parse
>> from xml.sax.handler import ContentHandler
>> parse(u"å.timeline", ContentHandler())
>> EOF
>>
>> If I instead do
>>
>> parse(u"å.timeline".encode("utf-8"), ContentHandler())
>>
>> the script runs without errors.
>>
>> Is this a bug or expected behavior?
>
> Bug; open() figures out the filesystem encoding just fine.
> Bug tracker to report the issue to: http://bugs.python.org/
>
> Workaround:
> parse(open(u"å.timeline", 'r'), ContentHandler())
When I tried your workaround, I still got this error:
Traceback (most recent call last):
File "<stdin>", line 4, in <module>
File "/usr/lib64/python2.7/site-packages/_xmlplus/sax/__init__.py",
line 31, in parse
parser.parse(filename_or_stream)
File "/usr/lib64/python2.7/site-packages/_xmlplus/sax/expatreader.py",
line 109, in parse
xmlreader.IncrementalParser.parse(self, source)
File "/usr/lib64/python2.7/site-packages/_xmlplus/sax/xmlreader.py",
line 119, in parse
self.prepareParser(source)
File "/usr/lib64/python2.7/site-packages/_xmlplus/sax/expatreader.py",
line 121, in prepareParser
self._parser.SetBase(source.getSystemId())
UnicodeEncodeError: 'ascii' codec can't encode character u'\xe5' in
position 0: ordinal not in range(128)
The open(..) part works fine, but there still seems to be a problem inside the
sax parser.
--
Rickard Lindberg
Back to comp.lang.python | Previous | Next | Find similar | Unroll thread
Re: Unicode error in sax parser Rickard Lindberg <ricli85@gmail.com> - 2011-02-09 09:32 +0100
csiph-web