Groups | Search | Server Info | Keyboard shortcuts | Login | Register [http] [https] [nntp] [nntps]


Groups > comp.lang.python > #87162

Re: Opaque error message on UTF-8 decode

Path csiph.com!usenet.pasdenom.info!news.redatomik.org!newsfeed.xs4all.nl!newsfeed3a.news.xs4all.nl!xs4all!newsgate.cistron.nl!newsgate.news.xs4all.nl!post.news.xs4all.nl!not-for-mail
Return-Path <python-python-list@m.gmane.org>
X-Original-To python-list@python.org
Delivered-To python-list@mail.python.org
X-Spam-Status OK 0.001
X-Spam-Evidence '*H*': 1.00; '*S*': 0.00; 'subject:error': 0.03; 'from:addr:yahoo.co.uk': 0.04; 'continuation': 0.07; 'explicit': 0.07; 'skip:u 30': 0.07; 'utf-8': 0.07; 'lawrence': 0.09; 'received:80.91': 0.09; 'received:80.91.229': 0.09; 'received:gmane.org': 0.09; 'received:list': 0.09; 'language.': 0.14; '"python': 0.16; 'byte,': 0.16; 'character.': 0.16; 'codec': 0.16; 'received:80.91.229.3': 0.16; 'received:plane.gmane.org': 0.16; 'subject:UTF': 0.16; 'surrogate': 0.16; 'index': 0.16; 'language': 0.16; 'wrote:': 0.18; "python's": 0.19; 'this?': 0.23; 'header:User-Agent:1': 0.23; 'error': 0.23; 'byte': 0.24; 'tracker': 0.26; 'gets': 0.27; 'header:X-Complaints-To:1': 0.27; 'header:In-Reply-To:1': 0.27; 'record': 0.27; 'chris': 0.29; 'raise': 0.29; 'besides': 0.30; 'compared': 0.30; '"",': 0.31; '>>>>': 0.31; 'decimal': 0.31; 'raised': 0.31; 'file': 0.32; '(most': 0.33; 'beginning': 0.33; "i'd": 0.34; 'could': 0.34; 'problem': 0.35; "can't": 0.35; 'something': 0.35; 'but': 0.35; 'sequence': 0.36; 'should': 0.36; 'to:addr:python-list': 0.38; 'issue': 0.38; 'recent': 0.39; 'to:addr:python.org': 0.39; 'received:org': 0.40; 'skip:u 10': 0.60; 'refer': 0.63; 'our': 0.64; 'more': 0.64; 'charset:windows-1252': 0.65; 'worth': 0.66; 'invalid': 0.68; 'hassle?': 0.84; 'improvement,': 0.84; 'pike': 0.84; 'received:as9105.com': 0.84; 'received:dsl.as9105.com': 0.84; 'received:dynamic.dsl.as9105.com': 0.84
X-Injected-Via-Gmane http://gmane.org/
To python-list@python.org
From Mark Lawrence <breamoreboy@yahoo.co.uk>
Subject Re: Opaque error message on UTF-8 decode
Date Sun, 08 Mar 2015 21:23:50 +0000
References <CAPTjJmr3sDSMdW3=R_oHYH8MW4shb-NF04x9NU_V97KkwW8vKg@mail.gmail.com>
Mime-Version 1.0
Content-Type text/plain; charset=windows-1252; format=flowed
Content-Transfer-Encoding 7bit
X-Gmane-NNTP-Posting-Host 80-44-150-120.dynamic.dsl.as9105.com
User-Agent Mozilla/5.0 (Windows NT 6.3; WOW64; rv:31.0) Gecko/20100101 Thunderbird/31.5.0
In-Reply-To <CAPTjJmr3sDSMdW3=R_oHYH8MW4shb-NF04x9NU_V97KkwW8vKg@mail.gmail.com>
X-BeenThere python-list@python.org
X-Mailman-Version 2.1.19
Precedence list
List-Id General discussion list for the Python programming language <python-list.python.org>
List-Unsubscribe <https://mail.python.org/mailman/options/python-list>, <mailto:python-list-request@python.org?subject=unsubscribe>
List-Archive <http://mail.python.org/pipermail/python-list/>
List-Post <mailto:python-list@python.org>
List-Help <mailto:python-list-request@python.org?subject=help>
List-Subscribe <https://mail.python.org/mailman/listinfo/python-list>, <mailto:python-list-request@python.org?subject=subscribe>
Newsgroups comp.lang.python
Message-ID <mailman.177.1425849845.21433.python-list@python.org> (permalink)
Lines 37
NNTP-Posting-Host 2001:888:2000:d::a6
X-Trace 1425849845 news.xs4all.nl 2888 [2001:888:2000:d::a6]:37774
X-Complaints-To abuse@xs4all.nl
Xref csiph.com comp.lang.python:87162

Show key headers only | View raw


On 08/03/2015 21:15, Chris Angelico wrote:
>>>> b"\xed\xb4\x80".decode()
> Traceback (most recent call last):
>    File "<stdin>", line 1, in <module>
> UnicodeDecodeError: 'utf-8' codec can't decode byte 0xed in position
> 0: invalid continuation byte
>
> But 0xED is not a continuation byte, it's a start byte. And it's a
> perfectly valid one:
>
>>>> b"\xed\x9f\xbf".decode()
> '\ud7ff'
>
> Pike is more explicit about what the problem is:
>
>> utf8_to_string("\xed\xb4\x80");
> UTF-8 sequence beginning with 0xed 0xb4 at index 0 would decode to a
> UTF-16 surrogate character.
>
> Is this something where Python's error message could do with
> improvement, or is it not worth the hassle? Should I raise a tracker
> issue about this?
>
> ChrisA
>

I'd raise an issue so there's a formal record that we can refer to in 
the future.  Besides what's one issue like this compared to the "Python 
can't do decimal sums properly" which gets raised every few months by 
newbies :)

-- 
My fellow Pythonistas, ask not what our language can do for you, ask
what you can do for our language.

Mark Lawrence

Back to comp.lang.python | Previous | Next | Find similar | Unroll thread


Thread

Re: Opaque error message on UTF-8 decode Mark Lawrence <breamoreboy@yahoo.co.uk> - 2015-03-08 21:23 +0000

csiph-web