Groups | Search | Server Info | Keyboard shortcuts | Login | Register [http] [https] [nntp] [nntps]
Groups > comp.lang.python > #26085
| Path | csiph.com!usenet.pasdenom.info!weretis.net!feeder4.news.weretis.net!feeder2.ecngs.de!ecngs!feeder.ecngs.de!xlned.com!feeder1.xlned.com!newsfeed.xs4all.nl!newsfeed5.news.xs4all.nl!xs4all!post.news.xs4all.nl!not-for-mail |
|---|---|
| Return-Path | <phihag@phihag.de> |
| X-Original-To | python-list@python.org |
| Delivered-To | python-list@mail.python.org |
| X-Spam-Status | OK 0.002 |
| X-Spam-Evidence | '*H*': 1.00; '*S*': 0.00; 'cpython': 0.05; 'try:': 0.07; 'exits': 0.09; 'information?': 0.09; 'input,': 0.09; 'cc:addr:python-list': 0.10; '2.7': 0.13; 'encoding': 0.15; '3.2)': 0.16; '3.2,': 0.16; 'encodings': 0.16; 'filename:fname piece:signature': 0.16; 'occurred.': 0.16; 'routinely': 0.16; 'wrote:': 0.17; 'bytes': 0.17; 'stefan': 0.17; 'input': 0.18; 'tells': 0.22; 'cc:2**0': 0.23; 'cc:no real name:2**0': 0.24; 'cc:addr:python.org': 0.25; 'header:In-Reply-To:1': 0.25; 'header :User-Agent:1': 0.26; 'skip:b 30': 0.27; 'actual': 0.28; 'end,': 0.29; 'far.': 0.29; 'error': 0.30; 'code': 0.31; 'received:192.168.2': 0.34; 'pm,': 0.35; 'except': 0.36; 'but': 0.36; 'test': 0.36; 'does': 0.37; 'uses': 0.37; 'ones': 0.37; 'subject:: ': 0.38; 'received:192': 0.39; 'where': 0.40; 'received:192.168': 0.40; 'skip:u 10': 0.60; 'most': 0.61; 'upper': 0.75; 'ude': 0.84 |
| Date | Thu, 26 Jul 2012 14:17:44 +0200 |
| From | Philipp Hagemeister <phihag@phihag.de> |
| User-Agent | Mozilla/5.0 (X11; Linux x86_64; rv:10.0.5) Gecko/20120624 Icedove/10.0.5 |
| MIME-Version | 1.0 |
| To | Stefan Behnel <stefan_ml@behnel.de> |
| Subject | Re: catch UnicodeDecodeError |
| References | <04f7ff8d-9881-4a04-ab2e-b5573b5f3cd1@googlegroups.com> <mailman.2570.1343216119.4697.python-list@python.org> <b8723e64-12fa-4e53-8914-8f2b8e9c0f1d@googlegroups.com> <mailman.2581.1343242258.4697.python-list@python.org> <38f5cdaf-c021-4ccd-8fcb-c68b21d3aeb2@w24g2000vby.googlegroups.com> <mailman.2593.1343291337.4697.python-list@python.org> <17bf754d-b1e9-4bb7-bf42-190325ee969a@q29g2000vby.googlegroups.com> <jur8sp$5qu$1@dough.gmane.org> |
| In-Reply-To | <jur8sp$5qu$1@dough.gmane.org> |
| X-Enigmail-Version | 1.4 |
| OpenPGP | id=FAFB085C |
| Content-Type | multipart/signed; micalg=pgp-sha512; protocol="application/pgp-signature"; boundary="------------enigE4A9A193510395D3B1946045" |
| Cc | python-list@python.org |
| X-BeenThere | python-list@python.org |
| X-Mailman-Version | 2.1.12 |
| Precedence | list |
| List-Id | General discussion list for the Python programming language <python-list.python.org> |
| List-Unsubscribe | <http://mail.python.org/mailman/options/python-list>, <mailto:python-list-request@python.org?subject=unsubscribe> |
| List-Archive | <http://mail.python.org/pipermail/python-list> |
| List-Post | <mailto:python-list@python.org> |
| List-Help | <mailto:python-list-request@python.org?subject=help> |
| List-Subscribe | <http://mail.python.org/mailman/listinfo/python-list>, <mailto:python-list-request@python.org?subject=subscribe> |
| Newsgroups | comp.lang.python |
| Message-ID | <mailman.2602.1343305079.4697.python-list@python.org> (permalink) |
| Lines | 52 |
| NNTP-Posting-Host | 2001:888:2000:d::a6 |
| X-Trace | 1343305079 news.xs4all.nl 6921 [2001:888:2000:d::a6]:56016 |
| X-Complaints-To | abuse@xs4all.nl |
| Xref | csiph.com comp.lang.python:26085 |
Show key headers only | View raw
[Multipart message — attachments visible in raw view] - view raw
On 07/26/2012 01:15 PM, Stefan Behnel wrote:
>> exits with a UnicodeDecodeError.
> ... that tells you the exact code line where the error occurred.
Which property of a UnicodeDecodeError does include that information?
On cPython 2.7 and 3.2, I see only start and end, both of which refer to
the number of bytes read so far.
I used the followin test script:
e = None
try:
b'a\xc3\xa4\nb\xff0'.decode('utf-8')
except UnicodeDecodeError as ude:
e = ude
print(e.start) # 5 for this input, 3 for the input b'a\nb\xff0'
print(dir(e))
But even if you would somehow determine a line number, this would only
work if the actual encoding uses 0xa for newline. Most encodings (101
out of 108 applicable ones in cPython 3.2) do include 0x0a in their
representation of '\n', but multi-byte encodings routinely include 0x0a
bytes in their representation of non-newline characters. Therefore, the
most you can do is calculate an upper bound for the line number.
- Philipp
Back to comp.lang.python | Previous | Next — Previous in thread | Next in thread | Find similar | Unroll thread
catch UnicodeDecodeError jaroslav.dobrek@gmail.com - 2012-07-25 04:05 -0700
Re: catch UnicodeDecodeError Andrew Berg <bahamutzero8825@gmail.com> - 2012-07-25 06:34 -0500
Re: catch UnicodeDecodeError Philipp Hagemeister <phihag@phihag.de> - 2012-07-25 13:35 +0200
Re: catch UnicodeDecodeError jaroslav.dobrek@gmail.com - 2012-07-25 05:09 -0700
Re: catch UnicodeDecodeError jaroslav.dobrek@gmail.com - 2012-07-25 05:09 -0700
Re: catch UnicodeDecodeError Dave Angel <d@davea.name> - 2012-07-25 14:50 -0400
Re: catch UnicodeDecodeError Jaroslav Dobrek <jaroslav.dobrek@gmail.com> - 2012-07-26 00:46 -0700
Re: catch UnicodeDecodeError Stefan Behnel <stefan_ml@behnel.de> - 2012-07-26 10:28 +0200
Re: catch UnicodeDecodeError Jaroslav Dobrek <jaroslav.dobrek@gmail.com> - 2012-07-26 03:51 -0700
Re: catch UnicodeDecodeError Stefan Behnel <stefan_ml@behnel.de> - 2012-07-26 13:15 +0200
Re: catch UnicodeDecodeError jaroslav.dobrek@gmail.com - 2012-07-26 04:58 -0700
Re: catch UnicodeDecodeError jaroslav.dobrek@gmail.com - 2012-07-26 04:58 -0700
Re: catch UnicodeDecodeError Philipp Hagemeister <phihag@phihag.de> - 2012-07-26 14:17 +0200
Re: catch UnicodeDecodeError Stefan Behnel <stefan_ml@behnel.de> - 2012-07-26 14:24 +0200
Re: catch UnicodeDecodeError Chris Angelico <rosuav@gmail.com> - 2012-07-26 19:46 +1000
Re: catch UnicodeDecodeError wxjmfauth@gmail.com - 2012-07-26 03:19 -0700
Re: catch UnicodeDecodeError Philipp Hagemeister <phihag@phihag.de> - 2012-07-26 14:43 +0200
csiph-web