Groups | Search | Server Info | Keyboard shortcuts | Login | Register [http] [https] [nntp] [nntps]
Groups > comp.lang.python > #6285
| Path | csiph.com!x330-a1.tempe.blueboxinc.net!usenet.pasdenom.info!selfless.tophat.at!newsfeed.xs4all.nl!newsfeed6.news.xs4all.nl!xs4all!post.news.xs4all.nl!not-for-mail |
|---|---|
| Return-Path | <ian.g.kelly@gmail.com> |
| X-Original-To | python-list@python.org |
| Delivered-To | python-list@mail.python.org |
| X-Spam-Status | OK 0.000 |
| X-Spam-Evidence | '*H*': 1.00; '*S*': 0.00; 'example:': 0.03; 'wed,': 0.03; 'subject:bug': 0.04; 'encoded': 0.05; 'mrab': 0.05; 'subject:Python': 0.06; 'bytes.': 0.07; 'remotely': 0.07; 'skip:\\ 20': 0.09; 'pm,': 0.10; '>>>': 0.12; '25,': 0.12; 'subject:file': 0.14; 'wrote:': 0.14; '61,': 0.16; 'codec': 0.16; 'f.close()': 0.16; 'f.read()': 0.16; 'skip:7 20': 0.16; 'truncated': 0.16; 'traceback': 0.16; '(most': 0.16; "wouldn't": 0.17; 'bytes': 0.19; 'errors,': 0.19; 'cheers,': 0.19; 'header:In-Reply-To:1': 0.21; 'file,': 0.22; 'interpreted': 0.23; 'last):': 0.23; 'received:209.85.161.46': 0.23; 'received:mail- fx0-f46.google.com': 0.23; 'values': 0.25; 'received:209.85.161': 0.26; 'message-id:@mail.gmail.com': 0.28; 'problem': 0.28; 'skip:" 30': 0.29; 'instead': 0.29; 'code,': 0.29; 'least': 0.30; "won't": 0.30; 'throwing': 0.30; "can't": 0.32; "skip:' 10": 0.32; 'does': 0.33; 'to:addr:python-list': 0.33; 'initial': 0.33; 'skip:" 20': 0.33; 'starting': 0.33; 'source': 0.34; 'file': 0.34; "we're": 0.34; '"",': 0.35; 'beginning': 0.37; 'data.': 0.37; 'received:google.com': 0.37; 'received:209.85': 0.37; 'sequence': 0.37; 'space,': 0.37; 'anything': 0.38; 'but': 0.38; 'data': 0.38; 'subject:: ': 0.38; 'doing': 0.39; 'skip:s 20': 0.39; 'received:209': 0.39; 'returned': 0.39; 'to:addr:python.org': 0.39; 'subject:? ': 0.67; 'production': 0.68; 'subject:line': 0.73; 'bom': 0.84 |
| DKIM-Signature | v=1; a=rsa-sha256; c=relaxed/relaxed; d=gmail.com; s=gamma; h=domainkey-signature:mime-version:in-reply-to:references:from:date :message-id:subject:to:content-type; bh=KxoIBfohgtRa2FhLizc9l/fcb3P6yjgjZANC8XR/Zys=; b=w66HGbMnYPivVGDSLgqer5ZdlWxBmgHB8GYxxrNK0eL9cMbjlELteGxJSxr2t+EKuV 7xzGSOq+R4caKRwBIWQbMeNrCzcEDfVWQE4Q3aI+xqgqdD+sl3GDA3QNkEYI/57ZnQk5 ZKWqK1pEx8WRCkF7jn0TV6kqJAXoDoLKExH2s= |
| DomainKey-Signature | a=rsa-sha1; c=nofws; d=gmail.com; s=gamma; h=mime-version:in-reply-to:references:from:date:message-id:subject:to :content-type; b=bQsSLTZf7lRUzyFsjbHCBVS4pKXRGL1m8QZsuGwL1RXQr/7QZTjOOTgWqn31U9FFUT 922I0KXAx25ayjBSZunc623P3ToTs7Gujn9SnXVTHGxVgpdBYl7wwiLjCI4mQv+30MMb 9NN06LUQE2kPUVHZegLhuEKxrDbAx5Wxy0Txk= |
| MIME-Version | 1.0 |
| In-Reply-To | <4DDD7A27.60602@mrabarnett.plus.com> |
| References | <3d81e2a0-6c86-4f12-a1c4-ce4c736172b6@y31g2000vbp.googlegroups.com> <4DDD5FD2.8040607@mrabarnett.plus.com> <BANLkTik1NyMO8vEfb-+oO_7jLD9B=+ZMRA@mail.gmail.com> <4DDD7A27.60602@mrabarnett.plus.com> |
| From | Ian Kelly <ian.g.kelly@gmail.com> |
| Date | Wed, 25 May 2011 19:06:04 -0600 |
| Subject | Re: Python 3.2 bug? Reading the last line of a file |
| To | python-list@python.org |
| Content-Type | text/plain; charset=ISO-8859-1 |
| X-BeenThere | python-list@python.org |
| X-Mailman-Version | 2.1.12 |
| Precedence | list |
| List-Id | General discussion list for the Python programming language <python-list.python.org> |
| List-Unsubscribe | <http://mail.python.org/mailman/options/python-list>, <mailto:python-list-request@python.org?subject=unsubscribe> |
| List-Archive | <http://mail.python.org/pipermail/python-list> |
| List-Post | <mailto:python-list@python.org> |
| List-Help | <mailto:python-list-request@python.org?subject=help> |
| List-Subscribe | <http://mail.python.org/mailman/listinfo/python-list>, <mailto:python-list-request@python.org?subject=subscribe> |
| Newsgroups | comp.lang.python |
| Message-ID | <mailman.2103.1306371996.9059.python-list@python.org> (permalink) |
| Lines | 52 |
| NNTP-Posting-Host | 82.94.164.166 |
| X-Trace | 1306371996 news.xs4all.nl 49038 [::ffff:82.94.164.166]:49122 |
| X-Complaints-To | abuse@xs4all.nl |
| Xref | x330-a1.tempe.blueboxinc.net comp.lang.python:6285 |
Show key headers only | View raw
On Wed, May 25, 2011 at 3:52 PM, MRAB <python@mrabarnett.plus.com> wrote:
> What do you mean by "may include the decoder state in its return value"?
>
> It does make sense that the values returned from tell() won't be in the
> middle of an encoded sequence of bytes.
If you take a look at the source code, tell() returns a long that
includes decoder state data in the upper bytes. For example:
>>> data = b' ' + '\u0302a'.encode('utf-16')
>>> data
b' \xff\xfe\x02\x03a\x00'
>>> f = open('test.txt', 'wb')
>>> f.write(data)
7
>>> f.close()
>>> f = open('test.txt', 'r', encoding='utf-16')
>>> f.read()
Traceback (most recent call last):
File "<stdin>", line 1, in <module>
File "c:\python32\lib\codecs.py", line 300, in decode
(result, consumed) = self._buffer_decode(data, self.errors, final)
File "c:\python32\lib\encodings\utf_16.py", line 61, in _buffer_decode
codecs.utf_16_ex_decode(input, errors, 0, final)
UnicodeDecodeError: 'utf16' codec can't decode bytes in position 6-6:
truncated data
The problem of course is the initial space, throwing off the decoder.
We can try to seek past it:
>>> f.seek(1)
1
>>> f.read()
'\ufeff\u0302a'
But notice that since we're not reading from the beginning of the
file, the BOM has now been interpreted as data. However:
>>> f.seek(1 + (2 << 65))
73786976294838206465
>>> f.read()
'\u0302a'
And you can see that instead of reading from position
73786976294838206465 it has read from position 1 starting in the "read
a BOM" state. Note that I wouldn't recommend doing anything remotely
like this in production code, not least because the value that I
passed into seek() is platform-dependent. This is just a
demonstration of how the seek() value can include decoder state.
Cheers,
Ian
Back to comp.lang.python | Previous | Next — Previous in thread | Find similar | Unroll thread
Python 3.2 bug? Reading the last line of a file "tkpmep@hotmail.com" <tkpmep@hotmail.com> - 2011-05-25 12:33 -0700
Re: Python 3.2 bug? Reading the last line of a file MRAB <python@mrabarnett.plus.com> - 2011-05-25 21:00 +0100
Re: Python 3.2 bug? Reading the last line of a file Ian Kelly <ian.g.kelly@gmail.com> - 2011-05-25 14:54 -0600
Re: Python 3.2 bug? Reading the last line of a file MRAB <python@mrabarnett.plus.com> - 2011-05-25 22:52 +0100
Re: Python 3.2 bug? Reading the last line of a file "tkpmep@hotmail.com" <tkpmep@hotmail.com> - 2011-05-25 16:25 -0700
Re: Python 3.2 bug? Reading the last line of a file Ethan Furman <ethan@stoneleaf.us> - 2011-05-25 16:58 -0700
Re: Python 3.2 bug? Reading the last line of a file MRAB <python@mrabarnett.plus.com> - 2011-05-26 00:56 +0100
Re: Python 3.2 bug? Reading the last line of a file Ethan Furman <ethan@stoneleaf.us> - 2011-05-25 17:32 -0700
Re: Python 3.2 bug? Reading the last line of a file Jussi Piitulainen <jpiitula@ling.helsinki.fi> - 2011-05-26 08:09 +0300
Re: Python 3.2 bug? Reading the last line of a file "tkpmep@hotmail.com" <tkpmep@hotmail.com> - 2011-05-27 12:21 -0700
Re: Python 3.2 bug? Reading the last line of a file Ian Kelly <ian.g.kelly@gmail.com> - 2011-05-25 19:06 -0600
csiph-web