Groups | Search | Server Info | Keyboard shortcuts | Login | Register [http] [https] [nntp] [nntps]


Groups > comp.lang.python > #94720

Re: What happens when python seeks a text file

Return-Path <python-python-list@m.gmane.org>
X-Original-To python-list@python.org
Delivered-To python-list@mail.python.org
X-Spam-Status OK 0.000
X-Spam-Evidence '*H*': 1.00; '*S*': 0.00; 'subject:text': 0.04; 'encoded': 0.05; '21,': 0.07; 'bytes.': 0.07; 'character,': 0.07; 'subject:file': 0.07; '[0,': 0.09; 'indicates': 0.09; 'received:80.91': 0.09; 'received:80.91.229': 0.09; 'received:gmane.org': 0.09; 'received:list': 0.09; 'exception': 0.13; 'subject:python': 0.14; '32,': 0.16; '42,': 0.16; 'beginning.': 0.16; 'codec': 0.16; 'decode': 0.16; 'decoding': 0.16; 'received:80.91.229.3': 0.16; 'received:dip0.t-ipconnect.de': 0.16; 'received:plane.gmane.org': 0.16; 'received:t-ipconnect.de': 0.16; 'subject:when': 0.16; 'byte': 0.18; 'bit': 0.23; 'tried': 0.24; '(most': 0.24; 'somewhere': 0.24; 'header:User-Agent:1': 0.26; 'header:X -Complaints-To:1': 0.26; 'skip:" 20': 0.26; 'error': 0.27; 'sequence': 0.27; 'character': 0.29; 'e.g.': 0.30; "can't": 0.32; 'problem': 0.33; 'url:python': 0.33; 'raised': 0.33; 'traceback': 0.33; 'file': 0.34; 'text': 0.35; 'skip:. 20': 0.35; 'level': 0.35; 'but': 0.36; 'url:org': 0.36; 'url:library': 0.36; 'to:addr :python-list': 0.36; 'subject:: ': 0.37; 'received:org': 0.37; 'several': 0.38; 'hi,': 0.38; 'to:addr:python.org': 0.40; 'where': 0.40; 'received:de': 0.40; 'some': 0.40; 'url:3': 0.60; 'skip:u 10': 0.61; 'reverse': 0.66; 'received:217': 0.66; 'legal': 0.66; 'subject': 0.70; 'url:tutorial': 0.91
X-Injected-Via-Gmane http://gmane.org/
To python-list@python.org
From dieter <dieter@handshake.de>
Subject Re: What happens when python seeks a text file
Date Wed, 29 Jul 2015 07:52:28 +0200
References <22f8fb9f.10135.14ed1d210fb.Coremail.lijpbasin@126.com>
Mime-Version 1.0
Content-Type text/plain; charset=iso-8859-1
Content-Transfer-Encoding 8bit
X-Gmane-NNTP-Posting-Host pd9e0acb6.dip0.t-ipconnect.de
User-Agent Gnus/5.1008 (Gnus v5.10.8) XEmacs/21.4.22 (linux)
Cancel-Lock sha1:m6zwMAUAlXgoo0I9cM9/TdBOhac=
X-BeenThere python-list@python.org
X-Mailman-Version 2.1.20+
Precedence list
List-Id General discussion list for the Python programming language <python-list.python.org>
List-Unsubscribe <https://mail.python.org/mailman/options/python-list>, <mailto:python-list-request@python.org?subject=unsubscribe>
List-Archive <http://mail.python.org/pipermail/python-list/>
List-Post <mailto:python-list@python.org>
List-Help <mailto:python-list-request@python.org?subject=help>
List-Subscribe <https://mail.python.org/mailman/listinfo/python-list>, <mailto:python-list-request@python.org?subject=subscribe>
Newsgroups comp.lang.python
Message-ID <mailman.1058.1438149164.3674.python-list@python.org> (permalink)
Lines 35
NNTP-Posting-Host 2001:888:2000:d::a6
X-Trace 1438149164 news.xs4all.nl 2877 [2001:888:2000:d::a6]:59120
X-Complaints-To abuse@xs4all.nl
X-Received-Bytes 4346
X-Received-Body-CRC 3260891877
Path csiph.com!usenet.pasdenom.info!news.stben.net!border1.nntp.ams1.giganews.com!nntp.giganews.com!bcyclone05.am1.xlned.com!bcyclone05.am1.xlned.com!newsfeed.xs4all.nl!newsfeed7.news.xs4all.nl!post.news.xs4all.nl!not-for-mail
Xref csiph.com comp.lang.python:94720

Show key headers only | View raw


"=?GBK?B?wO68zsX0?=" <lijpbasin@126.com> writes:

> Hi, I tried using seek to reverse a text file after reading about the
> subject in the documentation:
>
> https://docs.python.org/3/tutorial/inputoutput.html#methods-of-file-objects
>
> https://docs.python.org/3/library/io.html#io.TextIOBase.seek
> ...
> However, an exception is raised if a file with the same content encoded in
> GBK is provided:
>
>     $ ./reverse_text_by_seek3.py Moon-gbk.txt
>     [0, 7, 8, 19, 21, 32, 42, 53, 64]
>     µÍͷ˼¹ÊÏç
>     ¾ÙÍ·ÍûÃ÷ÔÂ
>     Traceback (most recent call last):
>       File "./reverse_text_by_seek3.py", line 21, in <module>
>         print(f.readline(), end="")
>     UnicodeDecodeError: 'gbk' codec can't decode byte 0xaa in position 8: illegal multibyte sequence

The "seek" works on byte level while decoding works on character level
where some characters can be composed of several bytes.

The error you observe indicates that you have "seeked" somewhere
inside a character, not at a legal character beginning.

That you get an error for "gbk" and not for "utf-8" is a bit of
an "accident". The same problem can happen for "utf-8" but the probability
might by sligtly inferior.


Seek only to byte position for which you know that they are also
character beginnings -- e.g. line beginnings.

Back to comp.lang.python | Previous | Next | Find similar | Unroll thread


Thread

Re: What happens when python seeks a text file dieter <dieter@handshake.de> - 2015-07-29 07:52 +0200

csiph-web