Groups | Search | Server Info | Keyboard shortcuts | Login | Register [http] [https] [nntp] [nntps]
Groups > comp.lang.python > #94720
| From | dieter <dieter@handshake.de> |
|---|---|
| Subject | Re: What happens when python seeks a text file |
| Date | 2015-07-29 07:52 +0200 |
| References | <22f8fb9f.10135.14ed1d210fb.Coremail.lijpbasin@126.com> |
| Newsgroups | comp.lang.python |
| Message-ID | <mailman.1058.1438149164.3674.python-list@python.org> (permalink) |
"=?GBK?B?wO68zsX0?=" <lijpbasin@126.com> writes: > Hi, I tried using seek to reverse a text file after reading about the > subject in the documentation: > > https://docs.python.org/3/tutorial/inputoutput.html#methods-of-file-objects > > https://docs.python.org/3/library/io.html#io.TextIOBase.seek > ... > However, an exception is raised if a file with the same content encoded in > GBK is provided: > > $ ./reverse_text_by_seek3.py Moon-gbk.txt > [0, 7, 8, 19, 21, 32, 42, 53, 64] > µÍͷ˼¹ÊÏç > ¾ÙÍ·ÍûÃ÷Ô > Traceback (most recent call last): > File "./reverse_text_by_seek3.py", line 21, in <module> > print(f.readline(), end="") > UnicodeDecodeError: 'gbk' codec can't decode byte 0xaa in position 8: illegal multibyte sequence The "seek" works on byte level while decoding works on character level where some characters can be composed of several bytes. The error you observe indicates that you have "seeked" somewhere inside a character, not at a legal character beginning. That you get an error for "gbk" and not for "utf-8" is a bit of an "accident". The same problem can happen for "utf-8" but the probability might by sligtly inferior. Seek only to byte position for which you know that they are also character beginnings -- e.g. line beginnings.
Back to comp.lang.python | Previous | Next | Find similar | Unroll thread
Re: What happens when python seeks a text file dieter <dieter@handshake.de> - 2015-07-29 07:52 +0200
csiph-web