Groups | Search | Server Info | Keyboard shortcuts | Login | Register [http] [https] [nntp] [nntps]


Groups > comp.lang.python > #97056

Re: Readlines returns non ASCII character

Subject Re: Readlines returns non ASCII character
References (1 earlier) <CAJ4+4aoMn+93oQsVXnDF2wmnpDfUbg9zna==v2RVh2DP_o4otg@mail.gmail.com> <CAFHq_S6wVanCc6GKRRRWdYwFcqzq2Cv-Cdz3KPz=hX0w9jZdPw@mail.gmail.com> <CAJ4+4aqoYLeeShM+PSR5hxdHa=v_d=XGCY_MifxUPQUwVKuuMQ@mail.gmail.com> <56033F56.1020308@mrabarnett.plus.com> <CALwzidnuUP8bL7cTP05VTCtC_1ok8i4VMXEg-exGc-_egu=beA@mail.gmail.com>
From MRAB <python@mrabarnett.plus.com>
Date 2015-09-24 03:02 +0100
Newsgroups comp.lang.python
Message-ID <mailman.114.1443060146.28679.python-list@python.org> (permalink)

Show all headers | View raw


On 2015-09-24 02:37, Ian Kelly wrote:
> On Wed, Sep 23, 2015 at 6:09 PM, MRAB <python@mrabarnett.plus.com> wrote:
>> On 2015-09-24 00:51, paul.hermeneutic@gmail.com wrote:
>>>
>>>   If this starts at the beginning of the file, then it indicates that
>>> the file is UTF-16 (LE).
>>>
>>> UTF-8[t 1]     EF BB BF       239 187 191
>>> UTF-16 (BE)    FE FF          254 255
>>> UTF-16 (LE)    FF FE          255 254
>>> UTF-32 (BE)    00 00 FE FF    0 0 254 255
>>> UTF-32 (LE)    FF FE 00 00    255 254 0 0
>>>
>> The "signature" EF BB BF indicates the encoding called "utf-8-sig" by
>> Python. It occurs on Windows.
>>
>> If the file doesn't start with any of these, then it could be using any
>> encoding (except UTF-16 or UTF-32).
>
> Yes, but what does it mean when the signature is 00 FF 00 FE 00 FF and
> occurs not at the beginning but repeatedly throughout the file, as
> appears in the OP's case?
>
> At least, I'm assuming that the high-order bytes are 00 based on what
> the OP posted. I wouldn't be surprised though if they're just being
> mangled by the terminal, if it happens to be a certain one that will
> not be named but uses CP 1252.
>
Yes, a byte-string literal or a hex dump of, say, the first 256 bytes
would've been better.

Back to comp.lang.python | Previous | Next | Find similar | Unroll thread


Thread

Re: Readlines returns non ASCII character MRAB <python@mrabarnett.plus.com> - 2015-09-24 03:02 +0100

csiph-web