Groups | Search | Server Info | Keyboard shortcuts | Login | Register [http] [https] [nntp] [nntps]


Groups > comp.lang.python > #97055

Re: Readlines returns non ASCII character

Path csiph.com!goblin3!goblin1!goblin.stu.neva.ru!uio.no!news.tele.dk!news.tele.dk!small.news.tele.dk!newsgate.cistron.nl!newsgate.news.xs4all.nl!nzpost1.xs4all.net!not-for-mail
Return-Path <ian.g.kelly@gmail.com>
X-Original-To python-list@python.org
Delivered-To python-list@mail.python.org
X-Spam-Status OK 0.005
X-Spam-Evidence '*H*': 0.99; '*S*': 0.00; '(except': 0.05; 'mrab': 0.05; 'indicates': 0.09; 'posted.': 0.09; 'python.': 0.11; 'wed,': 0.15; 'encoding': 0.15; 'file,': 0.15; '23,': 0.16; '255': 0.16; 'mangled': 0.16; 'subject:non': 0.16; 'wrote:': 0.16; "wouldn't": 0.16; 'bytes': 0.18; '2015': 0.20; 'assuming': 0.22; 'occurs': 0.22; 'sep': 0.22; 'appears': 0.23; 'header:In-Reply-To:1': 0.24; "doesn't": 0.26; 'message-id:@mail.gmail.com': 0.27; 'starts': 0.29; "i'm": 0.30; 'certain': 0.31; 'surprised': 0.33; 'windows.': 0.33; 'throughout': 0.34; 'file': 0.34; 'received:google.com': 0.35; 'could': 0.35; 'but': 0.36; 'beginning': 0.36; 'to:addr :python-list': 0.36; 'pm,': 0.36; 'subject:: ': 0.37; 'being': 0.37; 'signature': 0.37; 'mean': 0.38; 'does': 0.39; 'to:addr:python.org': 0.40; 'called': 0.40; 'email addr:gmail.com': 0.62; "they're": 0.66; 'case?': 0.84; "op's": 0.84; 'to:name:python': 0.84
DKIM-Signature v=1; a=rsa-sha256; c=relaxed/relaxed; d=gmail.com; s=20120113; h=mime-version:in-reply-to:references:from:date:message-id:subject:to :content-type; bh=e4VsGiiJhMPzWWfercy2lrQu6ZuDJ3KHcisuEYUfO1g=; b=TFz0cOe8n5zdnN6Z0EWzM0+7L8W0n9MHKVdZFItrVcKP3U1qx1QBXKZuEFZ0qRuPZZ qdAIhpNkGZqZJpPtSwtA2iUEeQkpbhg4trqyh+39IzxC3QqGIjUA5vzeguAE4DH04DDb sS3NjYKgKVkqBuXZCf1tRW1Vss9LCUQKRwyqbUuWWlhH5vHHu2WwjdH2fplxAyJlu5Wx Yezz4X0msyZrPYBBjIyNCuJusCaz2HAOdW8zOEprW+UGnhF3h4p8nyXwkrIjNPJbP5mi VkmICAZNOfcpcce4TODcC4qf62fwYl+CNlyqYhz/WzjisL8BsZmAsvt7XXBWmme6UkGW 74wg==
X-Received by 10.129.101.11 with SMTP id z11mr22027023ywb.85.1443058702891; Wed, 23 Sep 2015 18:38:22 -0700 (PDT)
MIME-Version 1.0
In-Reply-To <56033F56.1020308@mrabarnett.plus.com>
References <CAFHq_S4dQxkQoHhP0hQfNvZ0KVtz2F-PFqbePOyYUGFXCLqipg@mail.gmail.com> <CAJ4+4aoMn+93oQsVXnDF2wmnpDfUbg9zna==v2RVh2DP_o4otg@mail.gmail.com> <CAFHq_S6wVanCc6GKRRRWdYwFcqzq2Cv-Cdz3KPz=hX0w9jZdPw@mail.gmail.com> <CAJ4+4aqoYLeeShM+PSR5hxdHa=v_d=XGCY_MifxUPQUwVKuuMQ@mail.gmail.com> <56033F56.1020308@mrabarnett.plus.com>
From Ian Kelly <ian.g.kelly@gmail.com>
Date Wed, 23 Sep 2015 19:37:43 -0600
Subject Re: Readlines returns non ASCII character
To Python <python-list@python.org>
Content-Type text/plain; charset=UTF-8
X-BeenThere python-list@python.org
X-Mailman-Version 2.1.20+
Precedence list
List-Id General discussion list for the Python programming language <python-list.python.org>
List-Unsubscribe <https://mail.python.org/mailman/options/python-list>, <mailto:python-list-request@python.org?subject=unsubscribe>
List-Archive <http://mail.python.org/pipermail/python-list/>
List-Post <mailto:python-list@python.org>
List-Help <mailto:python-list-request@python.org?subject=help>
List-Subscribe <https://mail.python.org/mailman/listinfo/python-list>, <mailto:python-list-request@python.org?subject=subscribe>
Newsgroups comp.lang.python
Message-ID <mailman.113.1443058705.28679.python-list@python.org> (permalink)
Lines 26
NNTP-Posting-Host 2001:888:2000:d::a6
X-Trace 1443058705 news.xs4all.nl 23851 [2001:888:2000:d::a6]:38722
X-Complaints-To abuse@xs4all.nl
Xref csiph.com comp.lang.python:97055

Show key headers only | View raw


On Wed, Sep 23, 2015 at 6:09 PM, MRAB <python@mrabarnett.plus.com> wrote:
> On 2015-09-24 00:51, paul.hermeneutic@gmail.com wrote:
>>
>>   If this starts at the beginning of the file, then it indicates that
>> the file is UTF-16 (LE).
>>
>> UTF-8[t 1]     EF BB BF       239 187 191
>> UTF-16 (BE)    FE FF          254 255
>> UTF-16 (LE)    FF FE          255 254
>> UTF-32 (BE)    00 00 FE FF    0 0 254 255
>> UTF-32 (LE)    FF FE 00 00    255 254 0 0
>>
> The "signature" EF BB BF indicates the encoding called "utf-8-sig" by
> Python. It occurs on Windows.
>
> If the file doesn't start with any of these, then it could be using any
> encoding (except UTF-16 or UTF-32).

Yes, but what does it mean when the signature is 00 FF 00 FE 00 FF and
occurs not at the beginning but repeatedly throughout the file, as
appears in the OP's case?

At least, I'm assuming that the high-order bytes are 00 based on what
the OP posted. I wouldn't be surprised though if they're just being
mangled by the terminal, if it happens to be a certain one that will
not be named but uses CP 1252.

Back to comp.lang.python | Previous | Next | Find similar | Unroll thread


Thread

Re: Readlines returns non ASCII character Ian Kelly <ian.g.kelly@gmail.com> - 2015-09-23 19:37 -0600

csiph-web