Path: csiph.com!fu-berlin.de!uni-berlin.de!not-for-mail
From: Ian Kelly <ian.g.kelly@gmail.com>
Newsgroups: comp.lang.python
Subject: Re: Irregular last line in a text file, was Re: Regular expressions
Date: Tue, 3 Nov 2015 11:39:32 -0700
Lines: 38
Message-ID: <mailman.45.1446576019.8789.python-list@python.org>
References: <662g3blobme52hfoududj27err185v2npm@4ax.com> <20151102204237.6a78abdf@bigbox.christie.dr> <56382F33.8050905@gmail.com> <n19ui8$df7$1@ger.gmane.org> <20151103055018.535e3e42@bigbox.christie.dr> <mailman.21.1446559261.8789.python-list@python.org> <lf5io5junfn.fsf@ling.helsinki.fi> <n1ak7u$jgs$1@ger.gmane.org> <20151103105653.622d5e34@bigbox.christie.dr> <CALwzidn4e_8_LuuzgiS-vQWUZG+WS0cXmy2-kzpXHBLuPqN53Q@mail.gmail.com>
Mime-Version: 1.0
Content-Type: text/plain; charset=UTF-8
In-Reply-To: <CALwzidn4e_8_LuuzgiS-vQWUZG+WS0cXmy2-kzpXHBLuPqN53Q@mail.gmail.com>
Precedence: list
Xref: csiph.com comp.lang.python:98184

On Tue, Nov 3, 2015 at 11:33 AM, Ian Kelly <ian.g.kelly@gmail.com> wrote:
> On Tue, Nov 3, 2015 at 9:56 AM, Tim Chase <python.list@tim.thechases.com> wrote:
>> Or even more valuable to me:
>>
>>   with open(..., newline="strip") as f:
>>     assert all(not line.endswith(("\n", "\r")) for line in f)
>>
>> because I have countless loops that look something like
>>
>>   with open(...) as f:
>>     for line in f:
>>       line = line.rstrip('\r\n')
>>       process(line)
>
> What would happen if you read a file opened like this without
> iterating over lines?

I think I'd go with this:

>>> def strip_newlines(iterable):
...     for line in iterable:
...         yield line.rstrip('\r\n')
...
>>> list(strip_newlines(['one\n', 'two\r', 'three']))
['one', 'two', 'three']

Or if I care about optimizing the for loop (but we're talking about
file I/O, so probably not), this might be faster:

>>> import operator
>>> def strip_newlines(iterable):
...     return map(operator.methodcaller('rstrip', '\r\n'), iterable)
...
>>> list(strip_newlines(['one\n', 'two\r', 'three']))
['one', 'two', 'three']

Then the iteration is just:
    for line in strip_newlines(f):