Path: csiph.com!fu-berlin.de!uni-berlin.de!not-for-mail From: Ian Kelly Newsgroups: comp.lang.python Subject: Re: Irregular last line in a text file, was Re: Regular expressions Date: Tue, 3 Nov 2015 11:39:32 -0700 Lines: 38 Message-ID: References: <662g3blobme52hfoududj27err185v2npm@4ax.com> <20151102204237.6a78abdf@bigbox.christie.dr> <56382F33.8050905@gmail.com> <20151103055018.535e3e42@bigbox.christie.dr> <20151103105653.622d5e34@bigbox.christie.dr> Mime-Version: 1.0 Content-Type: text/plain; charset=UTF-8 X-Trace: news.uni-berlin.de t6iPXvpQMqK5cjCDjJOLvQ1zxoegRAo4uv7enTRx5DrA== Return-Path: X-Original-To: python-list@python.org Delivered-To: python-list@mail.python.org X-Spam-Status: OK 0.002 X-Spam-Evidence: '*H*': 1.00; '*S*': 0.00; 'operator': 0.03; 'subject:text': 0.04; 'subject:file': 0.07; 'optimizing': 0.09; 'def': 0.13; '(but': 0.15; 'iterable)': 0.16; 'iterable:': 0.16; 'iterating': 0.16; 'iteration': 0.16; 'loops': 0.16; 'received:io': 0.16; 'received:psf.io': 0.16; 'subject:Regular': 0.16; 'subject:expressions': 0.16; 'wrote:': 0.16; '>>>': 0.20; '2015': 0.20; 'am,': 0.23; 'this:': 0.23; 'import': 0.24; 'tim': 0.24; 'header:In-Reply-To:1': 0.24; 'skip:m 30': 0.27; 'message- id:@mail.gmail.com': 0.27; 'yield': 0.27; 'chase': 0.29; "we're": 0.30; 'subject:last': 0.30; "i'd": 0.31; 'probably': 0.31; 'tue,': 0.34; 'file': 0.34; 'received:google.com': 0.35; 'nov': 0.35; 'something': 0.35; 'received:209.85': 0.36; 'to:addr:python-list': 0.36; 'subject:: ': 0.37; 'received:209.85.213': 0.37; 'received:209': 0.38; 'to:addr:python.org': 0.40; 'care': 0.60; 'valuable': 0.61; 'skip:n 10': 0.62; 'more': 0.63; 'talking': 0.67; 'to:name:python': 0.84 DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=gmail.com; s=20120113; h=mime-version:in-reply-to:references:from:date:message-id:subject:to :content-type; bh=TCPMz5GdNQd/ifQ8oD2IhAus8BXa603mHhYDa/mm+T4=; b=mUkJH8QMnbhhs6zOTknmUH2ZZyA5gVY+OoiqznKzwrA9QSbAB4fIdI0agl909dS7Bq SlApj3BG8CdxRUilxmYl/WnuKY/Uv96aiV7BOYfeZ3nBviD8Uw4AyQqv0zozfgi/bpoP ELmgX32NynOVLpnvegT4FuWweVSN9isjbuq63oeZJPMK1TcOCZlOaH2cG9S4G98BrclB WNsIJfOOWRytQiVtvO/CyBXO2KwWGMgVrITkuh4iRDfRjSReG2eu2rDYBBo9nJMP6/yX D2jtU/JAxfjJ1qLm3oxLKQVHKhyE2exjzG82Q5/pvgO/32PkcBJ0I4gKFDg7/aNDJf9m gTWw== X-Received: by 10.50.78.231 with SMTP id e7mr18714086igx.93.1446576011650; Tue, 03 Nov 2015 10:40:11 -0800 (PST) In-Reply-To: X-BeenThere: python-list@python.org X-Mailman-Version: 2.1.20+ Precedence: list List-Id: General discussion list for the Python programming language List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , Xref: csiph.com comp.lang.python:98184 On Tue, Nov 3, 2015 at 11:33 AM, Ian Kelly wrote: > On Tue, Nov 3, 2015 at 9:56 AM, Tim Chase wrote: >> Or even more valuable to me: >> >> with open(..., newline="strip") as f: >> assert all(not line.endswith(("\n", "\r")) for line in f) >> >> because I have countless loops that look something like >> >> with open(...) as f: >> for line in f: >> line = line.rstrip('\r\n') >> process(line) > > What would happen if you read a file opened like this without > iterating over lines? I think I'd go with this: >>> def strip_newlines(iterable): ... for line in iterable: ... yield line.rstrip('\r\n') ... >>> list(strip_newlines(['one\n', 'two\r', 'three'])) ['one', 'two', 'three'] Or if I care about optimizing the for loop (but we're talking about file I/O, so probably not), this might be faster: >>> import operator >>> def strip_newlines(iterable): ... return map(operator.methodcaller('rstrip', '\r\n'), iterable) ... >>> list(strip_newlines(['one\n', 'two\r', 'three'])) ['one', 'two', 'three'] Then the iteration is just: for line in strip_newlines(f):