Path: csiph.com!usenet.pasdenom.info!news.redatomik.org!newsfeed.xs4all.nl!newsfeed1.news.xs4all.nl!xs4all!newsgate.cistron.nl!newsgate.news.xs4all.nl!post.news.xs4all.nl!not-for-mail Return-Path: X-Original-To: python-list@python.org Delivered-To: python-list@mail.python.org X-Spam-Status: OK 0.009 X-Spam-Evidence: '*H*': 0.98; '*S*': 0.00; 'bug.': 0.09; 'character,': 0.09; 'option,': 0.09; 'subject:set': 0.09; 'terminated': 0.09; 'cc:addr:python-list': 0.11; '\\n,': 0.16; 'emits': 0.16; 'from:addr:rosuav': 0.16; 'from:name:chris angelico': 0.16; 'it;': 0.16; 'macs': 0.16; 'newlines': 0.16; 'rather,': 0.16; 'sees': 0.16; 'subject:when': 0.16; 'tried:': 0.16; 'followed': 0.16; 'wrote:': 0.18; 'basically': 0.19; 'skip:f 30': 0.19; 'cc:addr:python.org': 0.22; 'mon,': 0.24; '(or': 0.24; 'cc:2**0': 0.24; 'read,': 0.26; 'header:In-Reply-To:1': 0.27; 'fixed': 0.29; 'character': 0.29; 'message-id:@mail.gmail.com': 0.30; "i'm": 0.30; 'lines': 0.31; '>>>>': 0.31; 'assumes': 0.31; 'documenting': 0.31; 'another': 0.32; 'text': 0.33; "can't": 0.35; 'knows': 0.35; 'something': 0.35; 'but': 0.35; 'received:google.com': 0.35; 'next': 0.36; 'subject:?': 0.36; 'should': 0.36; 'easily': 0.37; 'files': 0.38; 'pm,': 0.38; 'sure': 0.39; 'read': 0.60; 'blank': 0.60; 'removing': 0.60; 'worth': 0.66; 'believe': 0.68; 'line,': 0.68; 'subject:get': 0.81; '2015': 0.84; 'otten': 0.84; 'to:none': 0.92; 'imagine': 0.93 DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=gmail.com; s=20120113; h=mime-version:in-reply-to:references:date:message-id:subject:from:cc :content-type; bh=4fTfDmkQ99zoAj3nwVUDXAhoyzIgLXCPFMJFxxEOc+E=; b=vwx6PE8H81Sl/OyRpMWW45JGn8ZXDJEIlbL7Y6ostoZYZfzrDzkrtS91/vnZGSAyd2 BtO0cTvZuRqQqc7gV1deSTvoIGoqXifyNjqKFOmhI+O4dzhThWFVb1II80fVBvsDEGGu CWhHZWanljnxOqV1kYIA02Pk0Yi2KbWo9J8ficFYo+ltWD6cZz+zEXJqwCFaWfkM/n0R LUIo+UL7eeLlVIIk5xkA5wraK6mLtAF/AeU1EbQaZ/o3NJvQD8exE5VO5xqsR9fUzcay z09FzQorc0xrowXqHnos2K1iYnwuJz05l9OYk005ylEDvG6X0VWzxQZoV2k70uqQuuYR dXXA== MIME-Version: 1.0 X-Received: by 10.42.43.199 with SMTP id y7mr8177984ice.12.1430741611678; Mon, 04 May 2015 05:13:31 -0700 (PDT) In-Reply-To: References: <3c45772b-77e0-4c17-8b3d-aa246c4b511c@googlegroups.com> Date: Mon, 4 May 2015 22:13:31 +1000 Subject: Re: when does newlines get set in universal newlines mode? From: Chris Angelico Cc: "python-list@python.org" Content-Type: text/plain; charset=UTF-8 X-BeenThere: python-list@python.org X-Mailman-Version: 2.1.20+ Precedence: list List-Id: General discussion list for the Python programming language List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , Newsgroups: comp.lang.python Message-ID: Lines: 38 NNTP-Posting-Host: 2001:888:2000:d::a6 X-Trace: 1430741619 news.xs4all.nl 2879 [2001:888:2000:d::a6]:37876 X-Complaints-To: abuse@xs4all.nl Xref: csiph.com comp.lang.python:89899 On Mon, May 4, 2015 at 10:01 PM, Peter Otten <__peter__@web.de> wrote: > I tried: > >>>> with open("tmp.txt", "wb") as f: f.write("alpha\r\nbeta\rgamma\n") > ... >>>> f = open("tmp.txt", "rU") >>>> f.newlines >>>> f.readline() > 'alpha\n' >>>> f.newlines > # expected: '\r\n' >>>> f.readline() > 'beta\n' >>>> f.newlines > '\r\n' # expected: ('\r', '\r\n') >>>> f.readline() > 'gamma\n' >>>> f.newlines > ('\r', '\n', '\r\n') > > I believe this is a bug. I'm not sure it is, actually; imagine the text is coming in one character at a time (eg from a pipe), and it's seen "alpha\r". It knows that this is a line, so it emits it; but until the next character is read, it can't know whether it's going to be \r or \r\n. What should it do? Read another character, which might block? Put "\r" into .newlines, which might be wrong? Once it sees the \n, it knows that it was \r\n (or rather, it assumes that files do not have lines of text terminated by \r followed by blank lines terminated by \n - because that would be stupid). It may be worth documenting this limitation, but it's not something that can easily be fixed without removing support for \r newlines - although that might be an option, given that non-OSX Macs are basically history now. ChrisA