Groups | Search | Server Info | Keyboard shortcuts | Login | Register [http] [https] [nntp] [nntps]


Groups > comp.lang.python > #89893 > unrolled thread

when does newlines get set in universal newlines mode?

Started byarekfu@gmail.com
First post2015-05-04 02:50 -0700
Last post2015-05-04 09:33 -0600
Articles 13 — 7 participants

Back to article view | Back to comp.lang.python


Contents

  when does newlines get set in universal newlines mode? arekfu@gmail.com - 2015-05-04 02:50 -0700
    Re: when does newlines get set in universal newlines mode? Peter Otten <__peter__@web.de> - 2015-05-04 14:01 +0200
    Re: when does newlines get set in universal newlines mode? Chris Angelico <rosuav@gmail.com> - 2015-05-04 22:13 +1000
      Re: when does newlines get set in universal newlines mode? Davide Mancusi <arekfu@gmail.com> - 2015-05-04 06:35 -0700
        Re: when does newlines get set in universal newlines mode? Terry Reedy <tjreedy@udel.edu> - 2015-05-04 13:38 -0400
      Re: when does newlines get set in universal newlines mode? Steven D'Aprano <steve+comp.lang.python@pearwood.info> - 2015-05-05 18:31 +1000
        Re: when does newlines get set in universal newlines mode? Chris Angelico <rosuav@gmail.com> - 2015-05-05 18:41 +1000
          Re: when does newlines get set in universal newlines mode? Davide Mancusi <arekfu@gmail.com> - 2015-05-05 02:23 -0700
            Re: when does newlines get set in universal newlines mode? Chris Angelico <rosuav@gmail.com> - 2015-05-05 19:28 +1000
              Re: when does newlines get set in universal newlines mode? Davide Mancusi <arekfu@gmail.com> - 2015-05-05 03:58 -0700
    Re: when does newlines get set in universal newlines mode? Peter Otten <__peter__@web.de> - 2015-05-04 17:17 +0200
    Re: when does newlines get set in universal newlines mode? Chris Angelico <rosuav@gmail.com> - 2015-05-05 01:26 +1000
    Re: when does newlines get set in universal newlines mode? Ian Kelly <ian.g.kelly@gmail.com> - 2015-05-04 09:33 -0600

#89893 — when does newlines get set in universal newlines mode?

Fromarekfu@gmail.com
Date2015-05-04 02:50 -0700
Subjectwhen does newlines get set in universal newlines mode?
Message-ID<3c45772b-77e0-4c17-8b3d-aa246c4b511c@googlegroups.com>
Hi all,

I have a text file with Windows-style line terminators (\r\n) which I open in universal newlines mode (Python 2.7). I would expect the newlines attribute to be set after the first call to the readline() method, but apparently this is not the case:

>>> f=open('test_crlf', 'rU')
>>> f.newlines
>>> f.readline()
'foo\n'
>>> f.newlines
>>> f.readline()
'bar\n'
>>> f.newlines
'\r\n'
On the other hand, the newlines attribute gets set after the first call to readline() on a file with Unix-style line endings.

Is this a bug or a feature?

Thanks in advance,
Davide

[toc] | [next] | [standalone]


#89897

FromPeter Otten <__peter__@web.de>
Date2015-05-04 14:01 +0200
Message-ID<mailman.81.1430740879.12865.python-list@python.org>
In reply to#89893
arekfu@gmail.com wrote:

> Hi all,
> 
> I have a text file with Windows-style line terminators (\r\n) which I open
> in universal newlines mode (Python 2.7). I would expect the newlines
> attribute to be set after the first call to the readline() method, but
> apparently this is not the case:
> 
>>>> f=open('test_crlf', 'rU')
>>>> f.newlines
>>>> f.readline()
> 'foo\n'
>>>> f.newlines
>>>> f.readline()
> 'bar\n'
>>>> f.newlines
> '\r\n'
> On the other hand, the newlines attribute gets set after the first call to
> readline() on a file with Unix-style line endings.
> 
> Is this a bug or a feature?

According to

https://docs.python.org/2.7/library/functions.html#open

"""
If Python is built without universal newlines support a mode with 'U' is the 
same as normal text mode. Note that file objects so opened also have an 
attribute called newlines which has a value of None (if no newlines have yet 
been seen), '\n', '\r', '\r\n', or a tuple containing all the newline types 
seen.
"""

I tried:

>>> with open("tmp.txt", "wb") as f: f.write("alpha\r\nbeta\rgamma\n")
... 
>>> f = open("tmp.txt", "rU")
>>> f.newlines
>>> f.readline()
'alpha\n'
>>> f.newlines 
# expected: '\r\n'
>>> f.readline()
'beta\n'
>>> f.newlines
'\r\n' # expected: ('\r', '\r\n')
>>> f.readline()
'gamma\n'
>>> f.newlines
('\r', '\n', '\r\n')

I believe this is a bug.

[toc] | [prev] | [next] | [standalone]


#89899

FromChris Angelico <rosuav@gmail.com>
Date2015-05-04 22:13 +1000
Message-ID<mailman.83.1430741619.12865.python-list@python.org>
In reply to#89893
On Mon, May 4, 2015 at 10:01 PM, Peter Otten <__peter__@web.de> wrote:
> I tried:
>
>>>> with open("tmp.txt", "wb") as f: f.write("alpha\r\nbeta\rgamma\n")
> ...
>>>> f = open("tmp.txt", "rU")
>>>> f.newlines
>>>> f.readline()
> 'alpha\n'
>>>> f.newlines
> # expected: '\r\n'
>>>> f.readline()
> 'beta\n'
>>>> f.newlines
> '\r\n' # expected: ('\r', '\r\n')
>>>> f.readline()
> 'gamma\n'
>>>> f.newlines
> ('\r', '\n', '\r\n')
>
> I believe this is a bug.

I'm not sure it is, actually; imagine the text is coming in one
character at a time (eg from a pipe), and it's seen "alpha\r". It
knows that this is a line, so it emits it; but until the next
character is read, it can't know whether it's going to be \r or \r\n.
What should it do? Read another character, which might block? Put "\r"
into .newlines, which might be wrong? Once it sees the \n, it knows
that it was \r\n (or rather, it assumes that files do not have lines
of text terminated by \r followed by blank lines terminated by \n -
because that would be stupid).

It may be worth documenting this limitation, but it's not something
that can easily be fixed without removing support for \r newlines -
although that might be an option, given that non-OSX Macs are
basically history now.

ChrisA

[toc] | [prev] | [next] | [standalone]


#89902

FromDavide Mancusi <arekfu@gmail.com>
Date2015-05-04 06:35 -0700
Message-ID<b41e75a7-d890-4ff8-b83b-2e8e223137b5@googlegroups.com>
In reply to#89899
>> I believe this is a bug.
>
> I'm not sure it is, actually; imagine the text is coming in one
> character at a time (eg from a pipe), and it's seen "alpha\r". It
> knows that this is a line, so it emits it; but until the next
> character is read, it can't know whether it's going to be \r or \r\n.
> What should it do? Read another character, which might block? Put "\r"
> into .newlines, which might be wrong? Once it sees the \n, it knows
> that it was \r\n (or rather, it assumes that files do not have lines
> of text terminated by \r followed by blank lines terminated by \n -
> because that would be stupid).

I think this is a good point. However, I will probably submit a bug
report anyway and let the devs make their decisions. It is at least a
documentation bug.

Cheers,
Davide

[toc] | [prev] | [next] | [standalone]


#89917

FromTerry Reedy <tjreedy@udel.edu>
Date2015-05-04 13:38 -0400
Message-ID<mailman.95.1430761138.12865.python-list@python.org>
In reply to#89902
On 5/4/2015 9:35 AM, Davide Mancusi wrote:
>>> I believe this is a bug.
>>
>> I'm not sure it is, actually; imagine the text is coming in one
>> character at a time (eg from a pipe), and it's seen "alpha\r". It
>> knows that this is a line, so it emits it; but until the next
>> character is read, it can't know whether it's going to be \r or \r\n.
>> What should it do? Read another character, which might block? Put "\r"
>> into .newlines, which might be wrong? Once it sees the \n, it knows
>> that it was \r\n (or rather, it assumes that files do not have lines
>> of text terminated by \r followed by blank lines terminated by \n -
>> because that would be stupid).
>
> I think this is a good point. However, I will probably submit a bug
> report anyway and let the devs make their decisions. It is at least a
> documentation bug.

Be sure to report the exact python binary you are using, as reported 
when you start the interactive interpreter or Idle shell.

-- 
Terry Jan Reedy

[toc] | [prev] | [next] | [standalone]


#89944

FromSteven D'Aprano <steve+comp.lang.python@pearwood.info>
Date2015-05-05 18:31 +1000
Message-ID<55487fe3$0$12919$c3e8da3$5496439d@news.astraweb.com>
In reply to#89899
On Monday 04 May 2015 22:13, Chris Angelico wrote:

> It may be worth documenting this limitation, but it's not something
> that can easily be fixed without removing support for \r newlines -
> although that might be an option, given that non-OSX Macs are
> basically history now.

Non-OSX Macs are history, but the text files they created are not.



-- 
Steve

[toc] | [prev] | [next] | [standalone]


#89945

FromChris Angelico <rosuav@gmail.com>
Date2015-05-05 18:41 +1000
Message-ID<mailman.112.1430815275.12865.python-list@python.org>
In reply to#89944
On Tue, May 5, 2015 at 6:31 PM, Steven D'Aprano
<steve+comp.lang.python@pearwood.info> wrote:
> On Monday 04 May 2015 22:13, Chris Angelico wrote:
>
>> It may be worth documenting this limitation, but it's not something
>> that can easily be fixed without removing support for \r newlines -
>> although that might be an option, given that non-OSX Macs are
>> basically history now.
>
> Non-OSX Macs are history, but the text files they created are not.

True. Like I said, "might be". I could imagine, for instance, a caveat
being put on the newlines attribute saying that "when reading a file
delimited by \r, this may erroneously imply that \r\n is being used,
until a second line has been read". But that's probably more
complexity than it's worth.

ChrisA

[toc] | [prev] | [next] | [standalone]


#89948

FromDavide Mancusi <arekfu@gmail.com>
Date2015-05-05 02:23 -0700
Message-ID<5728620b-6076-4752-8025-93ed460afba0@googlegroups.com>
In reply to#89945
I just opened a bug report:

http://bugs.python.org/issue24126

We'll see what they say.

[toc] | [prev] | [next] | [standalone]


#89949

FromChris Angelico <rosuav@gmail.com>
Date2015-05-05 19:28 +1000
Message-ID<mailman.115.1430818087.12865.python-list@python.org>
In reply to#89948
On Tue, May 5, 2015 at 7:23 PM, Davide Mancusi <arekfu@gmail.com> wrote:
> I just opened a bug report:
>
> http://bugs.python.org/issue24126
>
> We'll see what they say.

Cool. I suggest posting in the tracker thread the exact Python
version(s) you've tested this with, in case it matters.

ChrisA

[toc] | [prev] | [next] | [standalone]


#89959

FromDavide Mancusi <arekfu@gmail.com>
Date2015-05-05 03:58 -0700
Message-ID<6ba588a6-2a82-4111-9462-17cd98116f97@googlegroups.com>
In reply to#89949
> Cool. I suggest posting in the tracker thread the exact Python
> version(s) you've tested this with, in case it matters.

Done. Good point.

[toc] | [prev] | [next] | [standalone]


#89909

FromPeter Otten <__peter__@web.de>
Date2015-05-04 17:17 +0200
Message-ID<mailman.91.1430752649.12865.python-list@python.org>
In reply to#89893
Chris Angelico wrote:

> On Mon, May 4, 2015 at 10:01 PM, Peter Otten <__peter__@web.de> wrote:
>> I tried:
>>
>>>>> with open("tmp.txt", "wb") as f: f.write("alpha\r\nbeta\rgamma\n")
>> ...
>>>>> f = open("tmp.txt", "rU")
>>>>> f.newlines
>>>>> f.readline()
>> 'alpha\n'
>>>>> f.newlines
>> # expected: '\r\n'
>>>>> f.readline()
>> 'beta\n'
>>>>> f.newlines
>> '\r\n' # expected: ('\r', '\r\n')
>>>>> f.readline()
>> 'gamma\n'
>>>>> f.newlines
>> ('\r', '\n', '\r\n')
>>
>> I believe this is a bug.
> 
> I'm not sure it is, actually; imagine the text is coming in one
> character at a time (eg from a pipe), and it's seen "alpha\r". It
> knows that this is a line, so it emits it; but until the next
> character is read, it can't know whether it's going to be \r or \r\n.
> What should it do? Read another character, which might block? Put "\r"
> into .newlines, which might be wrong? Once it sees the \n, it knows
> that it was \r\n (or rather, it assumes that files do not have lines
> of text terminated by \r followed by blank lines terminated by \n -
> because that would be stupid).
> 
> It may be worth documenting this limitation, but it's not something
> that can easily be fixed without removing support for \r newlines -
> although that might be an option, given that non-OSX Macs are
> basically history now.

OK, you convinced me. Then I tried:

>>> with open("tmp.txt", "wb") as f: f.write("0\r\n3\r5\n7")
... 
>>> assert len(open("tmp.txt", "rb").read()) == 8
>>> f = open("tmp.txt", "rU")
>>> f.readline()
'0\n'
>>> f.newlines
>>> f.tell()
3
>>> f.newlines
'\r\n'

Hm, so tell() moves the file pointer? Is that sane?

[toc] | [prev] | [next] | [standalone]


#89910

FromChris Angelico <rosuav@gmail.com>
Date2015-05-05 01:26 +1000
Message-ID<mailman.92.1430753207.12865.python-list@python.org>
In reply to#89893
On Tue, May 5, 2015 at 1:17 AM, Peter Otten <__peter__@web.de> wrote:
> OK, you convinced me. Then I tried:
>
>>>> with open("tmp.txt", "wb") as f: f.write("0\r\n3\r5\n7")
> ...
>>>> assert len(open("tmp.txt", "rb").read()) == 8
>>>> f = open("tmp.txt", "rU")
>>>> f.readline()
> '0\n'
>>>> f.newlines
>>>> f.tell()
> 3
>>>> f.newlines
> '\r\n'
>
> Hm, so tell() moves the file pointer? Is that sane?

... wow. Okay! That's a bit weird.

It's possible that something's being done with internal buffering
(after all, it's horribly inefficient to *actually* read text one byte
at a time, even if that's what's happening conceptually), and that
tell() causes some checks to be done. But that really is rather
strange. I'd be interested to know what happens if another process
writes to a pipe "0\r", then sleeps while the readline() and tell()
happen, and then writes a "\n" - what will that do to newlines?

By the way, it's as well to clarify, with all these examples, what
Python version you're using. There may be significant differences.

ChrisA

[toc] | [prev] | [next] | [standalone]


#89911

FromIan Kelly <ian.g.kelly@gmail.com>
Date2015-05-04 09:33 -0600
Message-ID<mailman.93.1430753672.12865.python-list@python.org>
In reply to#89893
On Mon, May 4, 2015 at 9:17 AM, Peter Otten <__peter__@web.de> wrote:
> OK, you convinced me. Then I tried:
>
>>>> with open("tmp.txt", "wb") as f: f.write("0\r\n3\r5\n7")
> ...
>>>> assert len(open("tmp.txt", "rb").read()) == 8
>>>> f = open("tmp.txt", "rU")
>>>> f.readline()
> '0\n'
>>>> f.newlines
>>>> f.tell()
> 3
>>>> f.newlines
> '\r\n'
>
> Hm, so tell() moves the file pointer? Is that sane?

If I call readline() followed by tell(), I expect the result to be the
position of the start of the next line. Maybe this is considered safe
because tell() on a pipe raises an exception?

[toc] | [prev] | [standalone]


Back to top | Article view | comp.lang.python


csiph-web