Groups | Search | Server Info | Keyboard shortcuts | Login | Register [http] [https] [nntp] [nntps]
Groups > comp.lang.python > #89893 > unrolled thread
| Started by | arekfu@gmail.com |
|---|---|
| First post | 2015-05-04 02:50 -0700 |
| Last post | 2015-05-04 09:33 -0600 |
| Articles | 13 — 7 participants |
Back to article view | Back to comp.lang.python
when does newlines get set in universal newlines mode? arekfu@gmail.com - 2015-05-04 02:50 -0700
Re: when does newlines get set in universal newlines mode? Peter Otten <__peter__@web.de> - 2015-05-04 14:01 +0200
Re: when does newlines get set in universal newlines mode? Chris Angelico <rosuav@gmail.com> - 2015-05-04 22:13 +1000
Re: when does newlines get set in universal newlines mode? Davide Mancusi <arekfu@gmail.com> - 2015-05-04 06:35 -0700
Re: when does newlines get set in universal newlines mode? Terry Reedy <tjreedy@udel.edu> - 2015-05-04 13:38 -0400
Re: when does newlines get set in universal newlines mode? Steven D'Aprano <steve+comp.lang.python@pearwood.info> - 2015-05-05 18:31 +1000
Re: when does newlines get set in universal newlines mode? Chris Angelico <rosuav@gmail.com> - 2015-05-05 18:41 +1000
Re: when does newlines get set in universal newlines mode? Davide Mancusi <arekfu@gmail.com> - 2015-05-05 02:23 -0700
Re: when does newlines get set in universal newlines mode? Chris Angelico <rosuav@gmail.com> - 2015-05-05 19:28 +1000
Re: when does newlines get set in universal newlines mode? Davide Mancusi <arekfu@gmail.com> - 2015-05-05 03:58 -0700
Re: when does newlines get set in universal newlines mode? Peter Otten <__peter__@web.de> - 2015-05-04 17:17 +0200
Re: when does newlines get set in universal newlines mode? Chris Angelico <rosuav@gmail.com> - 2015-05-05 01:26 +1000
Re: when does newlines get set in universal newlines mode? Ian Kelly <ian.g.kelly@gmail.com> - 2015-05-04 09:33 -0600
| From | arekfu@gmail.com |
|---|---|
| Date | 2015-05-04 02:50 -0700 |
| Subject | when does newlines get set in universal newlines mode? |
| Message-ID | <3c45772b-77e0-4c17-8b3d-aa246c4b511c@googlegroups.com> |
Hi all,
I have a text file with Windows-style line terminators (\r\n) which I open in universal newlines mode (Python 2.7). I would expect the newlines attribute to be set after the first call to the readline() method, but apparently this is not the case:
>>> f=open('test_crlf', 'rU')
>>> f.newlines
>>> f.readline()
'foo\n'
>>> f.newlines
>>> f.readline()
'bar\n'
>>> f.newlines
'\r\n'
On the other hand, the newlines attribute gets set after the first call to readline() on a file with Unix-style line endings.
Is this a bug or a feature?
Thanks in advance,
Davide
[toc] | [next] | [standalone]
| From | Peter Otten <__peter__@web.de> |
|---|---|
| Date | 2015-05-04 14:01 +0200 |
| Message-ID | <mailman.81.1430740879.12865.python-list@python.org> |
| In reply to | #89893 |
arekfu@gmail.com wrote:
> Hi all,
>
> I have a text file with Windows-style line terminators (\r\n) which I open
> in universal newlines mode (Python 2.7). I would expect the newlines
> attribute to be set after the first call to the readline() method, but
> apparently this is not the case:
>
>>>> f=open('test_crlf', 'rU')
>>>> f.newlines
>>>> f.readline()
> 'foo\n'
>>>> f.newlines
>>>> f.readline()
> 'bar\n'
>>>> f.newlines
> '\r\n'
> On the other hand, the newlines attribute gets set after the first call to
> readline() on a file with Unix-style line endings.
>
> Is this a bug or a feature?
According to
https://docs.python.org/2.7/library/functions.html#open
"""
If Python is built without universal newlines support a mode with 'U' is the
same as normal text mode. Note that file objects so opened also have an
attribute called newlines which has a value of None (if no newlines have yet
been seen), '\n', '\r', '\r\n', or a tuple containing all the newline types
seen.
"""
I tried:
>>> with open("tmp.txt", "wb") as f: f.write("alpha\r\nbeta\rgamma\n")
...
>>> f = open("tmp.txt", "rU")
>>> f.newlines
>>> f.readline()
'alpha\n'
>>> f.newlines
# expected: '\r\n'
>>> f.readline()
'beta\n'
>>> f.newlines
'\r\n' # expected: ('\r', '\r\n')
>>> f.readline()
'gamma\n'
>>> f.newlines
('\r', '\n', '\r\n')
I believe this is a bug.
[toc] | [prev] | [next] | [standalone]
| From | Chris Angelico <rosuav@gmail.com> |
|---|---|
| Date | 2015-05-04 22:13 +1000 |
| Message-ID | <mailman.83.1430741619.12865.python-list@python.org> |
| In reply to | #89893 |
On Mon, May 4, 2015 at 10:01 PM, Peter Otten <__peter__@web.de> wrote:
> I tried:
>
>>>> with open("tmp.txt", "wb") as f: f.write("alpha\r\nbeta\rgamma\n")
> ...
>>>> f = open("tmp.txt", "rU")
>>>> f.newlines
>>>> f.readline()
> 'alpha\n'
>>>> f.newlines
> # expected: '\r\n'
>>>> f.readline()
> 'beta\n'
>>>> f.newlines
> '\r\n' # expected: ('\r', '\r\n')
>>>> f.readline()
> 'gamma\n'
>>>> f.newlines
> ('\r', '\n', '\r\n')
>
> I believe this is a bug.
I'm not sure it is, actually; imagine the text is coming in one
character at a time (eg from a pipe), and it's seen "alpha\r". It
knows that this is a line, so it emits it; but until the next
character is read, it can't know whether it's going to be \r or \r\n.
What should it do? Read another character, which might block? Put "\r"
into .newlines, which might be wrong? Once it sees the \n, it knows
that it was \r\n (or rather, it assumes that files do not have lines
of text terminated by \r followed by blank lines terminated by \n -
because that would be stupid).
It may be worth documenting this limitation, but it's not something
that can easily be fixed without removing support for \r newlines -
although that might be an option, given that non-OSX Macs are
basically history now.
ChrisA
[toc] | [prev] | [next] | [standalone]
| From | Davide Mancusi <arekfu@gmail.com> |
|---|---|
| Date | 2015-05-04 06:35 -0700 |
| Message-ID | <b41e75a7-d890-4ff8-b83b-2e8e223137b5@googlegroups.com> |
| In reply to | #89899 |
>> I believe this is a bug. > > I'm not sure it is, actually; imagine the text is coming in one > character at a time (eg from a pipe), and it's seen "alpha\r". It > knows that this is a line, so it emits it; but until the next > character is read, it can't know whether it's going to be \r or \r\n. > What should it do? Read another character, which might block? Put "\r" > into .newlines, which might be wrong? Once it sees the \n, it knows > that it was \r\n (or rather, it assumes that files do not have lines > of text terminated by \r followed by blank lines terminated by \n - > because that would be stupid). I think this is a good point. However, I will probably submit a bug report anyway and let the devs make their decisions. It is at least a documentation bug. Cheers, Davide
[toc] | [prev] | [next] | [standalone]
| From | Terry Reedy <tjreedy@udel.edu> |
|---|---|
| Date | 2015-05-04 13:38 -0400 |
| Message-ID | <mailman.95.1430761138.12865.python-list@python.org> |
| In reply to | #89902 |
On 5/4/2015 9:35 AM, Davide Mancusi wrote: >>> I believe this is a bug. >> >> I'm not sure it is, actually; imagine the text is coming in one >> character at a time (eg from a pipe), and it's seen "alpha\r". It >> knows that this is a line, so it emits it; but until the next >> character is read, it can't know whether it's going to be \r or \r\n. >> What should it do? Read another character, which might block? Put "\r" >> into .newlines, which might be wrong? Once it sees the \n, it knows >> that it was \r\n (or rather, it assumes that files do not have lines >> of text terminated by \r followed by blank lines terminated by \n - >> because that would be stupid). > > I think this is a good point. However, I will probably submit a bug > report anyway and let the devs make their decisions. It is at least a > documentation bug. Be sure to report the exact python binary you are using, as reported when you start the interactive interpreter or Idle shell. -- Terry Jan Reedy
[toc] | [prev] | [next] | [standalone]
| From | Steven D'Aprano <steve+comp.lang.python@pearwood.info> |
|---|---|
| Date | 2015-05-05 18:31 +1000 |
| Message-ID | <55487fe3$0$12919$c3e8da3$5496439d@news.astraweb.com> |
| In reply to | #89899 |
On Monday 04 May 2015 22:13, Chris Angelico wrote: > It may be worth documenting this limitation, but it's not something > that can easily be fixed without removing support for \r newlines - > although that might be an option, given that non-OSX Macs are > basically history now. Non-OSX Macs are history, but the text files they created are not. -- Steve
[toc] | [prev] | [next] | [standalone]
| From | Chris Angelico <rosuav@gmail.com> |
|---|---|
| Date | 2015-05-05 18:41 +1000 |
| Message-ID | <mailman.112.1430815275.12865.python-list@python.org> |
| In reply to | #89944 |
On Tue, May 5, 2015 at 6:31 PM, Steven D'Aprano <steve+comp.lang.python@pearwood.info> wrote: > On Monday 04 May 2015 22:13, Chris Angelico wrote: > >> It may be worth documenting this limitation, but it's not something >> that can easily be fixed without removing support for \r newlines - >> although that might be an option, given that non-OSX Macs are >> basically history now. > > Non-OSX Macs are history, but the text files they created are not. True. Like I said, "might be". I could imagine, for instance, a caveat being put on the newlines attribute saying that "when reading a file delimited by \r, this may erroneously imply that \r\n is being used, until a second line has been read". But that's probably more complexity than it's worth. ChrisA
[toc] | [prev] | [next] | [standalone]
| From | Davide Mancusi <arekfu@gmail.com> |
|---|---|
| Date | 2015-05-05 02:23 -0700 |
| Message-ID | <5728620b-6076-4752-8025-93ed460afba0@googlegroups.com> |
| In reply to | #89945 |
I just opened a bug report: http://bugs.python.org/issue24126 We'll see what they say.
[toc] | [prev] | [next] | [standalone]
| From | Chris Angelico <rosuav@gmail.com> |
|---|---|
| Date | 2015-05-05 19:28 +1000 |
| Message-ID | <mailman.115.1430818087.12865.python-list@python.org> |
| In reply to | #89948 |
On Tue, May 5, 2015 at 7:23 PM, Davide Mancusi <arekfu@gmail.com> wrote: > I just opened a bug report: > > http://bugs.python.org/issue24126 > > We'll see what they say. Cool. I suggest posting in the tracker thread the exact Python version(s) you've tested this with, in case it matters. ChrisA
[toc] | [prev] | [next] | [standalone]
| From | Davide Mancusi <arekfu@gmail.com> |
|---|---|
| Date | 2015-05-05 03:58 -0700 |
| Message-ID | <6ba588a6-2a82-4111-9462-17cd98116f97@googlegroups.com> |
| In reply to | #89949 |
> Cool. I suggest posting in the tracker thread the exact Python > version(s) you've tested this with, in case it matters. Done. Good point.
[toc] | [prev] | [next] | [standalone]
| From | Peter Otten <__peter__@web.de> |
|---|---|
| Date | 2015-05-04 17:17 +0200 |
| Message-ID | <mailman.91.1430752649.12865.python-list@python.org> |
| In reply to | #89893 |
Chris Angelico wrote:
> On Mon, May 4, 2015 at 10:01 PM, Peter Otten <__peter__@web.de> wrote:
>> I tried:
>>
>>>>> with open("tmp.txt", "wb") as f: f.write("alpha\r\nbeta\rgamma\n")
>> ...
>>>>> f = open("tmp.txt", "rU")
>>>>> f.newlines
>>>>> f.readline()
>> 'alpha\n'
>>>>> f.newlines
>> # expected: '\r\n'
>>>>> f.readline()
>> 'beta\n'
>>>>> f.newlines
>> '\r\n' # expected: ('\r', '\r\n')
>>>>> f.readline()
>> 'gamma\n'
>>>>> f.newlines
>> ('\r', '\n', '\r\n')
>>
>> I believe this is a bug.
>
> I'm not sure it is, actually; imagine the text is coming in one
> character at a time (eg from a pipe), and it's seen "alpha\r". It
> knows that this is a line, so it emits it; but until the next
> character is read, it can't know whether it's going to be \r or \r\n.
> What should it do? Read another character, which might block? Put "\r"
> into .newlines, which might be wrong? Once it sees the \n, it knows
> that it was \r\n (or rather, it assumes that files do not have lines
> of text terminated by \r followed by blank lines terminated by \n -
> because that would be stupid).
>
> It may be worth documenting this limitation, but it's not something
> that can easily be fixed without removing support for \r newlines -
> although that might be an option, given that non-OSX Macs are
> basically history now.
OK, you convinced me. Then I tried:
>>> with open("tmp.txt", "wb") as f: f.write("0\r\n3\r5\n7")
...
>>> assert len(open("tmp.txt", "rb").read()) == 8
>>> f = open("tmp.txt", "rU")
>>> f.readline()
'0\n'
>>> f.newlines
>>> f.tell()
3
>>> f.newlines
'\r\n'
Hm, so tell() moves the file pointer? Is that sane?
[toc] | [prev] | [next] | [standalone]
| From | Chris Angelico <rosuav@gmail.com> |
|---|---|
| Date | 2015-05-05 01:26 +1000 |
| Message-ID | <mailman.92.1430753207.12865.python-list@python.org> |
| In reply to | #89893 |
On Tue, May 5, 2015 at 1:17 AM, Peter Otten <__peter__@web.de> wrote:
> OK, you convinced me. Then I tried:
>
>>>> with open("tmp.txt", "wb") as f: f.write("0\r\n3\r5\n7")
> ...
>>>> assert len(open("tmp.txt", "rb").read()) == 8
>>>> f = open("tmp.txt", "rU")
>>>> f.readline()
> '0\n'
>>>> f.newlines
>>>> f.tell()
> 3
>>>> f.newlines
> '\r\n'
>
> Hm, so tell() moves the file pointer? Is that sane?
... wow. Okay! That's a bit weird.
It's possible that something's being done with internal buffering
(after all, it's horribly inefficient to *actually* read text one byte
at a time, even if that's what's happening conceptually), and that
tell() causes some checks to be done. But that really is rather
strange. I'd be interested to know what happens if another process
writes to a pipe "0\r", then sleeps while the readline() and tell()
happen, and then writes a "\n" - what will that do to newlines?
By the way, it's as well to clarify, with all these examples, what
Python version you're using. There may be significant differences.
ChrisA
[toc] | [prev] | [next] | [standalone]
| From | Ian Kelly <ian.g.kelly@gmail.com> |
|---|---|
| Date | 2015-05-04 09:33 -0600 |
| Message-ID | <mailman.93.1430753672.12865.python-list@python.org> |
| In reply to | #89893 |
On Mon, May 4, 2015 at 9:17 AM, Peter Otten <__peter__@web.de> wrote:
> OK, you convinced me. Then I tried:
>
>>>> with open("tmp.txt", "wb") as f: f.write("0\r\n3\r5\n7")
> ...
>>>> assert len(open("tmp.txt", "rb").read()) == 8
>>>> f = open("tmp.txt", "rU")
>>>> f.readline()
> '0\n'
>>>> f.newlines
>>>> f.tell()
> 3
>>>> f.newlines
> '\r\n'
>
> Hm, so tell() moves the file pointer? Is that sane?
If I call readline() followed by tell(), I expect the result to be the
position of the start of the next line. Maybe this is considered safe
because tell() on a pipe raises an exception?
[toc] | [prev] | [standalone]
Back to top | Article view | comp.lang.python
csiph-web