Path: csiph.com!x330-a1.tempe.blueboxinc.net!usenet.pasdenom.info!aioe.org!news.stack.nl!newsfeed.xs4all.nl!newsfeed5.news.xs4all.nl!xs4all!post.news.xs4all.nl!not-for-mail
Date: Sat, 23 Apr 2011 22:25:20 +0200
From: =?UTF-8?B?RGFuaWVsIEdlcsW+bw==?= <danger@rulez.sk>
User-Agent: Mozilla/5.0 (Macintosh; U; Intel Mac OS X 10.6; en-US; rv:1.9.2.17pre) Gecko/20110414 Lanikai/3.1.10pre
MIME-Version: 1.0
To: python-list@python.org
Subject: Re: detecting newline character
References: <4DB315D7.1020405@rulez.sk>	<mailman.781.1303585944.9059.python-list@python.org> <1450471.fLXkozyCbt@PointedEars.de>
In-Reply-To: <1450471.fLXkozyCbt@PointedEars.de>
Content-Type: text/plain; charset=UTF-8; format=flowed
Content-Transfer-Encoding: 8bit
Precedence: list
Newsgroups: comp.lang.python
Message-ID: <mailman.784.1303590332.9059.python-list@python.org>
Lines: 51
NNTP-Posting-Host: 82.94.164.166
Xref: x330-a1.tempe.blueboxinc.net comp.lang.python:3924

On 23.4.2011 21:33, Thomas 'PointedEars' Lahn wrote:
> Chris Rebert wrote:
>
>> On Sat, Apr 23, 2011 at 11:09 AM, Daniel Geržo<danger@rulez.sk>  wrote:
>>> I need to detect the newline characters used in the file I am reading.
>>> For this purpose I am using the following code:
>>>
>>> def _read_lines(self):
>>>      with contextlib.closing(codecs.open(self.path, "rU")) as fobj:
>>>      fobj.readlines()
>>>      if isinstance(fobj.newlines, tuple):
>>>          self.newline = fobj.newlines[0]
>>>      else:
>>>          self.newline = fobj.newlines
>>>
>>> This works fine, if I call codecs.open() without encoding argument; I am
>>> testing with an ASCII enghlish text file, and in such case the
>>> fobj.newlines is correctly detected being as '\r\n'. However, when I call
>>> codecs.open() with encoding='ascii' argument, the fobj.newlines is None
>>> and I can't figure out why that is the case. Reading the PEP at
>>> http://www.python.org/dev/peps/pep-0278/ I don't see any reason why would
>>> I end up with newlines being None after I call readlines().
>>>
>>> Anyone has an idea?
>>
>> I would hypothesize that it's an interaction bug between universal
>> newlines and codecs.open().
>>
>> […]
>> I would speculate that the upshot of this is that codecs.open() ends
>> up calling built-in open() with a nonsense `mode` of "rUb" or similar,
>> resulting in strange behavior.
>>
>> If this explanation is correct, then there are 2 bugs:
>> 1. Built-in open() should treat "b" and "U" as mutually exclusive and
>> reject mode strings which involve both.
>> 2. codecs.open() should either reject modes involving "U", or be fixed
>> so that they work as expected.
>
> You might be correct that it is a bug (already fixed in versions newer than
> 2.5), since codecs.open() from my Python 2.6 reads as follows:

Well I am doing this on:
Python 2.7.1 (r271:86832, Mar  7 2011, 14:28:09)
[GCC 4.2.1 (Apple Inc. build 5664)] on darwin

So what do you guys advise me to do?

-- 
S pozdravom / Best regards
   Daniel Gerzo