Path: csiph.com!x330-a1.tempe.blueboxinc.net!usenet.pasdenom.info!aioe.org!news.stack.nl!newsfeed.xs4all.nl!newsfeed5.news.xs4all.nl!xs4all!post.news.xs4all.nl!not-for-mail Return-Path: X-Original-To: python-list@python.org Delivered-To: python-list@mail.python.org X-Spam-Status: OK 0.000 X-Spam-Evidence: '*H*': 1.00; '*S*': 0.00; 'bug': 0.02; 'else:': 0.03; 'ascii': 0.07; 'behavior.': 0.07; 'fine,': 0.07; 'pep': 0.07; 'python': 0.07; 'argument,': 0.09; 'newline': 0.09; 'url:dev': 0.09; 'url:peps': 0.09; '>>>': 0.12; 'def': 0.13; 'am,': 0.14; 'wrote:': 0.14; '"b"': 0.16; '"u"': 0.16; '[gcc': 0.16; 'both.': 0.16; 'expected.': 0.16; 'explanation': 0.16; 'modes': 0.16; 'newlines': 0.16; 'nonsense': 0.16; 'open()': 0.16; 'rebert': 0.16; 'tuple):': 0.16; 'case.': 0.16; 'versions': 0.18; 'figure': 0.18; 'header:In-Reply-To:1': 0.22; 'file,': 0.22; 'correct,': 0.23; 'similar,': 0.23; 'calling': 0.25; 'detect': 0.25; 'correct': 0.26; 'chris': 0.27; 'fixed': 0.27; 'testing': 0.28; 'daniel': 0.29; 'mode': 0.29; 'sat,': 0.29; 'universal': 0.29; 'idea?': 0.31; 'however,': 0.31; "can't": 0.31; 'anyone': 0.31; "skip:' 10": 0.32; 'to:addr:python-list': 0.32; 'using': 0.34; 'there': 0.35; 'file': 0.35; 'characters': 0.35; 'correctly': 0.35; 'follows:': 0.35; 'header:User-Agent:1': 0.35; '2.6': 0.35; 'do?': 0.35; 'doing': 0.36; 'none': 0.36; 'case': 0.37; 'should': 0.37; 'url:python': 0.37; 'either': 0.37; 'apr': 0.38; 'involving': 0.38; 'resulting': 0.38; 'strings': 0.38; 'code:': 0.38; 'url:org': 0.38; 'used': 0.38; 'end': 0.39; 'to:addr:python.org': 0.39; 'works': 0.40; 'would': 0.40; "it's": 0.40; 'header:Received:5': 0.40; 'might': 0.40; 'best': 0.60; '2011': 0.62; 'mar': 0.64; 'reads': 0.65; 'strange': 0.65; 'received:188': 0.75; 'exclusive': 0.77; '2.5),': 0.84; '2.7.1': 0.84; 'argument;': 0.84; 'received:188.40': 0.84 X-Virus-Scanned: amavisd-new at rulez.sk Date: Sat, 23 Apr 2011 22:25:20 +0200 From: =?UTF-8?B?RGFuaWVsIEdlcsW+bw==?= User-Agent: Mozilla/5.0 (Macintosh; U; Intel Mac OS X 10.6; en-US; rv:1.9.2.17pre) Gecko/20110414 Lanikai/3.1.10pre MIME-Version: 1.0 To: python-list@python.org Subject: Re: detecting newline character References: <4DB315D7.1020405@rulez.sk> <1450471.fLXkozyCbt@PointedEars.de> In-Reply-To: <1450471.fLXkozyCbt@PointedEars.de> Content-Type: text/plain; charset=UTF-8; format=flowed Content-Transfer-Encoding: 8bit X-BeenThere: python-list@python.org X-Mailman-Version: 2.1.12 Precedence: list List-Id: General discussion list for the Python programming language List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , Newsgroups: comp.lang.python Message-ID: Lines: 51 NNTP-Posting-Host: 82.94.164.166 X-Trace: 1303590332 news.xs4all.nl 81478 [::ffff:82.94.164.166]:36519 X-Complaints-To: abuse@xs4all.nl Xref: x330-a1.tempe.blueboxinc.net comp.lang.python:3924 On 23.4.2011 21:33, Thomas 'PointedEars' Lahn wrote: > Chris Rebert wrote: > >> On Sat, Apr 23, 2011 at 11:09 AM, Daniel Geržo wrote: >>> I need to detect the newline characters used in the file I am reading. >>> For this purpose I am using the following code: >>> >>> def _read_lines(self): >>> with contextlib.closing(codecs.open(self.path, "rU")) as fobj: >>> fobj.readlines() >>> if isinstance(fobj.newlines, tuple): >>> self.newline = fobj.newlines[0] >>> else: >>> self.newline = fobj.newlines >>> >>> This works fine, if I call codecs.open() without encoding argument; I am >>> testing with an ASCII enghlish text file, and in such case the >>> fobj.newlines is correctly detected being as '\r\n'. However, when I call >>> codecs.open() with encoding='ascii' argument, the fobj.newlines is None >>> and I can't figure out why that is the case. Reading the PEP at >>> http://www.python.org/dev/peps/pep-0278/ I don't see any reason why would >>> I end up with newlines being None after I call readlines(). >>> >>> Anyone has an idea? >> >> I would hypothesize that it's an interaction bug between universal >> newlines and codecs.open(). >> >> […] >> I would speculate that the upshot of this is that codecs.open() ends >> up calling built-in open() with a nonsense `mode` of "rUb" or similar, >> resulting in strange behavior. >> >> If this explanation is correct, then there are 2 bugs: >> 1. Built-in open() should treat "b" and "U" as mutually exclusive and >> reject mode strings which involve both. >> 2. codecs.open() should either reject modes involving "U", or be fixed >> so that they work as expected. > > You might be correct that it is a bug (already fixed in versions newer than > 2.5), since codecs.open() from my Python 2.6 reads as follows: Well I am doing this on: Python 2.7.1 (r271:86832, Mar 7 2011, 14:28:09) [GCC 4.2.1 (Apple Inc. build 5664)] on darwin So what do you guys advise me to do? -- S pozdravom / Best regards Daniel Gerzo