Groups | Search | Server Info | Keyboard shortcuts | Login | Register [http] [https] [nntp] [nntps]
Groups > comp.lang.python > #3924
| Date | 2011-04-23 22:25 +0200 |
|---|---|
| From | Daniel Geržo <danger@rulez.sk> |
| Subject | Re: detecting newline character |
| References | <4DB315D7.1020405@rulez.sk> <mailman.781.1303585944.9059.python-list@python.org> <1450471.fLXkozyCbt@PointedEars.de> |
| Newsgroups | comp.lang.python |
| Message-ID | <mailman.784.1303590332.9059.python-list@python.org> (permalink) |
On 23.4.2011 21:33, Thomas 'PointedEars' Lahn wrote: > Chris Rebert wrote: > >> On Sat, Apr 23, 2011 at 11:09 AM, Daniel Geržo<danger@rulez.sk> wrote: >>> I need to detect the newline characters used in the file I am reading. >>> For this purpose I am using the following code: >>> >>> def _read_lines(self): >>> with contextlib.closing(codecs.open(self.path, "rU")) as fobj: >>> fobj.readlines() >>> if isinstance(fobj.newlines, tuple): >>> self.newline = fobj.newlines[0] >>> else: >>> self.newline = fobj.newlines >>> >>> This works fine, if I call codecs.open() without encoding argument; I am >>> testing with an ASCII enghlish text file, and in such case the >>> fobj.newlines is correctly detected being as '\r\n'. However, when I call >>> codecs.open() with encoding='ascii' argument, the fobj.newlines is None >>> and I can't figure out why that is the case. Reading the PEP at >>> http://www.python.org/dev/peps/pep-0278/ I don't see any reason why would >>> I end up with newlines being None after I call readlines(). >>> >>> Anyone has an idea? >> >> I would hypothesize that it's an interaction bug between universal >> newlines and codecs.open(). >> >> […] >> I would speculate that the upshot of this is that codecs.open() ends >> up calling built-in open() with a nonsense `mode` of "rUb" or similar, >> resulting in strange behavior. >> >> If this explanation is correct, then there are 2 bugs: >> 1. Built-in open() should treat "b" and "U" as mutually exclusive and >> reject mode strings which involve both. >> 2. codecs.open() should either reject modes involving "U", or be fixed >> so that they work as expected. > > You might be correct that it is a bug (already fixed in versions newer than > 2.5), since codecs.open() from my Python 2.6 reads as follows: Well I am doing this on: Python 2.7.1 (r271:86832, Mar 7 2011, 14:28:09) [GCC 4.2.1 (Apple Inc. build 5664)] on darwin So what do you guys advise me to do? -- S pozdravom / Best regards Daniel Gerzo
Back to comp.lang.python | Previous | Next — Previous in thread | Next in thread | Find similar
Re: detecting newline character Chris Rebert <clp2@rebertia.com> - 2011-04-23 12:12 -0700
Re: detecting newline character Thomas 'PointedEars' Lahn <PointedEars@web.de> - 2011-04-23 21:33 +0200
Re: detecting newline character Daniel Geržo <danger@rulez.sk> - 2011-04-23 22:25 +0200
Re: detecting newline character Thomas 'PointedEars' Lahn <PointedEars@web.de> - 2011-04-24 01:30 +0200
Re: detecting newline character jmfauth <wxjmfauth@gmail.com> - 2011-04-24 00:05 -0700
Re: detecting newline character Daniel Geržo <danger@rulez.sk> - 2011-04-24 10:21 +0200
Re: detecting newline character Thomas 'PointedEars' Lahn <PointedEars@web.de> - 2011-04-24 11:19 +0200
Re: detecting newline character Daniel Geržo <danger@rulez.sk> - 2011-04-24 11:49 +0200
Re: detecting newline character Thomas 'PointedEars' Lahn <PointedEars@web.de> - 2011-04-24 14:50 +0200
csiph-web