Groups | Search | Server Info | Keyboard shortcuts | Login | Register [http] [https] [nntp] [nntps]
Groups > comp.lang.python > #3921
| Message-Id | <1450471.fLXkozyCbt@PointedEars.de> |
|---|---|
| From | Thomas 'PointedEars' Lahn <PointedEars@web.de> |
| Organization | PointedEars Software (PES) |
| Date | 2011-04-23 21:33 +0200 |
| Subject | Re: detecting newline character |
| Newsgroups | comp.lang.python |
| References | <4DB315D7.1020405@rulez.sk> <mailman.781.1303585944.9059.python-list@python.org> |
| Followup-To | comp.lang.python |
Followups directed to: comp.lang.python
Chris Rebert wrote:
> On Sat, Apr 23, 2011 at 11:09 AM, Daniel Geržo <danger@rulez.sk> wrote:
>> I need to detect the newline characters used in the file I am reading.
>> For this purpose I am using the following code:
>>
>> def _read_lines(self):
>> with contextlib.closing(codecs.open(self.path, "rU")) as fobj:
>> fobj.readlines()
>> if isinstance(fobj.newlines, tuple):
>> self.newline = fobj.newlines[0]
>> else:
>> self.newline = fobj.newlines
>>
>> This works fine, if I call codecs.open() without encoding argument; I am
>> testing with an ASCII enghlish text file, and in such case the
>> fobj.newlines is correctly detected being as '\r\n'. However, when I call
>> codecs.open() with encoding='ascii' argument, the fobj.newlines is None
>> and I can't figure out why that is the case. Reading the PEP at
>> http://www.python.org/dev/peps/pep-0278/ I don't see any reason why would
>> I end up with newlines being None after I call readlines().
>>
>> Anyone has an idea?
>
> I would hypothesize that it's an interaction bug between universal
> newlines and codecs.open().
>
> […]
> I would speculate that the upshot of this is that codecs.open() ends
> up calling built-in open() with a nonsense `mode` of "rUb" or similar,
> resulting in strange behavior.
>
> If this explanation is correct, then there are 2 bugs:
> 1. Built-in open() should treat "b" and "U" as mutually exclusive and
> reject mode strings which involve both.
> 2. codecs.open() should either reject modes involving "U", or be fixed
> so that they work as expected.
You might be correct that it is a bug (already fixed in versions newer than
2.5), since codecs.open() from my Python 2.6 reads as follows:
def open(filename, mode='rb', encoding=None, errors='strict', buffering=1):
"""
…
"""
if encoding is not None:
if 'U' in mode:
# No automatic conversion of '\n' is done on reading and writing
mode = mode.strip().replace('U', '')
if mode[:1] not in set('rwa'):
mode = 'r' + mode
if 'b' not in mode:
# Force opening of the file in binary mode
mode = mode + 'b'
file = __builtin__.open(filename, mode, buffering)
if encoding is None:
return file
info = lookup(encoding)
srw = StreamReaderWriter(file, info.streamreader, info.streamwriter,
errors)
# Add attributes to simplify introspection
srw.encoding = encoding
return srw
--
PointedEars
Back to comp.lang.python | Previous | Next — Previous in thread | Next in thread | Find similar
Re: detecting newline character Chris Rebert <clp2@rebertia.com> - 2011-04-23 12:12 -0700
Re: detecting newline character Thomas 'PointedEars' Lahn <PointedEars@web.de> - 2011-04-23 21:33 +0200
Re: detecting newline character Daniel Geržo <danger@rulez.sk> - 2011-04-23 22:25 +0200
Re: detecting newline character Thomas 'PointedEars' Lahn <PointedEars@web.de> - 2011-04-24 01:30 +0200
Re: detecting newline character jmfauth <wxjmfauth@gmail.com> - 2011-04-24 00:05 -0700
Re: detecting newline character Daniel Geržo <danger@rulez.sk> - 2011-04-24 10:21 +0200
Re: detecting newline character Thomas 'PointedEars' Lahn <PointedEars@web.de> - 2011-04-24 11:19 +0200
Re: detecting newline character Daniel Geržo <danger@rulez.sk> - 2011-04-24 11:49 +0200
Re: detecting newline character Thomas 'PointedEars' Lahn <PointedEars@web.de> - 2011-04-24 14:50 +0200
csiph-web