Groups | Search | Server Info | Keyboard shortcuts | Login | Register [http] [https] [nntp] [nntps]
Groups > comp.lang.python > #3952
| Message-Id | <2715963.tiIbC2pHGD@PointedEars.de> |
|---|---|
| From | Thomas 'PointedEars' Lahn <PointedEars@web.de> |
| Organization | PointedEars Software (PES) |
| Date | 2011-04-24 15:35 +0200 |
| Subject | [SOLVED] detecting newline character |
| Newsgroups | comp.lang.python |
| References | <mailman.779.1303582172.9059.python-list@python.org> <2887713.8pfo1IzOAr@PointedEars.de> <mailman.789.1303631049.9059.python-list@python.org> |
| Followup-To | comp.lang.python |
Followups directed to: comp.lang.python
Daniel Geržo wrote: > On 23.4.2011 21:18, Thomas 'PointedEars' Lahn wrote: >> Daniel Geržo wrote: >>> [f = codecs.open(…, mode='rU', encoding='ascii') and f.newlines] >> >> […] >> The only reason I can think of for this not working ATM comes from the >> documentation, where it says that 'U' requires Python to be built with >> universal newline support; that it is *usually* so, but might not be so >> in your case (but then the question remains: How could it be not None >> without `encoding' argument?) > > Yes, this is what does not make sense. If I didn't have the universal > newline support enabled, I wouldn't have the newlines attribute at all. True. But good to know to have a test with hasattr(fileobj, 'newlines')! >> <http://docs.python.org/library/codecs.html?highlight=codecs.open#codecs.open> >> <http://docs.python.org/library/functions.html#open> >> >> WFM with and without `encoding' argument in python-2.7.1-8 (CPython), >> Debian GNU/Linux 6.0.1, Linux 2.6.35.5-pe (custom) SMP i686. >> >> Which Python implementation and version are you using on which system? > > This is a standard python installation from MacPorts. System is OS X > 10.6.7. I have now tried both python 2.7.1 and python 2.6.6 from > MacPorts and also 2.6.6 on FreeBSD. All fail for me when I set encoding. I think this discussion, in particular <2838616.PzL39ZcT7Z@PointedEars.de>, <news:5684911.Hjke4DdEvY@PointedEars.de> and finally <http://bugs.python.org/issue691291>, is providing a good explanation now. To summarize: 1. From Python 2.6.5-rc1 and Python 2.7-alpha4 forward, codecs.open() does not support universal newlines and will ignore any 'U' in its `mode' argument when the `encoding' argument is different from None. 2. As a result, file.newlines will be None if if exists. 3. This is by design, fixing a bug back from Python 2.3a. 4. Use another approach. :) >> On which system has the "ASCII" file been created and how? Note that >> both uploading the file with FTP in ASCII mode and downloading over HTTP >> might have removed the problem Python has with it. > > Unfortunately I am not 100% sure where I created the file, it was quite > some time ago, but it was either WinXP, or OS X Leopard. The source code > can be found at https://bitbucket.org/danger/pysublib/src - I noticed > the subtitle file tests (e.g. test/test_subripfile.py) are failing for > me and I have identified the problem with newlines being None after > calling read(). Well, you have two alternatives now (codecs.open() with list(set(re.search(newlines, readlines())) and io.open()), and you appear to have decided for `io', so there should not be a problem anymore. I wish you good luck with your project, it looks really interesting (I remember having written a DVD subtitle script based on gocr in bash a few years ago). -- \\//, PointedEars
Back to comp.lang.python | Previous | Next — Previous in thread | Find similar
detecting newline character Daniel Geržo <danger@rulez.sk> - 2011-04-23 20:09 +0200
Re: detecting newline character Thomas 'PointedEars' Lahn <PointedEars@web.de> - 2011-04-23 21:18 +0200
Re: detecting newline character Daniel Geržo <danger@rulez.sk> - 2011-04-24 09:43 +0200
[SOLVED] detecting newline character Thomas 'PointedEars' Lahn <PointedEars@web.de> - 2011-04-24 15:35 +0200
csiph-web