Groups | Search | Server Info | Keyboard shortcuts | Login | Register [http] [https] [nntp] [nntps]
Groups > comp.lang.python > #3942
| Date | 2011-04-24 09:43 +0200 |
|---|---|
| From | Daniel Geržo <danger@rulez.sk> |
| Subject | Re: detecting newline character |
| References | <mailman.779.1303582172.9059.python-list@python.org> <2887713.8pfo1IzOAr@PointedEars.de> |
| Newsgroups | comp.lang.python |
| Message-ID | <mailman.789.1303631049.9059.python-list@python.org> (permalink) |
On 23.4.2011 21:18, Thomas 'PointedEars' Lahn wrote: > Daniel Geržo wrote: > >> I need to detect the newline characters used in the file I am reading. >> For this purpose I am using the following code: >> >> def _read_lines(self): >> with contextlib.closing(codecs.open(self.path, "rU")) as fobj: >> fobj.readlines() >> if isinstance(fobj.newlines, tuple): >> self.newline = fobj.newlines[0] >> else: >> self.newline = fobj.newlines >> >> This works fine, if I call codecs.open() without encoding argument; I am >> testing with an ASCII enghlish text file, and in such case the >> fobj.newlines is correctly detected being as '\r\n'. However, when I >> call codecs.open() with encoding='ascii' argument, the fobj.newlines is >> None and I can't figure out why that is the case. Reading the PEP at >> http://www.python.org/dev/peps/pep-0278/ I don't see any reason why >> would I end up with newlines being None after I call readlines(). >> >> Anyone has an idea? You can fetch the file I am testing with from >> http://danger.rulez.sk/subrip_ascii.srt > > I see nothing suspicious in your .srt *after* downloading it. file -i > confirms that it only contains US-ASCII characters (but see below). That is indeed the case in my environment too. danger@[danger-mbp ~/devel/pysublib/pysublib/test/files]> file -i subrip_ascii.srt subrip_ascii.srt: regular file danger@[danger-mbp ~/devel/pysublib/pysublib/test/files]> file subrip_ascii.srt subrip_ascii.srt: ASCII English text, with CRLF line terminators > The only reason I can think of for this not working ATM comes from the > documentation, where it says that 'U' requires Python to be built with > universal newline support; that it is *usually* so, but might not be so in > your case (but then the question remains: How could it be not None without > `encoding' argument?) Yes, this is what does not make sense. If I didn't have the universal newline support enabled, I wouldn't have the newlines attribute at all. > <http://docs.python.org/library/codecs.html?highlight=codecs.open#codecs.open> > <http://docs.python.org/library/functions.html#open> > > WFM with and without `encoding' argument in python-2.7.1-8 (CPython), Debian > GNU/Linux 6.0.1, Linux 2.6.35.5-pe (custom) SMP i686. > > Which Python implementation and version are you using on which system? This is a standard python installation from MacPorts. System is OS X 10.6.7. I have now tried both python 2.7.1 and python 2.6.6 from MacPorts and also 2.6.6 on FreeBSD. All fail for me when I set encoding. > On which system has the "ASCII" file been created and how? Note that both > uploading the file with FTP in ASCII mode and downloading over HTTP might > have removed the problem Python has with it. Unfortunately I am not 100% sure where I created the file, it was quite some time ago, but it was either WinXP, or OS X Leopard. The source code can be found at https://bitbucket.org/danger/pysublib/src - I noticed the subtitle file tests (e.g. test/test_subripfile.py) are failing for me and I have identified the problem with newlines being None after calling read(). -- S pozdravom / Best regards Daniel Gerzo
Back to comp.lang.python | Previous | Next — Previous in thread | Next in thread | Find similar
detecting newline character Daniel Geržo <danger@rulez.sk> - 2011-04-23 20:09 +0200
Re: detecting newline character Thomas 'PointedEars' Lahn <PointedEars@web.de> - 2011-04-23 21:18 +0200
Re: detecting newline character Daniel Geržo <danger@rulez.sk> - 2011-04-24 09:43 +0200
[SOLVED] detecting newline character Thomas 'PointedEars' Lahn <PointedEars@web.de> - 2011-04-24 15:35 +0200
csiph-web