Groups | Search | Server Info | Keyboard shortcuts | Login | Register [http] [https] [nntp] [nntps]
Groups > comp.lang.python > #3920
| Path | csiph.com!x330-a1.tempe.blueboxinc.net!usenet.pasdenom.info!aioe.org!feeder.news-service.com!npeer.de.kpn-eurorings.net!npeer-ng0.de.kpn-eurorings.net!newsfeed.arcor.de!newsspool2.arcor-online.net!news.arcor.de.POSTED!not-for-mail |
|---|---|
| Content-Type | text/plain; charset="UTF-8" |
| Message-Id | <2887713.8pfo1IzOAr@PointedEars.de> |
| From | Thomas 'PointedEars' Lahn <PointedEars@web.de> |
| Reply-To | Thomas 'PointedEars' Lahn <usenet@PointedEars.de> |
| Organization | PointedEars Software (PES) |
| Date | Sat, 23 Apr 2011 21:18:53 +0200 |
| User-Agent | KNode/4.4.7 |
| Content-Transfer-Encoding | 8Bit |
| Subject | Re: detecting newline character |
| Newsgroups | comp.lang.python |
| References | <mailman.779.1303582172.9059.python-list@python.org> |
| Followup-To | comp.lang.python |
| MIME-Version | 1.0 |
| Lines | 47 |
| NNTP-Posting-Date | 23 Apr 2011 21:18:53 CEST |
| NNTP-Posting-Host | 0bc404b0.newsspool4.arcor-online.net |
| X-Trace | DXC=;M77fWX`F]J78PK[oJ2ng@4IUK<Cl32<A4Fo<]lROoRA8kF<OcfhCOKoY<<[OWJ>^KDZm8W4\YJNLT<8F<]0D<`InOJ3[SYnM<EMUZiLaS@B2H |
| X-Complaints-To | usenet-abuse@arcor.de |
| Xref | x330-a1.tempe.blueboxinc.net comp.lang.python:3920 |
Followups directed to: comp.lang.python
Show key headers only | View raw
Daniel Geržo wrote: > I need to detect the newline characters used in the file I am reading. > For this purpose I am using the following code: > > def _read_lines(self): > with contextlib.closing(codecs.open(self.path, "rU")) as fobj: > fobj.readlines() > if isinstance(fobj.newlines, tuple): > self.newline = fobj.newlines[0] > else: > self.newline = fobj.newlines > > This works fine, if I call codecs.open() without encoding argument; I am > testing with an ASCII enghlish text file, and in such case the > fobj.newlines is correctly detected being as '\r\n'. However, when I > call codecs.open() with encoding='ascii' argument, the fobj.newlines is > None and I can't figure out why that is the case. Reading the PEP at > http://www.python.org/dev/peps/pep-0278/ I don't see any reason why > would I end up with newlines being None after I call readlines(). > > Anyone has an idea? You can fetch the file I am testing with from > http://danger.rulez.sk/subrip_ascii.srt I see nothing suspicious in your .srt *after* downloading it. file -i confirms that it only contains US-ASCII characters (but see below). The only reason I can think of for this not working ATM comes from the documentation, where it says that 'U' requires Python to be built with universal newline support; that it is *usually* so, but might not be so in your case (but then the question remains: How could it be not None without `encoding' argument?) <http://docs.python.org/library/codecs.html?highlight=codecs.open#codecs.open> <http://docs.python.org/library/functions.html#open> WFM with and without `encoding' argument in python-2.7.1-8 (CPython), Debian GNU/Linux 6.0.1, Linux 2.6.35.5-pe (custom) SMP i686. Which Python implementation and version are you using on which system? On which system has the "ASCII" file been created and how? Note that both uploading the file with FTP in ASCII mode and downloading over HTTP might have removed the problem Python has with it. -- PointedEars
Back to comp.lang.python | Previous | Next — Previous in thread | Next in thread | Find similar
detecting newline character Daniel Geržo <danger@rulez.sk> - 2011-04-23 20:09 +0200
Re: detecting newline character Thomas 'PointedEars' Lahn <PointedEars@web.de> - 2011-04-23 21:18 +0200
Re: detecting newline character Daniel Geržo <danger@rulez.sk> - 2011-04-24 09:43 +0200
[SOLVED] detecting newline character Thomas 'PointedEars' Lahn <PointedEars@web.de> - 2011-04-24 15:35 +0200
csiph-web