Path: csiph.com!newsfeed.hal-mli.net!feeder3.hal-mli.net!weretis.net!feeder4.news.weretis.net!newsreader4.netcologne.de!news.netcologne.de!xlned.com!feeder7.xlned.com!newsfeed.xs4all.nl!newsfeed1a.news.xs4all.nl!xs4all!newsgate.cistron.nl!newsgate.news.xs4all.nl!post.news.xs4all.nl!not-for-mail
MIME-Version: 1.0
In-Reply-To: <7xioof9li6.fsf@ruckus.brouhaha.com>
References: <mailman.10656.1401842403.18130.python-list@python.org> <lmq4mu$8nv$1@news.albasani.net> <7xr433z0g3.fsf@ruckus.brouhaha.com> <lmqdn8$scl$1@news.albasani.net> <mailman.10759.1401998071.18130.python-list@python.org> <7xioof9li6.fsf@ruckus.brouhaha.com>
From: Ian Kelly <ian.g.kelly@gmail.com>
Date: Thu, 5 Jun 2014 14:18:49 -0600
Subject: Re: Unicode and Python - how often do you index strings?
To: Python <python-list@python.org>
Content-Type: text/plain; charset=UTF-8
Precedence: list
Newsgroups: comp.lang.python
Message-ID: <mailman.10763.1401999570.18130.python-list@python.org>
Lines: 18
NNTP-Posting-Host: 2001:888:2000:d::a6
Xref: csiph.com comp.lang.python:72756

On Thu, Jun 5, 2014 at 1:58 PM, Paul Rubin <no.email@nospam.invalid> wrote:
> Ryan Hiebert <ryan@ryanhiebert.com> writes:
>> How so? I was using line=line[:-1] for removing the trailing newline, and
>> just replaced it with rstrip('\n'). What are you doing differently?
>
> rstrip removes all the newlines off the end, whether there are zero or
> multiple.  In perl the difference is chomp vs chop.  line=line[:-1]
> removes one character, that might or might not be a newline.

Given the description that the input string is "a textfile line", if
it has multiple newlines then it's invalid.

Personally I tend toward rstrip('\r\n') so that I don't have to worry
about files with alternative line terminators.

If you want to be really picky about removing exactly one line
terminator, then this captures all the relatively modern variations:
re.sub('\r?\n$|\n?\r$', line, '', count=1)