Path: csiph.com!v102.xanadu-bbs.net!xanadu-bbs.net!feeder.erje.net!eu.feeder.erje.net!newsfeed.fsmpi.rwth-aachen.de!newsfeed.kamp.net!newsfeed.kamp.net!87.79.20.101.MISMATCH!newsreader4.netcologne.de!news.netcologne.de!xlned.com!feeder7.xlned.com!newsfeed.xs4all.nl!newsfeed1a.news.xs4all.nl!xs4all!post.news.xs4all.nl!not-for-mail
MIME-Version: 1.0
In-Reply-To: <1402000445.62825.YahooMailNeo@web163806.mail.gq1.yahoo.com>
References: <mailman.10656.1401842403.18130.python-list@python.org> <lmq4mu$8nv$1@news.albasani.net> <7xr433z0g3.fsf@ruckus.brouhaha.com> <lmqdn8$scl$1@news.albasani.net> <mailman.10759.1401998071.18130.python-list@python.org> <7xioof9li6.fsf@ruckus.brouhaha.com> <CALwzidm1mg6HunTrBmpcoRgePd=5Aa4=QQ-yEJVB_CAH5ZY+Fw@mail.gmail.com> <1402000445.62825.YahooMailNeo@web163806.mail.gq1.yahoo.com>
From: Ian Kelly <ian.g.kelly@gmail.com>
Date: Thu, 5 Jun 2014 18:05:34 -0600
Subject: Re: Unicode and Python - how often do you index strings?
To: Python <python-list@python.org>
Content-Type: text/plain; charset=UTF-8
Precedence: list
Newsgroups: comp.lang.python
Message-ID: <mailman.10791.1402013180.18130.python-list@python.org>
Lines: 13
NNTP-Posting-Host: 2001:888:2000:d::a6
Xref: csiph.com comp.lang.python:72796

On Thu, Jun 5, 2014 at 2:34 PM, Albert-Jan Roskam <fomcl@yahoo.com> wrote:
>> If you want to be really picky about removing exactly one line
>> terminator, then this captures all the relatively modern variations:
>> re.sub('\r?\n$|\n?\r$', line, '', count=1)
>
> or perhaps: re.sub("[^ \S]+$", "", line)

That will remove more than one terminator, plus tabs. Points for
including \f and \v though.

I suppose if we want to be absolutely correct, we should follow the
Unicode standard:
re.sub(r'\r?\n$|[\r\v\f\x85\u2028\u2029]$', line, '', count=1)