Path: csiph.com!v102.xanadu-bbs.net!xanadu-bbs.net!feeder.erje.net!eu.feeder.erje.net!feeds.phibee-telecom.net!newsfeed.xs4all.nl!newsfeed4.news.xs4all.nl!xs4all!newsgate.cistron.nl!newsgate.news.xs4all.nl!post.news.xs4all.nl!not-for-mail Return-Path: X-Original-To: python-list@python.org Delivered-To: python-list@mail.python.org X-Spam-Status: OK 0.005 X-Spam-Evidence: '*H*': 0.99; '*S*': 0.00; 'subject:Python': 0.06; "'',": 0.07; 'removes': 0.07; 'string': 0.09; 'character,': 0.09; 'so?': 0.09; 'trailing': 0.09; 'python': 0.11; 'invalid.': 0.16; 'line)': 0.16; 'newline,': 0.16; 'newlines': 0.16; 'perhaps:': 0.16; 'subject:Unicode': 0.16; 'textfile': 0.16; 'index': 0.16; 'thursday,': 0.16; 'wrote:': 0.18; 'thu,': 0.19; '>>>': 0.22; 'input': 0.22; 'tend': 0.24; 'unicode': 0.24; 'paul': 0.24; 'header:In-Reply-To:1': 0.27; 'to:2**1': 0.27; '"",': 0.31; 'end,': 0.31; 'perl': 0.31; 'writes:': 0.31; '-----': 0.33; 'header:Received:9': 0.33; 'there': 0.35; 'really': 0.36; 'ryan': 0.36; 'doing': 0.36; 'subject:?': 0.36; 'to:addr:python-list': 0.38; 'files': 0.38; 'pm,': 0.38; 'subject:': 0.39; 'to:addr:python.org': 0.39; 'how': 0.40; 'ian': 0.60; 'removing': 0.60; 'worry': 0.60; 're:': 0.63; 'relatively': 0.65; 'to:addr:gmail.com': 0.65; 'header:Reply-To:1': 0.67; 'line,': 0.68; 'captures': 0.84; 'subject:you': 0.87 X-Yahoo-Newman-Property: ymail-4 X-Yahoo-Newman-Id: 828443.52489.bm@omp1095.mail.gq1.yahoo.com DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=yahoo.com; s=s1024; t=1402000445; bh=RrNk6SG+4+U1Qs82GygyZjfusOFAjpoPWgoHPkBQhYU=; h=References:Message-ID:Date:From:Reply-To:Subject:To:In-Reply-To:MIME-Version:Content-Type:Content-Transfer-Encoding; b=z6y9UIOiPUbDdCkIMaHULj/m5hAPM01ApCgUSTyX0R2Z2MZ0fLAgUdW+3JVVHjZOePu5zLIF2G4Jy7IJhanrRchOt0/3f9M5Qyjk9OEzzUTCdZIh0UuzW6mgiZx7o7yCeFMiuEsukfYNbcvcFoPKP1DtyhNkHIc1ykK+eR90csE= X-YMail-OSG: kVvmAHUVM1mKM_v1exM6GPg2BY.NsuIQ5Xfor6TmEpoyb48 7IuD3KqT9d5wP4sZRCrTZIym98bV0f7iBVxvSxUeTzn0Pi9oYNX5NFd5RoMC plBWkMpss3XMXLkaeOsc7gFyhzci8m6BCR5b3riJJNUjXdiw5rS0pABZFVGm Hb165D9q14kHqmrkEkb72fIqcxYUIvQC6ORVvvIRwxr0LIVoi6ZLKecvBVm1 wBcwazUrNElJ902xTuaqbtzUH0uV5YlEsedm68JLTZKkKRs3UYXa3HM6KSk8 5YvP5S00Vrz8ykf3YngeKGvT3dV1uaID3ceaI5DIJLkEZ5tW9l_EAWGkGGOU eXs47fsH2W77lsvBNvMAGUr9g0Ok7lGovuumnJF68rOiI0i7ZubPG9r2ZpRT 1HC2avEl0wIdf3zwjXajDYv9KfsLy8vcJPEReKh50POCgyi8K.iEoaWd64_n c97E7seHYZuc01amSUCwJ1vsvQfqWn9DM3SqBoV7n7qem6ejl16hi9jEoxSr d4kBMtTys8sjE0uIbNXmRq5tiSyZ2PecprONsvTg4qf0WsDCCNjMGa7zo.g- - X-Rocket-MIMEInfo: 002.001, CgoKCgoKLS0tLS0gT3JpZ2luYWwgTWVzc2FnZSAtLS0tLQo.IEZyb206IElhbiBLZWxseSA8aWFuLmcua2VsbHlAZ21haWwuY29tPgo.IFRvOiBQeXRob24gPHB5dGhvbi1saXN0QHB5dGhvbi5vcmc.Cj4gQ2M6IAo.IFNlbnQ6IFRodXJzZGF5LCBKdW5lIDUsIDIwMTQgMTA6MTggUE0KPiBTdWJqZWN0OiBSZTogVW5pY29kZSBhbmQgUHl0aG9uIC0gaG93IG9mdGVuIGRvIHlvdSBpbmRleCBzdHJpbmdzPwo.IAo.IE9uIFRodSwgSnVuIDUsIDIwMTQgYXQgMTo1OCBQTSwgUGF1bCBSdWJpbiA8bm8uZW1haWxAbm8BMAEBAQE- X-Mailer: YahooMailWebService/0.8.190.668 References: <7xr433z0g3.fsf@ruckus.brouhaha.com> <7xioof9li6.fsf@ruckus.brouhaha.com> Date: Thu, 5 Jun 2014 13:34:05 -0700 (PDT) From: Albert-Jan Roskam Subject: Re: Unicode and Python - how often do you index strings? To: Ian Kelly , Python In-Reply-To: MIME-Version: 1.0 Content-Type: text/plain; charset=iso-8859-1 Content-Transfer-Encoding: quoted-printable X-BeenThere: python-list@python.org X-Mailman-Version: 2.1.15 Precedence: list Reply-To: Albert-Jan Roskam List-Id: General discussion list for the Python programming language List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , Newsgroups: comp.lang.python Message-ID: Lines: 18 NNTP-Posting-Host: 2001:888:2000:d::a6 X-Trace: 1402000635 news.xs4all.nl 2938 [2001:888:2000:d::a6]:42260 X-Complaints-To: abuse@xs4all.nl Xref: csiph.com comp.lang.python:72762 =0A=0A=0A=0A=0A=0A----- Original Message -----=0A> From: Ian Kelly =0A> To: Python =0A> Cc: =0A> Sent: = Thursday, June 5, 2014 10:18 PM=0A> Subject: Re: Unicode and Python - how o= ften do you index strings?=0A> =0A> On Thu, Jun 5, 2014 at 1:58 PM, Paul Ru= bin =0A> wrote:=0A>> Ryan Hiebert writes:=0A>>> How so? I was using line=3Dline[:-1] for removing t= he trailing newline, =0A> and=0A>>> just replaced it with rstrip('\n'). Wh= at are you doing =0A> differently?=0A>> =0A>> rstrip removes all the newli= nes off the end, whether there are zero or=0A>> multiple.=A0 In perl the d= ifference is chomp vs chop.=A0 line=3Dline[:-1]=0A>> removes one character= , that might or might not be a newline.=0A> =0A> Given the description that= the input string is "a textfile line", if=0A> it has multiple newlines the= n it's invalid.=0A> =0A> Personally I tend toward rstrip('\r\n') so that I = don't have =0A> to worry=0A> about files with alternative line terminators.= =0A=0AI tend to use: s.rstrip(os.linesep)=0A=0A> If you want to be really p= icky about removing exactly one line=0A> terminator, then this captures all= the relatively modern variations:=0A> re.sub('\r?\n$|\n?\r$', line, '', co= unt=3D1)=0A=0Aor perhaps: re.sub("[^ \S]+$", "", line)