Groups > comp.lang.python > #72564 > unrolled thread

Unicode and Python - how often do you index strings?

Started by	Chris Angelico <rosuav@gmail.com>
First post	2014-06-04 10:39 +1000
Last post	2014-06-05 15:05 -0500
Articles	20 on this page of 40 — 21 participants

Back to article view | Back to comp.lang.python

  Unicode and Python - how often do you index strings? Chris Angelico <rosuav@gmail.com> - 2014-06-04 10:39 +1000
    Re: Unicode and Python - how often do you index strings? Roy Smith <roy@panix.com> - 2014-06-03 21:18 -0400
      Re: Unicode and Python - how often do you index strings? Chris Angelico <rosuav@gmail.com> - 2014-06-04 12:13 +1000
        Re: Unicode and Python - how often do you index strings? Gregory Ewing <greg.ewing@canterbury.ac.nz> - 2014-06-04 18:48 +1200
          Re: Unicode and Python - how often do you index strings? alister <alister.nospam.ware@ntlworld.com> - 2014-06-04 10:57 +0000
      Re: Unicode and Python - how often do you index strings? alister <alister.nospam.ware@ntlworld.com> - 2014-06-04 10:50 +0000
        Re: Unicode and Python - how often do you index strings? Rustom Mody <rustompmody@gmail.com> - 2014-06-04 05:52 -0700
          Re: Unicode and Python - how often do you index strings? alister <alister.nospam.ware@ntlworld.com> - 2014-06-04 13:36 +0000
    Re: Unicode and Python - how often do you index strings? wxjmfauth@gmail.com - 2014-06-03 23:50 -0700
      Re: Unicode and Python - how often do you index strings? Michael Torrie <torriem@gmail.com> - 2014-06-04 08:50 -0600
        Re: Unicode and Python - how often do you index strings? wxjmfauth@gmail.com - 2014-06-05 00:06 -0700
          Re: Unicode and Python - how often do you index strings? Marko Rauhamaa <marko@pacujo.net> - 2014-06-05 10:20 +0300
          Re: Unicode and Python - how often do you index strings? alister <alister.nospam.ware@ntlworld.com> - 2014-06-05 15:39 +0000
            Re: Unicode and Python - how often do you index strings? Mark H Harris <harrismh777@gmail.com> - 2014-06-05 10:57 -0500
              Re: Unicode and Python - how often do you index strings? Mark Lawrence <breamoreboy@yahoo.co.uk> - 2014-06-05 18:15 +0100
                Re: Unicode and Python - how often do you index strings? alister <alister.nospam.ware@ntlworld.com> - 2014-06-05 17:33 +0000
      Re: Unicode and Python - how often do you index strings? Joshua Landau <joshua@landau.ws> - 2014-06-05 18:18 +0100
    Re: Unicode and Python Rustom Mody <rustompmody@gmail.com> - 2014-06-04 21:25 -0700
      Re: Unicode and Python wxjmfauth@gmail.com - 2014-06-05 00:23 -0700
    Re: Unicode and Python - how often do you index strings? Johannes Bauer <dfnsonfsduifb@gmx.de> - 2014-06-05 18:09 +0200
      Re: Unicode and Python - how often do you index strings? Paul Rubin <no.email@nospam.invalid> - 2014-06-05 11:16 -0700
        Re: Unicode and Python - how often do you index strings? Johannes Bauer <dfnsonfsduifb@gmx.de> - 2014-06-05 20:42 +0200
          Re: Unicode and Python - how often do you index strings? Ryan Hiebert <ryan@ryanhiebert.com> - 2014-06-05 13:52 -0500
            Re: Unicode and Python - how often do you index strings? Paul Rubin <no.email@nospam.invalid> - 2014-06-05 12:58 -0700
              Re: Unicode and Python - how often do you index strings? Ian Kelly <ian.g.kelly@gmail.com> - 2014-06-05 14:18 -0600
                Re: Unicode and Python - how often do you index strings? Johannes Bauer <dfnsonfsduifb@gmx.de> - 2014-06-06 10:47 +0200
                  Re: Unicode and Python - how often do you index strings? Tim Chase <python.list@tim.thechases.com> - 2014-06-06 05:37 -0500
                  Re: Unicode and Python - how often do you index strings? Steven D'Aprano <steve+comp.lang.python@pearwood.info> - 2014-06-06 11:52 +0000
              Re: Unicode and Python - how often do you index strings? Albert-Jan Roskam <fomcl@yahoo.com> - 2014-06-05 13:34 -0700
                Re: Unicode and Python - how often do you index strings? Roy Smith <roy@panix.com> - 2014-06-05 17:00 -0400
                  Re: Unicode and Python - how often do you index strings? Rustom Mody <rustompmody@gmail.com> - 2014-06-05 15:24 -0700
                    Re: Unicode and Python - how often do you index strings? Ned Deily <nad@acm.org> - 2014-06-05 15:57 -0700
                      Re: Unicode and Python - how often do you index strings? Roy Smith <roy@panix.com> - 2014-06-05 20:10 -0400
                        Re: Unicode and Python - how often do you index strings? Ned Deily <nad@acm.org> - 2014-06-05 17:43 -0700
                        Re: Unicode and Python - how often do you index strings? Grant Edwards <invalid@invalid.invalid> - 2014-06-06 14:20 +0000
              Re: Unicode and Python - how often do you index strings? Ian Kelly <ian.g.kelly@gmail.com> - 2014-06-05 18:05 -0600
            Re: Unicode and Python - how often do you index strings? Johannes Bauer <dfnsonfsduifb@gmx.de> - 2014-06-06 10:42 +0200
              Re: Unicode and Python - how often do you index strings? Larry Hudson <orgnut@yahoo.com> - 2014-06-06 20:24 -0700
          Re: Unicode and Python - how often do you index strings? Chris Angelico <rosuav@gmail.com> - 2014-06-06 05:59 +1000
          Re: Unicode and Python - how often do you index strings? Ryan Hiebert <ryan@ryanhiebert.com> - 2014-06-05 15:05 -0500

Page 2 of 2 — ← Prev page 1 [2]

#72743

From	Paul Rubin <no.email@nospam.invalid>
Date	2014-06-05 11:16 -0700
Message-ID	<7xr433z0g3.fsf@ruckus.brouhaha.com>
In reply to	#72721

Johannes Bauer <dfnsonfsduifb@gmx.de> writes:
> line = line[:-1]
> Which truncates the trailing "\n" of a textfile line.

use line.rstrip() for that.

[toc] | [prev] | [next] | [standalone]

#72746

From	Johannes Bauer <dfnsonfsduifb@gmx.de>
Date	2014-06-05 20:42 +0200
Message-ID	<lmqdn8$scl$1@news.albasani.net>
In reply to	#72743

On 05.06.2014 20:16, Paul Rubin wrote:
> Johannes Bauer <dfnsonfsduifb@gmx.de> writes:
>> line = line[:-1]
>> Which truncates the trailing "\n" of a textfile line.
> 
> use line.rstrip() for that.

rstrip has different functionality than what I'm doing.

Cheers,
Johannes

-- 
>> Wo hattest Du das Beben nochmal GENAU vorhergesagt?
> Zumindest nicht öffentlich!
Ah, der neueste und bis heute genialste Streich unsere großen
Kosmologen: Die Geheim-Vorhersage.
 - Karl Kaos über Rüdiger Thomas in dsa <hidbv3$om2$1@speranza.aioe.org>

[toc] | [prev] | [next] | [standalone]

#72750

From	Ryan Hiebert <ryan@ryanhiebert.com>
Date	2014-06-05 13:52 -0500
Message-ID	<mailman.10759.1401998071.18130.python-list@python.org>
In reply to	#72746

[Multipart message — attachments visible in raw view] — view raw

2014-06-05 13:42 GMT-05:00 Johannes Bauer <dfnsonfsduifb@gmx.de>:

> On 05.06.2014 20:16, Paul Rubin wrote:
> > Johannes Bauer <dfnsonfsduifb@gmx.de> writes:
> >> line = line[:-1]
> >> Which truncates the trailing "\n" of a textfile line.
> >
> > use line.rstrip() for that.
>
> rstrip has different functionality than what I'm doing.


How so? I was using line=line[:-1] for removing the trailing newline, and
just replaced it with rstrip('\n'). What are you doing differently?

[toc] | [prev] | [next] | [standalone]

#72751

From	Paul Rubin <no.email@nospam.invalid>
Date	2014-06-05 12:58 -0700
Message-ID	<7xioof9li6.fsf@ruckus.brouhaha.com>
In reply to	#72750

Ryan Hiebert <ryan@ryanhiebert.com> writes:
> How so? I was using line=line[:-1] for removing the trailing newline, and
> just replaced it with rstrip('\n'). What are you doing differently?

rstrip removes all the newlines off the end, whether there are zero or
multiple.  In perl the difference is chomp vs chop.  line=line[:-1]
removes one character, that might or might not be a newline.

[toc] | [prev] | [next] | [standalone]

#72756

From	Ian Kelly <ian.g.kelly@gmail.com>
Date	2014-06-05 14:18 -0600
Message-ID	<mailman.10763.1401999570.18130.python-list@python.org>
In reply to	#72751

On Thu, Jun 5, 2014 at 1:58 PM, Paul Rubin <no.email@nospam.invalid> wrote:
> Ryan Hiebert <ryan@ryanhiebert.com> writes:
>> How so? I was using line=line[:-1] for removing the trailing newline, and
>> just replaced it with rstrip('\n'). What are you doing differently?
>
> rstrip removes all the newlines off the end, whether there are zero or
> multiple.  In perl the difference is chomp vs chop.  line=line[:-1]
> removes one character, that might or might not be a newline.

Given the description that the input string is "a textfile line", if
it has multiple newlines then it's invalid.

Personally I tend toward rstrip('\r\n') so that I don't have to worry
about files with alternative line terminators.

If you want to be really picky about removing exactly one line
terminator, then this captures all the relatively modern variations:
re.sub('\r?\n$|\n?\r$', line, '', count=1)

[toc] | [prev] | [next] | [standalone]

#72824

From	Johannes Bauer <dfnsonfsduifb@gmx.de>
Date	2014-06-06 10:47 +0200
Message-ID	<lmrv7g$fdm$2@news.albasani.net>
In reply to	#72756

On 05.06.2014 22:18, Ian Kelly wrote:

> Personally I tend toward rstrip('\r\n') so that I don't have to worry
> about files with alternative line terminators.

Hm, I was under the impression that Python already took care of removing
the \r at a line ending. Checking that right now:

(DOS encoded file "y")
>>> for line in open("y", "r"): print(line.encode("utf-8"))
...
b'foo\n'
b'bar\n'
b'moo\n'
b'koo\n'

Yup, the \r was removed automatically. Are there cases when it isn't?

Cheers,
Johannes

-- 
>> Wo hattest Du das Beben nochmal GENAU vorhergesagt?
> Zumindest nicht öffentlich!
Ah, der neueste und bis heute genialste Streich unsere großen
Kosmologen: Die Geheim-Vorhersage.
 - Karl Kaos über Rüdiger Thomas in dsa <hidbv3$om2$1@speranza.aioe.org>

[toc] | [prev] | [next] | [standalone]

#72829

From	Tim Chase <python.list@tim.thechases.com>
Date	2014-06-06 05:37 -0500
Message-ID	<mailman.10808.1402051070.18130.python-list@python.org>
In reply to	#72824

On 2014-06-06 10:47, Johannes Bauer wrote:
> > Personally I tend toward rstrip('\r\n') so that I don't have to
> > worry about files with alternative line terminators.
> 
> Hm, I was under the impression that Python already took care of
> removing the \r at a line ending. Checking that right now:
> 
> (DOS encoded file "y")
> >>> for line in open("y", "r"): print(line.encode("utf-8"))
> ...
> b'foo\n'
> b'bar\n'
> b'moo\n'
> b'koo\n'
> 
> Yup, the \r was removed automatically. Are there cases when it
> isn't?

It's possible if the file is opened as binary:

>>> f = file('delme.txt', 'wb')
>>> f.write('hello\r\nworld\r\n')
>>> f.close()
>>> f = file('delme.txt', 'rb')
>>> for row in f: print repr(row)
... 
'hello\r\n'
'world\r\n'
>>> f.close()


-tkc

[toc] | [prev] | [next] | [standalone]

#72837

From	Steven D'Aprano <steve+comp.lang.python@pearwood.info>
Date	2014-06-06 11:52 +0000
Message-ID	<5391ab7f$0$29978$c3e8da3$5496439d@news.astraweb.com>
In reply to	#72824

On Fri, 06 Jun 2014 10:47:44 +0200, Johannes Bauer wrote:

> Hm, I was under the impression that Python already took care of removing
> the \r at a line ending. Checking that right now:
[snip example]

This is called "Universal Newlines". Technically it is a build-time 
option which applies when you build the Python interpreter from source, 
so, yes, some Pythons may not implement it at all. But I think that it 
has been on by default for a long time, and the option to turn it off may 
have been removed in Python 3.3 or 3.4. In practical terms, you should 
normally expect it to be on.

Here's the PEP that introduced it: 
http://legacy.python.org/dev/peps/pep-0278/

The idea is that when universal newlines support is enabled, by default 
will convert any of \n, \r or \r\n into \n when reading from a file in 
text mode, and convert back the other way when writing the file.

In binary mode, newlines are *never* changed.

In Python 3, you can return end-of-lines unchanged by passing newline='' 
to the open() function.

https://docs.python.org/2/library/functions.html#open
https://docs.python.org/3/library/functions.html#open

-- 
Steven D'Aprano
http://import-that.dreamwidth.org/

[toc] | [prev] | [next] | [standalone]

#72762

From	Albert-Jan Roskam <fomcl@yahoo.com>
Date	2014-06-05 13:34 -0700
Message-ID	<mailman.10767.1402000635.18130.python-list@python.org>
In reply to	#72751






----- Original Message -----
> From: Ian Kelly <ian.g.kelly@gmail.com>
> To: Python <python-list@python.org>
> Cc: 
> Sent: Thursday, June 5, 2014 10:18 PM
> Subject: Re: Unicode and Python - how often do you index strings?
> 
> On Thu, Jun 5, 2014 at 1:58 PM, Paul Rubin <no.email@nospam.invalid> 
> wrote:
>>  Ryan Hiebert <ryan@ryanhiebert.com> writes:
>>>  How so? I was using line=line[:-1] for removing the trailing newline, 
> and
>>>  just replaced it with rstrip('\n'). What are you doing 
> differently?
>> 
>>  rstrip removes all the newlines off the end, whether there are zero or
>>  multiple.  In perl the difference is chomp vs chop.  line=line[:-1]
>>  removes one character, that might or might not be a newline.
> 
> Given the description that the input string is "a textfile line", if
> it has multiple newlines then it's invalid.
> 
> Personally I tend toward rstrip('\r\n') so that I don't have 
> to worry
> about files with alternative line terminators.

I tend to use: s.rstrip(os.linesep)

> If you want to be really picky about removing exactly one line
> terminator, then this captures all the relatively modern variations:
> re.sub('\r?\n$|\n?\r$', line, '', count=1)

or perhaps: re.sub("[^ \S]+$", "", line)

[toc] | [prev] | [next] | [standalone]

#72764

From	Roy Smith <roy@panix.com>
Date	2014-06-05 17:00 -0400
Message-ID	<roy-821955.17002605062014@news.panix.com>
In reply to	#72762

In article <mailman.10767.1402000635.18130.python-list@python.org>,
 Albert-Jan Roskam <fomcl@yahoo.com> wrote:

> 





----- Original Message -----
> From: Ian Kelly <ian.g.kelly@gmail.com>

> > To: Python <python-list@python.org>
> Cc: 
> Sent: Thursday, June 5, 2014 
> 10:18 PM
> Subject: Re: Unicode and Python - how often do you index strings?

> > 
> On Thu, Jun 5, 2014 at 1:58 PM, Paul Rubin <no.email@nospam.invalid> 
> 
> wrote:
>>  Ryan Hiebert <ryan@ryanhiebert.com> writes:
>>>  How so? I was 
> using line=line[:-1] for removing the trailing newline, 
> and
>>>  just 
> replaced it with rstrip('\n'). What are you doing 
> differently?
>> 
>>  
> rstrip removes all the newlines off the end, whether there are zero or
>>  
> multiple.? In perl the difference is chomp vs chop.? line=line[:-1]
>>  
> removes one character, that might or might not be a newline.
> 
> Given the 
> description that the input string is "a textfile line", if
> it has multiple 
> newlines then it's invalid.
> 
> Personally I tend toward rstrip('\r\n') so 
> that I don't have 
> to worry
> about files with alternative line 
> terminators.

I tend to use: s.rstrip(os.linesep)

> If you want to be really 
> picky about removing exactly one line
> terminator, then this captures all 
> the relatively modern variations:
> re.sub('\r?\n$|\n?\r$', line, '', 
> count=1)

or perhaps: re.sub("[^ \S]+$", "", line)

Just for fun, I took a screen-shot of what this looks like in my 
newsreader.  URL below.  Looks like something chomped on unicode pretty 
hard :-)

http://www.panix.com/~roy/unicode.pdf

[toc] | [prev] | [next] | [standalone]

#72778

From	Rustom Mody <rustompmody@gmail.com>
Date	2014-06-05 15:24 -0700
Message-ID	<8681edf0-7a1f-4110-9f87-a8cd0988cece@googlegroups.com>
In reply to	#72764

On Friday, June 6, 2014 2:30:26 AM UTC+5:30, Roy Smith wrote:
> Just for fun, I took a screen-shot of what this looks like in my 
> newsreader.  URL below.  Looks like something chomped on unicode pretty 
> hard :-)
>  
> http://www.panix.com/~roy/unicode.pdf

Yiiiiiiiiiiiiiiiiii!!!!!!!!!!!!

[toc] | [prev] | [next] | [standalone]

#72781

From	Ned Deily <nad@acm.org>
Date	2014-06-05 15:57 -0700
Message-ID	<mailman.10781.1402009056.18130.python-list@python.org>
In reply to	#72778

In article <8681edf0-7a1f-4110-9f87-a8cd0988cece@googlegroups.com>,
 Rustom Mody <rustompmody@gmail.com> wrote:

> On Friday, June 6, 2014 2:30:26 AM UTC+5:30, Roy Smith wrote:
> > Just for fun, I took a screen-shot of what this looks like in my 
> > newsreader.  URL below.  Looks like something chomped on unicode pretty 
> > hard :-)
> >  
> > http://www.panix.com/~roy/unicode.pdf
> 
> Yiiiiiiiiiiiiiiiiii!!!!!!!!!!!!

Roy is using MT-NewsWatcher as a client.  Because its codebase's origins 
are back in classic MacOS (<= 9), it has its own *interesting* ways to 
deal with encodings.  BTW, don't upgrade to OS X 10.9 Mavericks if 
you're dependent on MT-NW; it finally stops working there because what 
was left of Open Transport support in OS X has finally been ripped out 
of 10.9.

-- 
 Ned Deily,
 nad@acm.org

[toc] | [prev] | [next] | [standalone]

#72797

From	Roy Smith <roy@panix.com>
Date	2014-06-05 20:10 -0400
Message-ID	<roy-2A9D82.20100705062014@news.panix.com>
In reply to	#72781

In article <mailman.10781.1402009056.18130.python-list@python.org>,
 Ned Deily <nad@acm.org> wrote:

> In article <8681edf0-7a1f-4110-9f87-a8cd0988cece@googlegroups.com>,
>  Rustom Mody <rustompmody@gmail.com> wrote:
> 
> > On Friday, June 6, 2014 2:30:26 AM UTC+5:30, Roy Smith wrote:
> > > Just for fun, I took a screen-shot of what this looks like in my 
> > > newsreader.  URL below.  Looks like something chomped on unicode pretty 
> > > hard :-)
> > >  
> > > http://www.panix.com/~roy/unicode.pdf
> > 
> > Yiiiiiiiiiiiiiiiiii!!!!!!!!!!!!
> 
> Roy is using MT-NewsWatcher as a client.

Yes.  Except for the fact that it hasn't kept up with unicode, I find 
the U/I pretty much perfect.  I imagine at some point I'll be force to 
look elsewhere, but then again, netnews is pretty much dead.

> BTW, don't upgrade to OS X 10.9 Mavericks if you're dependent on 
> MT-NW; it finally stops working there because what was left of Open 
> Transport support in OS X has finally been ripped out of 10.9.

Hmmm, good to know.  I'm still on 10.7, and don't see any reason to 
move.  But, then again, you'd expect that from somebody who's still on 
Python 2.x, wouldn't you?

[toc] | [prev] | [next] | [standalone]

#72802

From	Ned Deily <nad@acm.org>
Date	2014-06-05 17:43 -0700
Message-ID	<mailman.10794.1402015416.18130.python-list@python.org>
In reply to	#72797

In article <roy-2A9D82.20100705062014@news.panix.com>,
 Roy Smith <roy@panix.com> wrote:
> In article <mailman.10781.1402009056.18130.python-list@python.org>,
>  Ned Deily <nad@acm.org> wrote:
> > Roy is using MT-NewsWatcher as a client.
> Yes.  Except for the fact that it hasn't kept up with unicode, I find 
> the U/I pretty much perfect.  I imagine at some point I'll be force to 
> look elsewhere, but then again, netnews is pretty much dead.

I agree about the U/I, although I'm sure a lot of that has to do with 
familiarity. However, netnews isn't dead, it has just morphed a bit.  A 
newsreader, like MT-NW, is great for following mailing lists like this 
(and most other Python-related lists) via gmane.org's bi-directional 
mailing list - NNTP gateways.  And for this list it's usually better to 
read the mailing list variant via gmane.org NNTP than the Usenet group 
variant via a traditional USENET NNTP server because there's less spam 
with the former.

> > BTW, don't upgrade to OS X 10.9 Mavericks if you're dependent on 
> > MT-NW; it finally stops working there because what was left of Open 
> > Transport support in OS X has finally been ripped out of 10.9.
> Hmmm, good to know.  I'm still on 10.7, and don't see any reason to 
> move.  But, then again, you'd expect that from somebody who's still on 
> Python 2.x, wouldn't you?

Heh. Well, both 10.8 and 10.9 proved various improvements, both feature 
and performance, over 10.7.  Alas, Apple won't likely be supporting 10.7 
with security updates for as long as the PSF will be supporting 2.7.x.  
But, by then, you'll have had a chance to re-implement MT-NW in Python.

-- 
 Ned Deily,
 nad@acm.org

[toc] | [prev] | [next] | [standalone]

#72847

From	Grant Edwards <invalid@invalid.invalid>
Date	2014-06-06 14:20 +0000
Message-ID	<lmsimr$4a7$1@reader1.panix.com>
In reply to	#72797

On 2014-06-06, Roy Smith <roy@panix.com> wrote:

>> Roy is using MT-NewsWatcher as a client.
>
> Yes.  Except for the fact that it hasn't kept up with unicode, I find 
> the U/I pretty much perfect.  I imagine at some point I'll be force to 
> look elsewhere, but then again, netnews is pretty much dead.

There are still a few active groups, but reading e-mail lists via NNTP
(in my case using slrn) via gmane is a huge reason to have an
efficient, well-designed "news" client.

If usenet does really pack it in someday and I have to switch from
comp.lang.python to the mailing list, it will be done by pointing slrn
at new.gmane.org -- not by having all those e-mails sent to me so I
can try to sort through them...

-- 
Grant Edwards               grant.b.edwards        Yow! My NOSE is NUMB!
                                  at               
                              gmail.com

[toc] | [prev] | [next] | [standalone]

#72796

From	Ian Kelly <ian.g.kelly@gmail.com>
Date	2014-06-05 18:05 -0600
Message-ID	<mailman.10791.1402013180.18130.python-list@python.org>
In reply to	#72751

On Thu, Jun 5, 2014 at 2:34 PM, Albert-Jan Roskam <fomcl@yahoo.com> wrote:
>> If you want to be really picky about removing exactly one line
>> terminator, then this captures all the relatively modern variations:
>> re.sub('\r?\n$|\n?\r$', line, '', count=1)
>
> or perhaps: re.sub("[^ \S]+$", "", line)

That will remove more than one terminator, plus tabs. Points for
including \f and \v though.

I suppose if we want to be absolutely correct, we should follow the
Unicode standard:
re.sub(r'\r?\n$|[\r\v\f\x85\u2028\u2029]$', line, '', count=1)

[toc] | [prev] | [next] | [standalone]

#72823

From	Johannes Bauer <dfnsonfsduifb@gmx.de>
Date	2014-06-06 10:42 +0200
Message-ID	<lmruud$fdm$1@news.albasani.net>
In reply to	#72750

On 05.06.2014 20:52, Ryan Hiebert wrote:
> 2014-06-05 13:42 GMT-05:00 Johannes Bauer <dfnsonfsduifb@gmx.de>:
> 
>> On 05.06.2014 20:16, Paul Rubin wrote:
>>> Johannes Bauer <dfnsonfsduifb@gmx.de> writes:
>>>> line = line[:-1]
>>>> Which truncates the trailing "\n" of a textfile line.
>>>
>>> use line.rstrip() for that.
>>
>> rstrip has different functionality than what I'm doing.
> 
> How so? I was using line=line[:-1] for removing the trailing newline, and
> just replaced it with rstrip('\n'). What are you doing differently?

Ah, I didn't know rstrip() accepted parameters and since you wrote
line.rstrip() this would also cut away whitespaces (which sadly are
relevant in odd cases).

Thanks for the clarification, I'll definitely introduce that.

Cheers,
Johannes

-- 
>> Wo hattest Du das Beben nochmal GENAU vorhergesagt?
> Zumindest nicht öffentlich!
Ah, der neueste und bis heute genialste Streich unsere großen
Kosmologen: Die Geheim-Vorhersage.
 - Karl Kaos über Rüdiger Thomas in dsa <hidbv3$om2$1@speranza.aioe.org>

[toc] | [prev] | [next] | [standalone]

#72899

From	Larry Hudson <orgnut@yahoo.com>
Date	2014-06-06 20:24 -0700
Message-ID	<8LGdnYTuM9VjGA_OnZ2dnUVZ_q6dnZ2d@giganews.com>
In reply to	#72823

On 06/06/2014 01:42 AM, Johannes Bauer wrote:
<snip>
> Ah, I didn't know rstrip() accepted parameters and since you wrote
> line.rstrip() this would also cut away whitespaces (which sadly are
> relevant in odd cases).
>

No problem.  If a parameter is used in the strip() family, than _only_ those characters are 
stripped.  Example:

 >>> s = 'some text \n'
 >>> print('"{}"'.format(s.rstrip()))      #  No parameter, strip all whitespace
"some text"
 >>> print('"{}"'.format(s.rstrip('\n')))  #  Parameter is newline, only strip newlines
"some text "

      -=- Larry

BTW, the strip() parameter (which must be a string) is not limited to whitespace, it can be used 
with any set of characters.

[toc] | [prev] | [next] | [standalone]

#72752

From	Chris Angelico <rosuav@gmail.com>
Date	2014-06-06 05:59 +1000
Message-ID	<mailman.10760.1401998360.18130.python-list@python.org>
In reply to	#72746

On Fri, Jun 6, 2014 at 4:52 AM, Ryan Hiebert <ryan@ryanhiebert.com> wrote:
> 2014-06-05 13:42 GMT-05:00 Johannes Bauer <dfnsonfsduifb@gmx.de>:
>
>> On 05.06.2014 20:16, Paul Rubin wrote:
>> > Johannes Bauer <dfnsonfsduifb@gmx.de> writes:
>> >> line = line[:-1]
>> >> Which truncates the trailing "\n" of a textfile line.
>> >
>> > use line.rstrip() for that.
>>
>> rstrip has different functionality than what I'm doing.
>
>
> How so? I was using line=line[:-1] for removing the trailing newline, and
> just replaced it with rstrip('\n'). What are you doing differently?

>>> line = "Hello,\nworld!\n\n"
>>> line[:-1]
'Hello,\nworld!\n'
>>> line.rstrip('\n')
'Hello,\nworld!'

If it's guaranteed to end with exactly one newline, then and only then
will they be identical.

ChrisA

[toc] | [prev] | [next] | [standalone]

#72754

From	Ryan Hiebert <ryan@ryanhiebert.com>
Date	2014-06-05 15:05 -0500
Message-ID	<mailman.10761.1401999232.18130.python-list@python.org>
In reply to	#72746

[Multipart message — attachments visible in raw view] — view raw

On Thu, Jun 5, 2014 at 2:59 PM, Chris Angelico <rosuav@gmail.com> wrote:

> On Fri, Jun 6, 2014 at 4:52 AM, Ryan Hiebert <ryan@ryanhiebert.com> wrote:
> > 2014-06-05 13:42 GMT-05:00 Johannes Bauer <dfnsonfsduifb@gmx.de>:
> >
> >> On 05.06.2014 20:16, Paul Rubin wrote:
> >> > Johannes Bauer <dfnsonfsduifb@gmx.de> writes:
> >> >> line = line[:-1]
> >> >> Which truncates the trailing "\n" of a textfile line.
> >> >
> >> > use line.rstrip() for that.
> >>
> >> rstrip has different functionality than what I'm doing.
> >
> >
> > How so? I was using line=line[:-1] for removing the trailing newline, and
> > just replaced it with rstrip('\n'). What are you doing differently?
>
> >>> line = "Hello,\nworld!\n\n"
> >>> line[:-1]
> 'Hello,\nworld!\n'
> >>> line.rstrip('\n')
> 'Hello,\nworld!'
>
> If it's guaranteed to end with exactly one newline, then and only then
> will they be identical.
>
>  OK, that's not an issue for my case, and additionally I'm using the
open(_, 'U') file iterable, so I shouldn't see multiple trailing newlines
anyway.

[toc] | [prev] | [standalone]

Page 2 of 2 — ← Prev page 1 [2]

csiph-web

Unicode and Python - how often do you index strings?

Contents

#72743

#72746

#72750

#72751

#72756

#72824

#72829

#72837

#72762

#72764

#72778

#72781

#72797

#72802

#72847

#72796

#72823

#72899

#72752

#72754