Groups > comp.lang.python > #43575 > unrolled thread

Re: howto remove the thousand separator

Started by	Mark Janssen <dreamingforward@gmail.com>
First post	2013-04-14 12:06 -0700
Last post	2013-04-15 23:16 +0100
Articles	13 — 7 participants

Back to article view | Back to comp.lang.python

This discussion starts older than the indexed window; earlier articles aren't shown. The article labeled Started by below is the oldest one visible, not the original post.

  Re: howto remove the thousand separator Mark Janssen <dreamingforward@gmail.com> - 2013-04-14 12:06 -0700
    Re: howto remove the thousand separator Steven D'Aprano <steve+comp.lang.python@pearwood.info> - 2013-04-15 00:29 +0000
      Re: howto remove the thousand separator Mark Janssen <dreamingforward@gmail.com> - 2013-04-14 17:44 -0700
        Re: howto remove the thousand separator Steven D'Aprano <steve+comp.lang.python@pearwood.info> - 2013-04-15 01:14 +0000
          Re: howto remove the thousand separator Chris Angelico <rosuav@gmail.com> - 2013-04-15 11:29 +1000
            Re: howto remove the thousand separator Walter Hurry <walterhurry@lavabit.com> - 2013-04-15 02:25 +0000
              Re: howto remove the thousand separator Roy Smith <roy@panix.com> - 2013-04-14 22:35 -0400
                Re: howto remove the thousand separator Steven D'Aprano <steve+comp.lang.python@pearwood.info> - 2013-04-15 07:04 +0000
          Re: howto remove the thousand separator Rotwang <sg552@hotmail.co.uk> - 2013-04-15 03:19 +0100
            Re: howto remove the thousand separator Ned Deily <nad@acm.org> - 2013-04-14 22:15 -0700
            Re: howto remove the thousand separator Steven D'Aprano <steve+comp.lang.python@pearwood.info> - 2013-04-15 07:03 +0000
              Re: howto remove the thousand separator Chris Angelico <rosuav@gmail.com> - 2013-04-15 17:39 +1000
              Re: howto remove the thousand separator Rotwang <sg552@hotmail.co.uk> - 2013-04-15 23:16 +0100

#43575 — Re: howto remove the thousand separator

From	Mark Janssen <dreamingforward@gmail.com>
Date	2013-04-14 12:06 -0700
Subject	Re: howto remove the thousand separator
Message-ID	<mailman.596.1365966380.3114.python-list@python.org>

On Sun, Apr 14, 2013 at 11:57 AM, pyth0n3r <pyth0n3r@gmail.com> wrote:
> I came across a problem that when i deal with int data with ',' as thousand
> separator, such as 12,916, i can not change it into int() or float().
> How can i remove the comma in int data?
> Any reply will be appreciated!!

cleaned=''
for c in myStringNumber:
   if c != ',':
     cleaned+=c
int(cleaned)

mark

[toc] | [next] | [standalone]

#43587

From	Steven D'Aprano <steve+comp.lang.python@pearwood.info>
Date	2013-04-15 00:29 +0000
Message-ID	<516b49dc$0$29977$c3e8da3$5496439d@news.astraweb.com>
In reply to	#43575

On Sun, 14 Apr 2013 12:06:12 -0700, Mark Janssen wrote:

> cleaned=''
> for c in myStringNumber:
>    if c != ',':
>      cleaned+=c
> int(cleaned)

Please don't write code like that. Firstly, it's long and bloated, and 
runs at the speed of Python, not C. Second, it runs at the speed of 
SLLLLOOOOOOOOOOWWWW Python, not fast Python, due to being an O(N**2) 
algorithm.

If you don't know what O(N**2) means, you should read this for an 
introduction:

http://www.joelonsoftware.com/articles/fog0000000319.html

-- 
Steven

[toc] | [prev] | [next] | [standalone]

#43589

From	Mark Janssen <dreamingforward@gmail.com>
Date	2013-04-14 17:44 -0700
Message-ID	<mailman.611.1365986675.3114.python-list@python.org>
In reply to	#43587

On Sun, Apr 14, 2013 at 5:29 PM, Steven D'Aprano
<steve+comp.lang.python@pearwood.info> wrote:
> On Sun, 14 Apr 2013 12:06:12 -0700, Mark Janssen wrote:
>
>> cleaned=''
>> for c in myStringNumber:
>>    if c != ',':
>>      cleaned+=c
>> int(cleaned)
>
> ....due to being an O(N**2)  algorithm.

What on earth makes you think that is an O(n**2) algorithm and not O(n)?

Mark

[toc] | [prev] | [next] | [standalone]

#43590

From	Steven D'Aprano <steve+comp.lang.python@pearwood.info>
Date	2013-04-15 01:14 +0000
Message-ID	<516b5471$0$29977$c3e8da3$5496439d@news.astraweb.com>
In reply to	#43589

On Sun, 14 Apr 2013 17:44:28 -0700, Mark Janssen wrote:

> On Sun, Apr 14, 2013 at 5:29 PM, Steven D'Aprano
> <steve+comp.lang.python@pearwood.info> wrote:
>> On Sun, 14 Apr 2013 12:06:12 -0700, Mark Janssen wrote:
>>
>>> cleaned=''
>>> for c in myStringNumber:
>>>    if c != ',':
>>>      cleaned+=c
>>> int(cleaned)
>>
>> ....due to being an O(N**2)  algorithm.
> 
> What on earth makes you think that is an O(n**2) algorithm and not O(n)?

Strings are immutable. Consider building up a single string from four 
substrings:

s = ''
s += 'fe'
s += 'fi'
s += 'fo'
s += 'fum'

Python *might* optimize the first concatenation, '' + 'fe', to just reuse 
'fe', (but it might not). Let's assume it does, so that no copying is 
needed. Then it gets to the second concatenation, and now it has to copy 
characters, because strings are immutable and cannot be modified in 
place. Showing the *running* total of characters copied:

'fe' + 'fi' => 'fefi'  # four characters copied
'fefi' + 'fo' => 'fefifo'  # 4 + 6 = ten characters copied
'fefifo' + 'fum' => 'fefifofum'  # 10 + 9 = nineteen characters copied

Notice how each intermediate substring gets copied repeatedly? In order 
to build up a string of length 9, we've had to copy at least 19 
characters. With only four substrings, it's not terribly obvious how 
badly this performs. So let's add some more substrings, and see how the 
running total increases:

'fefifofum' + 'foo' => 'fefifofumfoo'  # 19 + 12 = 31
'fefifofumfoo' + 'bar' => 'fefifofumfoobar'  # 31 + 15 = 46
'fefifofumfoobar' + 'baz' => 'fefifofumfoobarbaz'  # 46 + 18 = 64
'fefifofumfoobarbaz' + 'spam' => 'fefifofumfoobarbazspam'  # 64 + 22 = 86

To build up a string of length 22, we've had to copy, and re-copy, and re-
re-copy, 86 characters in total. And the string gets bigger, the 
inefficiency gets worse. Each substring (except the very last one) gets 
copied multiple times; the number of times it gets copied is proportional 
to the number of substrings.

If the substrings are individual characters, then each character is 
copied a number of times proportional to the number of characters N; 
since there are N characters, each being copied (proportional to) N 
times, that makes N*N or N**2.

-- 
Steven

[toc] | [prev] | [next] | [standalone]

#43592

From	Chris Angelico <rosuav@gmail.com>
Date	2013-04-15 11:29 +1000
Message-ID	<mailman.612.1365989366.3114.python-list@python.org>
In reply to	#43590

On Mon, Apr 15, 2013 at 11:14 AM, Steven D'Aprano
<steve+comp.lang.python@pearwood.info> wrote:
> On Sun, 14 Apr 2013 17:44:28 -0700, Mark Janssen wrote:
>> What on earth makes you think that is an O(n**2) algorithm and not O(n)?
>
> Python *might* optimize the first concatenation, '' + 'fe', to just reuse
> 'fe', (but it might not). Let's assume it does, so that no copying is
> needed. Then it gets to the second concatenation, and now it has to copy
> characters, because strings are immutable and cannot be modified in
> place.

There are actually a lot of optimizations done, so it might turn out
to be O(n) in practice. But strictly in the Python code, yes, this is
definitely O(n*n).

ChrisA

[toc] | [prev] | [next] | [standalone]

#43595

From	Walter Hurry <walterhurry@lavabit.com>
Date	2013-04-15 02:25 +0000
Message-ID	<kkfodv$f5m$1@news.albasani.net>
In reply to	#43592

On Mon, 15 Apr 2013 11:29:17 +1000, Chris Angelico wrote:

> There are actually a lot of optimizations done, so it might turn out to
> be O(n) in practice. But strictly in the Python code, yes, this is
> definitely O(n*n).

In any event, Janssen should cease and desist offering advice here if he 
can't do better than that.

[toc] | [prev] | [next] | [standalone]

#43597

From	Roy Smith <roy@panix.com>
Date	2013-04-14 22:35 -0400
Message-ID	<roy-BE1090.22354214042013@news.panix.com>
In reply to	#43595

In article <kkfodv$f5m$1@news.albasani.net>,
 Walter Hurry <walterhurry@lavabit.com> wrote:

> On Mon, 15 Apr 2013 11:29:17 +1000, Chris Angelico wrote:
> 
> > There are actually a lot of optimizations done, so it might turn out to
> > be O(n) in practice. But strictly in the Python code, yes, this is
> > definitely O(n*n).
> 
> In any event, Janssen should cease and desist offering advice here if he 
> can't do better than that.

That's a little harsh.  Sure, it was a "sub-optimal" way to write the 
code (for all the reasons people mentioned), but it engendered a good 
discussion.

[toc] | [prev] | [next] | [standalone]

#43607

From	Steven D'Aprano <steve+comp.lang.python@pearwood.info>
Date	2013-04-15 07:04 +0000
Message-ID	<516ba676$0$29872$c3e8da3$5496439d@news.astraweb.com>
In reply to	#43597

On Sun, 14 Apr 2013 22:35:42 -0400, Roy Smith wrote:

> In article <kkfodv$f5m$1@news.albasani.net>,
>  Walter Hurry <walterhurry@lavabit.com> wrote:
> 
>> On Mon, 15 Apr 2013 11:29:17 +1000, Chris Angelico wrote:
>> 
>> > There are actually a lot of optimizations done, so it might turn out
>> > to be O(n) in practice. But strictly in the Python code, yes, this is
>> > definitely O(n*n).
>> 
>> In any event, Janssen should cease and desist offering advice here if
>> he can't do better than that.
> 
> That's a little harsh.  Sure, it was a "sub-optimal" way to write the
> code (for all the reasons people mentioned), but it engendered a good
> discussion.


Agreed. I'd rather people come out with poor code, and LEARN from the 
answers, than feel that they dare not reply until they're an expert.



-- 
Steven

[toc] | [prev] | [next] | [standalone]

#43594

From	Rotwang <sg552@hotmail.co.uk>
Date	2013-04-15 03:19 +0100
Message-ID	<kkfnun$kpj$1@dont-email.me>
In reply to	#43590

On 15/04/2013 02:14, Steven D'Aprano wrote:
> On Sun, 14 Apr 2013 17:44:28 -0700, Mark Janssen wrote:
>
>> On Sun, Apr 14, 2013 at 5:29 PM, Steven D'Aprano
>> <steve+comp.lang.python@pearwood.info> wrote:
>>> On Sun, 14 Apr 2013 12:06:12 -0700, Mark Janssen wrote:
>>>
>>>> cleaned=''
>>>> for c in myStringNumber:
>>>>     if c != ',':
>>>>       cleaned+=c
>>>> int(cleaned)
>>>
>>> ....due to being an O(N**2)  algorithm.
>>
>> What on earth makes you think that is an O(n**2) algorithm and not O(n)?
>
> Strings are immutable. Consider building up a single string from four
> substrings:
>
> s = ''
> s += 'fe'
> s += 'fi'
> s += 'fo'
> s += 'fum'
>
> Python *might* optimize the first concatenation, '' + 'fe', to just reuse
> 'fe', (but it might not). Let's assume it does, so that no copying is
> needed. Then it gets to the second concatenation, and now it has to copy
> characters, because strings are immutable and cannot be modified in
> place.

Actually, I believe that CPython is optimised to modify strings in place 
where possible, so that the above would surprisingly turn out to be 
O(n). See the following thread where I asked about this:

http://groups.google.com/group/comp.lang.python/browse_thread/thread/990a695fe2d85c52

(Sorry for linking to Google Groups. Does anyone know of a better c.l.p. 
web archive?)

[toc] | [prev] | [next] | [standalone]

#43603

From	Ned Deily <nad@acm.org>
Date	2013-04-14 22:15 -0700
Message-ID	<mailman.620.1366002932.3114.python-list@python.org>
In reply to	#43594

In article <kkfnun$kpj$1@dont-email.me>, Rotwang <sg552@hotmail.co.uk> 
wrote:
> (Sorry for linking to Google Groups. Does anyone know of a better c.l.p. 
> web archive?)

http://dir.gmane.org/gmane.comp.python.general

-- 
 Ned Deily,
 nad@acm.org

[toc] | [prev] | [next] | [standalone]

#43606

From	Steven D'Aprano <steve+comp.lang.python@pearwood.info>
Date	2013-04-15 07:03 +0000
Message-ID	<516ba633$0$29872$c3e8da3$5496439d@news.astraweb.com>
In reply to	#43594

On Mon, 15 Apr 2013 03:19:43 +0100, Rotwang wrote:

> On 15/04/2013 02:14, Steven D'Aprano wrote:
>> On Sun, 14 Apr 2013 17:44:28 -0700, Mark Janssen wrote:
>>
>>> On Sun, Apr 14, 2013 at 5:29 PM, Steven D'Aprano
>>> <steve+comp.lang.python@pearwood.info> wrote:
>>>> On Sun, 14 Apr 2013 12:06:12 -0700, Mark Janssen wrote:
>>>>
>>>>> cleaned=''
>>>>> for c in myStringNumber:
>>>>>     if c != ',':
>>>>>       cleaned+=c
>>>>> int(cleaned)
>>>>
>>>> ....due to being an O(N**2)  algorithm.
>>>
>>> What on earth makes you think that is an O(n**2) algorithm and not
>>> O(n)?
>>
>> Strings are immutable. Consider building up a single string from four
>> substrings:
>>
>> s = ''
>> s += 'fe'
>> s += 'fi'
>> s += 'fo'
>> s += 'fum'
>>
>> Python *might* optimize the first concatenation, '' + 'fe', to just
>> reuse 'fe', (but it might not). Let's assume it does, so that no
>> copying is needed. Then it gets to the second concatenation, and now it
>> has to copy characters, because strings are immutable and cannot be
>> modified in place.
> 
> Actually, I believe that CPython is optimised to modify strings in place
> where possible, so that the above would surprisingly turn out to be
> O(n). See the following thread where I asked about this:

I deliberately didn't open that can of worms, mostly because I was in a 
hurry, but also because it's not an optimization you can rely on. It 
depends on the version, implementation, operating system, and the exact 
code running.

1) It only applies to code running under some, but not all, versions of 
CPython. It does not apply to PyPy, Jython, IronPython, and probably not 
other implementations.

2) Even under CPython, it can fail. It *will* fail if you have multiple 
references to the same strings. And it *may* fail depending on the 
vagaries of the memory management system in place, e.g. code that is 
optimized on Linux may fail to optimize under Windows, leading to slow 
code.

As far as I'm concerned, the best advice regarding this optimization is:

- always program as if it doesn't exist;

- but be glad it does when you're writing quick and dirty code in the 
interactive interpreter, where the convenience of string concatenation 
may be just too darn convenient to bother doing the right thing.

> http://groups.google.com/group/comp.lang.python/browse_thread/
thread/990a695fe2d85c52
> 
> (Sorry for linking to Google Groups. Does anyone know of a better c.l.p.
> web archive?)

The canonical (although possibly not the best) archive for c.l.p. is the 
python-list mailing list archive:

http://mail.python.org/mailman/listinfo/python-list

-- 
Steven

[toc] | [prev] | [next] | [standalone]

#43609

From	Chris Angelico <rosuav@gmail.com>
Date	2013-04-15 17:39 +1000
Message-ID	<mailman.623.1366011573.3114.python-list@python.org>
In reply to	#43606

On Mon, Apr 15, 2013 at 5:03 PM, Steven D'Aprano
<steve+comp.lang.python@pearwood.info> wrote:
> On Mon, 15 Apr 2013 03:19:43 +0100, Rotwang wrote:
>
>> On 15/04/2013 02:14, Steven D'Aprano wrote:
>>> Strings are immutable. Consider building up a single string from four
>>> substrings:
>>
>> Actually, I believe that CPython is optimised to modify strings in place
>> where possible, so that the above would surprisingly turn out to be
>> O(n). See the following thread where I asked about this:
>
> I deliberately didn't open that can of worms, mostly because I was in a
> hurry, but also because it's not an optimization you can rely on. It
> depends on the version, implementation, operating system, and the exact
> code running.
>
> As far as I'm concerned, the best advice regarding this optimization is:
>
> - always program as if it doesn't exist;
>
> - but be glad it does when you're writing quick and dirty code in the
> interactive interpreter, where the convenience of string concatenation
> may be just too darn convenient to bother doing the right thing.

Agreed; that's why, in my reply, I emphasized that the pure Python
code IS quadratic, even though the actual implementation might turn
out linear. (I love that word "might". Covers myriad possibilities on
both sides.)

Same goes for all sorts of other possibilities. I wouldn't test string
equality with 'is' without explicit interning, even if I'm testing a
constant against another constant in the same module - but I might get
a big performance boost if the system's interned all its constants for
me.

ChrisA

[toc] | [prev] | [next] | [standalone]

#43646

From	Rotwang <sg552@hotmail.co.uk>
Date	2013-04-15 23:16 +0100
Message-ID	<kkhu1q$15t$2@dont-email.me>
In reply to	#43606

On 15/04/2013 08:03, Steven D'Aprano wrote:
> On Mon, 15 Apr 2013 03:19:43 +0100, Rotwang wrote:
>> [...]
>>
>> (Sorry for linking to Google Groups. Does anyone know of a better c.l.p.
>> web archive?)
>
> The canonical (although possibly not the best) archive for c.l.p. is the
> python-list mailing list archive:
>
> http://mail.python.org/mailman/listinfo/python-list

Thanks to both you and Ned.

[toc] | [prev] | [standalone]

csiph-web

Re: howto remove the thousand separator

Contents

#43575 — Re: howto remove the thousand separator

#43587

#43589

#43590

#43592

#43595

#43597

#43607

#43594

#43603

#43606

#43609

#43646