Groups > comp.lang.python > #89530 > unrolled thread

Using + with strings considered bad

Started by	Cecil Westerhof <Cecil@decebal.nl>
First post	2015-04-29 10:29 +0200
Last post	2015-04-29 06:40 -0500
Articles	9 — 7 participants

Back to article view | Back to comp.lang.python

  Using + with strings considered bad Cecil Westerhof <Cecil@decebal.nl> - 2015-04-29 10:29 +0200
    Re: Using + with strings considered bad Mark Lawrence <breamoreboy@yahoo.co.uk> - 2015-04-29 10:08 +0100
    Re: Using + with strings considered bad Peter Otten <__peter__@web.de> - 2015-04-29 11:24 +0200
      Re: Using + with strings considered bad Cecil Westerhof <Cecil@decebal.nl> - 2015-04-29 13:17 +0200
      Re: Using + with strings considered bad Cecil Westerhof <Cecil@decebal.nl> - 2015-04-29 14:23 +0200
        Re: Using + with strings considered bad Chris Angelico <rosuav@gmail.com> - 2015-04-29 22:55 +1000
    Re: Using + with strings considered bad Steven D'Aprano <steve+comp.lang.python@pearwood.info> - 2015-04-29 23:15 +1000
      Re: Using + with strings considered bad wxjmfauth@gmail.com - 2015-04-29 08:16 -0700
    Re: Using + with strings considered bad Andrew Berg <aberg010@my.hennepintech.edu> - 2015-04-29 06:40 -0500

#89530 — Using + with strings considered bad

From	Cecil Westerhof <Cecil@decebal.nl>
Date	2015-04-29 10:29 +0200
Subject	Using + with strings considered bad
Message-ID	<878udbxrpg.fsf@Equus.decebal.nl>

Because I try to keep my lines (well) below 80 characters, I use the
following:
    print('Calculating fibonacci and fibonacci_memoize once for ' +
          str(large_fibonacci) + ' to determine speed increase')

But I was told that using + with strings was bad practice. Is this
true? If so, what is the better way to do this?

-- 
Cecil Westerhof
Senior Software Engineer
LinkedIn: http://www.linkedin.com/in/cecilwesterhof

[toc] | [next] | [standalone]

#89533

From	Mark Lawrence <breamoreboy@yahoo.co.uk>
Date	2015-04-29 10:08 +0100
Message-ID	<mailman.78.1430298608.3680.python-list@python.org>
In reply to	#89530

On 29/04/2015 09:29, Cecil Westerhof wrote:
> Because I try to keep my lines (well) below 80 characters, I use the
> following:
>      print('Calculating fibonacci and fibonacci_memoize once for ' +
>            str(large_fibonacci) + ' to determine speed increase')
>
> But I was told that using + with strings was bad practice. Is this
> true? If so, what is the better way to do this?
>

It's not bad practice as such, it's simply that performance takes a nose 
dive if you're contatenating large numbers of strings.  If performance 
is an issue the recommended way is to write.

' '.join(strings)

-- 
My fellow Pythonistas, ask not what our language can do for you, ask
what you can do for our language.

Mark Lawrence

[toc] | [prev] | [next] | [standalone]

#89536

From	Peter Otten <__peter__@web.de>
Date	2015-04-29 11:24 +0200
Message-ID	<mailman.79.1430299516.3680.python-list@python.org>
In reply to	#89530

Cecil Westerhof wrote:

> Because I try to keep my lines (well) below 80 characters, I use the
> following:
>     print('Calculating fibonacci and fibonacci_memoize once for ' +
>           str(large_fibonacci) + ' to determine speed increase')
> 
> But I was told that using + with strings was bad practice. Is this
> true? 

No. What was meant was probably that str.join() is preferred when you are to 
concat an arbitrary number of strings, i. e.

# wrong
s = ""
for item in items:
    s += " " + item.name # may be inefficient depending on implementation
s = s[1:]

# correct
s = " ".join(item.name for item in items)

(For more complex operations than just getting an attribute you may have to 
write a helper generator:

def bogus(items):
    prev = ""
    for item in items:
        yield str(len(prev) - len(item))
        prev = item

s = "*".join(bogus(items))
)

> If so, what is the better way to do this?
 
Python concats adjacent string constants implicitly

>>> "one" "two"
'onetwo'

but in CPython an extra + will be removed by the peephole optimiser:

>>> def f(): return "one" + "two"
... 
>>> import dis
>>> dis.dis(f)
  1           0 LOAD_CONST               3 ('onetwo')
              3 RETURN_VALUE


>     print('Calculating fibonacci and fibonacci_memoize once for ' +
>           str(large_fibonacci) + ' to determine speed increase')

You could write that as

print('Calculating fibonacci and fibonacci_memoize once for '
      '{} to determine speed increase'.format(large_fibonacci))

but in a simple case like yours I'd go with the obvious

print(
    'Calculating fibonacci and fibonacci_memoize once for',
    large_fibonacci,
    'to determine speed increase')

[toc] | [prev] | [next] | [standalone]

#89540

From	Cecil Westerhof <Cecil@decebal.nl>
Date	2015-04-29 13:17 +0200
Message-ID	<87y4lbchfm.fsf@Equus.decebal.nl>
In reply to	#89536

Op Wednesday 29 Apr 2015 11:24 CEST schreef Peter Otten:

> Cecil Westerhof wrote:
>
>> Because I try to keep my lines (well) below 80 characters, I use
>> the following: print('Calculating fibonacci and fibonacci_memoize
>> once for ' + str(large_fibonacci) + ' to determine speed increase')
>>
>> But I was told that using + with strings was bad practice. Is this
>> true? 

[...]

>> print('Calculating fibonacci and fibonacci_memoize once for ' +
>> str(large_fibonacci) + ' to determine speed increase')
>
> You could write that as
>
> print('Calculating fibonacci and fibonacci_memoize once for '
> '{} to determine speed increase'.format(large_fibonacci))
>
> but in a simple case like yours I'd go with the obvious
>
> print(
> 'Calculating fibonacci and fibonacci_memoize once for',
> large_fibonacci,
> 'to determine speed increase')

I have gone for this option. Thanks.

-- 
Cecil Westerhof
Senior Software Engineer
LinkedIn: http://www.linkedin.com/in/cecilwesterhof

[toc] | [prev] | [next] | [standalone]

#89547

From	Cecil Westerhof <Cecil@decebal.nl>
Date	2015-04-29 14:23 +0200
Message-ID	<87egn3ced3.fsf@Equus.decebal.nl>
In reply to	#89536

Op Wednesday 29 Apr 2015 11:24 CEST schreef Peter Otten:

>> print('Calculating fibonacci and fibonacci_memoize once for ' +
>> str(large_fibonacci) + ' to determine speed increase')
>
> You could write that as
>
> print('Calculating fibonacci and fibonacci_memoize once for '
> '{} to determine speed increase'.format(large_fibonacci))
>
> but in a simple case like yours I'd go with the obvious
>
> print(
> 'Calculating fibonacci and fibonacci_memoize once for',
> large_fibonacci,
> 'to determine speed increase')

In 2.7 that gives:
   ('Calculating fibonacci and fibonacci_memoize once for', 40, 'to determine speed increase')

So I am going to use the one above it.

-- 
Cecil Westerhof
Senior Software Engineer
LinkedIn: http://www.linkedin.com/in/cecilwesterhof

[toc] | [prev] | [next] | [standalone]

#89549

From	Chris Angelico <rosuav@gmail.com>
Date	2015-04-29 22:55 +1000
Message-ID	<mailman.85.1430312164.3680.python-list@python.org>
In reply to	#89547

On Wed, Apr 29, 2015 at 10:23 PM, Cecil Westerhof <Cecil@decebal.nl> wrote:
> In 2.7 that gives:
>    ('Calculating fibonacci and fibonacci_memoize once for', 40, 'to determine speed increase')
>
> So I am going to use the one above it.

Start your script with:

from __future__ import print_function

Problem solved! :)

ChrisA

[toc] | [prev] | [next] | [standalone]

#89550

From	Steven D'Aprano <steve+comp.lang.python@pearwood.info>
Date	2015-04-29 23:15 +1000
Message-ID	<5540d95c$0$12982$c3e8da3$5496439d@news.astraweb.com>
In reply to	#89530

On Wed, 29 Apr 2015 06:29 pm, Cecil Westerhof wrote:

> Because I try to keep my lines (well) below 80 characters, I use the
> following:
>     print('Calculating fibonacci and fibonacci_memoize once for ' +
>           str(large_fibonacci) + ' to determine speed increase')

That's perfectly fine, but these two alternatives may be better:

    print('Calculating fibonacci and fibonacci_memoize once for'
           ' %s to determine speed increase' % large_fibonacci)

    print('Calculating fibonacci and fibonacci_memoize once for'
           ' {} to determine speed increase'.format(large_fibonacci))

> But I was told that using + with strings was bad practice. Is this
> true? If so, what is the better way to do this?

*Repeated* string concatenation is bad practice. Concatenating one or two
strings is fine. Doing it in a loop to build up a big string is bad mojo.

# Perfectly fine:
message = prefix + "something or other" + suffix

# Okay, but there are better alternatives:
for item in things:
    message = "something " + str(item)
    print(message)

# This is asking for trouble.
# Use ''.join(substrings) instead.
text = ''
for s in substrings:
    text = text + s

The problem with the third one is that it has to make temporary strings
which get thrown away, and that gets very expensive if there are many
substrings. Suppose our substrings are "a", "bb", "ccc", "dddd", "eeeee",
then the temporary strings that are made end up being:

text = "a"  # copies one character (maybe?)
text = "abb"  # copies three characters
text = "abbccc"  # copies six characters
text = "abbcccdddd"  # copies ten characters
text = "abbcccddddeeeee"  # copies fifteen characters

So to build a string of length 15, Python ends up copying 34 or 35
characters. As the number of substrings increases, the amount of wasted
copying blows out: repeated string concatenation behaves quadratically,
which is very slow.

The tricky part is that Python starting from version 2.3 introduced an
optimization that *may* avoid all those extra copying under *some*
circumstances. So with casual testing, you might not notice the quadratic
behaviour, and see linear behaviour.

Until you rely on it being fast, and it isn't.

-- 
Steven

[toc] | [prev] | [next] | [standalone]

#89554

From	wxjmfauth@gmail.com
Date	2015-04-29 08:16 -0700
Message-ID	<db3b6548-9f4c-472d-ab47-cf25653eefe6@googlegroups.com>
In reply to	#89550

Le mercredi 29 avril 2015 15:15:19 UTC+2, Steven D'Aprano a écrit :
> On Wed, 29 Apr 2015 06:29 pm, Cecil Westerhof wrote:
> 
> > Because I try to keep my lines (well) below 80 characters, I use the
> > following:
> >     print('Calculating fibonacci and fibonacci_memoize once for ' +
> >           str(large_fibonacci) + ' to determine speed increase')
> 
> That's perfectly fine, but these two alternatives may be better:
> 
>     print('Calculating fibonacci and fibonacci_memoize once for'
>            ' %s to determine speed increase' % large_fibonacci)
> 
>     print('Calculating fibonacci and fibonacci_memoize once for'
>            ' {} to determine speed increase'.format(large_fibonacci))
> 
> 
> 
> > But I was told that using + with strings was bad practice. Is this
> > true? If so, what is the better way to do this?
> 
> *Repeated* string concatenation is bad practice. Concatenating one or two
> strings is fine. Doing it in a loop to build up a big string is bad mojo.
> 
> 
> # Perfectly fine:
> message = prefix + "something or other" + suffix
> 
> 
> # Okay, but there are better alternatives:
> for item in things:
>     message = "something " + str(item)
>     print(message)
> 
> 
> # This is asking for trouble.
> # Use ''.join(substrings) instead.
> text = ''
> for s in substrings:
>     text = text + s
> 
> 
> The problem with the third one is that it has to make temporary strings
> which get thrown away, and that gets very expensive if there are many
> substrings. Suppose our substrings are "a", "bb", "ccc", "dddd", "eeeee",
> then the temporary strings that are made end up being:
> 
> text = "a"  # copies one character (maybe?)
> text = "abb"  # copies three characters
> text = "abbccc"  # copies six characters
> text = "abbcccdddd"  # copies ten characters
> text = "abbcccddddeeeee"  # copies fifteen characters
> 
> So to build a string of length 15, Python ends up copying 34 or 35
> characters. As the number of substrings increases, the amount of wasted
> copying blows out: repeated string concatenation behaves quadratically,
> which is very slow.
> 
> The tricky part is that Python starting from version 2.3 introduced an
> optimization that *may* avoid all those extra copying under *some*
> circumstances. So with casual testing, you might not notice the quadratic
> behaviour, and see linear behaviour.
> 
> Until you rely on it being fast, and it isn't.
> 

------

>>> timeit.repeat("''.join(['a', 'a']*1000)")
[35.80853889412481, 36.228519745690505, 36.15569110606032]
>>> timeit.repeat("''.join(['a', '\u3456']*1000)")
[38.041631106320324, 38.031590190424936, 38.020335857707664]
>>> 

-----------

>>> timeit.repeat("''.join(['a', 'a']*1000)")
[38.94708620458107, 38.91509429875259, 38.914195952835485]
>>> timeit.repeat("''.join(['a', '\u3456']*1000)")
[58.284382998616366, 58.23354945884603, 58.27770782766095]
>>> 

with a memory gain = 0

[toc] | [prev] | [next] | [standalone]

#89562

From	Andrew Berg <aberg010@my.hennepintech.edu>
Date	2015-04-29 06:40 -0500
Message-ID	<mailman.89.1430331235.3680.python-list@python.org>
In reply to	#89530

On 2015.04.29 04:08, Mark Lawrence wrote:
> On 29/04/2015 09:29, Cecil Westerhof wrote:
>> Because I try to keep my lines (well) below 80 characters, I use the
>> following:
>>      print('Calculating fibonacci and fibonacci_memoize once for ' +
>>            str(large_fibonacci) + ' to determine speed increase')
>>
>> But I was told that using + with strings was bad practice. Is this
>> true? If so, what is the better way to do this?
>>
> 
> It's not bad practice as such, it's simply that performance takes a nose 
> dive if you're contatenating large numbers of strings.  If performance 
> is an issue the recommended way is to write.
> 
> ' '.join(strings)
> 
I thought it was frowned upon because it's less readable for anything non-trivial.

hero1 = 'Batman'
hero2 = 'Robin'
villain = 'The Joker'
place = 'Gotham City'

sentence = hero1 + " and " + hero2 + " fight " + villain + " in " + place + "."
# doesn't flow as well as:
sentence = "{hero1} and {hero2} fight {villain} in {place}.".format(
		hero1=hero1,hero2=hero2,villain=villain,place=place)

[toc] | [prev] | [standalone]

csiph-web

Using + with strings considered bad

Contents

#89530 — Using + with strings considered bad

#89533

#89536

#89540

#89547

#89549

#89550

#89554

#89562