Groups | Search | Server Info | Keyboard shortcuts | Login | Register [http] [https] [nntp] [nntps]
Groups > comp.lang.python > #89530 > unrolled thread
| Started by | Cecil Westerhof <Cecil@decebal.nl> |
|---|---|
| First post | 2015-04-29 10:29 +0200 |
| Last post | 2015-04-29 06:40 -0500 |
| Articles | 9 — 7 participants |
Back to article view | Back to comp.lang.python
Using + with strings considered bad Cecil Westerhof <Cecil@decebal.nl> - 2015-04-29 10:29 +0200
Re: Using + with strings considered bad Mark Lawrence <breamoreboy@yahoo.co.uk> - 2015-04-29 10:08 +0100
Re: Using + with strings considered bad Peter Otten <__peter__@web.de> - 2015-04-29 11:24 +0200
Re: Using + with strings considered bad Cecil Westerhof <Cecil@decebal.nl> - 2015-04-29 13:17 +0200
Re: Using + with strings considered bad Cecil Westerhof <Cecil@decebal.nl> - 2015-04-29 14:23 +0200
Re: Using + with strings considered bad Chris Angelico <rosuav@gmail.com> - 2015-04-29 22:55 +1000
Re: Using + with strings considered bad Steven D'Aprano <steve+comp.lang.python@pearwood.info> - 2015-04-29 23:15 +1000
Re: Using + with strings considered bad wxjmfauth@gmail.com - 2015-04-29 08:16 -0700
Re: Using + with strings considered bad Andrew Berg <aberg010@my.hennepintech.edu> - 2015-04-29 06:40 -0500
| From | Cecil Westerhof <Cecil@decebal.nl> |
|---|---|
| Date | 2015-04-29 10:29 +0200 |
| Subject | Using + with strings considered bad |
| Message-ID | <878udbxrpg.fsf@Equus.decebal.nl> |
Because I try to keep my lines (well) below 80 characters, I use the
following:
print('Calculating fibonacci and fibonacci_memoize once for ' +
str(large_fibonacci) + ' to determine speed increase')
But I was told that using + with strings was bad practice. Is this
true? If so, what is the better way to do this?
--
Cecil Westerhof
Senior Software Engineer
LinkedIn: http://www.linkedin.com/in/cecilwesterhof
[toc] | [next] | [standalone]
| From | Mark Lawrence <breamoreboy@yahoo.co.uk> |
|---|---|
| Date | 2015-04-29 10:08 +0100 |
| Message-ID | <mailman.78.1430298608.3680.python-list@python.org> |
| In reply to | #89530 |
On 29/04/2015 09:29, Cecil Westerhof wrote:
> Because I try to keep my lines (well) below 80 characters, I use the
> following:
> print('Calculating fibonacci and fibonacci_memoize once for ' +
> str(large_fibonacci) + ' to determine speed increase')
>
> But I was told that using + with strings was bad practice. Is this
> true? If so, what is the better way to do this?
>
It's not bad practice as such, it's simply that performance takes a nose
dive if you're contatenating large numbers of strings. If performance
is an issue the recommended way is to write.
' '.join(strings)
--
My fellow Pythonistas, ask not what our language can do for you, ask
what you can do for our language.
Mark Lawrence
[toc] | [prev] | [next] | [standalone]
| From | Peter Otten <__peter__@web.de> |
|---|---|
| Date | 2015-04-29 11:24 +0200 |
| Message-ID | <mailman.79.1430299516.3680.python-list@python.org> |
| In reply to | #89530 |
Cecil Westerhof wrote:
> Because I try to keep my lines (well) below 80 characters, I use the
> following:
> print('Calculating fibonacci and fibonacci_memoize once for ' +
> str(large_fibonacci) + ' to determine speed increase')
>
> But I was told that using + with strings was bad practice. Is this
> true?
No. What was meant was probably that str.join() is preferred when you are to
concat an arbitrary number of strings, i. e.
# wrong
s = ""
for item in items:
s += " " + item.name # may be inefficient depending on implementation
s = s[1:]
# correct
s = " ".join(item.name for item in items)
(For more complex operations than just getting an attribute you may have to
write a helper generator:
def bogus(items):
prev = ""
for item in items:
yield str(len(prev) - len(item))
prev = item
s = "*".join(bogus(items))
)
> If so, what is the better way to do this?
Python concats adjacent string constants implicitly
>>> "one" "two"
'onetwo'
but in CPython an extra + will be removed by the peephole optimiser:
>>> def f(): return "one" + "two"
...
>>> import dis
>>> dis.dis(f)
1 0 LOAD_CONST 3 ('onetwo')
3 RETURN_VALUE
> print('Calculating fibonacci and fibonacci_memoize once for ' +
> str(large_fibonacci) + ' to determine speed increase')
You could write that as
print('Calculating fibonacci and fibonacci_memoize once for '
'{} to determine speed increase'.format(large_fibonacci))
but in a simple case like yours I'd go with the obvious
print(
'Calculating fibonacci and fibonacci_memoize once for',
large_fibonacci,
'to determine speed increase')
[toc] | [prev] | [next] | [standalone]
| From | Cecil Westerhof <Cecil@decebal.nl> |
|---|---|
| Date | 2015-04-29 13:17 +0200 |
| Message-ID | <87y4lbchfm.fsf@Equus.decebal.nl> |
| In reply to | #89536 |
Op Wednesday 29 Apr 2015 11:24 CEST schreef Peter Otten:
> Cecil Westerhof wrote:
>
>> Because I try to keep my lines (well) below 80 characters, I use
>> the following: print('Calculating fibonacci and fibonacci_memoize
>> once for ' + str(large_fibonacci) + ' to determine speed increase')
>>
>> But I was told that using + with strings was bad practice. Is this
>> true?
[...]
>> print('Calculating fibonacci and fibonacci_memoize once for ' +
>> str(large_fibonacci) + ' to determine speed increase')
>
> You could write that as
>
> print('Calculating fibonacci and fibonacci_memoize once for '
> '{} to determine speed increase'.format(large_fibonacci))
>
> but in a simple case like yours I'd go with the obvious
>
> print(
> 'Calculating fibonacci and fibonacci_memoize once for',
> large_fibonacci,
> 'to determine speed increase')
I have gone for this option. Thanks.
--
Cecil Westerhof
Senior Software Engineer
LinkedIn: http://www.linkedin.com/in/cecilwesterhof
[toc] | [prev] | [next] | [standalone]
| From | Cecil Westerhof <Cecil@decebal.nl> |
|---|---|
| Date | 2015-04-29 14:23 +0200 |
| Message-ID | <87egn3ced3.fsf@Equus.decebal.nl> |
| In reply to | #89536 |
Op Wednesday 29 Apr 2015 11:24 CEST schreef Peter Otten:
>> print('Calculating fibonacci and fibonacci_memoize once for ' +
>> str(large_fibonacci) + ' to determine speed increase')
>
> You could write that as
>
> print('Calculating fibonacci and fibonacci_memoize once for '
> '{} to determine speed increase'.format(large_fibonacci))
>
> but in a simple case like yours I'd go with the obvious
>
> print(
> 'Calculating fibonacci and fibonacci_memoize once for',
> large_fibonacci,
> 'to determine speed increase')
In 2.7 that gives:
('Calculating fibonacci and fibonacci_memoize once for', 40, 'to determine speed increase')
So I am going to use the one above it.
--
Cecil Westerhof
Senior Software Engineer
LinkedIn: http://www.linkedin.com/in/cecilwesterhof
[toc] | [prev] | [next] | [standalone]
| From | Chris Angelico <rosuav@gmail.com> |
|---|---|
| Date | 2015-04-29 22:55 +1000 |
| Message-ID | <mailman.85.1430312164.3680.python-list@python.org> |
| In reply to | #89547 |
On Wed, Apr 29, 2015 at 10:23 PM, Cecil Westerhof <Cecil@decebal.nl> wrote:
> In 2.7 that gives:
> ('Calculating fibonacci and fibonacci_memoize once for', 40, 'to determine speed increase')
>
> So I am going to use the one above it.
Start your script with:
from __future__ import print_function
Problem solved! :)
ChrisA
[toc] | [prev] | [next] | [standalone]
| From | Steven D'Aprano <steve+comp.lang.python@pearwood.info> |
|---|---|
| Date | 2015-04-29 23:15 +1000 |
| Message-ID | <5540d95c$0$12982$c3e8da3$5496439d@news.astraweb.com> |
| In reply to | #89530 |
On Wed, 29 Apr 2015 06:29 pm, Cecil Westerhof wrote:
> Because I try to keep my lines (well) below 80 characters, I use the
> following:
> print('Calculating fibonacci and fibonacci_memoize once for ' +
> str(large_fibonacci) + ' to determine speed increase')
That's perfectly fine, but these two alternatives may be better:
print('Calculating fibonacci and fibonacci_memoize once for'
' %s to determine speed increase' % large_fibonacci)
print('Calculating fibonacci and fibonacci_memoize once for'
' {} to determine speed increase'.format(large_fibonacci))
> But I was told that using + with strings was bad practice. Is this
> true? If so, what is the better way to do this?
*Repeated* string concatenation is bad practice. Concatenating one or two
strings is fine. Doing it in a loop to build up a big string is bad mojo.
# Perfectly fine:
message = prefix + "something or other" + suffix
# Okay, but there are better alternatives:
for item in things:
message = "something " + str(item)
print(message)
# This is asking for trouble.
# Use ''.join(substrings) instead.
text = ''
for s in substrings:
text = text + s
The problem with the third one is that it has to make temporary strings
which get thrown away, and that gets very expensive if there are many
substrings. Suppose our substrings are "a", "bb", "ccc", "dddd", "eeeee",
then the temporary strings that are made end up being:
text = "a" # copies one character (maybe?)
text = "abb" # copies three characters
text = "abbccc" # copies six characters
text = "abbcccdddd" # copies ten characters
text = "abbcccddddeeeee" # copies fifteen characters
So to build a string of length 15, Python ends up copying 34 or 35
characters. As the number of substrings increases, the amount of wasted
copying blows out: repeated string concatenation behaves quadratically,
which is very slow.
The tricky part is that Python starting from version 2.3 introduced an
optimization that *may* avoid all those extra copying under *some*
circumstances. So with casual testing, you might not notice the quadratic
behaviour, and see linear behaviour.
Until you rely on it being fast, and it isn't.
--
Steven
[toc] | [prev] | [next] | [standalone]
| From | wxjmfauth@gmail.com |
|---|---|
| Date | 2015-04-29 08:16 -0700 |
| Message-ID | <db3b6548-9f4c-472d-ab47-cf25653eefe6@googlegroups.com> |
| In reply to | #89550 |
Le mercredi 29 avril 2015 15:15:19 UTC+2, Steven D'Aprano a écrit :
> On Wed, 29 Apr 2015 06:29 pm, Cecil Westerhof wrote:
>
> > Because I try to keep my lines (well) below 80 characters, I use the
> > following:
> > print('Calculating fibonacci and fibonacci_memoize once for ' +
> > str(large_fibonacci) + ' to determine speed increase')
>
> That's perfectly fine, but these two alternatives may be better:
>
> print('Calculating fibonacci and fibonacci_memoize once for'
> ' %s to determine speed increase' % large_fibonacci)
>
> print('Calculating fibonacci and fibonacci_memoize once for'
> ' {} to determine speed increase'.format(large_fibonacci))
>
>
>
> > But I was told that using + with strings was bad practice. Is this
> > true? If so, what is the better way to do this?
>
> *Repeated* string concatenation is bad practice. Concatenating one or two
> strings is fine. Doing it in a loop to build up a big string is bad mojo.
>
>
> # Perfectly fine:
> message = prefix + "something or other" + suffix
>
>
> # Okay, but there are better alternatives:
> for item in things:
> message = "something " + str(item)
> print(message)
>
>
> # This is asking for trouble.
> # Use ''.join(substrings) instead.
> text = ''
> for s in substrings:
> text = text + s
>
>
> The problem with the third one is that it has to make temporary strings
> which get thrown away, and that gets very expensive if there are many
> substrings. Suppose our substrings are "a", "bb", "ccc", "dddd", "eeeee",
> then the temporary strings that are made end up being:
>
> text = "a" # copies one character (maybe?)
> text = "abb" # copies three characters
> text = "abbccc" # copies six characters
> text = "abbcccdddd" # copies ten characters
> text = "abbcccddddeeeee" # copies fifteen characters
>
> So to build a string of length 15, Python ends up copying 34 or 35
> characters. As the number of substrings increases, the amount of wasted
> copying blows out: repeated string concatenation behaves quadratically,
> which is very slow.
>
> The tricky part is that Python starting from version 2.3 introduced an
> optimization that *may* avoid all those extra copying under *some*
> circumstances. So with casual testing, you might not notice the quadratic
> behaviour, and see linear behaviour.
>
> Until you rely on it being fast, and it isn't.
>
------
>>> timeit.repeat("''.join(['a', 'a']*1000)")
[35.80853889412481, 36.228519745690505, 36.15569110606032]
>>> timeit.repeat("''.join(['a', '\u3456']*1000)")
[38.041631106320324, 38.031590190424936, 38.020335857707664]
>>>
-----------
>>> timeit.repeat("''.join(['a', 'a']*1000)")
[38.94708620458107, 38.91509429875259, 38.914195952835485]
>>> timeit.repeat("''.join(['a', '\u3456']*1000)")
[58.284382998616366, 58.23354945884603, 58.27770782766095]
>>>
with a memory gain = 0
[toc] | [prev] | [next] | [standalone]
| From | Andrew Berg <aberg010@my.hennepintech.edu> |
|---|---|
| Date | 2015-04-29 06:40 -0500 |
| Message-ID | <mailman.89.1430331235.3680.python-list@python.org> |
| In reply to | #89530 |
On 2015.04.29 04:08, Mark Lawrence wrote:
> On 29/04/2015 09:29, Cecil Westerhof wrote:
>> Because I try to keep my lines (well) below 80 characters, I use the
>> following:
>> print('Calculating fibonacci and fibonacci_memoize once for ' +
>> str(large_fibonacci) + ' to determine speed increase')
>>
>> But I was told that using + with strings was bad practice. Is this
>> true? If so, what is the better way to do this?
>>
>
> It's not bad practice as such, it's simply that performance takes a nose
> dive if you're contatenating large numbers of strings. If performance
> is an issue the recommended way is to write.
>
> ' '.join(strings)
>
I thought it was frowned upon because it's less readable for anything non-trivial.
hero1 = 'Batman'
hero2 = 'Robin'
villain = 'The Joker'
place = 'Gotham City'
sentence = hero1 + " and " + hero2 + " fight " + villain + " in " + place + "."
# doesn't flow as well as:
sentence = "{hero1} and {hero2} fight {villain} in {place}.".format(
hero1=hero1,hero2=hero2,villain=villain,place=place)
[toc] | [prev] | [standalone]
Back to top | Article view | comp.lang.python
csiph-web