Path: csiph.com!v102.xanadu-bbs.net!xanadu-bbs.net!feeder.erje.net!eu.feeder.erje.net!xlned.com!feeder3.xlned.com!news2.euro.net!newsgate.cistron.nl!newsgate.news.xs4all.nl!post.news.xs4all.nl!not-for-mail Return-Path: X-Original-To: python-list@python.org Delivered-To: python-list@mail.python.org X-Spam-Status: OK 0.000 X-Spam-Evidence: '*H*': 1.00; '*S*': 0.00; 'python.': 0.02; 'scripts': 0.03; 'interpreter': 0.05; 'say,': 0.05; '(python': 0.07; 'alignment': 0.07; 'cache': 0.07; 'debug': 0.07; 'debugging': 0.07; 'desirable.': 0.07; 'sys': 0.07; 'string': 0.09; 'branching': 0.09; 'formatting': 0.09; 'literal': 0.09; 'oh,': 0.09; 'optimizing': 0.09; 'statements': 0.09; 'variable,': 0.09; 'cc:addr:python-list': 0.11; 'python': 0.11; 'def': 0.12; 'bug': 0.12; 'language,': 0.12; '*you*': 0.16; 'adjusted': 0.16; 'btw:': 0.16; 'cc:name:python list': 0.16; 'debugging,': 0.16; 'eliminating': 0.16; 'example)': 0.16; 'fiddle': 0.16; 'fine.': 0.16; 'hmm.': 0.16; 'owner:': 0.16; 'sad': 0.16; 'substituted': 0.16; 'substitution': 0.16; 'time.time()': 0.16; '\xa0print': 0.16; 'wrote:': 0.18; 'module': 0.19; 'passing': 0.19; 'slightly': 0.19; 'seems': 0.21; '8bit%:5': 0.22; 'command': 0.22; 'machine': 0.22; '>>>': 0.22; 'memory': 0.22; 'import': 0.22; 'email addr:gmail.com>': 0.22; 'portion': 0.22; 'cc:addr:python.org': 0.22; 'print': 0.22; '>>>': 0.24; 'new,': 0.24; 'refers': 0.24; 'skip:\xa0 20': 0.24; 'subject:problem': 0.24; '\xa0if': 0.24; '---': 0.24; 'cc:2**0': 0.24; 'possibly': 0.26; 'shown': 0.26; 'second': 0.26; 'post': 0.26; 'header:In-Reply-To:1': 0.27; 'tried': 0.27; 'function': 0.29; 'testing': 0.29; 'chris': 0.29; 'said,': 0.30; 'statement': 0.30; 'message-id:@mail.gmail.com': 0.30; "i'm": 0.30; 'code': 0.31; 'comments': 0.31; 'lines': 0.31; "skip:' 10": 0.31; 'that.': 0.31; 'constant': 0.31; 'overhead': 0.31; 'prints': 0.31; 'skip:r 60': 0.31; 'yields': 0.31; 'another': 0.32; 'quite': 0.32; 'bus': 0.33; 'call.': 0.33; 'skip:t 40': 0.33; 'comment': 0.34; 'skip:d 20': 0.34; "i'd": 0.34; 'could': 0.34; 'subject:with': 0.35; 'agree': 0.35; 'something': 0.35; 'anybody': 0.35; 'test': 0.35; 'but': 0.35; 'received:google.com': 0.35; 'add': 0.35; 'there': 0.35; 'really': 0.36; 'false': 0.36; 'instances': 0.36; 'doing': 0.36; 'half': 0.37; 'seconds': 0.37; 'application': 0.37; 'too': 0.37; 'step': 0.37; 'easiest': 0.38; 'generic': 0.38; 'jason': 0.38; 'pm,': 0.38; 'rather': 0.38; 'first': 0.61; 'you.': 0.62; 'times': 0.62; "you've": 0.63; 'subject:The': 0.64; 'become': 0.64; 'more': 0.64; 'different': 0.65; 'to:addr:gmail.com': 0.65; 'p.s.': 0.66; 'minutes': 0.67; 'believe': 0.68; 'atm': 0.68; 'optimized': 0.68; 'below:': 0.68; 'targeted': 0.69; 'reliance': 0.72; 'million': 0.74; 'hey,': 0.75; "'dead'": 0.84; 'calls,': 0.84; 'disappears': 0.84; 'experiment': 0.84; 'improvement': 0.84; 'misses': 0.84; 'timings': 0.84; 'upset': 0.84; 'doubling': 0.91; 'increases': 0.91; 'pipeline': 0.91; 'bandwidth': 0.96; '2013': 0.98 DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=gmail.com; s=20120113; h=mime-version:in-reply-to:references:date:message-id:subject:from:to :cc:content-type; bh=i8W4qEvTVYZy5+6RlQVYGA0898d6IP3ynYtV0EoXC1E=; b=WvpnWaORGImLHDqTroHI5XXUA8Rm0GokkpQyjIFaysC7Ii90q9hLQtIYv3YqeQn+7g 6414UCAATn7su6i8MYKdYstbb+nuchVPT6y6lTCmGWP2jy2p79mPdSz5EYJd/jimowkn L9Ebb3FWuyn5aqJpKsn6IACfX+F58gFUhPLOtnRKFg6cleaYj3kW4wG58SnGQ/VXDl1H 8LWs1qbpIeJcZFSLxTXslfVadT7Usq35ZHkDMWY8lFrDNVBSTj9ZX0oSYc54I+G9EUsk JPM8TSQsQNWoUC5+ehB26RJFOwkEqggGD6UZ3PlWE+yWkw71iY68fz+04NIaUUroPW8C Ojkg== MIME-Version: 1.0 X-Received: by 10.50.127.211 with SMTP id ni19mr6585280igb.93.1370218582051; Sun, 02 Jun 2013 17:16:22 -0700 (PDT) In-Reply-To: References: <687dea63-84da-4c45-9366-cb5a10665d1f@googlegroups.com> Date: Sun, 2 Jun 2013 20:16:21 -0400 Subject: Re: PyWart: The problem with "print" From: Jason Swails To: Chris Angelico Content-Type: multipart/alternative; boundary=089e013c5aeed9121f04de34df50 Cc: python list X-BeenThere: python-list@python.org X-Mailman-Version: 2.1.15 Precedence: list List-Id: General discussion list for the Python programming language List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , Newsgroups: comp.lang.python Message-ID: Lines: 213 NNTP-Posting-Host: 2001:888:2000:d::a6 X-Trace: 1370218591 news.xs4all.nl 15891 [2001:888:2000:d::a6]:53443 X-Complaints-To: abuse@xs4all.nl Xref: csiph.com comp.lang.python:46741 --089e013c5aeed9121f04de34df50 Content-Type: text/plain; charset=ISO-8859-1 On Sun, Jun 2, 2013 at 1:20 PM, Chris Angelico wrote: > > Hmm. Could be costly. Hey, you know, Python has something for testing that. > > >>> timeit.timeit('debugprint("asdf")','def debugprint(*args):\n\tif not > DEBUG: return\n\tprint(*args)\nDEBUG=False',number=1000000) > 0.5838018519113444 > > That's roughly half a second for a million calls to debugprint(). > That's a 580ns cost per call. Rather than fiddle with the language, > I'd rather just take this cost. Oh, and there's another way, too: If > you make the DEBUG flag have effect only on startup, you could write > your module thus: > This is a slightly contrived demonstration... The time lost in a function call is not just the time it takes to execute that function. If it consistently increases the frequency of cache misses then the cost is much greater -- possibly by orders of magnitude if the application is truly bound by the bandwidth of the memory bus and the CPU pipeline is almost always saturated. I'm actually with RR in terms of eliminating the overhead involved with 'dead' function calls, since there are instances when optimizing in Python is desirable. I actually recently adjusted one of my own scripts to eliminate branching and improve data layout to achieve a 1000-fold improvement in efficiency (~45 minutes to 0.42 s. for one example) --- all in pure Python. The first approach was unacceptable, the second is fine. For comparison, if I add a 'deactivated' debugprint call into the inner loop (executed 243K times in this particular test), then the time of the double-loop step that I optimized takes 0.73 seconds (nearly doubling the duration of the whole step). The whole program is too large to post here, but the relevant code portion is shown below: i = 0 begin = time.time() for mol in owner: for atm in mol: blankfunc("Print i %d" % i) new_atoms[i] = self.atom_list[atm] i += 1 self.atom_list = new_atoms print "Done in %f seconds." % (time.time() - begin) from another module: DEBUG = False [snip] def blankfunc(instring): if DEBUG: print instring Also, you're often not passing a constant literal to the debug print -- you're doing some type of string formatting or substitution if you're really inspecting the value of a particular variable, and this also takes time. In the test I gave the timings for above, I passed a string the counter substituted to the 'dead' debug function. Copy-and-pasting your timeit experiment on my machine yields different timings (Python 2.7): >>> import sys >>> timeit.timeit('debugprint("asdf")','def debugprint(*args):\n\tif not DEBUG: return\n\tsys.stdout.write(*args)\nDEBUG=False',number=1000000) 0.15644001960754395 which is ~150 ns/function call, versus ~1300 ns/function call. And there may be even more extreme examples, this is just one I was able to cook up quickly. This is, I'm sad to say, where my alignment with RR ends. While I use prints in debugging all the time, it can also become a 'crutch', just like reliance on valgrind or gdb. If you don't believe me, you've never hit a bug that 'magically' disappears when you add a debugging print statement ;-). The easiest way to eliminate these 'dead' calls is to simply comment-out the print call, but I would be quite upset if the interpreter tried to outsmart me and do it automagically as RR seems to be suggesting. And if you're actually debugging, then you typically only add a few targeted print statements -- not too hard to comment-out. If it is, and you're really that lazy, then by all means add a new, greppable function call and use a sed command to comment those lines out for you. BTW: *you* in the above message refers to a generic person -- none of my comments were aimed at anybody in particular All the best, Jason P.S. All that said, I would agree with ChrisA's suggestion that the overhead is negligible is most cases... --089e013c5aeed9121f04de34df50 Content-Type: text/html; charset=ISO-8859-1 Content-Transfer-Encoding: quoted-printable


On Su= n, Jun 2, 2013 at 1:20 PM, Chris Angelico <rosuav@gmail.com> = wrote:

Hmm. Could be costly. Hey, you= know, Python has something for testing that.

>>> timeit.timeit('debugprint("asdf")','def = debugprint(*args):\n\tif not DEBUG: return\n\tprint(*args)\nDEBUG=3DFalse&#= 39;,number=3D1000000)
0.5838018519113444

That's roughly half a second for a million calls to debugprint().
That's a 580ns cost per call. Rather than fiddle with the language,
I'd rather just take this cost. Oh, and there's another way, too: I= f
you make the DEBUG flag have effect only on startup, you could write
your module thus:

This is a slightly contrived demonstration...= The time lost in a function call is not just the time it takes to execute = that function. =A0If it consistently increases the frequency of cache misse= s then the cost is much greater -- possibly by orders of magnitude if the a= pplication is truly bound by the bandwidth of the memory bus and the CPU pi= peline is almost always saturated.

I'm actually with RR in = terms of eliminating the overhead involved with 'dead' function cal= ls, since there are instances when optimizing in Python is desirable. =A0I = actually recently adjusted one of my own scripts to eliminate branching and= improve data layout to achieve a 1000-fold improvement in efficiency (~45 = minutes to 0.42 s. for one example) --- all in pure Python. =A0The first ap= proach was unacceptable, the second is fine. =A0For comparison, if I add a = 'deactivated' debugprint call into the inner loop (executed 243K ti= mes in this particular test), then the time of the double-loop step that I = optimized takes 0.73 seconds (nearly doubling the duration of the whole ste= p). =A0The whole program is too large to post here, but the relevant code p= ortion is shown below:

= =A0 =A0 =A0 =A0 =A0i =3D 0
=A0 =A0 =A0 =A0 =A0begin =3D time.time()
=A0 =A0 =A0 =A0 =A0for= mol in owner:
=A0 =A0 =A0 =A0 =A0 =A0 for atm in mol:
=A0 =A0 =A0 =A0 =A0 =A0 =A0 =A0blankfunc= ("Print i %d" % i)
=A0 =A0 =A0 =A0 =A0 = =A0 =A0 =A0new_atoms[i] =3D self.atom_list[atm]
=A0 =A0 =A0 =A0 =A0 =A0 =A0 =A0i +=3D= 1
=A0 =A0 =A0 =A0 =A0self.atom_list =3D new_atoms
=A0 =A0 =A0 = =A0 =A0print "Done in %f seconds." % (time.time() - begin)=

from another module:

DEBUG =3D False

[snip]

def blankfunc(instring):
=A0 =A0if DEBUG:
=A0 =A0 =A0 pr= int instring

Also, you're often not passing a constant literal to the= debug print -- you're doing some type of string formatting or substitu= tion if you're really inspecting the value of a particular variable, an= d this also takes time. =A0In the test I gave the timings for above, I pass= ed a string the counter substituted to the 'dead' debug function. = =A0Copy-and-pasting your timeit experiment on my machine yields different t= imings (Python 2.7):

>>> import sy= s
>>> timeit.timeit('debugprint("as= df")','def debugprint(*args):\n\tif not DEBUG: return\n\tsys.s= tdout.write(*args)\nDEBUG=3DFalse',number=3D1000000)
0.15644001960754395

which is ~150 ns/function call, versus ~1300 ns/function call. =A0An= d there may be even more extreme examples, this is just one I was able to c= ook up quickly.

This is, I'm sad to say, where my align= ment with RR ends. =A0While I use prints in debugging all the time, it can = also become a 'crutch', just like reliance on valgrind or gdb. =A0I= f you don't believe me, you've never hit a bug that 'magically&= #39; disappears when you add a debugging print statement ;-).

The easiest way to eliminate these 'dea= d' calls is to simply comment-out the print call, but I would be quite = upset if the interpreter tried to outsmart me and do it automagically as RR= seems to be suggesting. =A0And if you're actually debugging, then you = typically only add a few targeted print statements -- not too hard to comme= nt-out. =A0If it is, and you're really that lazy, then by all means add= a new, greppable function call and use a sed command to comment those line= s out for you.

BTW: *you* in the above message refers to a= generic person -- none of my comments were aimed at anybody in particular<= /div>

All the best,
Jason

P.S. All that said, I would agree with Chri= sA's suggestion that the overhead is negligible is most cases...
<= /div>
--089e013c5aeed9121f04de34df50--