Groups | Search | Server Info | Keyboard shortcuts | Login | Register [http] [https] [nntp] [nntps]


Groups > comp.lang.python > #50405 > unrolled thread

Documenting builtin methods

Started byJoshua Landau <joshua@landau.ws>
First post2013-07-11 04:15 +0100
Last post2013-07-12 08:09 +0100
Articles 9 — 4 participants

Back to article view | Back to comp.lang.python


Contents

  Documenting builtin methods Joshua Landau <joshua@landau.ws> - 2013-07-11 04:15 +0100
    Re: Documenting builtin methods alex23 <wuwei23@gmail.com> - 2013-07-11 15:35 +1000
    Re: Documenting builtin methods Steven D'Aprano <steve@pearwood.info> - 2013-07-11 06:06 +0000
      Re: Documenting builtin methods Chris Angelico <rosuav@gmail.com> - 2013-07-11 17:06 +1000
        Re: Documenting builtin methods Steven D'Aprano <steve@pearwood.info> - 2013-07-11 07:15 +0000
          Re: Documenting builtin methods Chris Angelico <rosuav@gmail.com> - 2013-07-11 17:23 +1000
      Re: Documenting builtin methods Joshua Landau <joshua@landau.ws> - 2013-07-12 00:11 +0100
        Re: Documenting builtin methods alex23 <wuwei23@gmail.com> - 2013-07-12 13:43 +1000
          Re: Documenting builtin methods Joshua Landau <joshua@landau.ws> - 2013-07-12 08:09 +0100

#50405 — Documenting builtin methods

FromJoshua Landau <joshua@landau.ws>
Date2013-07-11 04:15 +0100
SubjectDocumenting builtin methods
Message-ID<mailman.4558.1373512579.3114.python-list@python.org>
I have this innocent and simple code:

from collections import deque
exhaust_iter = deque(maxlen=0).extend
exhaust_iter.__doc__ = "Exhaust an iterator efficiently without
caching any of its yielded values."

Obviously it does not work. Is there a way to get it to work simply
and without creating a new scope (which would be a rather inefficient
a way to set documentation, and would hamper introspection)?

How about dropping the "simply" requirement?

[toc] | [next] | [standalone]


#50415

Fromalex23 <wuwei23@gmail.com>
Date2013-07-11 15:35 +1000
Message-ID<krlft3$c66$1@dont-email.me>
In reply to#50405
On 11/07/2013 1:15 PM, Joshua Landau wrote:
> I have this innocent and simple code:
>
> from collections import deque
> exhaust_iter = deque(maxlen=0).extend
> exhaust_iter.__doc__ = "Exhaust an iterator efficiently without
> caching any of its yielded values."
>
> Obviously it does not work. Is there a way to get it to work simply
> and without creating a new scope (which would be a rather inefficient
> a way to set documentation, and would hamper introspection)?

I would just go with the most obvious approach:

     def exhaust_iter(iter):
         """
         Exhaust an iterator efficiently without caching
         any of its yielded values
         """
         deque(maxlen=0).extend(iter)

It's not going to be that inefficient unless you're calling it in a long 
inner loop.

[toc] | [prev] | [next] | [standalone]


#50417

FromSteven D'Aprano <steve@pearwood.info>
Date2013-07-11 06:06 +0000
Message-ID<51de4b6c$0$11094$c3e8da3@news.astraweb.com>
In reply to#50405
On Thu, 11 Jul 2013 04:15:37 +0100, Joshua Landau wrote:

> I have this innocent and simple code:
> 
> from collections import deque
> exhaust_iter = deque(maxlen=0).extend 

At this point, exhaust_iter is another name for the bound instance method 
"extend" of one specific deque instance.

Other implementations may do otherwise[1], but CPython optimizes built-in 
methods and functions. E.g. they have no __dict__ so you can't add 
attributes to them. When you look up exhaust_iter.__doc__, you are 
actually looking up (type(exhaust_iter)).__doc__, which is a descriptor:

py> type(exhaust_iter).__doc__
<attribute '__doc__' of 'builtin_function_or_method' objects>
py> type(type(exhaust_iter).__doc__)
<class 'getset_descriptor'>


Confused yet? Don't worry, you will be...

So, calling exhaust_iter.__doc__:

1) looks up '__doc__' on the class "builtin_function_or_method", not the 
instance;

2) which looks up '__doc__' on the class __dict__:

py> type(exhaust_iter).__dict__['__doc__']
<attribute '__doc__' of 'builtin_function_or_method' objects>

3) This is a descriptor with __get__ and __set__ methods. Because the 
actual method is written in C, you can't access it's internals except via 
the API: even the class __dict__ is not really a dict, it's a wrapper 
around a dict:

py> type(type(exhaust_iter).__dict__)
<class 'mappingproxy'>


Anyway, we have a descriptor that returns the doc string:

py> descriptor = type(exhaust_iter).__doc__
py> descriptor.__get__(exhaust_iter)
'Extend the right side of the deque with elements from the iterable'

My guess is that it is fetching this from some private C member, which 
you can't get to from Python except via the descriptor. And you can't set 
it:

py> descriptor.__set__(exhaust_iter, '')
Traceback (most recent call last):
  File "<stdin>", line 1, in <module>
AttributeError: attribute '__doc__' of 'builtin_function_or_method' 
objects is not writable


which is probably because if you could write to it, it would change the 
docstring for *every* deque. And that would be bad.

If this were a pure-Python method, you could probably bypass the 
descriptor, but it's a C-level built-in. I think you're out of luck.

I think the right solution here is the trivial:

def exhaust(it):
    """Doc string here."""
    deque(maxlen=0).extend(it)


which will be fast enough for all but the tightest inner loops. But if 
you really care about optimizing this:


def factory():
    eatit = deque(maxlen=0).extend
    def exhaust_iter(it):
        """Doc string goes here"""
        eatit(it)
    return exhaust_iter

exhaust_it = factory()
del factory


which will be about as efficient as you can get while still having a 
custom docstring.

But really, I'm having trouble understanding what sort of application 
would have "run an iterator to exhaustion without doing anything with the 
values" as the performance bottleneck :-)



> exhaust_iter.__doc__ = "Exhaust an iterator efficiently [...]"
> 
> Obviously it does not work. 

Even if it did work, it would not do what you hope. Because __doc__ is a 
dunder attribute (double leading and trailing underscores), help() 
currently looks it up on the class, not the instance:


class Spam:
    "Spam spam spam"

x = Spam()
help(x)
=> displays "Spam spam spam"

x.__doc__ = "Yummy spam"
help(x)
=> still displays "Spam spam spam"



> Is there a way to get it to work simply and
> without creating a new scope (which would be a rather inefficient a way
> to set documentation, and would hamper introspection)?
> 
> How about dropping the "simply" requirement?

I don't believe so.





[1] IronPython and Jython both currently do the same thing as CPython, so 
even if this is not explicitly language-defined behaviour, it looks like 
it may be de facto standard behaviour.


-- 
Steven

[toc] | [prev] | [next] | [standalone]


#50421

FromChris Angelico <rosuav@gmail.com>
Date2013-07-11 17:06 +1000
Message-ID<mailman.4571.1373526402.3114.python-list@python.org>
In reply to#50417
On Thu, Jul 11, 2013 at 4:06 PM, Steven D'Aprano <steve@pearwood.info> wrote:
> I think the right solution here is the trivial:
>
> def exhaust(it):
>     """Doc string here."""
>     deque(maxlen=0).extend(it)
>
>
> which will be fast enough for all but the tightest inner loops. But if
> you really care about optimizing this:
>
>
> def factory():
>     eatit = deque(maxlen=0).extend
>     def exhaust_iter(it):
>         """Doc string goes here"""
>         eatit(it)
>     return exhaust_iter
>
> exhaust_it = factory()
> del factory
>
>
> which will be about as efficient as you can get while still having a
> custom docstring.

Surely no reason to go for the factory function:

def exhaust(it,eatit=deque(maxlen=0).extend):
	eatit(it)

ChrisA

[toc] | [prev] | [next] | [standalone]


#50422

FromSteven D'Aprano <steve@pearwood.info>
Date2013-07-11 07:15 +0000
Message-ID<51de5b7b$0$29862$c3e8da3$5496439d@news.astraweb.com>
In reply to#50421
On Thu, 11 Jul 2013 17:06:39 +1000, Chris Angelico wrote:

> On Thu, Jul 11, 2013 at 4:06 PM, Steven D'Aprano <steve@pearwood.info>
> wrote:
>> I think the right solution here is the trivial:
>>
>> def exhaust(it):
>>     """Doc string here."""
>>     deque(maxlen=0).extend(it)
>>
>>
>> which will be fast enough for all but the tightest inner loops. But if
>> you really care about optimizing this:
>>
>>
>> def factory():
>>     eatit = deque(maxlen=0).extend
>>     def exhaust_iter(it):
>>         """Doc string goes here"""
>>         eatit(it)
>>     return exhaust_iter
>>
>> exhaust_it = factory()
>> del factory
>>
>>
>> which will be about as efficient as you can get while still having a
>> custom docstring.
> 
> Surely no reason to go for the factory function:
> 
> def exhaust(it,eatit=deque(maxlen=0).extend):
> 	eatit(it)

Now you have the function accept a second argument, which is public, just 
to hold a purely internal reference to something that you don't want the 
caller to replace.


-- 
Steven

[toc] | [prev] | [next] | [standalone]


#50423

FromChris Angelico <rosuav@gmail.com>
Date2013-07-11 17:23 +1000
Message-ID<mailman.4572.1373527389.3114.python-list@python.org>
In reply to#50422
On Thu, Jul 11, 2013 at 5:15 PM, Steven D'Aprano <steve@pearwood.info> wrote:
> On Thu, 11 Jul 2013 17:06:39 +1000, Chris Angelico wrote:
>
>> On Thu, Jul 11, 2013 at 4:06 PM, Steven D'Aprano <steve@pearwood.info>
>> wrote:
>>> I think the right solution here is the trivial:
>>>
>>> def exhaust(it):
>>>     """Doc string here."""
>>>     deque(maxlen=0).extend(it)
>>>
>>>
>>> which will be fast enough for all but the tightest inner loops. But if
>>> you really care about optimizing this:
>>>
>>>
>>> def factory():
>>>     eatit = deque(maxlen=0).extend
>>>     def exhaust_iter(it):
>>>         """Doc string goes here"""
>>>         eatit(it)
>>>     return exhaust_iter
>>>
>>> exhaust_it = factory()
>>> del factory
>>>
>>>
>>> which will be about as efficient as you can get while still having a
>>> custom docstring.
>>
>> Surely no reason to go for the factory function:
>>
>> def exhaust(it,eatit=deque(maxlen=0).extend):
>>       eatit(it)
>
> Now you have the function accept a second argument, which is public, just
> to hold a purely internal reference to something that you don't want the
> caller to replace.

True, but doesn't that happen fairly often with default args? Usually
it's in the "int=int" notation to snapshot for performance.

ChrisA

[toc] | [prev] | [next] | [standalone]


#50477

FromJoshua Landau <joshua@landau.ws>
Date2013-07-12 00:11 +0100
Message-ID<mailman.4603.1373584321.3114.python-list@python.org>
In reply to#50417
On 11 July 2013 07:06, Steven D'Aprano <steve@pearwood.info> wrote:
>
> But really, I'm having trouble understanding what sort of application
> would have "run an iterator to exhaustion without doing anything with the
> values" as the performance bottleneck :-)

Definitely not this one. Heck, there's even no real reason something
as appropriately-named as "exhaust_iter" needs documentation.

Largely I was asking because I'd felt I'd missed something more
obvious; it seems there was not. I'm also doing some more functools
stuff than usual -- this method also applies to functions generated
with, say, functools.partial I had guessed. Only it does not, as you
show below -- and functools.partial objects allow you to ineffectively
set .__doc__ anyway.

I also feel that:

def factory():
    eatit = deque(maxlen=0).extend
    def exhaust_iter(it):
        """Doc string goes here"""
        eatit(it)
    return exhaust_iter

exhaust_it = factory()
del factory

is a very unobvious way to change a docstring and hides what I'm doing
very effectively. Chris Angelico's method is a fair bit better in this
regard, but I'm not sure it's worth it in this case. One
recommendation with Chris's method is to make it keyword-only (with
"*") which should keep the interface a touch cleaner.

>> exhaust_iter.__doc__ = "Exhaust an iterator efficiently [...]"
>>
>> Obviously it does not work.
>
> Even if it did work, it would not do what you hope. Because __doc__ is a
> dunder attribute (double leading and trailing underscores), help()
> currently looks it up on the class, not the instance:

I'd not considered that, and it seems to have doomed me from the start.

[toc] | [prev] | [next] | [standalone]


#50488

Fromalex23 <wuwei23@gmail.com>
Date2013-07-12 13:43 +1000
Message-ID<krntld$h9o$1@dont-email.me>
In reply to#50477
On 12/07/2013 9:11 AM, Joshua Landau wrote:
> I also feel that:
>
> def factory():
>      eatit = deque(maxlen=0).extend
>      def exhaust_iter(it):
>          """Doc string goes here"""
>          eatit(it)
>      return exhaust_iter
>
> exhaust_it = factory()
> del factory
>
> is a very unobvious way to change a docstring and hides what I'm doing
> very effectively.

My last post seems to have been eaten by either Thunderbird or the 
EternalSeptember servers, but it contained an erroneous claim that the 
straight function version performed as well as the factory one. However, 
in the interim a co-worker has come up with a slightly faster variant:

from functools import partial
from collections import deque

class exhaust_it(partial):
     """custom doc string"""

exhaust_it = exhaust_it(deque(maxlen=0).extend)

Shadowing the class name with the partial instance will ensure it has 
the same name when accessed via help(), and it's a simple way to avoid 
needing to clean up the namespace, as well.

[toc] | [prev] | [next] | [standalone]


#50499

FromJoshua Landau <joshua@landau.ws>
Date2013-07-12 08:09 +0100
Message-ID<mailman.4615.1373613045.3114.python-list@python.org>
In reply to#50488
On 12 July 2013 04:43, alex23 <wuwei23@gmail.com> wrote:
>
> My last post seems to have been eaten by either Thunderbird or the
> EternalSeptember servers, but it contained an erroneous claim that the
> straight function version performed as well as the factory one. However, in
> the interim a co-worker has come up with a slightly faster variant:
>
> from functools import partial
> from collections import deque
>
> class exhaust_it(partial):
>     """custom doc string"""
>
> exhaust_it = exhaust_it(deque(maxlen=0).extend)
>
> Shadowing the class name with the partial instance will ensure it has the
> same name when accessed via help(), and it's a simple way to avoid needing
> to clean up the namespace, as well.

That's beautiful. You could even trivially make a wrapper function:

def wrap_docstring(function, docstring, *, name=None):
    class Wrapper(partial): pass
    Wrapper.__name__ = function.__name__ if name is None else name
    Wrapper.__doc__ = docstring
    return Wrapper(function)

which is no slower. You get great introspection through the "func"
attribute, too :).

Also:

>>> times = time_raw(), time_function(), time_factory(), time_argument_hack(), time_partial()
>>> [round(time/times[0], 1) for time in times]
[1.0, 16.8, 3.1, 3.0, 1.8]

This times almost purely the constant overhead by calling
exhaust_iterabe on an empty iterable. So your friend wins the
premature optimisation test, too.

[toc] | [prev] | [standalone]


Back to top | Article view | comp.lang.python


csiph-web