Groups | Search | Server Info | Keyboard shortcuts | Login | Register [http] [https] [nntp] [nntps]
Groups > comp.lang.python > #50405 > unrolled thread
| Started by | Joshua Landau <joshua@landau.ws> |
|---|---|
| First post | 2013-07-11 04:15 +0100 |
| Last post | 2013-07-12 08:09 +0100 |
| Articles | 9 — 4 participants |
Back to article view | Back to comp.lang.python
Documenting builtin methods Joshua Landau <joshua@landau.ws> - 2013-07-11 04:15 +0100
Re: Documenting builtin methods alex23 <wuwei23@gmail.com> - 2013-07-11 15:35 +1000
Re: Documenting builtin methods Steven D'Aprano <steve@pearwood.info> - 2013-07-11 06:06 +0000
Re: Documenting builtin methods Chris Angelico <rosuav@gmail.com> - 2013-07-11 17:06 +1000
Re: Documenting builtin methods Steven D'Aprano <steve@pearwood.info> - 2013-07-11 07:15 +0000
Re: Documenting builtin methods Chris Angelico <rosuav@gmail.com> - 2013-07-11 17:23 +1000
Re: Documenting builtin methods Joshua Landau <joshua@landau.ws> - 2013-07-12 00:11 +0100
Re: Documenting builtin methods alex23 <wuwei23@gmail.com> - 2013-07-12 13:43 +1000
Re: Documenting builtin methods Joshua Landau <joshua@landau.ws> - 2013-07-12 08:09 +0100
| From | Joshua Landau <joshua@landau.ws> |
|---|---|
| Date | 2013-07-11 04:15 +0100 |
| Subject | Documenting builtin methods |
| Message-ID | <mailman.4558.1373512579.3114.python-list@python.org> |
I have this innocent and simple code: from collections import deque exhaust_iter = deque(maxlen=0).extend exhaust_iter.__doc__ = "Exhaust an iterator efficiently without caching any of its yielded values." Obviously it does not work. Is there a way to get it to work simply and without creating a new scope (which would be a rather inefficient a way to set documentation, and would hamper introspection)? How about dropping the "simply" requirement?
[toc] | [next] | [standalone]
| From | alex23 <wuwei23@gmail.com> |
|---|---|
| Date | 2013-07-11 15:35 +1000 |
| Message-ID | <krlft3$c66$1@dont-email.me> |
| In reply to | #50405 |
On 11/07/2013 1:15 PM, Joshua Landau wrote:
> I have this innocent and simple code:
>
> from collections import deque
> exhaust_iter = deque(maxlen=0).extend
> exhaust_iter.__doc__ = "Exhaust an iterator efficiently without
> caching any of its yielded values."
>
> Obviously it does not work. Is there a way to get it to work simply
> and without creating a new scope (which would be a rather inefficient
> a way to set documentation, and would hamper introspection)?
I would just go with the most obvious approach:
def exhaust_iter(iter):
"""
Exhaust an iterator efficiently without caching
any of its yielded values
"""
deque(maxlen=0).extend(iter)
It's not going to be that inefficient unless you're calling it in a long
inner loop.
[toc] | [prev] | [next] | [standalone]
| From | Steven D'Aprano <steve@pearwood.info> |
|---|---|
| Date | 2013-07-11 06:06 +0000 |
| Message-ID | <51de4b6c$0$11094$c3e8da3@news.astraweb.com> |
| In reply to | #50405 |
On Thu, 11 Jul 2013 04:15:37 +0100, Joshua Landau wrote:
> I have this innocent and simple code:
>
> from collections import deque
> exhaust_iter = deque(maxlen=0).extend
At this point, exhaust_iter is another name for the bound instance method
"extend" of one specific deque instance.
Other implementations may do otherwise[1], but CPython optimizes built-in
methods and functions. E.g. they have no __dict__ so you can't add
attributes to them. When you look up exhaust_iter.__doc__, you are
actually looking up (type(exhaust_iter)).__doc__, which is a descriptor:
py> type(exhaust_iter).__doc__
<attribute '__doc__' of 'builtin_function_or_method' objects>
py> type(type(exhaust_iter).__doc__)
<class 'getset_descriptor'>
Confused yet? Don't worry, you will be...
So, calling exhaust_iter.__doc__:
1) looks up '__doc__' on the class "builtin_function_or_method", not the
instance;
2) which looks up '__doc__' on the class __dict__:
py> type(exhaust_iter).__dict__['__doc__']
<attribute '__doc__' of 'builtin_function_or_method' objects>
3) This is a descriptor with __get__ and __set__ methods. Because the
actual method is written in C, you can't access it's internals except via
the API: even the class __dict__ is not really a dict, it's a wrapper
around a dict:
py> type(type(exhaust_iter).__dict__)
<class 'mappingproxy'>
Anyway, we have a descriptor that returns the doc string:
py> descriptor = type(exhaust_iter).__doc__
py> descriptor.__get__(exhaust_iter)
'Extend the right side of the deque with elements from the iterable'
My guess is that it is fetching this from some private C member, which
you can't get to from Python except via the descriptor. And you can't set
it:
py> descriptor.__set__(exhaust_iter, '')
Traceback (most recent call last):
File "<stdin>", line 1, in <module>
AttributeError: attribute '__doc__' of 'builtin_function_or_method'
objects is not writable
which is probably because if you could write to it, it would change the
docstring for *every* deque. And that would be bad.
If this were a pure-Python method, you could probably bypass the
descriptor, but it's a C-level built-in. I think you're out of luck.
I think the right solution here is the trivial:
def exhaust(it):
"""Doc string here."""
deque(maxlen=0).extend(it)
which will be fast enough for all but the tightest inner loops. But if
you really care about optimizing this:
def factory():
eatit = deque(maxlen=0).extend
def exhaust_iter(it):
"""Doc string goes here"""
eatit(it)
return exhaust_iter
exhaust_it = factory()
del factory
which will be about as efficient as you can get while still having a
custom docstring.
But really, I'm having trouble understanding what sort of application
would have "run an iterator to exhaustion without doing anything with the
values" as the performance bottleneck :-)
> exhaust_iter.__doc__ = "Exhaust an iterator efficiently [...]"
>
> Obviously it does not work.
Even if it did work, it would not do what you hope. Because __doc__ is a
dunder attribute (double leading and trailing underscores), help()
currently looks it up on the class, not the instance:
class Spam:
"Spam spam spam"
x = Spam()
help(x)
=> displays "Spam spam spam"
x.__doc__ = "Yummy spam"
help(x)
=> still displays "Spam spam spam"
> Is there a way to get it to work simply and
> without creating a new scope (which would be a rather inefficient a way
> to set documentation, and would hamper introspection)?
>
> How about dropping the "simply" requirement?
I don't believe so.
[1] IronPython and Jython both currently do the same thing as CPython, so
even if this is not explicitly language-defined behaviour, it looks like
it may be de facto standard behaviour.
--
Steven
[toc] | [prev] | [next] | [standalone]
| From | Chris Angelico <rosuav@gmail.com> |
|---|---|
| Date | 2013-07-11 17:06 +1000 |
| Message-ID | <mailman.4571.1373526402.3114.python-list@python.org> |
| In reply to | #50417 |
On Thu, Jul 11, 2013 at 4:06 PM, Steven D'Aprano <steve@pearwood.info> wrote: > I think the right solution here is the trivial: > > def exhaust(it): > """Doc string here.""" > deque(maxlen=0).extend(it) > > > which will be fast enough for all but the tightest inner loops. But if > you really care about optimizing this: > > > def factory(): > eatit = deque(maxlen=0).extend > def exhaust_iter(it): > """Doc string goes here""" > eatit(it) > return exhaust_iter > > exhaust_it = factory() > del factory > > > which will be about as efficient as you can get while still having a > custom docstring. Surely no reason to go for the factory function: def exhaust(it,eatit=deque(maxlen=0).extend): eatit(it) ChrisA
[toc] | [prev] | [next] | [standalone]
| From | Steven D'Aprano <steve@pearwood.info> |
|---|---|
| Date | 2013-07-11 07:15 +0000 |
| Message-ID | <51de5b7b$0$29862$c3e8da3$5496439d@news.astraweb.com> |
| In reply to | #50421 |
On Thu, 11 Jul 2013 17:06:39 +1000, Chris Angelico wrote: > On Thu, Jul 11, 2013 at 4:06 PM, Steven D'Aprano <steve@pearwood.info> > wrote: >> I think the right solution here is the trivial: >> >> def exhaust(it): >> """Doc string here.""" >> deque(maxlen=0).extend(it) >> >> >> which will be fast enough for all but the tightest inner loops. But if >> you really care about optimizing this: >> >> >> def factory(): >> eatit = deque(maxlen=0).extend >> def exhaust_iter(it): >> """Doc string goes here""" >> eatit(it) >> return exhaust_iter >> >> exhaust_it = factory() >> del factory >> >> >> which will be about as efficient as you can get while still having a >> custom docstring. > > Surely no reason to go for the factory function: > > def exhaust(it,eatit=deque(maxlen=0).extend): > eatit(it) Now you have the function accept a second argument, which is public, just to hold a purely internal reference to something that you don't want the caller to replace. -- Steven
[toc] | [prev] | [next] | [standalone]
| From | Chris Angelico <rosuav@gmail.com> |
|---|---|
| Date | 2013-07-11 17:23 +1000 |
| Message-ID | <mailman.4572.1373527389.3114.python-list@python.org> |
| In reply to | #50422 |
On Thu, Jul 11, 2013 at 5:15 PM, Steven D'Aprano <steve@pearwood.info> wrote: > On Thu, 11 Jul 2013 17:06:39 +1000, Chris Angelico wrote: > >> On Thu, Jul 11, 2013 at 4:06 PM, Steven D'Aprano <steve@pearwood.info> >> wrote: >>> I think the right solution here is the trivial: >>> >>> def exhaust(it): >>> """Doc string here.""" >>> deque(maxlen=0).extend(it) >>> >>> >>> which will be fast enough for all but the tightest inner loops. But if >>> you really care about optimizing this: >>> >>> >>> def factory(): >>> eatit = deque(maxlen=0).extend >>> def exhaust_iter(it): >>> """Doc string goes here""" >>> eatit(it) >>> return exhaust_iter >>> >>> exhaust_it = factory() >>> del factory >>> >>> >>> which will be about as efficient as you can get while still having a >>> custom docstring. >> >> Surely no reason to go for the factory function: >> >> def exhaust(it,eatit=deque(maxlen=0).extend): >> eatit(it) > > Now you have the function accept a second argument, which is public, just > to hold a purely internal reference to something that you don't want the > caller to replace. True, but doesn't that happen fairly often with default args? Usually it's in the "int=int" notation to snapshot for performance. ChrisA
[toc] | [prev] | [next] | [standalone]
| From | Joshua Landau <joshua@landau.ws> |
|---|---|
| Date | 2013-07-12 00:11 +0100 |
| Message-ID | <mailman.4603.1373584321.3114.python-list@python.org> |
| In reply to | #50417 |
On 11 July 2013 07:06, Steven D'Aprano <steve@pearwood.info> wrote:
>
> But really, I'm having trouble understanding what sort of application
> would have "run an iterator to exhaustion without doing anything with the
> values" as the performance bottleneck :-)
Definitely not this one. Heck, there's even no real reason something
as appropriately-named as "exhaust_iter" needs documentation.
Largely I was asking because I'd felt I'd missed something more
obvious; it seems there was not. I'm also doing some more functools
stuff than usual -- this method also applies to functions generated
with, say, functools.partial I had guessed. Only it does not, as you
show below -- and functools.partial objects allow you to ineffectively
set .__doc__ anyway.
I also feel that:
def factory():
eatit = deque(maxlen=0).extend
def exhaust_iter(it):
"""Doc string goes here"""
eatit(it)
return exhaust_iter
exhaust_it = factory()
del factory
is a very unobvious way to change a docstring and hides what I'm doing
very effectively. Chris Angelico's method is a fair bit better in this
regard, but I'm not sure it's worth it in this case. One
recommendation with Chris's method is to make it keyword-only (with
"*") which should keep the interface a touch cleaner.
>> exhaust_iter.__doc__ = "Exhaust an iterator efficiently [...]"
>>
>> Obviously it does not work.
>
> Even if it did work, it would not do what you hope. Because __doc__ is a
> dunder attribute (double leading and trailing underscores), help()
> currently looks it up on the class, not the instance:
I'd not considered that, and it seems to have doomed me from the start.
[toc] | [prev] | [next] | [standalone]
| From | alex23 <wuwei23@gmail.com> |
|---|---|
| Date | 2013-07-12 13:43 +1000 |
| Message-ID | <krntld$h9o$1@dont-email.me> |
| In reply to | #50477 |
On 12/07/2013 9:11 AM, Joshua Landau wrote:
> I also feel that:
>
> def factory():
> eatit = deque(maxlen=0).extend
> def exhaust_iter(it):
> """Doc string goes here"""
> eatit(it)
> return exhaust_iter
>
> exhaust_it = factory()
> del factory
>
> is a very unobvious way to change a docstring and hides what I'm doing
> very effectively.
My last post seems to have been eaten by either Thunderbird or the
EternalSeptember servers, but it contained an erroneous claim that the
straight function version performed as well as the factory one. However,
in the interim a co-worker has come up with a slightly faster variant:
from functools import partial
from collections import deque
class exhaust_it(partial):
"""custom doc string"""
exhaust_it = exhaust_it(deque(maxlen=0).extend)
Shadowing the class name with the partial instance will ensure it has
the same name when accessed via help(), and it's a simple way to avoid
needing to clean up the namespace, as well.
[toc] | [prev] | [next] | [standalone]
| From | Joshua Landau <joshua@landau.ws> |
|---|---|
| Date | 2013-07-12 08:09 +0100 |
| Message-ID | <mailman.4615.1373613045.3114.python-list@python.org> |
| In reply to | #50488 |
On 12 July 2013 04:43, alex23 <wuwei23@gmail.com> wrote:
>
> My last post seems to have been eaten by either Thunderbird or the
> EternalSeptember servers, but it contained an erroneous claim that the
> straight function version performed as well as the factory one. However, in
> the interim a co-worker has come up with a slightly faster variant:
>
> from functools import partial
> from collections import deque
>
> class exhaust_it(partial):
> """custom doc string"""
>
> exhaust_it = exhaust_it(deque(maxlen=0).extend)
>
> Shadowing the class name with the partial instance will ensure it has the
> same name when accessed via help(), and it's a simple way to avoid needing
> to clean up the namespace, as well.
That's beautiful. You could even trivially make a wrapper function:
def wrap_docstring(function, docstring, *, name=None):
class Wrapper(partial): pass
Wrapper.__name__ = function.__name__ if name is None else name
Wrapper.__doc__ = docstring
return Wrapper(function)
which is no slower. You get great introspection through the "func"
attribute, too :).
Also:
>>> times = time_raw(), time_function(), time_factory(), time_argument_hack(), time_partial()
>>> [round(time/times[0], 1) for time in times]
[1.0, 16.8, 3.1, 3.0, 1.8]
This times almost purely the constant overhead by calling
exhaust_iterabe on an empty iterable. So your friend wins the
premature optimisation test, too.
[toc] | [prev] | [standalone]
Back to top | Article view | comp.lang.python
csiph-web