Groups | Search | Server Info | Keyboard shortcuts | Login | Register [http] [https] [nntp] [nntps]
Groups > comp.lang.python > #40696 > unrolled thread
| Started by | Roy Smith <roy@panix.com> |
|---|---|
| First post | 2013-03-06 22:20 -0500 |
| Last post | 2013-03-10 19:50 -0400 |
| Articles | 20 on this page of 21 — 10 participants |
Back to article view | Back to comp.lang.python
Interesting list() un-optimization Roy Smith <roy@panix.com> - 2013-03-06 22:20 -0500
Re: Interesting list() un-optimization Dave Angel <davea@davea.name> - 2013-03-06 22:38 -0500
Re: Interesting list() un-optimization Tim Chase <python.list@tim.thechases.com> - 2013-03-06 21:57 -0600
Re: Interesting list() un-optimization Kev Dwyer <kevin.p.dwyer@gmail.com> - 2013-03-07 07:31 +0000
Re: Interesting list() un-optimization Wolfgang Maier <wolfgang.maier@biologie.uni-freiburg.de> - 2013-03-07 11:22 +0000
Re: Interesting list() un-optimization Ian Kelly <ian.g.kelly@gmail.com> - 2013-03-07 09:00 -0700
Re: Interesting list() un-optimization Christian Heimes <christian@python.org> - 2013-03-07 17:20 +0100
Re: Interesting list() un-optimization Ian Kelly <ian.g.kelly@gmail.com> - 2013-03-07 10:31 -0700
Re: Interesting list() un-optimization Stefan Behnel <stefan_ml@behnel.de> - 2013-03-07 20:19 +0100
Re: Interesting list() un-optimization Ian Kelly <ian.g.kelly@gmail.com> - 2013-03-07 13:26 -0700
Re: Interesting list() un-optimization Terry Reedy <tjreedy@udel.edu> - 2013-03-07 15:29 -0500
Re: Interesting list() un-optimization Terry Reedy <tjreedy@udel.edu> - 2013-03-07 15:34 -0500
Re: Interesting list() un-optimization Wolfgang Maier <wolfgang.maier@biologie.uni-freiburg.de> - 2013-03-07 20:41 +0000
Re: Interesting list() un-optimization Terry Reedy <tjreedy@udel.edu> - 2013-03-07 17:53 -0500
Re: Interesting list() un-optimization Steven D'Aprano <steve+comp.lang.python@pearwood.info> - 2013-03-08 17:59 +0000
Re: Interesting list() un-optimization Roy Smith <roy@panix.com> - 2013-03-08 13:45 -0500
Re: Interesting list() un-optimization Roy Smith <roy@panix.com> - 2013-03-10 09:05 -0400
Re: Interesting list() un-optimization Terry Reedy <tjreedy@udel.edu> - 2013-03-10 17:39 -0400
Re: Interesting list() un-optimization Roy Smith <roy@panix.com> - 2013-03-10 18:34 -0400
Re: Interesting list() un-optimization Steven D'Aprano <steve+comp.lang.python@pearwood.info> - 2013-03-10 23:35 +0000
Re: Interesting list() un-optimization Roy Smith <roy@panix.com> - 2013-03-10 19:50 -0400
Page 1 of 2 [1] 2 Next page →
| From | Roy Smith <roy@panix.com> |
|---|---|
| Date | 2013-03-06 22:20 -0500 |
| Subject | Interesting list() un-optimization |
| Message-ID | <roy-572C99.22201106032013@70-1-84-166.pools.spcsdns.net> |
I stumbled upon an interesting bit of trivia concerning lists and list
comprehensions today.
We use mongoengine as a database model layer. A mongoengine query
returns an iterable object called a QuerySet. The "obvious" way to
create a list of the query results would be:
my_objects = list(my_query_set)
and, indeed, that works. But, then I found this code:
my_objects = [obj for obj in my_query_set]
which seemed a bit silly. I called over the guy who wrote it and asked
him why he didn't just write it using list(). I was astounded when it
turned out there's a good reason!
Apparently, list() has an "optimization" where it calls len() on its
argument to try and discover the number of items it's going to put into
the list. Presumably, list() uses this information to pre-allocate the
right amount of memory the first time, without any resizing. If len()
fails, it falls back to just iterating and resizing as needed.
Normally, this would be a win.
The problem is, QuerySets have a __len__() method. Calling it is a lot
faster than iterating over the whole query set and counting the items,
but it does result in an additional database query, which is a lot
slower than the list resizing! Writing the code as a list comprehension
prevents list() from trying to optimize when it shouldn't!
[toc] | [next] | [standalone]
| From | Dave Angel <davea@davea.name> |
|---|---|
| Date | 2013-03-06 22:38 -0500 |
| Message-ID | <mailman.2975.1362627544.2939.python-list@python.org> |
| In reply to | #40696 |
On 03/06/2013 10:20 PM, Roy Smith wrote: > I stumbled upon an interesting bit of trivia concerning lists and list > comprehensions today. > > We use mongoengine as a database model layer. A mongoengine query > returns an iterable object called a QuerySet. The "obvious" way to > create a list of the query results would be: > > my_objects = list(my_query_set) > > and, indeed, that works. But, then I found this code: > > my_objects = [obj for obj in my_query_set] > > which seemed a bit silly. I called over the guy who wrote it and asked > him why he didn't just write it using list(). I was astounded when it > turned out there's a good reason! > > Apparently, list() has an "optimization" where it calls len() on its > argument to try and discover the number of items it's going to put into > the list. Presumably, list() uses this information to pre-allocate the > right amount of memory the first time, without any resizing. If len() > fails, it falls back to just iterating and resizing as needed. > Normally, this would be a win. > > The problem is, QuerySets have a __len__() method. Calling it is a lot > faster than iterating over the whole query set and counting the items, > but it does result in an additional database query, which is a lot > slower than the list resizing! Writing the code as a list comprehension > prevents list() from trying to optimize when it shouldn't! > That is very interesting. list() assumes the __len__() method would be very quick. Perhaps list() should take an optional second argument that specifies the initial length to allocate. That way code that either doesn't want __len__() to be used, or that already knows a reasonable number to use, can supply the value to preallocate. -- DaveA
[toc] | [prev] | [next] | [standalone]
| From | Tim Chase <python.list@tim.thechases.com> |
|---|---|
| Date | 2013-03-06 21:57 -0600 |
| Message-ID | <mailman.2976.1362628560.2939.python-list@python.org> |
| In reply to | #40696 |
On 2013-03-06 22:20, Roy Smith wrote:
> I stumbled upon an interesting bit of trivia concerning lists and
> list comprehensions today.
I agree with Dave Angel that this is interesting. A little testing
shows that this can be rewritten as
my_objects = list(iter(my_query_set))
which seems to then skip the costly __len__ call. Performance geeks
are welcome to time it against the list-comprehension version :-)
-tkc
class Foo(object):
def __init__(self):
self.items = range(10)
def __iter__(self):
return iter(self.items)
def __len__(self):
print "Calling costly __len__"
return len(self.items)
print "Ensuring we can iterate over it:"
for x in Foo():
print x
print "\nJust call list():"
lst = list(Foo())
print lst
print "\nCall list(iter())"
lst = list(iter(Foo()))
print lst
[toc] | [prev] | [next] | [standalone]
| From | Kev Dwyer <kevin.p.dwyer@gmail.com> |
|---|---|
| Date | 2013-03-07 07:31 +0000 |
| Message-ID | <mailman.2985.1362641500.2939.python-list@python.org> |
| In reply to | #40696 |
Roy Smith wrote: > I stumbled upon an interesting bit of trivia concerning lists and list > comprehensions today. > > We use mongoengine as a database model layer. A mongoengine query > returns an iterable object called a QuerySet. The "obvious" way to > create a list of the query results would be: > > my_objects = list(my_query_set) > > and, indeed, that works. But, then I found this code: > > my_objects = [obj for obj in my_query_set] > > which seemed a bit silly. I called over the guy who wrote it and asked > him why he didn't just write it using list(). I was astounded when it > turned out there's a good reason! > > Apparently, list() has an "optimization" where it calls len() on its > argument to try and discover the number of items it's going to put into > the list. Presumably, list() uses this information to pre-allocate the > right amount of memory the first time, without any resizing. If len() > fails, it falls back to just iterating and resizing as needed. > Normally, this would be a win. > > The problem is, QuerySets have a __len__() method. Calling it is a lot > faster than iterating over the whole query set and counting the items, > but it does result in an additional database query, which is a lot > slower than the list resizing! Writing the code as a list comprehension > prevents list() from trying to optimize when it shouldn't! Interesting discovery. Yet isn't this as much an issue with the mongoengine library as with list()? Queryset.count() can be called if the "length" of a resultset needs to be retrieved, so the __len__() methd seems redundant. And given that it's not unheard of to call list() on iterables, perhaps the library designers should either optimise the __len__() method, or document the performance implications of calling list on the queryset? Anyway, thanks for this thought-provoking post. Cheers, Kev
[toc] | [prev] | [next] | [standalone]
| From | Wolfgang Maier <wolfgang.maier@biologie.uni-freiburg.de> |
|---|---|
| Date | 2013-03-07 11:22 +0000 |
| Message-ID | <mailman.3008.1362655396.2939.python-list@python.org> |
| In reply to | #40696 |
Tim Chase <python.list <at> tim.thechases.com> writes: > On 2013-03-06 22:20, Roy Smith wrote: > > I stumbled upon an interesting bit of trivia concerning lists and > > list comprehensions today. > > A little testing > shows that this can be rewritten as > > my_objects = list(iter(my_query_set)) > > which seems to then skip the costly __len__ call. Performance geeks > are welcome to time it against the list-comprehension version > > class Foo(object): > def __init__(self): > self.items = range(10) > def __iter__(self): > return iter(self.items) > def __len__(self): > print "Calling costly __len__" > return len(self.items) > Well, it skips the costly len() call because your iter(Foo()) returns iter(range()) under the hood and list() uses that object's __len__() method. In most cases, such a workaround will not be feasible. Why should iter(QuerySet()) have a faster __len__() method defined than QuerySet() itself. Most likely, iter(QuerySet()) just returns self anyway? Best, Wolfgang
[toc] | [prev] | [next] | [standalone]
| From | Ian Kelly <ian.g.kelly@gmail.com> |
|---|---|
| Date | 2013-03-07 09:00 -0700 |
| Message-ID | <mailman.3035.1362672083.2939.python-list@python.org> |
| In reply to | #40696 |
On Thu, Mar 7, 2013 at 4:22 AM, Wolfgang Maier
<wolfgang.maier@biologie.uni-freiburg.de> wrote:
> Well, it skips the costly len() call because your iter(Foo()) returns
> iter(range()) under the hood and list() uses that object's __len__() method.
Iterators do not generally have __len__ methods.
>>> len(iter(range(10)))
Traceback (most recent call last):
File "<stdin>", line 1, in <module>
TypeError: object of type 'range_iterator' has no len()
> In
> most cases, such a workaround will not be feasible. Why should iter(QuerySet())
> have a faster __len__() method defined than QuerySet() itself.
iter(QuerySet()) should not have any __len__ method defined at all,
which is why the optimization would be skipped.
> Most likely,
> iter(QuerySet()) just returns self anyway?
But on this point, you are correct. The mongoengine QuerySet.__iter__
method is defined as:
def __iter__(self):
self.rewind()
return self
This is unfortunate design. Not only does it mean that the iterator's
__len__ method cannot be trusted (what should the __len__ of a
partially exhausted iterator return?), but it also means that requesting
an iterator over the QuerySet will also silently invalidate any
existing iterators.
[toc] | [prev] | [next] | [standalone]
| From | Christian Heimes <christian@python.org> |
|---|---|
| Date | 2013-03-07 17:20 +0100 |
| Message-ID | <mailman.3038.1362673271.2939.python-list@python.org> |
| In reply to | #40696 |
Am 07.03.2013 17:00, schrieb Ian Kelly: > On Thu, Mar 7, 2013 at 4:22 AM, Wolfgang Maier > <wolfgang.maier@biologie.uni-freiburg.de> wrote: >> Well, it skips the costly len() call because your iter(Foo()) returns >> iter(range()) under the hood and list() uses that object's __len__() method. > > Iterators do not generally have __len__ methods. > >>>> len(iter(range(10))) > Traceback (most recent call last): > File "<stdin>", line 1, in <module> > TypeError: object of type 'range_iterator' has no len() But iterators have a length hint method that are used for some optimizations and preallocations, too. >>> i = iter(range(10)) >>> i.__length_hint__() 10 See http://www.python.org/dev/peps/pep-0424/ Christian
[toc] | [prev] | [next] | [standalone]
| From | Ian Kelly <ian.g.kelly@gmail.com> |
|---|---|
| Date | 2013-03-07 10:31 -0700 |
| Message-ID | <mailman.3041.1362677547.2939.python-list@python.org> |
| In reply to | #40696 |
On Thu, Mar 7, 2013 at 9:20 AM, Christian Heimes <christian@python.org> wrote: > Am 07.03.2013 17:00, schrieb Ian Kelly: >> On Thu, Mar 7, 2013 at 4:22 AM, Wolfgang Maier >> <wolfgang.maier@biologie.uni-freiburg.de> wrote: >>> Well, it skips the costly len() call because your iter(Foo()) returns >>> iter(range()) under the hood and list() uses that object's __len__() method. >> >> Iterators do not generally have __len__ methods. >> >>>>> len(iter(range(10))) >> Traceback (most recent call last): >> File "<stdin>", line 1, in <module> >> TypeError: object of type 'range_iterator' has no len() > > But iterators have a length hint method that are used for some > optimizations and preallocations, too. > >>>> i = iter(range(10)) >>>> i.__length_hint__() > 10 > > See http://www.python.org/dev/peps/pep-0424/ Didn't know about that, thanks. Presumably a proper iter(QuerySet()) object could implement __length_hint__ in an efficient manner rather than by just calling the __len__ of the underlying QuerySet,
[toc] | [prev] | [next] | [standalone]
| From | Stefan Behnel <stefan_ml@behnel.de> |
|---|---|
| Date | 2013-03-07 20:19 +0100 |
| Message-ID | <mailman.3046.1362683962.2939.python-list@python.org> |
| In reply to | #40696 |
Ian Kelly, 07.03.2013 18:31: > On Thu, Mar 7, 2013 at 9:20 AM, Christian Heimes wrote: >> Am 07.03.2013 17:00, schrieb Ian Kelly: >>> On Thu, Mar 7, 2013 at 4:22 AM, Wolfgang Maier wrote: >>>> Well, it skips the costly len() call because your iter(Foo()) returns >>>> iter(range()) under the hood and list() uses that object's __len__() method. >>> >>> Iterators do not generally have __len__ methods. >>> >>> >>> len(iter(range(10))) >>> Traceback (most recent call last): >>> File "<stdin>", line 1, in <module> >>> TypeError: object of type 'range_iterator' has no len() >> >> But iterators have a length hint method that are used for some >> optimizations and preallocations, too. >> >> >>> i = iter(range(10)) >> >>> i.__length_hint__() >> 10 >> >> See http://www.python.org/dev/peps/pep-0424/ > > Didn't know about that, thanks. Presumably a proper iter(QuerySet()) > object could implement __length_hint__ in an efficient manner rather > than by just calling the __len__ of the underlying QuerySet, And how exactly would it do that, without either doing what __len__ does or reading the whole result set into memory? Stefan
[toc] | [prev] | [next] | [standalone]
| From | Ian Kelly <ian.g.kelly@gmail.com> |
|---|---|
| Date | 2013-03-07 13:26 -0700 |
| Message-ID | <mailman.3051.1362688062.2939.python-list@python.org> |
| In reply to | #40696 |
On Thu, Mar 7, 2013 at 12:19 PM, Stefan Behnel <stefan_ml@behnel.de> wrote: >> Didn't know about that, thanks. Presumably a proper iter(QuerySet()) >> object could implement __length_hint__ in an efficient manner rather >> than by just calling the __len__ of the underlying QuerySet, > > And how exactly would it do that, without either doing what __len__ does or > reading the whole result set into memory? If the underlying cursor provides its own efficient length hint, it could return that. Or if the query is result-limited, use the limit as a length hint, provided it's not absurdly large. And if you really can't efficiently determine anything about the length of the result set at all, you can always fall back on returning NotImplemented.
[toc] | [prev] | [next] | [standalone]
| From | Terry Reedy <tjreedy@udel.edu> |
|---|---|
| Date | 2013-03-07 15:29 -0500 |
| Message-ID | <mailman.3052.1362688168.2939.python-list@python.org> |
| In reply to | #40696 |
On 3/7/2013 11:00 AM, Ian Kelly wrote:
> But on this point, you are correct. The mongoengine QuerySet.__iter__
> method is defined as:
>
> def __iter__(self):
> self.rewind()
> return self
>
> This is unfortunate design. Not only does it mean that the iterator's
> __len__ method cannot be trusted (what should the __len__ of a
> partially exhausted iterator return?), but it also means that requesting
> an iterator over the QuerySet will also silently invalidate any
> existing iterators.
I view that design as a violation of the iterator protocol and hence a
program bug. __iter__ should either *just* return self (if the self is
an iterator) or return a new object (if self is a non-iterator
iterable). File objects are iterators and .__iter__ does not rewind.
>>> f = open("f:/python/mypy/tem.py")
>>> next(f)
'class myit(list):\n'
>>> f2 = iter(f)
>>> f2 is f
True
>>> next(f2)
" def __bytes__(self): return b'hello'\n"
--
Terry Jan Reedy
[toc] | [prev] | [next] | [standalone]
| From | Terry Reedy <tjreedy@udel.edu> |
|---|---|
| Date | 2013-03-07 15:34 -0500 |
| Message-ID | <mailman.3053.1362688500.2939.python-list@python.org> |
| In reply to | #40696 |
On 3/7/2013 11:20 AM, Christian Heimes wrote: > But iterators have a length hint method that are used for some > optimizations and preallocations, too. This is easy when the base iterable has a length method, as do range objects. >>>> i = iter(range(10)) >>>> i.__length_hint__() > 10 And the length_hint can (should be) decremented with each next call. >>> next(i); next(i) 0 1 >>> i.__length_hint__() 8 -- Terry Jan Reedy
[toc] | [prev] | [next] | [standalone]
| From | Wolfgang Maier <wolfgang.maier@biologie.uni-freiburg.de> |
|---|---|
| Date | 2013-03-07 20:41 +0000 |
| Message-ID | <mailman.3055.1362688919.2939.python-list@python.org> |
| In reply to | #40696 |
> >>> Iterators do not generally have __len__ methods.
> >>>
> >>> >>> len(iter(range(10)))
> >>> Traceback (most recent call last):
> >>> File "<stdin>", line 1, in <module>
> >>> TypeError: object of type 'range_iterator' has no len()
> >>
> >> But iterators have a length hint method that are used for some
> >> optimizations and preallocations, too.
> >>
> >> >>> i = iter(range(10))
> >> >>> i.__length_hint__()
> >> 10
> >>
> >> See http://www.python.org/dev/peps/pep-0424/
> >
very interesting (hadn't heard of it)! Just checked the PEP,
then tested list()'s behavior, and it is just as described:
class stupid(list):
def __len__(self):
print ('len() called')
return NotImplemented
def __length_hint__(self):
print ('hint requested')
l=iter(self).__length_hint__()
print (l)
return l
a=stupid((1,2,3))
len(d)
======>
len() called
Traceback (most recent call last):
File "<pyshell#79>", line 1, in <module>
len(d)
TypeError: an integer is required
list(d)
======>
len() called
hint requested
3
[1, 2, 3]
so list() first tries to call the iterable's __len__ method. If that raises a
TypeError it falls back to __length_hint__ .
What I still don't know is how the listiterator object's __length_hint__ works.
Why, in this case, does it know that it has a length of 3 ? The PEP does not
provide any hint how a reasonable hint could be calculated.
> And how exactly would it do that, without either doing what __len__ does or
> reading the whole result set into memory?
>
> Stefan
>
a very good question.
Best,
Wolfgang
[toc] | [prev] | [next] | [standalone]
| From | Terry Reedy <tjreedy@udel.edu> |
|---|---|
| Date | 2013-03-07 17:53 -0500 |
| Message-ID | <mailman.3060.1362696834.2939.python-list@python.org> |
| In reply to | #40696 |
On 3/7/2013 3:41 PM, Wolfgang Maier wrote:
>
>>>>> Iterators do not generally have __len__ methods.
>>>>>
>>>>>>>> len(iter(range(10)))
>>>>> Traceback (most recent call last):
>>>>> File "<stdin>", line 1, in <module>
>>>>> TypeError: object of type 'range_iterator' has no len()
>>>>
>>>> But iterators have a length hint method that are used for some
>>>> optimizations and preallocations, too.
>>>>
>>>>>>> i = iter(range(10))
>>>>>>> i.__length_hint__()
>>>> 10
>>>>
>>>> See http://www.python.org/dev/peps/pep-0424/
>>>
>
> very interesting (hadn't heard of it)! Just checked the PEP,
> then tested list()'s behavior, and it is just as described:
>
> class stupid(list):
> def __len__(self):
> print ('len() called')
> return NotImplemented
>
> def __length_hint__(self):
> print ('hint requested')
> l=iter(self).__length_hint__()
> print (l)
> return l
>
> a=stupid((1,2,3))
> len(d)
> ======>
> len() called
>
> Traceback (most recent call last):
> File "<pyshell#79>", line 1, in <module>
> len(d)
> TypeError: an integer is required
>
> list(d)
> ======>
> len() called
> hint requested
> 3
> [1, 2, 3]
>
> so list() first tries to call the iterable's __len__ method. If that raises a
> TypeError it falls back to __length_hint__ .
> What I still don't know is how the listiterator object's __length_hint__ works.
> Why, in this case, does it know that it has a length of 3 ? The PEP does not
> provide any hint how a reasonable hint could be calculated.
'How' depends on the iterator, but when it has an underlying concrete
iterable of known length, it should be rather trivial as .__next__ will
explicitly or implicitly use the count remaining in its operation. Part
of the justification of adding __length_hint__ is that in many cases it
is so easy. Iterators based on an iterator with a length_hint can just
pass it along.
The list iterator might work with a C pointer to the next item and a
countdown count initialized to the list length. The __next__ method
might be something like this mixed C and Python pseudocode:
if (countdown--) return *(current--);
else raise StopIteration
(For non-C coders, postfix -- decrements value *after* retrieving it.)
Then __length_hint__ would just return countdown. Tuples would work the
same way. Sets and dicts would need current-- elaborated to skip over
empty hash table slots. Range iterators would add the step to current,
instead of 1.
--
Terry Jan Reedy
[toc] | [prev] | [next] | [standalone]
| From | Steven D'Aprano <steve+comp.lang.python@pearwood.info> |
|---|---|
| Date | 2013-03-08 17:59 +0000 |
| Message-ID | <513a26fa$0$30001$c3e8da3$5496439d@news.astraweb.com> |
| In reply to | #40696 |
On Wed, 06 Mar 2013 22:20:11 -0500, Roy Smith wrote: > I stumbled upon an interesting bit of trivia concerning lists and list > comprehensions today. > > We use mongoengine as a database model layer. A mongoengine query > returns an iterable object called a QuerySet. The "obvious" way to > create a list of the query results would be: > > my_objects = list(my_query_set) > > and, indeed, that works. But, then I found this code: > > my_objects = [obj for obj in my_query_set] > > which seemed a bit silly. I called over the guy who wrote it and asked > him why he didn't just write it using list(). I was astounded when it > turned out there's a good reason! And why was that not documented in the code? I hope you took this fellow out and gave him a damned good thrashing! -- Steven
[toc] | [prev] | [next] | [standalone]
| From | Roy Smith <roy@panix.com> |
|---|---|
| Date | 2013-03-08 13:45 -0500 |
| Message-ID | <roy-0DC72C.13450208032013@70-1-84-166.pools.spcsdns.net> |
| In reply to | #40879 |
In article <513a26fa$0$30001$c3e8da3$5496439d@news.astraweb.com>, Steven D'Aprano <steve+comp.lang.python@pearwood.info> wrote: > On Wed, 06 Mar 2013 22:20:11 -0500, Roy Smith wrote: > > > I stumbled upon an interesting bit of trivia concerning lists and list > > comprehensions today. > > > > We use mongoengine as a database model layer. A mongoengine query > > returns an iterable object called a QuerySet. The "obvious" way to > > create a list of the query results would be: > > > > my_objects = list(my_query_set) > > > > and, indeed, that works. But, then I found this code: > > > > my_objects = [obj for obj in my_query_set] > > > > which seemed a bit silly. I called over the guy who wrote it and asked > > him why he didn't just write it using list(). I was astounded when it > > turned out there's a good reason! > > And why was that not documented in the code? > > I hope you took this fellow out and gave him a damned good thrashing! Nah. I'm a pacifist. Besides, he's my boss :-)
[toc] | [prev] | [next] | [standalone]
| From | Roy Smith <roy@panix.com> |
|---|---|
| Date | 2013-03-10 09:05 -0400 |
| Message-ID | <roy-87E0C0.09050910032013@70-1-84-166.pools.spcsdns.net> |
| In reply to | #40696 |
In article <roy-572C99.22201106032013@70-1-84-166.pools.spcsdns.net>,
Roy Smith <roy@panix.com> wrote:
> The problem is, QuerySets have a __len__() method. Calling it is a lot
> faster than iterating over the whole query set and counting the items,
> but it does result in an additional database query, which is a lot
> slower than the list resizing! Writing the code as a list comprehension
> prevents list() from trying to optimize when it shouldn't!
Hmmm, I think I've found a good solution.
It turns out, we don't actually use QuerySet in our models. We've
defined our own QuerySet subclass which adds a few convenience methods.
Adding
def __len__(self):
raise NotImplemented
to our subclass should do the job. It looks like list() respects that,
calls __iter__(), and does the right thing. I can't find any place
where that behavior for list() is documented, but it's logical and
experimentally, it seems to work.
Can anybody see any downside to this?
[toc] | [prev] | [next] | [standalone]
| From | Terry Reedy <tjreedy@udel.edu> |
|---|---|
| Date | 2013-03-10 17:39 -0400 |
| Message-ID | <mailman.3167.1362951587.2939.python-list@python.org> |
| In reply to | #41014 |
On 3/10/2013 9:05 AM, Roy Smith wrote: > In article <roy-572C99.22201106032013@70-1-84-166.pools.spcsdns.net>, > Roy Smith <roy@panix.com> wrote: > >> The problem is, QuerySets have a __len__() method. Calling it is a lot >> faster than iterating over the whole query set and counting the items, >> but it does result in an additional database query, which is a lot >> slower than the list resizing! Writing the code as a list comprehension >> prevents list() from trying to optimize when it shouldn't! > > Hmmm, I think I've found a good solution. > > It turns out, we don't actually use QuerySet in our models. We've > defined our own QuerySet subclass which adds a few convenience methods. > Adding > > def __len__(self): > raise NotImplemented > > to our subclass should do the job. It looks like list() respects that, > calls __iter__(), and does the right thing. I can't find any place > where that behavior for list() is documented, It is a cpython implementation detail that probably has changed with the versions. > but it's logical and experimentally, it seems to work. > Can anybody see any downside to this? No. Give it a try. -- Terry Jan Reedy
[toc] | [prev] | [next] | [standalone]
| From | Roy Smith <roy@panix.com> |
|---|---|
| Date | 2013-03-10 18:34 -0400 |
| Message-ID | <roy-9B5401.18345810032013@70-1-84-166.pools.spcsdns.net> |
| In reply to | #41038 |
In article <mailman.3167.1362951587.2939.python-list@python.org>,
Terry Reedy <tjreedy@udel.edu> wrote:
> > It turns out, we don't actually use QuerySet in our models. We've
> > defined our own QuerySet subclass which adds a few convenience methods.
> > Adding
> >
> > def __len__(self):
> > raise NotImplemented
> >
> > to our subclass should do the job. It looks like list() respects that,
> > calls __iter__(), and does the right thing. I can't find any place
> > where that behavior for list() is documented,
>
> It is a cpython implementation detail that probably has changed with the
> versions.
Yeah, that's what I was afraid of. The "obvious" solution of:
class QuerySet(mongoengine.queryset.QuerySet):
def __init__(self, document, collection):
super(QuerySet, self).__init__(document, collection)
[...]
del self.__len__
results in:
[rest of stack dump elided]
del self.__len__
AttributeError: __len__
which I don't understand.
[toc] | [prev] | [next] | [standalone]
| From | Steven D'Aprano <steve+comp.lang.python@pearwood.info> |
|---|---|
| Date | 2013-03-10 23:35 +0000 |
| Message-ID | <513d18d6$0$6512$c3e8da3$5496439d@news.astraweb.com> |
| In reply to | #41044 |
On Sun, 10 Mar 2013 18:34:58 -0400, Roy Smith wrote:
> Yeah, that's what I was afraid of. The "obvious" solution of:
>
> class QuerySet(mongoengine.queryset.QuerySet):
> def __init__(self, document, collection):
> super(QuerySet, self).__init__(document, collection) [...]
> del self.__len__
>
> results in:
>
> [rest of stack dump elided]
> del self.__len__
> AttributeError: __len__
>
> which I don't understand.
You don't define a per-instance custom QuerySet attribute called __len__,
so you can't delete it from the instance.
The existing __len__ attribute is attached to the mongoengine
queryset.QuerySet class, not the instance. You could monkeypatch the
parent class, but that will probably break something else:
del mongoengine.queryset.QuerySet.__len__
or you could try over-riding __len__ with a fake that pretends it doesn't
exist:
def __len__(self):
raise AttributeError
Try that and see it it works. (It may not.)
--
Steven
[toc] | [prev] | [next] | [standalone]
Page 1 of 2 [1] 2 Next page →
Back to top | Article view | comp.lang.python
csiph-web