Groups > comp.lang.python > #40696 > unrolled thread

Interesting list() un-optimization

Started by	Roy Smith <roy@panix.com>
First post	2013-03-06 22:20 -0500
Last post	2013-03-10 19:50 -0400
Articles	20 on this page of 21 — 10 participants

Back to article view | Back to comp.lang.python

  Interesting list() un-optimization Roy Smith <roy@panix.com> - 2013-03-06 22:20 -0500
    Re: Interesting list() un-optimization Dave Angel <davea@davea.name> - 2013-03-06 22:38 -0500
    Re: Interesting list() un-optimization Tim Chase <python.list@tim.thechases.com> - 2013-03-06 21:57 -0600
    Re: Interesting list() un-optimization Kev Dwyer <kevin.p.dwyer@gmail.com> - 2013-03-07 07:31 +0000
    Re: Interesting list() un-optimization Wolfgang Maier <wolfgang.maier@biologie.uni-freiburg.de> - 2013-03-07 11:22 +0000
    Re: Interesting list() un-optimization Ian Kelly <ian.g.kelly@gmail.com> - 2013-03-07 09:00 -0700
    Re: Interesting list() un-optimization Christian Heimes <christian@python.org> - 2013-03-07 17:20 +0100
    Re: Interesting list() un-optimization Ian Kelly <ian.g.kelly@gmail.com> - 2013-03-07 10:31 -0700
    Re: Interesting list() un-optimization Stefan Behnel <stefan_ml@behnel.de> - 2013-03-07 20:19 +0100
    Re: Interesting list() un-optimization Ian Kelly <ian.g.kelly@gmail.com> - 2013-03-07 13:26 -0700
    Re: Interesting list() un-optimization Terry Reedy <tjreedy@udel.edu> - 2013-03-07 15:29 -0500
    Re: Interesting list() un-optimization Terry Reedy <tjreedy@udel.edu> - 2013-03-07 15:34 -0500
    Re: Interesting list() un-optimization Wolfgang Maier <wolfgang.maier@biologie.uni-freiburg.de> - 2013-03-07 20:41 +0000
    Re: Interesting list() un-optimization Terry Reedy <tjreedy@udel.edu> - 2013-03-07 17:53 -0500
    Re: Interesting list() un-optimization Steven D'Aprano <steve+comp.lang.python@pearwood.info> - 2013-03-08 17:59 +0000
      Re: Interesting list() un-optimization Roy Smith <roy@panix.com> - 2013-03-08 13:45 -0500
    Re: Interesting list() un-optimization Roy Smith <roy@panix.com> - 2013-03-10 09:05 -0400
      Re: Interesting list() un-optimization Terry Reedy <tjreedy@udel.edu> - 2013-03-10 17:39 -0400
        Re: Interesting list() un-optimization Roy Smith <roy@panix.com> - 2013-03-10 18:34 -0400
          Re: Interesting list() un-optimization Steven D'Aprano <steve+comp.lang.python@pearwood.info> - 2013-03-10 23:35 +0000
            Re: Interesting list() un-optimization Roy Smith <roy@panix.com> - 2013-03-10 19:50 -0400

Page 1 of 2 [1] 2 Next page →

#40696 — Interesting list() un-optimization

From	Roy Smith <roy@panix.com>
Date	2013-03-06 22:20 -0500
Subject	Interesting list() un-optimization
Message-ID	<roy-572C99.22201106032013@70-1-84-166.pools.spcsdns.net>

I stumbled upon an interesting bit of trivia concerning lists and list 
comprehensions today.

We use mongoengine as a database model layer.  A mongoengine query 
returns an iterable object called a QuerySet.  The "obvious" way to 
create a list of the query results would be:

    my_objects = list(my_query_set)

and, indeed, that works.  But, then I found this code:

   my_objects = [obj for obj in my_query_set]

which seemed a bit silly.  I called over the guy who wrote it and asked 
him why he didn't just write it using list().  I was astounded when it 
turned out there's a good reason!

Apparently, list() has an "optimization" where it calls len() on its 
argument to try and discover the number of items it's going to put into 
the list.  Presumably, list() uses this information to pre-allocate the 
right amount of memory the first time, without any resizing.  If len() 
fails, it falls back to just iterating and resizing as needed.  
Normally, this would be a win.

The problem is, QuerySets have a __len__() method.  Calling it is a lot 
faster than iterating over the whole query set and counting the items, 
but it does result in an additional database query, which is a lot 
slower than the list resizing!  Writing the code as a list comprehension 
prevents list() from trying to optimize when it shouldn't!

[toc] | [next] | [standalone]

#40697

From	Dave Angel <davea@davea.name>
Date	2013-03-06 22:38 -0500
Message-ID	<mailman.2975.1362627544.2939.python-list@python.org>
In reply to	#40696

On 03/06/2013 10:20 PM, Roy Smith wrote:
> I stumbled upon an interesting bit of trivia concerning lists and list
> comprehensions today.
>
> We use mongoengine as a database model layer.  A mongoengine query
> returns an iterable object called a QuerySet.  The "obvious" way to
> create a list of the query results would be:
>
>      my_objects = list(my_query_set)
>
> and, indeed, that works.  But, then I found this code:
>
>     my_objects = [obj for obj in my_query_set]
>
> which seemed a bit silly.  I called over the guy who wrote it and asked
> him why he didn't just write it using list().  I was astounded when it
> turned out there's a good reason!
>
> Apparently, list() has an "optimization" where it calls len() on its
> argument to try and discover the number of items it's going to put into
> the list.  Presumably, list() uses this information to pre-allocate the
> right amount of memory the first time, without any resizing.  If len()
> fails, it falls back to just iterating and resizing as needed.
> Normally, this would be a win.
>
> The problem is, QuerySets have a __len__() method.  Calling it is a lot
> faster than iterating over the whole query set and counting the items,
> but it does result in an additional database query, which is a lot
> slower than the list resizing!  Writing the code as a list comprehension
> prevents list() from trying to optimize when it shouldn't!
>

That is very interesting.  list() assumes the __len__() method would be 
very quick.

Perhaps list() should take an optional second argument that specifies 
the initial length to allocate.  That way code that either doesn't want 
__len__() to be used, or that already knows a reasonable number to use, 
can supply the value to preallocate.

-- 
DaveA

[toc] | [prev] | [next] | [standalone]

#40699

From	Tim Chase <python.list@tim.thechases.com>
Date	2013-03-06 21:57 -0600
Message-ID	<mailman.2976.1362628560.2939.python-list@python.org>
In reply to	#40696

On 2013-03-06 22:20, Roy Smith wrote:
> I stumbled upon an interesting bit of trivia concerning lists and
> list comprehensions today.

I agree with Dave Angel that this is interesting.  A little testing
shows that this can be rewritten as

  my_objects = list(iter(my_query_set))

which seems to then skip the costly __len__ call.  Performance geeks
are welcome to time it against the list-comprehension version :-)

-tkc


class Foo(object):
    def __init__(self):
        self.items = range(10)
    def __iter__(self):
        return iter(self.items)
    def __len__(self):
        print "Calling costly __len__"
        return len(self.items)

print "Ensuring we can iterate over it:"
for x in Foo():
    print x

print "\nJust call list():"
lst = list(Foo())
print lst

print "\nCall list(iter())"
lst = list(iter(Foo()))
print lst

[toc] | [prev] | [next] | [standalone]

#40715

From	Kev Dwyer <kevin.p.dwyer@gmail.com>
Date	2013-03-07 07:31 +0000
Message-ID	<mailman.2985.1362641500.2939.python-list@python.org>
In reply to	#40696

Roy Smith wrote:

> I stumbled upon an interesting bit of trivia concerning lists and list
> comprehensions today.
> 
> We use mongoengine as a database model layer.  A mongoengine query
> returns an iterable object called a QuerySet.  The "obvious" way to
> create a list of the query results would be:
> 
>     my_objects = list(my_query_set)
> 
> and, indeed, that works.  But, then I found this code:
> 
>    my_objects = [obj for obj in my_query_set]
> 
> which seemed a bit silly.  I called over the guy who wrote it and asked
> him why he didn't just write it using list().  I was astounded when it
> turned out there's a good reason!
> 
> Apparently, list() has an "optimization" where it calls len() on its
> argument to try and discover the number of items it's going to put into
> the list.  Presumably, list() uses this information to pre-allocate the
> right amount of memory the first time, without any resizing.  If len()
> fails, it falls back to just iterating and resizing as needed.
> Normally, this would be a win.
> 
> The problem is, QuerySets have a __len__() method.  Calling it is a lot
> faster than iterating over the whole query set and counting the items,
> but it does result in an additional database query, which is a lot
> slower than the list resizing!  Writing the code as a list comprehension
> prevents list() from trying to optimize when it shouldn't!


Interesting discovery.  Yet isn't this as much an issue with the mongoengine 
library as with list()?  Queryset.count() can be called if the "length" of a 
resultset needs to be retrieved, so the __len__() methd seems redundant.  
And given that it's not unheard of to call list() on iterables, perhaps the 
library designers should either optimise the __len__() method, or document 
the performance implications of calling list on the queryset? 

Anyway, thanks for this thought-provoking post.

Cheers,

Kev

[toc] | [prev] | [next] | [standalone]

#40748

From	Wolfgang Maier <wolfgang.maier@biologie.uni-freiburg.de>
Date	2013-03-07 11:22 +0000
Message-ID	<mailman.3008.1362655396.2939.python-list@python.org>
In reply to	#40696

Tim Chase <python.list <at> tim.thechases.com> writes:

> On 2013-03-06 22:20, Roy Smith wrote:
> > I stumbled upon an interesting bit of trivia concerning lists and
> > list comprehensions today.
> 
> A little testing
> shows that this can be rewritten as
> 
>   my_objects = list(iter(my_query_set))
> 
> which seems to then skip the costly __len__ call.  Performance geeks
> are welcome to time it against the list-comprehension version 
> 
> class Foo(object):
>     def __init__(self):
>         self.items = range(10)
>     def __iter__(self):
>         return iter(self.items)
>     def __len__(self):
>         print "Calling costly __len__"
>         return len(self.items)
> 

Well, it skips the costly len() call because your iter(Foo()) returns
iter(range()) under the hood and list() uses that object's __len__() method. In
most cases, such a workaround will not be feasible. Why should iter(QuerySet())
have a faster __len__() method defined than QuerySet() itself. Most likely,
iter(QuerySet()) just returns self anyway?
Best,
Wolfgang

[toc] | [prev] | [next] | [standalone]

#40793

From	Ian Kelly <ian.g.kelly@gmail.com>
Date	2013-03-07 09:00 -0700
Message-ID	<mailman.3035.1362672083.2939.python-list@python.org>
In reply to	#40696

On Thu, Mar 7, 2013 at 4:22 AM, Wolfgang Maier
<wolfgang.maier@biologie.uni-freiburg.de> wrote:
> Well, it skips the costly len() call because your iter(Foo()) returns
> iter(range()) under the hood and list() uses that object's __len__() method.

Iterators do not generally have __len__ methods.

>>> len(iter(range(10)))
Traceback (most recent call last):
  File "<stdin>", line 1, in <module>
TypeError: object of type 'range_iterator' has no len()

> In
> most cases, such a workaround will not be feasible. Why should iter(QuerySet())
> have a faster __len__() method defined than QuerySet() itself.

iter(QuerySet()) should not have any __len__ method defined at all,
which is why the optimization would be skipped.

> Most likely,
> iter(QuerySet()) just returns self anyway?

But on this point, you are correct.  The mongoengine QuerySet.__iter__
method is defined as:

    def __iter__(self):
        self.rewind()
        return self

This is unfortunate design.  Not only does it mean that the iterator's
__len__ method cannot be trusted (what should the __len__ of a
partially exhausted iterator return?), but it also means that requesting
an iterator over the QuerySet will also silently invalidate any
existing iterators.

[toc] | [prev] | [next] | [standalone]

#40799

From	Christian Heimes <christian@python.org>
Date	2013-03-07 17:20 +0100
Message-ID	<mailman.3038.1362673271.2939.python-list@python.org>
In reply to	#40696

Am 07.03.2013 17:00, schrieb Ian Kelly:
> On Thu, Mar 7, 2013 at 4:22 AM, Wolfgang Maier
> <wolfgang.maier@biologie.uni-freiburg.de> wrote:
>> Well, it skips the costly len() call because your iter(Foo()) returns
>> iter(range()) under the hood and list() uses that object's __len__() method.
> 
> Iterators do not generally have __len__ methods.
> 
>>>> len(iter(range(10)))
> Traceback (most recent call last):
>   File "<stdin>", line 1, in <module>
> TypeError: object of type 'range_iterator' has no len()

But iterators have a length hint method that are used for some
optimizations and preallocations, too.

>>> i = iter(range(10))
>>> i.__length_hint__()
10

See http://www.python.org/dev/peps/pep-0424/

Christian

[toc] | [prev] | [next] | [standalone]

#40804

From	Ian Kelly <ian.g.kelly@gmail.com>
Date	2013-03-07 10:31 -0700
Message-ID	<mailman.3041.1362677547.2939.python-list@python.org>
In reply to	#40696

On Thu, Mar 7, 2013 at 9:20 AM, Christian Heimes <christian@python.org> wrote:
> Am 07.03.2013 17:00, schrieb Ian Kelly:
>> On Thu, Mar 7, 2013 at 4:22 AM, Wolfgang Maier
>> <wolfgang.maier@biologie.uni-freiburg.de> wrote:
>>> Well, it skips the costly len() call because your iter(Foo()) returns
>>> iter(range()) under the hood and list() uses that object's __len__() method.
>>
>> Iterators do not generally have __len__ methods.
>>
>>>>> len(iter(range(10)))
>> Traceback (most recent call last):
>>   File "<stdin>", line 1, in <module>
>> TypeError: object of type 'range_iterator' has no len()
>
> But iterators have a length hint method that are used for some
> optimizations and preallocations, too.
>
>>>> i = iter(range(10))
>>>> i.__length_hint__()
> 10
>
> See http://www.python.org/dev/peps/pep-0424/

Didn't know about that, thanks.  Presumably a proper iter(QuerySet())
object could implement __length_hint__ in an efficient manner rather
than by just calling the __len__ of the underlying QuerySet,

[toc] | [prev] | [next] | [standalone]

#40811

From	Stefan Behnel <stefan_ml@behnel.de>
Date	2013-03-07 20:19 +0100
Message-ID	<mailman.3046.1362683962.2939.python-list@python.org>
In reply to	#40696

Ian Kelly, 07.03.2013 18:31:
> On Thu, Mar 7, 2013 at 9:20 AM, Christian Heimes wrote:
>> Am 07.03.2013 17:00, schrieb Ian Kelly:
>>> On Thu, Mar 7, 2013 at 4:22 AM, Wolfgang Maier wrote:
>>>> Well, it skips the costly len() call because your iter(Foo()) returns
>>>> iter(range()) under the hood and list() uses that object's __len__() method.
>>>
>>> Iterators do not generally have __len__ methods.
>>>
>>> >>> len(iter(range(10)))
>>> Traceback (most recent call last):
>>>   File "<stdin>", line 1, in <module>
>>> TypeError: object of type 'range_iterator' has no len()
>>
>> But iterators have a length hint method that are used for some
>> optimizations and preallocations, too.
>>
>> >>> i = iter(range(10))
>> >>> i.__length_hint__()
>> 10
>>
>> See http://www.python.org/dev/peps/pep-0424/
> 
> Didn't know about that, thanks.  Presumably a proper iter(QuerySet())
> object could implement __length_hint__ in an efficient manner rather
> than by just calling the __len__ of the underlying QuerySet,

And how exactly would it do that, without either doing what __len__ does or
reading the whole result set into memory?

Stefan

[toc] | [prev] | [next] | [standalone]

#40819

From	Ian Kelly <ian.g.kelly@gmail.com>
Date	2013-03-07 13:26 -0700
Message-ID	<mailman.3051.1362688062.2939.python-list@python.org>
In reply to	#40696

On Thu, Mar 7, 2013 at 12:19 PM, Stefan Behnel <stefan_ml@behnel.de> wrote:
>> Didn't know about that, thanks.  Presumably a proper iter(QuerySet())
>> object could implement __length_hint__ in an efficient manner rather
>> than by just calling the __len__ of the underlying QuerySet,
>
> And how exactly would it do that, without either doing what __len__ does or
> reading the whole result set into memory?

If the underlying cursor provides its own efficient length hint, it
could return that.  Or if the query is result-limited, use the limit
as a length hint, provided it's not absurdly large.  And if you really
can't efficiently determine anything about the length of the result
set at all, you can always fall back on returning NotImplemented.

[toc] | [prev] | [next] | [standalone]

#40820

From	Terry Reedy <tjreedy@udel.edu>
Date	2013-03-07 15:29 -0500
Message-ID	<mailman.3052.1362688168.2939.python-list@python.org>
In reply to	#40696

On 3/7/2013 11:00 AM, Ian Kelly wrote:

> But on this point, you are correct.  The mongoengine QuerySet.__iter__
> method is defined as:
>
>      def __iter__(self):
>          self.rewind()
>          return self
>
> This is unfortunate design.  Not only does it mean that the iterator's
> __len__ method cannot be trusted (what should the __len__ of a
> partially exhausted iterator return?), but it also means that requesting
> an iterator over the QuerySet will also silently invalidate any
> existing iterators.

I view that design as a violation of the iterator protocol and hence a 
program bug. __iter__ should either *just* return self (if the self is 
an iterator) or return a new object (if self is a non-iterator 
iterable). File objects are iterators and .__iter__ does not rewind.

 >>> f = open("f:/python/mypy/tem.py")
 >>> next(f)
'class myit(list):\n'
 >>> f2 = iter(f)
 >>> f2 is f
True
 >>> next(f2)
"    def __bytes__(self): return b'hello'\n"

-- 
Terry Jan Reedy

[toc] | [prev] | [next] | [standalone]

#40821

From	Terry Reedy <tjreedy@udel.edu>
Date	2013-03-07 15:34 -0500
Message-ID	<mailman.3053.1362688500.2939.python-list@python.org>
In reply to	#40696

On 3/7/2013 11:20 AM, Christian Heimes wrote:

> But iterators have a length hint method that are used for some
> optimizations and preallocations, too.

This is easy when the base iterable has a length method, as do range 
objects.

>>>> i = iter(range(10))
>>>> i.__length_hint__()
> 10

And the length_hint can (should be) decremented with each next call.

 >>> next(i); next(i)
0
1
 >>> i.__length_hint__()
8

-- 
Terry Jan Reedy

[toc] | [prev] | [next] | [standalone]

#40823

From	Wolfgang Maier <wolfgang.maier@biologie.uni-freiburg.de>
Date	2013-03-07 20:41 +0000
Message-ID	<mailman.3055.1362688919.2939.python-list@python.org>
In reply to	#40696

> >>> Iterators do not generally have __len__ methods.
> >>>
> >>> >>> len(iter(range(10)))
> >>> Traceback (most recent call last):
> >>>   File "<stdin>", line 1, in <module>
> >>> TypeError: object of type 'range_iterator' has no len()
> >>
> >> But iterators have a length hint method that are used for some
> >> optimizations and preallocations, too.
> >>
> >> >>> i = iter(range(10))
> >> >>> i.__length_hint__()
> >> 10
> >>
> >> See http://www.python.org/dev/peps/pep-0424/
> >

very interesting (hadn't heard of it)! Just checked the PEP,
then tested list()'s behavior, and it is just as described:

class stupid(list):
	def __len__(self):
		print ('len() called')
		return NotImplemented
	
	def __length_hint__(self):
		print ('hint requested')
		l=iter(self).__length_hint__()
		print (l)
		return l

a=stupid((1,2,3))
len(d)
======>
  len() called

  Traceback (most recent call last):
    File "<pyshell#79>", line 1, in <module>
      len(d)
  TypeError: an integer is required

list(d)
======>
  len() called
  hint requested
  3
  [1, 2, 3]

so list() first tries to call the iterable's __len__ method. If that raises a
TypeError it falls back to __length_hint__ .
What I still don't know is how the listiterator object's __length_hint__ works.
Why, in this case, does it know that it has a length of 3 ? The PEP does not
provide any hint how a reasonable hint could be calculated.

> And how exactly would it do that, without either doing what __len__ does or
> reading the whole result set into memory?
> 
> Stefan
> 

a very good question.

Best,
Wolfgang

[toc] | [prev] | [next] | [standalone]

#40832

From	Terry Reedy <tjreedy@udel.edu>
Date	2013-03-07 17:53 -0500
Message-ID	<mailman.3060.1362696834.2939.python-list@python.org>
In reply to	#40696

On 3/7/2013 3:41 PM, Wolfgang Maier wrote:
>
>>>>> Iterators do not generally have __len__ methods.
>>>>>
>>>>>>>> len(iter(range(10)))
>>>>> Traceback (most recent call last):
>>>>>    File "<stdin>", line 1, in <module>
>>>>> TypeError: object of type 'range_iterator' has no len()
>>>>
>>>> But iterators have a length hint method that are used for some
>>>> optimizations and preallocations, too.
>>>>
>>>>>>> i = iter(range(10))
>>>>>>> i.__length_hint__()
>>>> 10
>>>>
>>>> See http://www.python.org/dev/peps/pep-0424/
>>>
>
> very interesting (hadn't heard of it)! Just checked the PEP,
> then tested list()'s behavior, and it is just as described:
>
> class stupid(list):
> 	def __len__(self):
> 		print ('len() called')
> 		return NotImplemented
> 	
> 	def __length_hint__(self):
> 		print ('hint requested')
> 		l=iter(self).__length_hint__()
> 		print (l)
> 		return l
>
> a=stupid((1,2,3))
> len(d)
> ======>
>    len() called
>
>    Traceback (most recent call last):
>      File "<pyshell#79>", line 1, in <module>
>        len(d)
>    TypeError: an integer is required
>
> list(d)
> ======>
>    len() called
>    hint requested
>    3
>    [1, 2, 3]
>
> so list() first tries to call the iterable's __len__ method. If that raises a
> TypeError it falls back to __length_hint__ .
> What I still don't know is how the listiterator object's __length_hint__ works.
> Why, in this case, does it know that it has a length of 3 ? The PEP does not
> provide any hint how a reasonable hint could be calculated.

'How' depends on the iterator, but when it has an underlying concrete 
iterable of known length, it should be rather trivial as .__next__ will 
explicitly or implicitly use the count remaining in its operation. Part 
of the justification of adding __length_hint__ is that in many cases it 
is so easy. Iterators based on an iterator with a length_hint can just 
pass it along.

The list iterator might work with a C pointer to the next item and a 
countdown count initialized to the list length. The __next__ method 
might be something like this mixed C and Python pseudocode:

   if (countdown--) return *(current--);
   else raise StopIteration

(For non-C coders, postfix -- decrements value *after* retrieving it.)
Then __length_hint__ would just return countdown. Tuples would work the 
same way. Sets and dicts would need current-- elaborated to skip over 
empty hash table slots. Range iterators would add the step to current, 
instead of 1.

-- 
Terry Jan Reedy

[toc] | [prev] | [next] | [standalone]

#40879

From	Steven D'Aprano <steve+comp.lang.python@pearwood.info>
Date	2013-03-08 17:59 +0000
Message-ID	<513a26fa$0$30001$c3e8da3$5496439d@news.astraweb.com>
In reply to	#40696

On Wed, 06 Mar 2013 22:20:11 -0500, Roy Smith wrote:

> I stumbled upon an interesting bit of trivia concerning lists and list
> comprehensions today.
> 
> We use mongoengine as a database model layer.  A mongoengine query
> returns an iterable object called a QuerySet.  The "obvious" way to
> create a list of the query results would be:
> 
>     my_objects = list(my_query_set)
> 
> and, indeed, that works.  But, then I found this code:
> 
>    my_objects = [obj for obj in my_query_set]
> 
> which seemed a bit silly.  I called over the guy who wrote it and asked
> him why he didn't just write it using list().  I was astounded when it
> turned out there's a good reason!

And why was that not documented in the code?

I hope you took this fellow out and gave him a damned good thrashing!



-- 
Steven

[toc] | [prev] | [next] | [standalone]

#40882

From	Roy Smith <roy@panix.com>
Date	2013-03-08 13:45 -0500
Message-ID	<roy-0DC72C.13450208032013@70-1-84-166.pools.spcsdns.net>
In reply to	#40879

In article <513a26fa$0$30001$c3e8da3$5496439d@news.astraweb.com>,
 Steven D'Aprano <steve+comp.lang.python@pearwood.info> wrote:

> On Wed, 06 Mar 2013 22:20:11 -0500, Roy Smith wrote:
> 
> > I stumbled upon an interesting bit of trivia concerning lists and list
> > comprehensions today.
> > 
> > We use mongoengine as a database model layer.  A mongoengine query
> > returns an iterable object called a QuerySet.  The "obvious" way to
> > create a list of the query results would be:
> > 
> >     my_objects = list(my_query_set)
> > 
> > and, indeed, that works.  But, then I found this code:
> > 
> >    my_objects = [obj for obj in my_query_set]
> > 
> > which seemed a bit silly.  I called over the guy who wrote it and asked
> > him why he didn't just write it using list().  I was astounded when it
> > turned out there's a good reason!
> 
> And why was that not documented in the code?
> 
> I hope you took this fellow out and gave him a damned good thrashing!

Nah.  I'm a pacifist.  Besides, he's my boss :-)

[toc] | [prev] | [next] | [standalone]

#41014

From	Roy Smith <roy@panix.com>
Date	2013-03-10 09:05 -0400
Message-ID	<roy-87E0C0.09050910032013@70-1-84-166.pools.spcsdns.net>
In reply to	#40696

In article <roy-572C99.22201106032013@70-1-84-166.pools.spcsdns.net>,
 Roy Smith <roy@panix.com> wrote:

> The problem is, QuerySets have a __len__() method.  Calling it is a lot 
> faster than iterating over the whole query set and counting the items, 
> but it does result in an additional database query, which is a lot 
> slower than the list resizing!  Writing the code as a list comprehension 
> prevents list() from trying to optimize when it shouldn't!

Hmmm, I think I've found a good solution.

It turns out, we don't actually use QuerySet in our models.  We've 
defined our own QuerySet subclass which adds a few convenience methods.  
Adding

    def __len__(self):
        raise NotImplemented

to our subclass should do the job.  It looks like list() respects that, 
calls __iter__(), and does the right thing.  I can't find any place 
where that behavior for list() is documented, but it's logical and 
experimentally, it seems to work.

Can anybody see any downside to this?

[toc] | [prev] | [next] | [standalone]

#41038

From	Terry Reedy <tjreedy@udel.edu>
Date	2013-03-10 17:39 -0400
Message-ID	<mailman.3167.1362951587.2939.python-list@python.org>
In reply to	#41014

On 3/10/2013 9:05 AM, Roy Smith wrote:
> In article <roy-572C99.22201106032013@70-1-84-166.pools.spcsdns.net>,
>   Roy Smith <roy@panix.com> wrote:
>
>> The problem is, QuerySets have a __len__() method.  Calling it is a lot
>> faster than iterating over the whole query set and counting the items,
>> but it does result in an additional database query, which is a lot
>> slower than the list resizing!  Writing the code as a list comprehension
>> prevents list() from trying to optimize when it shouldn't!
>
> Hmmm, I think I've found a good solution.
>
> It turns out, we don't actually use QuerySet in our models.  We've
> defined our own QuerySet subclass which adds a few convenience methods.
> Adding
>
>      def __len__(self):
>          raise NotImplemented
>
> to our subclass should do the job.  It looks like list() respects that,
> calls __iter__(), and does the right thing.  I can't find any place
> where that behavior for list() is documented,

It is a cpython implementation detail that probably has changed with the 
versions.

> but it's logical and experimentally, it seems to work.

> Can anybody see any downside to this?

No. Give it a try.


-- 
Terry Jan Reedy

[toc] | [prev] | [next] | [standalone]

#41044

From	Roy Smith <roy@panix.com>
Date	2013-03-10 18:34 -0400
Message-ID	<roy-9B5401.18345810032013@70-1-84-166.pools.spcsdns.net>
In reply to	#41038

In article <mailman.3167.1362951587.2939.python-list@python.org>,
 Terry Reedy <tjreedy@udel.edu> wrote:

> > It turns out, we don't actually use QuerySet in our models.  We've
> > defined our own QuerySet subclass which adds a few convenience methods.
> > Adding
> >
> >      def __len__(self):
> >          raise NotImplemented
> >
> > to our subclass should do the job.  It looks like list() respects that,
> > calls __iter__(), and does the right thing.  I can't find any place
> > where that behavior for list() is documented,
> 
> It is a cpython implementation detail that probably has changed with the 
> versions.

Yeah, that's what I was afraid of.  The "obvious" solution of:

class QuerySet(mongoengine.queryset.QuerySet):
    def __init__(self, document, collection):
        super(QuerySet, self).__init__(document, collection)
        [...]
        del self.__len__

results in:

[rest of stack dump elided]
    del self.__len__
AttributeError: __len__

which I don't understand.

[toc] | [prev] | [next] | [standalone]

#41046

From	Steven D'Aprano <steve+comp.lang.python@pearwood.info>
Date	2013-03-10 23:35 +0000
Message-ID	<513d18d6$0$6512$c3e8da3$5496439d@news.astraweb.com>
In reply to	#41044

On Sun, 10 Mar 2013 18:34:58 -0400, Roy Smith wrote:

> Yeah, that's what I was afraid of.  The "obvious" solution of:
> 
> class QuerySet(mongoengine.queryset.QuerySet):
>     def __init__(self, document, collection):
>         super(QuerySet, self).__init__(document, collection) [...]
>         del self.__len__
> 
> results in:
> 
> [rest of stack dump elided]
>     del self.__len__
> AttributeError: __len__
> 
> which I don't understand.

You don't define a per-instance custom QuerySet attribute called __len__, 
so you can't delete it from the instance.

The existing __len__ attribute is attached to the mongoengine 
queryset.QuerySet class, not the instance. You could monkeypatch the 
parent class, but that will probably break something else:

    del mongoengine.queryset.QuerySet.__len__

or you could try over-riding __len__ with a fake that pretends it doesn't 
exist:

def __len__(self):
    raise AttributeError

Try that and see it it works. (It may not.)

-- 
Steven

[toc] | [prev] | [next] | [standalone]

Page 1 of 2 [1] 2 Next page →

csiph-web

Interesting list() un-optimization

Contents

#40696 — Interesting list() un-optimization

#40697

#40699

#40715

#40748

#40793

#40799

#40804

#40811

#40819

#40820

#40821

#40823

#40832

#40879

#40882

#41014

#41038

#41044

#41046