Groups | Search | Server Info | Keyboard shortcuts | Login | Register [http] [https] [nntp] [nntps]


Groups > comp.lang.python > #61672 > unrolled thread

min max from tuples in list

Started byRobert Voigtländer <r.voigtlaender@gmail.com>
First post2013-12-11 23:25 -0800
Last post2013-12-13 01:33 +0000
Articles 20 on this page of 24 — 15 participants

Back to article view | Back to comp.lang.python


Contents

  min max from tuples in list Robert Voigtländer <r.voigtlaender@gmail.com> - 2013-12-11 23:25 -0800
    Re: min max from tuples in list Chris Angelico <rosuav@gmail.com> - 2013-12-12 19:18 +1100
    Re: min max from tuples in list Robert Voigtländer <r.voigtlaender@gmail.com> - 2013-12-12 00:34 -0800
      Re: min max from tuples in list Chris Angelico <rosuav@gmail.com> - 2013-12-12 19:43 +1100
    Re: min max from tuples in list Peter Otten <__peter__@web.de> - 2013-12-12 09:35 +0100
    Re: min max from tuples in list Jussi Piitulainen <jpiitula@ling.helsinki.fi> - 2013-12-12 10:52 +0200
    Re: min max from tuples in list Peter Otten <__peter__@web.de> - 2013-12-12 10:03 +0100
    Re: min max from tuples in list Steven D'Aprano <steve+comp.lang.python@pearwood.info> - 2013-12-12 11:44 +0000
      Re: min max from tuples in list Tim Chase <python.list@tim.thechases.com> - 2013-12-12 06:04 -0600
      Re: min max from tuples in list MRAB <python@mrabarnett.plus.com> - 2013-12-12 12:36 +0000
        Re: min max from tuples in list Steven D'Aprano <steve+comp.lang.python@pearwood.info> - 2013-12-12 23:25 +0000
      Re: min max from tuples in list Peter Otten <__peter__@web.de> - 2013-12-12 13:54 +0100
        Re: min max from tuples in list Steven D'Aprano <steve+comp.lang.python@pearwood.info> - 2013-12-13 02:36 +0000
      Re: min max from tuples in list Roy Smith <roy@panix.com> - 2013-12-12 10:02 -0500
        Re: min max from tuples in list Mark Lawrence <breamoreboy@yahoo.co.uk> - 2013-12-12 15:13 +0000
          Re: min max from tuples in list Robert Voigtländer <r.voigtlaender@gmail.com> - 2013-12-12 22:28 -0800
            Re: min max from tuples in list rusi <rustompmody@gmail.com> - 2013-12-13 10:06 -0800
              Re: min max from tuples in list Dennis Lee Bieber <wlfraed@ix.netcom.com> - 2013-12-13 19:30 -0500
                Re: min max from tuples in list Tim Roberts <timr@probo.com> - 2013-12-14 19:41 -0800
                  Re: min max from tuples in list Chris Angelico <rosuav@gmail.com> - 2013-12-16 23:08 +1100
                  Re: min max from tuples in list rusi <rustompmody@gmail.com> - 2013-12-16 07:49 -0800
                    Re: min max from tuples in list Ned Batchelder <ned@nedbatchelder.com> - 2013-12-16 10:59 -0500
                      Re: min max from tuples in list Gregory Ewing <greg.ewing@canterbury.ac.nz> - 2013-12-17 11:49 +1300
    Re: min max from tuples in list Denis McMahon <denismfmcmahon@gmail.com> - 2013-12-13 01:33 +0000

Page 1 of 2  [1] 2  Next page →


#61672 — min max from tuples in list

FromRobert Voigtländer <r.voigtlaender@gmail.com>
Date2013-12-11 23:25 -0800
Subjectmin max from tuples in list
Message-ID<f78a11dc-efdd-4108-8d1f-59386f020fd0@googlegroups.com>
Hi,

I have a list like this:

a = [(52, 193), (52, 193), (52, 192), (51, 193), (51, 191), (51, 190), (51, 189), (51, 188), (50, 194), (50, 187), (50, 186), (50, 185), (50, 184), (49, 194), (49, 183), (49, 182), (49, 181), (48, 194), (48, 180), (48, 179), (48, 178), (48, 177), (47, 194), (47, 176), (47, 175), (47, 174), (47, 173), (46, 195), (46, 172), (46, 171), (46, 170), (46, 169), (45, 195), (45, 168), (45, 167), (45, 166), (44, 195), (44, 165), (44, 164), (44, 163), (44, 162), (43, 195), (43, 161), (43, 160), (43, 159), (43, 158), (42, 196), (42, 157), (42, 156), (42, 155), (41, 196), (41, 154), (41, 153), (41, 152), (41, 151), (40, 196), (40, 150), (40, 149), (40, 148), (40, 147), (39, 196), (39, 146), (39, 145), (39, 144), (39, 143), (38, 196), (38, 142), (38, 141), (38, 140), (37, 197), (37, 139), (37, 138), (37, 137), (37, 136), (36, 197), (36, 135), (36, 134), (36, 133)]


I need to find a -performant- way to transform this into a list with tuples (a[0],[a[0][1]min],[a[0][1]max]).

Hard to explaint what I mean .. [0] of the first three tuples is 52. [1] is 193,193 and 192.
What I need as result for these three tuples is: (52,192,193).

For the next five tuples it is (51,188,193).


Extra challenges:
- This list is sorted. For performance reasons I would like to keep it unsorted.
- There may be tuples where min=max.
- There my be tupples where [0] only exists once. So mix is automatically max


I hope I was able to explain what I mean.

Thanks
Robert

[toc] | [next] | [standalone]


#61678

FromChris Angelico <rosuav@gmail.com>
Date2013-12-12 19:18 +1100
Message-ID<mailman.3969.1386836325.18130.python-list@python.org>
In reply to#61672
On Thu, Dec 12, 2013 at 6:25 PM, Robert Voigtländer
<r.voigtlaender@gmail.com> wrote:
> I need to find a -performant- way to transform this into a list with tuples (a[0],[a[0][1]min],[a[0][1]max]).
>
> Hard to explaint what I mean .. [0] of the first three tuples is 52. [1] is 193,193 and 192.
> What I need as result for these three tuples is: (52,192,193).
>
> For the next five tuples it is (51,188,193).
>
>
> Extra challenges:
> - This list is sorted. For performance reasons I would like to keep it unsorted.
> - There may be tuples where min=max.
> - There my be tupples where [0] only exists once. So mix is automatically max

Yep, I see what you mean! Apart from the first of the challenges,
which is ambiguous: do you mean you'd rather be able to work with it
unsorted, or is that a typo, "keep it sorted"?

This is a common task of aggregation. Your list is of (key, value)
tuples, and you want to do some per-key statistics. Here are three
variants on the code:

# Fastest version, depends on the keys being already grouped
# and the values sorted within each group. It actually returns
# the last and first, not the smallest and largest.
def min_max_1(lst):
    prev_key = None
    for key, value in lst:
        if key != prev_key:
            if prev_key is not None: yield prev_key, value, key_max
            key_max = value
    if prev_key is not None: yield prev_key, value, key_max

# This version depends on the keys being grouped, but
# not on them being sorted within the groups.
def min_max_2(lst):
    prev_key = None
    for key, value in lst:
        if key != prev_key:
            if prev_key is not None: yield prev_key, key_min, key_max
            key_min = key_max = value
        else:
            key_min = min(key_min, value)
            key_max = min(key_max, value)
    if prev_key is not None: yield prev_key, key_min, key_max

# Slowest version, does not depend on either the keys
# or the values being sorted. Will iterate over the entire
# list before producing any results. Returns tuples in
# arbitrary order, unlike the others (which will retain).
def min_max_3(lst):
    data = {}
    for key, value in lst:
        if key not in data:
            data[key]=(value, value)
        else:
            data[key][0] = min(data[key][0], value)
            data[key][1] = min(data[key][1], value)
    for key, minmax in data.items():
        yield key, minmax[0], minmax[1]

Each of these is a generator that yields (key, min, max) tuples. The
third one needs the most memory and execution time; the others simply
take the input as it comes. None of them actually requires that the
input be a list - any iterable will do.

ChrisA

[toc] | [prev] | [next] | [standalone]


#61682

FromRobert Voigtländer <r.voigtlaender@gmail.com>
Date2013-12-12 00:34 -0800
Message-ID<5aadb848-80e9-460a-b4bb-a8406c46730d@googlegroups.com>
In reply to#61672
Wow, thanks for the educating answer. I'll work through all the varaints.
And yes, I meant keep it unsorted.

As I read it, sorting may be required then if I don't want to use the slowest variant. I'll test them all.

Thanks
Robert

[toc] | [prev] | [next] | [standalone]


#61686

FromChris Angelico <rosuav@gmail.com>
Date2013-12-12 19:43 +1100
Message-ID<mailman.3976.1386837789.18130.python-list@python.org>
In reply to#61682
On Thu, Dec 12, 2013 at 7:34 PM, Robert Voigtländer
<r.voigtlaender@gmail.com> wrote:
> Wow, thanks for the educating answer. I'll work through all the varaints.
> And yes, I meant keep it unsorted.
>
> As I read it, sorting may be required then if I don't want to use the slowest variant. I'll test them all.

Sorting would be slower than using the dictionary method. Much slower.
If you don't need to sort for any other reason, don't sort!

By the way, it's usually courteous to quote at least some of the text
you're responding to, to give context. Otherwise nobody knows what
you're talking about. This isn't a PHPBB forum; each post actually is
quite separate. You may find that Google Groups makes your life
unnecessarily difficult in this, so it's probably easier to use the
mailing list:

https://mail.python.org/mailman/listinfo/python-list

ChrisA

[toc] | [prev] | [next] | [standalone]


#61683

FromPeter Otten <__peter__@web.de>
Date2013-12-12 09:35 +0100
Message-ID<mailman.3973.1386837348.18130.python-list@python.org>
In reply to#61672
Robert Voigtländer wrote:

> Hi,
> 
> I have a list like this:
> 
> a = [(52, 193), (52, 193), (52, 192), (51, 193), (51, 191), (51, 190),
> (51, 189), (51, 188), (50, 194), (50, 187), (50, 186), (50, 185), (50,
> 184), (49, 194), (49, 183), (49, 182), (49, 181), (48, 194), (48, 180),
> (48, 179), (48, 178), (48, 177), (47, 194), (47, 176), (47, 175), (47,
> 174), (47, 173), (46, 195), (46, 172), (46, 171), (46, 170), (46, 169),
> (45, 195), (45, 168), (45, 167), (45, 166), (44, 195), (44, 165), (44,
> 164), (44, 163), (44, 162), (43, 195), (43, 161), (43, 160), (43, 159),
> (43, 158), (42, 196), (42, 157), (42, 156), (42, 155), (41, 196), (41,
> 154), (41, 153), (41, 152), (41, 151), (40, 196), (40, 150), (40, 149),
> (40, 148), (40, 147), (39, 196), (39, 146), (39, 145), (39, 144), (39,
> 143), (38, 196), (38, 142), (38, 141), (38, 140), (37, 197), (37, 139),
> (37, 138), (37
>  , 137), (37, 136), (36, 197), (36, 135), (36, 134), (36, 133)]
> 
> 
> I need to find a -performant- way to transform this into a list with
> tuples (a[0],[a[0][1]min],[a[0][1]max]).
> 
> Hard to explaint what I mean .. [0] of the first three tuples is 52. [1]
> is 193,193 and 192. What I need as result for these three tuples is:
> (52,192,193).
> 
> For the next five tuples it is (51,188,193).
> 
> 
> Extra challenges:
> - This list is sorted. For performance reasons I would like to keep it
> unsorted. - There may be tuples where min=max.
> - There my be tupples where [0] only exists once. So mix is automatically
> max
> 
> 
> I hope I was able to explain what I mean.

I have a hunch that sorting the list might be quite efficient. You should at 
least try

import operator
import itertools

a = ...
a.sort()
result = []
for key, group in itertools.groupby(a, key=operator.itemgetter(0)):
    minpair = maxpair = next(group)
    for maxpair in group:
        pass
    result.append((key, minpair[1], maxpair[1]))

for item in result:
    print(item)

to see whether it is good enough.

[toc] | [prev] | [next] | [standalone]


#61687

FromJussi Piitulainen <jpiitula@ling.helsinki.fi>
Date2013-12-12 10:52 +0200
Message-ID<qotiouuzc1v.fsf@ruuvi.it.helsinki.fi>
In reply to#61672
Robert Voigtländer writes:

> Hi,
> 
> I have a list like this:

# shortened:
a = [(52, 193), (52, 193), (52, 192),
     (51, 193), (51, 191), (51, 190), (51, 189), (51, 188),
     (50, 194)]

> I need to find a -performant- way to transform this into a list with
> tuples (a[0],[a[0][1]min],[a[0][1]max]).
> 
> Hard to explaint what I mean .. [0] of the first three tuples is
> 52. [1] is 193,193 and 192.  What I need as result for these three
> tuples is: (52,192,193).
> 
> For the next five tuples it is (51,188,193).

I think you want [(50, 194, 194), (51, 188, 193), (52, 192, 193)] for
the shortened list, in some order. Then you might be happy with a dict
instead. If I misunderstand, the following will be irrelevant.

> Extra challenges:
> - This list is sorted. For performance reasons I would like to keep
>   it unsorted.
> - There may be tuples where min=max.
> - There my be tupples where [0] only exists once. So mix is
>   automatically max

Scan the list and keep the running min and max for each key in a dict.

mix = dict()
for k,v in a:
   mix[k] = ( min(mix.get(k, (v,v))[0], v),
              max(mix.get(k, (v,v))[1], v) )

# mix == {50: (194, 194), 51: (188, 193), 52: (192, 193)}
# As list: [ (k,v[0],v[1]) for k,v in mix.items() ]

[toc] | [prev] | [next] | [standalone]


#61688

FromPeter Otten <__peter__@web.de>
Date2013-12-12 10:03 +0100
Message-ID<mailman.3977.1386839016.18130.python-list@python.org>
In reply to#61672
Peter Otten wrote:

> Robert Voigtländer wrote:
> 
>> Hi,
>> 
>> I have a list like this:
>> 
>> a = [(52, 193), (52, 193), (52, 192), (51, 193), (51, 191), (51, 190),
>> (51, 189), (51, 188), (50, 194), (50, 187), (50, 186), (50, 185), (50,
>> 184), (49, 194), (49, 183), (49, 182), (49, 181), (48, 194), (48, 180),
>> (48, 179), (48, 178), (48, 177), (47, 194), (47, 176), (47, 175), (47,
>> 174), (47, 173), (46, 195), (46, 172), (46, 171), (46, 170), (46, 169),
>> (45, 195), (45, 168), (45, 167), (45, 166), (44, 195), (44, 165), (44,
>> 164), (44, 163), (44, 162), (43, 195), (43, 161), (43, 160), (43, 159),
>> (43, 158), (42, 196), (42, 157), (42, 156), (42, 155), (41, 196), (41,
>> 154), (41, 153), (41, 152), (41, 151), (40, 196), (40, 150), (40, 149),
>> (40, 148), (40, 147), (39, 196), (39, 146), (39, 145), (39, 144), (39,
>> 143), (38, 196), (38, 142), (38, 141), (38, 140), (37, 197), (37, 139),
>> (37, 138), (37
>>  , 137), (37, 136), (36, 197), (36, 135), (36, 134), (36, 133)]
>> 
>> 
>> I need to find a -performant- way to transform this into a list with
>> tuples (a[0],[a[0][1]min],[a[0][1]max]).
>> 
>> Hard to explaint what I mean .. [0] of the first three tuples is 52. [1]
>> is 193,193 and 192. What I need as result for these three tuples is:
>> (52,192,193).
>> 
>> For the next five tuples it is (51,188,193).
>> 
>> 
>> Extra challenges:
>> - This list is sorted. For performance reasons I would like to keep it
>> unsorted. - There may be tuples where min=max.
>> - There my be tupples where [0] only exists once. So mix is automatically
>> max
>> 
>> 
>> I hope I was able to explain what I mean.
> 
> I have a hunch that sorting the list might be quite efficient. You should
> at least try
> 
> import operator
> import itertools
> 
> a = ...
> a.sort()
> result = []
> for key, group in itertools.groupby(a, key=operator.itemgetter(0)):
>     minpair = maxpair = next(group)
>     for maxpair in group:
>         pass
>     result.append((key, minpair[1], maxpair[1]))
> 
> for item in result:
>     print(item)
> 
> to see whether it is good enough.

On second thought -- Chris Angelico may be right ;)
So here's my dict-based variant: 

def keyminmax(items):
    d = collections.defaultdict(list)
    for key, value in items:
        d[key].append(value)
    for key, values in d.items(): # d.iteritems() in python 2
        yield key, min(values), max(values)

for item in keyminmax(a):
    print(item)

It uses a bit more memory than Chris' code, but has fewer min()/max() calls.

[toc] | [prev] | [next] | [standalone]


#61699

FromSteven D'Aprano <steve+comp.lang.python@pearwood.info>
Date2013-12-12 11:44 +0000
Message-ID<52a9a1a0$0$29992$c3e8da3$5496439d@news.astraweb.com>
In reply to#61672
On Wed, 11 Dec 2013 23:25:53 -0800, Robert Voigtländer wrote:

> Hi,
> 
> I have a list like this:
> 
> a = [(52, 193), (52, 193), (52, 192), ...
> 
> 
> I need to find a -performant- way to transform this into a list with
> tuples (a[0],[a[0][1]min],[a[0][1]max]).

I'm afraid I don't know what you mean by "performant". It doesn't appear 
to be an English word, so far as I can tell. Do you mean efficient?

 
> Hard to explaint what I mean .. [0] of the first three tuples is 52. [1]
> is 193,193 and 192. What I need as result for these three tuples is:
> (52,192,193).
> 
> For the next five tuples it is (51,188,193).
> 
> 
> Extra challenges:
> - This list is sorted. For performance reasons I would like to keep it
> unsorted. 

Normally when people talk about "performance", they mean to *maximise* 
it, not minimise it. If you can guarantee that your data is already pre-
sorted, then the resulting algorithm is likely to be much more efficient 
than one which doesn't require sorted data.

In any case, sorting in Python is amazingly fast. You may be pleasantly 
surprised that a version that sorts your data, while nominally 
O(N log N), may be much faster than an O(N) solution that doesn't require 
sorted data. If I were a betting man, I'd be willing to wager a shiny new 
dollar[1] that sorting works out faster for reasonable sized sets of data.

One general piece of advice for speeding up Python code: the more work 
you can delegate to Python built-ins, like sort, min, max and so forth, 
the faster your code is likely to be.


> - There may be tuples where min=max.
> - There my be tupples where [0] only exists once. So mix is
> automatically max

Neither of these should be a problem.


The first approach I'd try would be:

- sort the list, which will group the tuples by their first item;

- conveniently, it will also ensure that the first tuple in each group 
will have the minimum second item, and the last tuple of that group the 
maximum value;

- use itertools.groupby to extract each group;

- grab the first tuple from the group, and take both its items;

- grab the last tuple from the group (if any), and take its second item;

- yield those three items.


Something like this should work, I expect:

from collections import deque
from itertools import groupby
from operator import itemgetter

def collect(data):
    data = sorted(data)
    groups = groupby(data, itemgetter(0))
    d = deque([], maxlen=1)
    for key, subiter in groups:
        smallest = largest = next(subiter)[1]
        d.extend(subiter)
        try:
            largest = d.pop()[1]
        except IndexError:
            pass
        yield (key, smallest, largest)
    


And in use:

py> data = [(1, 4), (5, 3), (2, 7), (2, 3), (1, 3), (2, 5), 
...         (5, 3), (4, 9)]
py> list(collect(data))
[(1, 3, 4), (2, 3, 7), (4, 9, 9), (5, 3, 3)]




[1] Betting man or not, nobody ever accused me of being a spend-thrift.



-- 
Steven

[toc] | [prev] | [next] | [standalone]


#61700

FromTim Chase <python.list@tim.thechases.com>
Date2013-12-12 06:04 -0600
Message-ID<mailman.3982.1386849829.18130.python-list@python.org>
In reply to#61699
On 2013-12-12 11:44, Steven D'Aprano wrote:
> In any case, sorting in Python is amazingly fast. You may be
> pleasantly surprised that a version that sorts your data, while
> nominally O(N log N), may be much faster than an O(N) solution that
> doesn't require sorted data. If I were a betting man, I'd be
> willing to wager a shiny new dollar[1] that sorting works out
> faster for reasonable sized sets of data.

An interesting observation given the "Optimizing list processing"
thread you recently opened about algorithms for processing large
volumes of data and finding that elbow where two algorithms cross on
the performance graph. :-)

-tkc


[toc] | [prev] | [next] | [standalone]


#61702

FromMRAB <python@mrabarnett.plus.com>
Date2013-12-12 12:36 +0000
Message-ID<mailman.3983.1386851815.18130.python-list@python.org>
In reply to#61699
On 12/12/2013 11:44, Steven D'Aprano wrote:
> On Wed, 11 Dec 2013 23:25:53 -0800, Robert Voigtländer wrote:
>
>> Hi,
>>
>> I have a list like this:
>>
>> a = [(52, 193), (52, 193), (52, 192), ...
>>
>>
>> I need to find a -performant- way to transform this into a list with
>> tuples (a[0],[a[0][1]min],[a[0][1]max]).
>
> I'm afraid I don't know what you mean by "performant". It doesn't appear
> to be an English word, so far as I can tell. Do you mean efficient?
>
[snip]

There's some debate over whether it's English or not:

http://english.stackexchange.com/questions/38945/what-is-wrong-with-the-word-performant

[toc] | [prev] | [next] | [standalone]


#61774

FromSteven D'Aprano <steve+comp.lang.python@pearwood.info>
Date2013-12-12 23:25 +0000
Message-ID<52aa45fd$0$29992$c3e8da3$5496439d@news.astraweb.com>
In reply to#61702
On Thu, 12 Dec 2013 12:36:51 +0000, MRAB wrote:

> On 12/12/2013 11:44, Steven D'Aprano wrote:
>> On Wed, 11 Dec 2013 23:25:53 -0800, Robert Voigtländer wrote:
>>
>>> Hi,
>>>
>>> I have a list like this:
>>>
>>> a = [(52, 193), (52, 193), (52, 192), ...
>>>
>>>
>>> I need to find a -performant- way to transform this into a list with
>>> tuples (a[0],[a[0][1]min],[a[0][1]max]).
>>
>> I'm afraid I don't know what you mean by "performant". It doesn't
>> appear to be an English word, so far as I can tell. Do you mean
>> efficient?
>>
> [snip]
> 
> There's some debate over whether it's English or not:
> 
> http://english.stackexchange.com/questions/38945/what-is-wrong-with-the-
word-performant


Ah! I've never come across it before, and it wasn't in any of the 
electronic dictionaries I tried, nor the dead-tree Shorter Oxford.

I think I don't dislike it.


-- 
Steven

[toc] | [prev] | [next] | [standalone]


#61704

FromPeter Otten <__peter__@web.de>
Date2013-12-12 13:54 +0100
Message-ID<mailman.3985.1386852841.18130.python-list@python.org>
In reply to#61699
Steven D'Aprano wrote:

> In any case, sorting in Python is amazingly fast. You may be pleasantly
> surprised that a version that sorts your data, while nominally
> O(N log N), may be much faster than an O(N) solution that doesn't require
> sorted data. If I were a betting man, I'd be willing to wager a shiny new
> dollar[1] that sorting works out faster for reasonable sized sets of data.

Well, that was my first reaction, too. But then

 
$ cat keyminmax.py
import operator
import itertools
import collections

def minmax_groupby(items):
    for key, group in itertools.groupby(sorted(items), 
key=operator.itemgetter(0)):
        minpair = maxpair = next(group)
        for maxpair in group:
            pass
        yield key, minpair[1], maxpair[1]

def minmax_dict(items):
    d = collections.defaultdict(list)
    for key, value in items:
        d[key].append(value)
    for key, values in d.items():
        yield key, min(values), max(values)

a = [(52, 193), (52, 193), (52, 192), (51, 193), (51, 191), (51, 190),
     (51, 189), (51, 188), (50, 194), (50, 187),(50, 186), (50, 185),
     (50, 184), (49, 194), (49, 183), (49, 182), (49, 181), (48, 194),
     (48, 180), (48, 179), (48, 178), (48, 177), (47, 194), (47, 176),
     (47, 175), (47, 174), (47, 173), (46, 195), (46, 172), (46, 171),
     (46, 170), (46, 169), (45, 195), (45, 168), (45, 167), (45, 166),
     (44, 195), (44, 165), (44, 164), (44, 163), (44, 162), (43, 195),
     (43, 161), (43, 160), (43, 159), (43, 158), (42, 196), (42, 157),
     (42, 156), (42, 155), (41, 196), (41, 154), (41, 153), (41, 152),
     (41, 151), (40, 196), (40, 150), (40, 149), (40, 148), (40, 147),
     (39, 196), (39, 146), (39, 145), (39, 144), (39, 143), (38, 196),
     (38, 142), (38, 141), (38, 140), (37, 197), (37, 139), (37, 138),
     (37, 137), (37, 136), (36, 197), (36, 135), (36, 134), (36, 133)]

from collections import deque
from itertools import groupby
from operator import itemgetter

def collect(data):
    data = sorted(data)
    groups = groupby(data, itemgetter(0))
    d = deque([], maxlen=1)
    for key, subiter in groups:
        smallest = largest = next(subiter)[1]
        d.extend(subiter)
        try:
            largest = d.pop()[1]
        except IndexError:
            pass
        yield (key, smallest, largest)


def time_dict():
    for item in minmax_dict(a):
        pass

def time_groupby():
    for item in minmax_groupby(a):
        pass

def time_daprano():
    for item in collect(a):
        pass

$ python -m timeit -s 'from keyminmax import time_groupby as t' 't()'
10000 loops, best of 3: 68.6 usec per loop
$ python -m timeit -s 'from keyminmax import time_dict as t' 't()'
10000 loops, best of 3: 53.3 usec per loop
$ python -m timeit -s 'from keyminmax import time_daprano as t' 't()'
10000 loops, best of 3: 75.7 usec per loop

So yes, sorting seems to be slower even for small datasets.

[toc] | [prev] | [next] | [standalone]


#61783

FromSteven D'Aprano <steve+comp.lang.python@pearwood.info>
Date2013-12-13 02:36 +0000
Message-ID<52aa72c8$0$29992$c3e8da3$5496439d@news.astraweb.com>
In reply to#61704
On Thu, 12 Dec 2013 13:54:10 +0100, Peter Otten wrote:

> Steven D'Aprano wrote:
> 
>> In any case, sorting in Python is amazingly fast. You may be pleasantly
>> surprised that a version that sorts your data, while nominally O(N log
>> N), may be much faster than an O(N) solution that doesn't require
>> sorted data. If I were a betting man, I'd be willing to wager a shiny
>> new dollar[1] that sorting works out faster for reasonable sized sets
>> of data.
> 
> Well, that was my first reaction, too. But then

[snip timeit results]

On my system, I don't get quite the same results as you. I find that the 
groupby solution is slightly faster than the dict solution, although both 
are slightly faster than the version I came up with. All three are plenty 
fast enough though. Using the same function definitions as you, I get:

py> from timeit import Timer
py> setup = "from __main__ import time_dict, time_groupby, time_daprano"
py> t1 = Timer("_ = time_dict()", setup)
py> t2 = Timer("_ = time_groupby()", setup)
py> t3 = Timer("_ = time_daprano()", setup)

and results:

py> min(t1.repeat(number=100000, repeat=6))
6.7522243447601795
py> min(t2.repeat(number=100000, repeat=6))
6.7246054373681545
py> min(t3.repeat(number=100000, repeat=6))
7.116707382723689

But, there's a flaw in the above analysis. The list that you give is pre-
sorted, and Python's sort routine is *very* efficient at sorting already 
sorted lists. So I randomly shuffled it and redid the tests. Now the 
advantage of the dict solution is clear:

py> from random import shuffle
py> shuffle(a)
py> min(t1.repeat(number=100000, repeat=6))
6.611802507191896
py> shuffle(a)
py> min(t2.repeat(number=100000, repeat=6))
8.499449279159307
py> shuffle(a)
py> min(t3.repeat(number=100000, repeat=6))
9.61335832811892


The dict solution is hardly changed (slightly faster, although I guess 
that is just due to random timing fluctuations and not meaningful). The 
other two, on the other hand, are quite a bit slower with unsorted data.

Nevertheless, it should be pointed out that even in the worst case, it's 
still pretty damn fast: less than 0.0001s to process a list with 78 
items, or about 1.3 microseconds per item.

A couple of other observations:

- The collect() function is the same algorithm as the minmax_groupby() 
function, the only difference being the implementation. I would expect 
(but haven't tested) that the collect() version using a deque instead of 
manually iterating over the items in each group would be faster if there 
were a lot of items per group.

- The dict version has an advantage that hashing integers is fast. If 
hashing the keys is slower, it may no longer be quite so good.

- The dict version keeps a list of all the values it sees, including 
duplicates. Instead of a list, a set may be sufficient:

def minmax_dict(items):
    d = collections.defaultdict(set)
    for key, value in items:
        d[key].add(value)
    for key, values in d.items():
        yield key, min(values), max(values)

It might even be worthwhile experimenting with something which simply 
stores the minimum and maximum values seen, rather than all the values. 
If there are a lot of items in each group, or if the comparisons are 
expensive, calculating the min and max in a single pass may be worthwhile:

http://code.activestate.com/recipes/577916-fast-minmax-function/




-- 
Steven

[toc] | [prev] | [next] | [standalone]


#61724

FromRoy Smith <roy@panix.com>
Date2013-12-12 10:02 -0500
Message-ID<roy-346C6D.10023312122013@news.panix.com>
In reply to#61699
In article <52a9a1a0$0$29992$c3e8da3$5496439d@news.astraweb.com>,
 Steven D'Aprano <steve+comp.lang.python@pearwood.info> wrote:

> I'm afraid I don't know what you mean by "performant".

I've heard the term used often.  It means something like, "performs 
well" or "runs fast".  It may or may not be an English word, but that 
doesn't stop people from using it :-)

[toc] | [prev] | [next] | [standalone]


#61728

FromMark Lawrence <breamoreboy@yahoo.co.uk>
Date2013-12-12 15:13 +0000
Message-ID<mailman.4001.1386861307.18130.python-list@python.org>
In reply to#61724
On 12/12/2013 15:02, Roy Smith wrote:
> In article <52a9a1a0$0$29992$c3e8da3$5496439d@news.astraweb.com>,
>   Steven D'Aprano <steve+comp.lang.python@pearwood.info> wrote:
>
>> I'm afraid I don't know what you mean by "performant".
>
> I've heard the term used often.  It means something like, "performs
> well" or "runs fast".  It may or may not be an English word, but that
> doesn't stop people from using it :-)
>

If "google" can be used to mean "make huge amouts of money with a 
product that is inherently flawed" then I'll happily accept "performant" 
as an English word, regardless of whether the English variant is UK, US, 
Australian, New Zealand, Soth African, Geordie, Glaswegian or any other :)

-- 
My fellow Pythonistas, ask not what our language can do for you, ask 
what you can do for our language.

Mark Lawrence

[toc] | [prev] | [next] | [standalone]


#61797

FromRobert Voigtländer <r.voigtlaender@gmail.com>
Date2013-12-12 22:28 -0800
Message-ID<c1a6930f-13ba-4e84-a35d-e50f1c00f50f@googlegroups.com>
In reply to#61728
>I've heard the term used often.  It means something like, "performs 
>well" or "runs fast".  It may or may not be an English word, but that 
>doesn't stop people from using it :-) 

> If "google" can be used to mean "make huge amouts of money with a 
> product that is inherently flawed" then I'll happily accept "performant" 
> as an English word, regardless of whether the English variant is UK, US, 
> Australian, New Zealand, Soth African, Geordie, Glaswegian or any other :)

Indeed it's not an english word. I have to stop using it. In German it's used with the meaning of "runs fast". Even though it's already not that clearly defined there.

Thanks for the help on the topic of data aggregation. It helped a lot and I again learned somthing.
I have a performant  .. well .. fast running solution now.

[toc] | [prev] | [next] | [standalone]


#61855

Fromrusi <rustompmody@gmail.com>
Date2013-12-13 10:06 -0800
Message-ID<60cce71c-dad2-4a7b-a5d0-daa45a6b0e28@googlegroups.com>
In reply to#61797
On Friday, December 13, 2013 11:58:51 AM UTC+5:30, Robert Voigtländer wrote:
> >I've heard the term used often.  It means something like, "performs 
> >well" or "runs fast".  It may or may not be an English word, but that 
> >doesn't stop people from using it :-) 

> > If "google" can be used to mean "make huge amouts of money with a 
> > product that is inherently flawed" then I'll happily accept "performant" 
> > as an English word, regardless of whether the English variant is UK, US, 
> > Australian, New Zealand, Soth African, Geordie, Glaswegian or any other :)

> Indeed it's not an english word. I have to stop using it. In German
> it's used with the meaning of "runs fast". Even though it's already
> not that clearly defined there.

> Thanks for the help on the topic of data aggregation. It helped a
> lot and I again learned somthing.  I have a performant .. well
> .. fast running solution now.

Well "performant" is performant enough for the purposes of communicating
on the python list I think :D

[toc] | [prev] | [next] | [standalone]


#61873

FromDennis Lee Bieber <wlfraed@ix.netcom.com>
Date2013-12-13 19:30 -0500
Message-ID<mailman.4102.1386981019.18130.python-list@python.org>
In reply to#61855
On Fri, 13 Dec 2013 10:06:30 -0800 (PST), rusi <rustompmody@gmail.com>
declaimed the following:

>
>Well "performant" is performant enough for the purposes of communicating
>on the python list I think :D

	Most probably could figure it out as being stylistically similar to
"conformant", which I believe IS used in English <G>

conformant => something that conforms
performant => something that performs

-- 
	Wulfraed                 Dennis Lee Bieber         AF6VN
    wlfraed@ix.netcom.com    HTTP://wlfraed.home.netcom.com/

[toc] | [prev] | [next] | [standalone]


#61930

FromTim Roberts <timr@probo.com>
Date2013-12-14 19:41 -0800
Message-ID<i59qa91d3dj5eqj743nfkombkged69lvo4@4ax.com>
In reply to#61873
Dennis Lee Bieber <wlfraed@ix.netcom.com> wrote:
>
>>
>>Well "performant" is performant enough for the purposes of communicating
>>on the python list I think :D
>
>	Most probably could figure it out as being stylistically similar to
>"conformant", which I believe IS used in English <G>
>
>conformant => something that conforms
>performant => something that performs

Yes, I suspect it comes from people expecting too much consistency.  If
something that has "conformance" is "conformant", then something that has
good "performance" must be "performant".
-- 
Tim Roberts, timr@probo.com
Providenza & Boekelheide, Inc.

[toc] | [prev] | [next] | [standalone]


#62052

FromChris Angelico <rosuav@gmail.com>
Date2013-12-16 23:08 +1100
Message-ID<mailman.4210.1387195732.18130.python-list@python.org>
In reply to#61930
On Sun, Dec 15, 2013 at 2:41 PM, Tim Roberts <timr@probo.com> wrote:
> Dennis Lee Bieber <wlfraed@ix.netcom.com> wrote:
>>
>>>
>>>Well "performant" is performant enough for the purposes of communicating
>>>on the python list I think :D
>>
>>       Most probably could figure it out as being stylistically similar to
>>"conformant", which I believe IS used in English <G>
>>
>>conformant => something that conforms
>>performant => something that performs
>
> Yes, I suspect it comes from people expecting too much consistency.  If
> something that has "conformance" is "conformant", then something that has
> good "performance" must be "performant".

In a surprising coincidence, the word "performant" came up in a Daily
MTG article today:

http://www.wizards.com/Magic/Magazine/Article.aspx?x=mtg/daily/feature/278

Though it is used in double quotes to indicate that it's not exactly
standard English :)

Funny to see two nerdy interests meet!

ChrisA

[toc] | [prev] | [next] | [standalone]


Page 1 of 2  [1] 2  Next page →

Back to top | Article view | comp.lang.python


csiph-web