Groups | Search | Server Info | Keyboard shortcuts | Login | Register [http] [https] [nntp] [nntps]
Groups > comp.lang.python > #61672 > unrolled thread
| Started by | Robert Voigtländer <r.voigtlaender@gmail.com> |
|---|---|
| First post | 2013-12-11 23:25 -0800 |
| Last post | 2013-12-13 01:33 +0000 |
| Articles | 20 on this page of 24 — 15 participants |
Back to article view | Back to comp.lang.python
min max from tuples in list Robert Voigtländer <r.voigtlaender@gmail.com> - 2013-12-11 23:25 -0800
Re: min max from tuples in list Chris Angelico <rosuav@gmail.com> - 2013-12-12 19:18 +1100
Re: min max from tuples in list Robert Voigtländer <r.voigtlaender@gmail.com> - 2013-12-12 00:34 -0800
Re: min max from tuples in list Chris Angelico <rosuav@gmail.com> - 2013-12-12 19:43 +1100
Re: min max from tuples in list Peter Otten <__peter__@web.de> - 2013-12-12 09:35 +0100
Re: min max from tuples in list Jussi Piitulainen <jpiitula@ling.helsinki.fi> - 2013-12-12 10:52 +0200
Re: min max from tuples in list Peter Otten <__peter__@web.de> - 2013-12-12 10:03 +0100
Re: min max from tuples in list Steven D'Aprano <steve+comp.lang.python@pearwood.info> - 2013-12-12 11:44 +0000
Re: min max from tuples in list Tim Chase <python.list@tim.thechases.com> - 2013-12-12 06:04 -0600
Re: min max from tuples in list MRAB <python@mrabarnett.plus.com> - 2013-12-12 12:36 +0000
Re: min max from tuples in list Steven D'Aprano <steve+comp.lang.python@pearwood.info> - 2013-12-12 23:25 +0000
Re: min max from tuples in list Peter Otten <__peter__@web.de> - 2013-12-12 13:54 +0100
Re: min max from tuples in list Steven D'Aprano <steve+comp.lang.python@pearwood.info> - 2013-12-13 02:36 +0000
Re: min max from tuples in list Roy Smith <roy@panix.com> - 2013-12-12 10:02 -0500
Re: min max from tuples in list Mark Lawrence <breamoreboy@yahoo.co.uk> - 2013-12-12 15:13 +0000
Re: min max from tuples in list Robert Voigtländer <r.voigtlaender@gmail.com> - 2013-12-12 22:28 -0800
Re: min max from tuples in list rusi <rustompmody@gmail.com> - 2013-12-13 10:06 -0800
Re: min max from tuples in list Dennis Lee Bieber <wlfraed@ix.netcom.com> - 2013-12-13 19:30 -0500
Re: min max from tuples in list Tim Roberts <timr@probo.com> - 2013-12-14 19:41 -0800
Re: min max from tuples in list Chris Angelico <rosuav@gmail.com> - 2013-12-16 23:08 +1100
Re: min max from tuples in list rusi <rustompmody@gmail.com> - 2013-12-16 07:49 -0800
Re: min max from tuples in list Ned Batchelder <ned@nedbatchelder.com> - 2013-12-16 10:59 -0500
Re: min max from tuples in list Gregory Ewing <greg.ewing@canterbury.ac.nz> - 2013-12-17 11:49 +1300
Re: min max from tuples in list Denis McMahon <denismfmcmahon@gmail.com> - 2013-12-13 01:33 +0000
Page 1 of 2 [1] 2 Next page →
| From | Robert Voigtländer <r.voigtlaender@gmail.com> |
|---|---|
| Date | 2013-12-11 23:25 -0800 |
| Subject | min max from tuples in list |
| Message-ID | <f78a11dc-efdd-4108-8d1f-59386f020fd0@googlegroups.com> |
Hi, I have a list like this: a = [(52, 193), (52, 193), (52, 192), (51, 193), (51, 191), (51, 190), (51, 189), (51, 188), (50, 194), (50, 187), (50, 186), (50, 185), (50, 184), (49, 194), (49, 183), (49, 182), (49, 181), (48, 194), (48, 180), (48, 179), (48, 178), (48, 177), (47, 194), (47, 176), (47, 175), (47, 174), (47, 173), (46, 195), (46, 172), (46, 171), (46, 170), (46, 169), (45, 195), (45, 168), (45, 167), (45, 166), (44, 195), (44, 165), (44, 164), (44, 163), (44, 162), (43, 195), (43, 161), (43, 160), (43, 159), (43, 158), (42, 196), (42, 157), (42, 156), (42, 155), (41, 196), (41, 154), (41, 153), (41, 152), (41, 151), (40, 196), (40, 150), (40, 149), (40, 148), (40, 147), (39, 196), (39, 146), (39, 145), (39, 144), (39, 143), (38, 196), (38, 142), (38, 141), (38, 140), (37, 197), (37, 139), (37, 138), (37, 137), (37, 136), (36, 197), (36, 135), (36, 134), (36, 133)] I need to find a -performant- way to transform this into a list with tuples (a[0],[a[0][1]min],[a[0][1]max]). Hard to explaint what I mean .. [0] of the first three tuples is 52. [1] is 193,193 and 192. What I need as result for these three tuples is: (52,192,193). For the next five tuples it is (51,188,193). Extra challenges: - This list is sorted. For performance reasons I would like to keep it unsorted. - There may be tuples where min=max. - There my be tupples where [0] only exists once. So mix is automatically max I hope I was able to explain what I mean. Thanks Robert
[toc] | [next] | [standalone]
| From | Chris Angelico <rosuav@gmail.com> |
|---|---|
| Date | 2013-12-12 19:18 +1100 |
| Message-ID | <mailman.3969.1386836325.18130.python-list@python.org> |
| In reply to | #61672 |
On Thu, Dec 12, 2013 at 6:25 PM, Robert Voigtländer
<r.voigtlaender@gmail.com> wrote:
> I need to find a -performant- way to transform this into a list with tuples (a[0],[a[0][1]min],[a[0][1]max]).
>
> Hard to explaint what I mean .. [0] of the first three tuples is 52. [1] is 193,193 and 192.
> What I need as result for these three tuples is: (52,192,193).
>
> For the next five tuples it is (51,188,193).
>
>
> Extra challenges:
> - This list is sorted. For performance reasons I would like to keep it unsorted.
> - There may be tuples where min=max.
> - There my be tupples where [0] only exists once. So mix is automatically max
Yep, I see what you mean! Apart from the first of the challenges,
which is ambiguous: do you mean you'd rather be able to work with it
unsorted, or is that a typo, "keep it sorted"?
This is a common task of aggregation. Your list is of (key, value)
tuples, and you want to do some per-key statistics. Here are three
variants on the code:
# Fastest version, depends on the keys being already grouped
# and the values sorted within each group. It actually returns
# the last and first, not the smallest and largest.
def min_max_1(lst):
prev_key = None
for key, value in lst:
if key != prev_key:
if prev_key is not None: yield prev_key, value, key_max
key_max = value
if prev_key is not None: yield prev_key, value, key_max
# This version depends on the keys being grouped, but
# not on them being sorted within the groups.
def min_max_2(lst):
prev_key = None
for key, value in lst:
if key != prev_key:
if prev_key is not None: yield prev_key, key_min, key_max
key_min = key_max = value
else:
key_min = min(key_min, value)
key_max = min(key_max, value)
if prev_key is not None: yield prev_key, key_min, key_max
# Slowest version, does not depend on either the keys
# or the values being sorted. Will iterate over the entire
# list before producing any results. Returns tuples in
# arbitrary order, unlike the others (which will retain).
def min_max_3(lst):
data = {}
for key, value in lst:
if key not in data:
data[key]=(value, value)
else:
data[key][0] = min(data[key][0], value)
data[key][1] = min(data[key][1], value)
for key, minmax in data.items():
yield key, minmax[0], minmax[1]
Each of these is a generator that yields (key, min, max) tuples. The
third one needs the most memory and execution time; the others simply
take the input as it comes. None of them actually requires that the
input be a list - any iterable will do.
ChrisA
[toc] | [prev] | [next] | [standalone]
| From | Robert Voigtländer <r.voigtlaender@gmail.com> |
|---|---|
| Date | 2013-12-12 00:34 -0800 |
| Message-ID | <5aadb848-80e9-460a-b4bb-a8406c46730d@googlegroups.com> |
| In reply to | #61672 |
Wow, thanks for the educating answer. I'll work through all the varaints. And yes, I meant keep it unsorted. As I read it, sorting may be required then if I don't want to use the slowest variant. I'll test them all. Thanks Robert
[toc] | [prev] | [next] | [standalone]
| From | Chris Angelico <rosuav@gmail.com> |
|---|---|
| Date | 2013-12-12 19:43 +1100 |
| Message-ID | <mailman.3976.1386837789.18130.python-list@python.org> |
| In reply to | #61682 |
On Thu, Dec 12, 2013 at 7:34 PM, Robert Voigtländer <r.voigtlaender@gmail.com> wrote: > Wow, thanks for the educating answer. I'll work through all the varaints. > And yes, I meant keep it unsorted. > > As I read it, sorting may be required then if I don't want to use the slowest variant. I'll test them all. Sorting would be slower than using the dictionary method. Much slower. If you don't need to sort for any other reason, don't sort! By the way, it's usually courteous to quote at least some of the text you're responding to, to give context. Otherwise nobody knows what you're talking about. This isn't a PHPBB forum; each post actually is quite separate. You may find that Google Groups makes your life unnecessarily difficult in this, so it's probably easier to use the mailing list: https://mail.python.org/mailman/listinfo/python-list ChrisA
[toc] | [prev] | [next] | [standalone]
| From | Peter Otten <__peter__@web.de> |
|---|---|
| Date | 2013-12-12 09:35 +0100 |
| Message-ID | <mailman.3973.1386837348.18130.python-list@python.org> |
| In reply to | #61672 |
Robert Voigtländer wrote:
> Hi,
>
> I have a list like this:
>
> a = [(52, 193), (52, 193), (52, 192), (51, 193), (51, 191), (51, 190),
> (51, 189), (51, 188), (50, 194), (50, 187), (50, 186), (50, 185), (50,
> 184), (49, 194), (49, 183), (49, 182), (49, 181), (48, 194), (48, 180),
> (48, 179), (48, 178), (48, 177), (47, 194), (47, 176), (47, 175), (47,
> 174), (47, 173), (46, 195), (46, 172), (46, 171), (46, 170), (46, 169),
> (45, 195), (45, 168), (45, 167), (45, 166), (44, 195), (44, 165), (44,
> 164), (44, 163), (44, 162), (43, 195), (43, 161), (43, 160), (43, 159),
> (43, 158), (42, 196), (42, 157), (42, 156), (42, 155), (41, 196), (41,
> 154), (41, 153), (41, 152), (41, 151), (40, 196), (40, 150), (40, 149),
> (40, 148), (40, 147), (39, 196), (39, 146), (39, 145), (39, 144), (39,
> 143), (38, 196), (38, 142), (38, 141), (38, 140), (37, 197), (37, 139),
> (37, 138), (37
> , 137), (37, 136), (36, 197), (36, 135), (36, 134), (36, 133)]
>
>
> I need to find a -performant- way to transform this into a list with
> tuples (a[0],[a[0][1]min],[a[0][1]max]).
>
> Hard to explaint what I mean .. [0] of the first three tuples is 52. [1]
> is 193,193 and 192. What I need as result for these three tuples is:
> (52,192,193).
>
> For the next five tuples it is (51,188,193).
>
>
> Extra challenges:
> - This list is sorted. For performance reasons I would like to keep it
> unsorted. - There may be tuples where min=max.
> - There my be tupples where [0] only exists once. So mix is automatically
> max
>
>
> I hope I was able to explain what I mean.
I have a hunch that sorting the list might be quite efficient. You should at
least try
import operator
import itertools
a = ...
a.sort()
result = []
for key, group in itertools.groupby(a, key=operator.itemgetter(0)):
minpair = maxpair = next(group)
for maxpair in group:
pass
result.append((key, minpair[1], maxpair[1]))
for item in result:
print(item)
to see whether it is good enough.
[toc] | [prev] | [next] | [standalone]
| From | Jussi Piitulainen <jpiitula@ling.helsinki.fi> |
|---|---|
| Date | 2013-12-12 10:52 +0200 |
| Message-ID | <qotiouuzc1v.fsf@ruuvi.it.helsinki.fi> |
| In reply to | #61672 |
Robert Voigtländer writes:
> Hi,
>
> I have a list like this:
# shortened:
a = [(52, 193), (52, 193), (52, 192),
(51, 193), (51, 191), (51, 190), (51, 189), (51, 188),
(50, 194)]
> I need to find a -performant- way to transform this into a list with
> tuples (a[0],[a[0][1]min],[a[0][1]max]).
>
> Hard to explaint what I mean .. [0] of the first three tuples is
> 52. [1] is 193,193 and 192. What I need as result for these three
> tuples is: (52,192,193).
>
> For the next five tuples it is (51,188,193).
I think you want [(50, 194, 194), (51, 188, 193), (52, 192, 193)] for
the shortened list, in some order. Then you might be happy with a dict
instead. If I misunderstand, the following will be irrelevant.
> Extra challenges:
> - This list is sorted. For performance reasons I would like to keep
> it unsorted.
> - There may be tuples where min=max.
> - There my be tupples where [0] only exists once. So mix is
> automatically max
Scan the list and keep the running min and max for each key in a dict.
mix = dict()
for k,v in a:
mix[k] = ( min(mix.get(k, (v,v))[0], v),
max(mix.get(k, (v,v))[1], v) )
# mix == {50: (194, 194), 51: (188, 193), 52: (192, 193)}
# As list: [ (k,v[0],v[1]) for k,v in mix.items() ]
[toc] | [prev] | [next] | [standalone]
| From | Peter Otten <__peter__@web.de> |
|---|---|
| Date | 2013-12-12 10:03 +0100 |
| Message-ID | <mailman.3977.1386839016.18130.python-list@python.org> |
| In reply to | #61672 |
Peter Otten wrote:
> Robert Voigtländer wrote:
>
>> Hi,
>>
>> I have a list like this:
>>
>> a = [(52, 193), (52, 193), (52, 192), (51, 193), (51, 191), (51, 190),
>> (51, 189), (51, 188), (50, 194), (50, 187), (50, 186), (50, 185), (50,
>> 184), (49, 194), (49, 183), (49, 182), (49, 181), (48, 194), (48, 180),
>> (48, 179), (48, 178), (48, 177), (47, 194), (47, 176), (47, 175), (47,
>> 174), (47, 173), (46, 195), (46, 172), (46, 171), (46, 170), (46, 169),
>> (45, 195), (45, 168), (45, 167), (45, 166), (44, 195), (44, 165), (44,
>> 164), (44, 163), (44, 162), (43, 195), (43, 161), (43, 160), (43, 159),
>> (43, 158), (42, 196), (42, 157), (42, 156), (42, 155), (41, 196), (41,
>> 154), (41, 153), (41, 152), (41, 151), (40, 196), (40, 150), (40, 149),
>> (40, 148), (40, 147), (39, 196), (39, 146), (39, 145), (39, 144), (39,
>> 143), (38, 196), (38, 142), (38, 141), (38, 140), (37, 197), (37, 139),
>> (37, 138), (37
>> , 137), (37, 136), (36, 197), (36, 135), (36, 134), (36, 133)]
>>
>>
>> I need to find a -performant- way to transform this into a list with
>> tuples (a[0],[a[0][1]min],[a[0][1]max]).
>>
>> Hard to explaint what I mean .. [0] of the first three tuples is 52. [1]
>> is 193,193 and 192. What I need as result for these three tuples is:
>> (52,192,193).
>>
>> For the next five tuples it is (51,188,193).
>>
>>
>> Extra challenges:
>> - This list is sorted. For performance reasons I would like to keep it
>> unsorted. - There may be tuples where min=max.
>> - There my be tupples where [0] only exists once. So mix is automatically
>> max
>>
>>
>> I hope I was able to explain what I mean.
>
> I have a hunch that sorting the list might be quite efficient. You should
> at least try
>
> import operator
> import itertools
>
> a = ...
> a.sort()
> result = []
> for key, group in itertools.groupby(a, key=operator.itemgetter(0)):
> minpair = maxpair = next(group)
> for maxpair in group:
> pass
> result.append((key, minpair[1], maxpair[1]))
>
> for item in result:
> print(item)
>
> to see whether it is good enough.
On second thought -- Chris Angelico may be right ;)
So here's my dict-based variant:
def keyminmax(items):
d = collections.defaultdict(list)
for key, value in items:
d[key].append(value)
for key, values in d.items(): # d.iteritems() in python 2
yield key, min(values), max(values)
for item in keyminmax(a):
print(item)
It uses a bit more memory than Chris' code, but has fewer min()/max() calls.
[toc] | [prev] | [next] | [standalone]
| From | Steven D'Aprano <steve+comp.lang.python@pearwood.info> |
|---|---|
| Date | 2013-12-12 11:44 +0000 |
| Message-ID | <52a9a1a0$0$29992$c3e8da3$5496439d@news.astraweb.com> |
| In reply to | #61672 |
On Wed, 11 Dec 2013 23:25:53 -0800, Robert Voigtländer wrote:
> Hi,
>
> I have a list like this:
>
> a = [(52, 193), (52, 193), (52, 192), ...
>
>
> I need to find a -performant- way to transform this into a list with
> tuples (a[0],[a[0][1]min],[a[0][1]max]).
I'm afraid I don't know what you mean by "performant". It doesn't appear
to be an English word, so far as I can tell. Do you mean efficient?
> Hard to explaint what I mean .. [0] of the first three tuples is 52. [1]
> is 193,193 and 192. What I need as result for these three tuples is:
> (52,192,193).
>
> For the next five tuples it is (51,188,193).
>
>
> Extra challenges:
> - This list is sorted. For performance reasons I would like to keep it
> unsorted.
Normally when people talk about "performance", they mean to *maximise*
it, not minimise it. If you can guarantee that your data is already pre-
sorted, then the resulting algorithm is likely to be much more efficient
than one which doesn't require sorted data.
In any case, sorting in Python is amazingly fast. You may be pleasantly
surprised that a version that sorts your data, while nominally
O(N log N), may be much faster than an O(N) solution that doesn't require
sorted data. If I were a betting man, I'd be willing to wager a shiny new
dollar[1] that sorting works out faster for reasonable sized sets of data.
One general piece of advice for speeding up Python code: the more work
you can delegate to Python built-ins, like sort, min, max and so forth,
the faster your code is likely to be.
> - There may be tuples where min=max.
> - There my be tupples where [0] only exists once. So mix is
> automatically max
Neither of these should be a problem.
The first approach I'd try would be:
- sort the list, which will group the tuples by their first item;
- conveniently, it will also ensure that the first tuple in each group
will have the minimum second item, and the last tuple of that group the
maximum value;
- use itertools.groupby to extract each group;
- grab the first tuple from the group, and take both its items;
- grab the last tuple from the group (if any), and take its second item;
- yield those three items.
Something like this should work, I expect:
from collections import deque
from itertools import groupby
from operator import itemgetter
def collect(data):
data = sorted(data)
groups = groupby(data, itemgetter(0))
d = deque([], maxlen=1)
for key, subiter in groups:
smallest = largest = next(subiter)[1]
d.extend(subiter)
try:
largest = d.pop()[1]
except IndexError:
pass
yield (key, smallest, largest)
And in use:
py> data = [(1, 4), (5, 3), (2, 7), (2, 3), (1, 3), (2, 5),
... (5, 3), (4, 9)]
py> list(collect(data))
[(1, 3, 4), (2, 3, 7), (4, 9, 9), (5, 3, 3)]
[1] Betting man or not, nobody ever accused me of being a spend-thrift.
--
Steven
[toc] | [prev] | [next] | [standalone]
| From | Tim Chase <python.list@tim.thechases.com> |
|---|---|
| Date | 2013-12-12 06:04 -0600 |
| Message-ID | <mailman.3982.1386849829.18130.python-list@python.org> |
| In reply to | #61699 |
On 2013-12-12 11:44, Steven D'Aprano wrote: > In any case, sorting in Python is amazingly fast. You may be > pleasantly surprised that a version that sorts your data, while > nominally O(N log N), may be much faster than an O(N) solution that > doesn't require sorted data. If I were a betting man, I'd be > willing to wager a shiny new dollar[1] that sorting works out > faster for reasonable sized sets of data. An interesting observation given the "Optimizing list processing" thread you recently opened about algorithms for processing large volumes of data and finding that elbow where two algorithms cross on the performance graph. :-) -tkc
[toc] | [prev] | [next] | [standalone]
| From | MRAB <python@mrabarnett.plus.com> |
|---|---|
| Date | 2013-12-12 12:36 +0000 |
| Message-ID | <mailman.3983.1386851815.18130.python-list@python.org> |
| In reply to | #61699 |
On 12/12/2013 11:44, Steven D'Aprano wrote: > On Wed, 11 Dec 2013 23:25:53 -0800, Robert Voigtländer wrote: > >> Hi, >> >> I have a list like this: >> >> a = [(52, 193), (52, 193), (52, 192), ... >> >> >> I need to find a -performant- way to transform this into a list with >> tuples (a[0],[a[0][1]min],[a[0][1]max]). > > I'm afraid I don't know what you mean by "performant". It doesn't appear > to be an English word, so far as I can tell. Do you mean efficient? > [snip] There's some debate over whether it's English or not: http://english.stackexchange.com/questions/38945/what-is-wrong-with-the-word-performant
[toc] | [prev] | [next] | [standalone]
| From | Steven D'Aprano <steve+comp.lang.python@pearwood.info> |
|---|---|
| Date | 2013-12-12 23:25 +0000 |
| Message-ID | <52aa45fd$0$29992$c3e8da3$5496439d@news.astraweb.com> |
| In reply to | #61702 |
On Thu, 12 Dec 2013 12:36:51 +0000, MRAB wrote: > On 12/12/2013 11:44, Steven D'Aprano wrote: >> On Wed, 11 Dec 2013 23:25:53 -0800, Robert Voigtländer wrote: >> >>> Hi, >>> >>> I have a list like this: >>> >>> a = [(52, 193), (52, 193), (52, 192), ... >>> >>> >>> I need to find a -performant- way to transform this into a list with >>> tuples (a[0],[a[0][1]min],[a[0][1]max]). >> >> I'm afraid I don't know what you mean by "performant". It doesn't >> appear to be an English word, so far as I can tell. Do you mean >> efficient? >> > [snip] > > There's some debate over whether it's English or not: > > http://english.stackexchange.com/questions/38945/what-is-wrong-with-the- word-performant Ah! I've never come across it before, and it wasn't in any of the electronic dictionaries I tried, nor the dead-tree Shorter Oxford. I think I don't dislike it. -- Steven
[toc] | [prev] | [next] | [standalone]
| From | Peter Otten <__peter__@web.de> |
|---|---|
| Date | 2013-12-12 13:54 +0100 |
| Message-ID | <mailman.3985.1386852841.18130.python-list@python.org> |
| In reply to | #61699 |
Steven D'Aprano wrote:
> In any case, sorting in Python is amazingly fast. You may be pleasantly
> surprised that a version that sorts your data, while nominally
> O(N log N), may be much faster than an O(N) solution that doesn't require
> sorted data. If I were a betting man, I'd be willing to wager a shiny new
> dollar[1] that sorting works out faster for reasonable sized sets of data.
Well, that was my first reaction, too. But then
$ cat keyminmax.py
import operator
import itertools
import collections
def minmax_groupby(items):
for key, group in itertools.groupby(sorted(items),
key=operator.itemgetter(0)):
minpair = maxpair = next(group)
for maxpair in group:
pass
yield key, minpair[1], maxpair[1]
def minmax_dict(items):
d = collections.defaultdict(list)
for key, value in items:
d[key].append(value)
for key, values in d.items():
yield key, min(values), max(values)
a = [(52, 193), (52, 193), (52, 192), (51, 193), (51, 191), (51, 190),
(51, 189), (51, 188), (50, 194), (50, 187),(50, 186), (50, 185),
(50, 184), (49, 194), (49, 183), (49, 182), (49, 181), (48, 194),
(48, 180), (48, 179), (48, 178), (48, 177), (47, 194), (47, 176),
(47, 175), (47, 174), (47, 173), (46, 195), (46, 172), (46, 171),
(46, 170), (46, 169), (45, 195), (45, 168), (45, 167), (45, 166),
(44, 195), (44, 165), (44, 164), (44, 163), (44, 162), (43, 195),
(43, 161), (43, 160), (43, 159), (43, 158), (42, 196), (42, 157),
(42, 156), (42, 155), (41, 196), (41, 154), (41, 153), (41, 152),
(41, 151), (40, 196), (40, 150), (40, 149), (40, 148), (40, 147),
(39, 196), (39, 146), (39, 145), (39, 144), (39, 143), (38, 196),
(38, 142), (38, 141), (38, 140), (37, 197), (37, 139), (37, 138),
(37, 137), (37, 136), (36, 197), (36, 135), (36, 134), (36, 133)]
from collections import deque
from itertools import groupby
from operator import itemgetter
def collect(data):
data = sorted(data)
groups = groupby(data, itemgetter(0))
d = deque([], maxlen=1)
for key, subiter in groups:
smallest = largest = next(subiter)[1]
d.extend(subiter)
try:
largest = d.pop()[1]
except IndexError:
pass
yield (key, smallest, largest)
def time_dict():
for item in minmax_dict(a):
pass
def time_groupby():
for item in minmax_groupby(a):
pass
def time_daprano():
for item in collect(a):
pass
$ python -m timeit -s 'from keyminmax import time_groupby as t' 't()'
10000 loops, best of 3: 68.6 usec per loop
$ python -m timeit -s 'from keyminmax import time_dict as t' 't()'
10000 loops, best of 3: 53.3 usec per loop
$ python -m timeit -s 'from keyminmax import time_daprano as t' 't()'
10000 loops, best of 3: 75.7 usec per loop
So yes, sorting seems to be slower even for small datasets.
[toc] | [prev] | [next] | [standalone]
| From | Steven D'Aprano <steve+comp.lang.python@pearwood.info> |
|---|---|
| Date | 2013-12-13 02:36 +0000 |
| Message-ID | <52aa72c8$0$29992$c3e8da3$5496439d@news.astraweb.com> |
| In reply to | #61704 |
On Thu, 12 Dec 2013 13:54:10 +0100, Peter Otten wrote:
> Steven D'Aprano wrote:
>
>> In any case, sorting in Python is amazingly fast. You may be pleasantly
>> surprised that a version that sorts your data, while nominally O(N log
>> N), may be much faster than an O(N) solution that doesn't require
>> sorted data. If I were a betting man, I'd be willing to wager a shiny
>> new dollar[1] that sorting works out faster for reasonable sized sets
>> of data.
>
> Well, that was my first reaction, too. But then
[snip timeit results]
On my system, I don't get quite the same results as you. I find that the
groupby solution is slightly faster than the dict solution, although both
are slightly faster than the version I came up with. All three are plenty
fast enough though. Using the same function definitions as you, I get:
py> from timeit import Timer
py> setup = "from __main__ import time_dict, time_groupby, time_daprano"
py> t1 = Timer("_ = time_dict()", setup)
py> t2 = Timer("_ = time_groupby()", setup)
py> t3 = Timer("_ = time_daprano()", setup)
and results:
py> min(t1.repeat(number=100000, repeat=6))
6.7522243447601795
py> min(t2.repeat(number=100000, repeat=6))
6.7246054373681545
py> min(t3.repeat(number=100000, repeat=6))
7.116707382723689
But, there's a flaw in the above analysis. The list that you give is pre-
sorted, and Python's sort routine is *very* efficient at sorting already
sorted lists. So I randomly shuffled it and redid the tests. Now the
advantage of the dict solution is clear:
py> from random import shuffle
py> shuffle(a)
py> min(t1.repeat(number=100000, repeat=6))
6.611802507191896
py> shuffle(a)
py> min(t2.repeat(number=100000, repeat=6))
8.499449279159307
py> shuffle(a)
py> min(t3.repeat(number=100000, repeat=6))
9.61335832811892
The dict solution is hardly changed (slightly faster, although I guess
that is just due to random timing fluctuations and not meaningful). The
other two, on the other hand, are quite a bit slower with unsorted data.
Nevertheless, it should be pointed out that even in the worst case, it's
still pretty damn fast: less than 0.0001s to process a list with 78
items, or about 1.3 microseconds per item.
A couple of other observations:
- The collect() function is the same algorithm as the minmax_groupby()
function, the only difference being the implementation. I would expect
(but haven't tested) that the collect() version using a deque instead of
manually iterating over the items in each group would be faster if there
were a lot of items per group.
- The dict version has an advantage that hashing integers is fast. If
hashing the keys is slower, it may no longer be quite so good.
- The dict version keeps a list of all the values it sees, including
duplicates. Instead of a list, a set may be sufficient:
def minmax_dict(items):
d = collections.defaultdict(set)
for key, value in items:
d[key].add(value)
for key, values in d.items():
yield key, min(values), max(values)
It might even be worthwhile experimenting with something which simply
stores the minimum and maximum values seen, rather than all the values.
If there are a lot of items in each group, or if the comparisons are
expensive, calculating the min and max in a single pass may be worthwhile:
http://code.activestate.com/recipes/577916-fast-minmax-function/
--
Steven
[toc] | [prev] | [next] | [standalone]
| From | Roy Smith <roy@panix.com> |
|---|---|
| Date | 2013-12-12 10:02 -0500 |
| Message-ID | <roy-346C6D.10023312122013@news.panix.com> |
| In reply to | #61699 |
In article <52a9a1a0$0$29992$c3e8da3$5496439d@news.astraweb.com>, Steven D'Aprano <steve+comp.lang.python@pearwood.info> wrote: > I'm afraid I don't know what you mean by "performant". I've heard the term used often. It means something like, "performs well" or "runs fast". It may or may not be an English word, but that doesn't stop people from using it :-)
[toc] | [prev] | [next] | [standalone]
| From | Mark Lawrence <breamoreboy@yahoo.co.uk> |
|---|---|
| Date | 2013-12-12 15:13 +0000 |
| Message-ID | <mailman.4001.1386861307.18130.python-list@python.org> |
| In reply to | #61724 |
On 12/12/2013 15:02, Roy Smith wrote: > In article <52a9a1a0$0$29992$c3e8da3$5496439d@news.astraweb.com>, > Steven D'Aprano <steve+comp.lang.python@pearwood.info> wrote: > >> I'm afraid I don't know what you mean by "performant". > > I've heard the term used often. It means something like, "performs > well" or "runs fast". It may or may not be an English word, but that > doesn't stop people from using it :-) > If "google" can be used to mean "make huge amouts of money with a product that is inherently flawed" then I'll happily accept "performant" as an English word, regardless of whether the English variant is UK, US, Australian, New Zealand, Soth African, Geordie, Glaswegian or any other :) -- My fellow Pythonistas, ask not what our language can do for you, ask what you can do for our language. Mark Lawrence
[toc] | [prev] | [next] | [standalone]
| From | Robert Voigtländer <r.voigtlaender@gmail.com> |
|---|---|
| Date | 2013-12-12 22:28 -0800 |
| Message-ID | <c1a6930f-13ba-4e84-a35d-e50f1c00f50f@googlegroups.com> |
| In reply to | #61728 |
>I've heard the term used often. It means something like, "performs >well" or "runs fast". It may or may not be an English word, but that >doesn't stop people from using it :-) > If "google" can be used to mean "make huge amouts of money with a > product that is inherently flawed" then I'll happily accept "performant" > as an English word, regardless of whether the English variant is UK, US, > Australian, New Zealand, Soth African, Geordie, Glaswegian or any other :) Indeed it's not an english word. I have to stop using it. In German it's used with the meaning of "runs fast". Even though it's already not that clearly defined there. Thanks for the help on the topic of data aggregation. It helped a lot and I again learned somthing. I have a performant .. well .. fast running solution now.
[toc] | [prev] | [next] | [standalone]
| From | rusi <rustompmody@gmail.com> |
|---|---|
| Date | 2013-12-13 10:06 -0800 |
| Message-ID | <60cce71c-dad2-4a7b-a5d0-daa45a6b0e28@googlegroups.com> |
| In reply to | #61797 |
On Friday, December 13, 2013 11:58:51 AM UTC+5:30, Robert Voigtländer wrote: > >I've heard the term used often. It means something like, "performs > >well" or "runs fast". It may or may not be an English word, but that > >doesn't stop people from using it :-) > > If "google" can be used to mean "make huge amouts of money with a > > product that is inherently flawed" then I'll happily accept "performant" > > as an English word, regardless of whether the English variant is UK, US, > > Australian, New Zealand, Soth African, Geordie, Glaswegian or any other :) > Indeed it's not an english word. I have to stop using it. In German > it's used with the meaning of "runs fast". Even though it's already > not that clearly defined there. > Thanks for the help on the topic of data aggregation. It helped a > lot and I again learned somthing. I have a performant .. well > .. fast running solution now. Well "performant" is performant enough for the purposes of communicating on the python list I think :D
[toc] | [prev] | [next] | [standalone]
| From | Dennis Lee Bieber <wlfraed@ix.netcom.com> |
|---|---|
| Date | 2013-12-13 19:30 -0500 |
| Message-ID | <mailman.4102.1386981019.18130.python-list@python.org> |
| In reply to | #61855 |
On Fri, 13 Dec 2013 10:06:30 -0800 (PST), rusi <rustompmody@gmail.com>
declaimed the following:
>
>Well "performant" is performant enough for the purposes of communicating
>on the python list I think :D
Most probably could figure it out as being stylistically similar to
"conformant", which I believe IS used in English <G>
conformant => something that conforms
performant => something that performs
--
Wulfraed Dennis Lee Bieber AF6VN
wlfraed@ix.netcom.com HTTP://wlfraed.home.netcom.com/
[toc] | [prev] | [next] | [standalone]
| From | Tim Roberts <timr@probo.com> |
|---|---|
| Date | 2013-12-14 19:41 -0800 |
| Message-ID | <i59qa91d3dj5eqj743nfkombkged69lvo4@4ax.com> |
| In reply to | #61873 |
Dennis Lee Bieber <wlfraed@ix.netcom.com> wrote: > >> >>Well "performant" is performant enough for the purposes of communicating >>on the python list I think :D > > Most probably could figure it out as being stylistically similar to >"conformant", which I believe IS used in English <G> > >conformant => something that conforms >performant => something that performs Yes, I suspect it comes from people expecting too much consistency. If something that has "conformance" is "conformant", then something that has good "performance" must be "performant". -- Tim Roberts, timr@probo.com Providenza & Boekelheide, Inc.
[toc] | [prev] | [next] | [standalone]
| From | Chris Angelico <rosuav@gmail.com> |
|---|---|
| Date | 2013-12-16 23:08 +1100 |
| Message-ID | <mailman.4210.1387195732.18130.python-list@python.org> |
| In reply to | #61930 |
On Sun, Dec 15, 2013 at 2:41 PM, Tim Roberts <timr@probo.com> wrote: > Dennis Lee Bieber <wlfraed@ix.netcom.com> wrote: >> >>> >>>Well "performant" is performant enough for the purposes of communicating >>>on the python list I think :D >> >> Most probably could figure it out as being stylistically similar to >>"conformant", which I believe IS used in English <G> >> >>conformant => something that conforms >>performant => something that performs > > Yes, I suspect it comes from people expecting too much consistency. If > something that has "conformance" is "conformant", then something that has > good "performance" must be "performant". In a surprising coincidence, the word "performant" came up in a Daily MTG article today: http://www.wizards.com/Magic/Magazine/Article.aspx?x=mtg/daily/feature/278 Though it is used in double quotes to indicate that it's not exactly standard English :) Funny to see two nerdy interests meet! ChrisA
[toc] | [prev] | [next] | [standalone]
Page 1 of 2 [1] 2 Next page →
Back to top | Article view | comp.lang.python
csiph-web