Path: csiph.com!usenet.pasdenom.info!news.albasani.net!feeder.erje.net!eu.feeder.erje.net!newsfeed.xs4all.nl!newsfeed1.news.xs4all.nl!xs4all!newsgate.cistron.nl!newsgate.news.xs4all.nl!post.news.xs4all.nl!not-for-mail
MIME-Version: 1.0
In-Reply-To: <f78a11dc-efdd-4108-8d1f-59386f020fd0@googlegroups.com>
References: <f78a11dc-efdd-4108-8d1f-59386f020fd0@googlegroups.com>
Date: Thu, 12 Dec 2013 19:18:41 +1100
Subject: Re: min max from tuples in list
From: Chris Angelico <rosuav@gmail.com>
Cc: "python-list@python.org" <python-list@python.org>
Content-Type: text/plain; charset=UTF-8
Content-Transfer-Encoding: quoted-printable
Precedence: list
Newsgroups: comp.lang.python
Message-ID: <mailman.3969.1386836325.18130.python-list@python.org>
Lines: 72
NNTP-Posting-Host: 2001:888:2000:d::a6
Xref: csiph.com comp.lang.python:61678

On Thu, Dec 12, 2013 at 6:25 PM, Robert Voigtl=C3=A4nder
<r.voigtlaender@gmail.com> wrote:
> I need to find a -performant- way to transform this into a list with tupl=
es (a[0],[a[0][1]min],[a[0][1]max]).
>
> Hard to explaint what I mean .. [0] of the first three tuples is 52. [1] =
is 193,193 and 192.
> What I need as result for these three tuples is: (52,192,193).
>
> For the next five tuples it is (51,188,193).
>
>
> Extra challenges:
> - This list is sorted. For performance reasons I would like to keep it un=
sorted.
> - There may be tuples where min=3Dmax.
> - There my be tupples where [0] only exists once. So mix is automatically=
 max

Yep, I see what you mean! Apart from the first of the challenges,
which is ambiguous: do you mean you'd rather be able to work with it
unsorted, or is that a typo, "keep it sorted"?

This is a common task of aggregation. Your list is of (key, value)
tuples, and you want to do some per-key statistics. Here are three
variants on the code:

# Fastest version, depends on the keys being already grouped
# and the values sorted within each group. It actually returns
# the last and first, not the smallest and largest.
def min_max_1(lst):
    prev_key =3D None
    for key, value in lst:
        if key !=3D prev_key:
            if prev_key is not None: yield prev_key, value, key_max
            key_max =3D value
    if prev_key is not None: yield prev_key, value, key_max

# This version depends on the keys being grouped, but
# not on them being sorted within the groups.
def min_max_2(lst):
    prev_key =3D None
    for key, value in lst:
        if key !=3D prev_key:
            if prev_key is not None: yield prev_key, key_min, key_max
            key_min =3D key_max =3D value
        else:
            key_min =3D min(key_min, value)
            key_max =3D min(key_max, value)
    if prev_key is not None: yield prev_key, key_min, key_max

# Slowest version, does not depend on either the keys
# or the values being sorted. Will iterate over the entire
# list before producing any results. Returns tuples in
# arbitrary order, unlike the others (which will retain).
def min_max_3(lst):
    data =3D {}
    for key, value in lst:
        if key not in data:
            data[key]=3D(value, value)
        else:
            data[key][0] =3D min(data[key][0], value)
            data[key][1] =3D min(data[key][1], value)
    for key, minmax in data.items():
        yield key, minmax[0], minmax[1]

Each of these is a generator that yields (key, min, max) tuples. The
third one needs the most memory and execution time; the others simply
take the input as it comes. None of them actually requires that the
input be a list - any iterable will do.

ChrisA