Path: csiph.com!usenet.pasdenom.info!news.albasani.net!feeder.erje.net!eu.feeder.erje.net!newsfeed.xs4all.nl!newsfeed1.news.xs4all.nl!xs4all!newsgate.cistron.nl!newsgate.news.xs4all.nl!post.news.xs4all.nl!not-for-mail Return-Path: X-Original-To: python-list@python.org Delivered-To: python-list@mail.python.org X-Spam-Status: OK 0.001 X-Spam-Evidence: '*H*': 1.00; '*S*': 0.00; 'else:': 0.03; 'value,': 0.04; 'none:': 0.07; 'smallest': 0.07; 'transform': 0.07; '[0]': 0.09; 'data:': 0.09; 'iterate': 0.09; 'cc:addr:python-list': 0.11; 'def': 0.12; '"keep': 0.16; '(key,': 0.16; 'from:addr:rosuav': 0.16; 'from:name:chris angelico': 0.16; 'grouped': 0.16; 'iterable': 0.16; 'once.': 0.16; 'statistics.': 0.16; 'tuples,': 0.16; 'variants': 0.16; 'wrote:': 0.18; 'do.': 0.18; 'producing': 0.19; 'thu,': 0.19; 'unlike': 0.19; 'input': 0.22; 'memory': 0.22; 'cc:addr:python.org': 0.22; 'exists': 0.24; 'cc:2**0': 0.24; 'first,': 0.26; 'task': 0.26; 'code:': 0.26; 'values': 0.27; 'header:In-Reply-To:1': 0.27; '[1]': 0.29; 'dec': 0.30; 'fastest': 0.30; 'mix': 0.30; 'robert': 0.30; 'subject:list': 0.30; 'message- id:@mail.gmail.com': 0.30; '(which': 0.31; 'keys': 0.31; 'time;': 0.31; 'tuples': 0.31; 'yields': 0.31; 'third': 0.33; 'subject:from': 0.34; 'common': 0.35; 'but': 0.35; 'received:google.com': 0.35; 'there': 0.35; 'version': 0.36; 'yield': 0.36; 'next': 0.36; 'list': 0.37; 'performance': 0.37; 'being': 0.38; 'depends': 0.38; 'version,': 0.38; 'pm,': 0.38; 'rather': 0.38; 'does': 0.39; '12,': 0.39; 'either': 0.39; 'results.': 0.60; 'most': 0.60; 'entire': 0.61; 'simply': 0.61; 'first': 0.61; 'within': 0.65; 'here': 0.66; 'apart': 0.72; 'groups.': 0.74; 'to:none': 0.92; '2013': 0.98 DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=gmail.com; s=20120113; h=mime-version:in-reply-to:references:date:message-id:subject:from:cc :content-type:content-transfer-encoding; bh=pVrHmn36FzDlQbatL7pVvVr7zHjxJ3AbMoPzLwzyaaI=; b=DAuqkcNJe9s8dnlePJ3OeWcvOCLTJRyUoStc9q6JkofAkbzL6bSB0lZ1EmWdOPOLx2 dmw0H4WYF8n8R4oMp+TE+lpZIIDZQaYRF1Hy4fz/fcbuUo0h51ikKc16slzoMKd4WnqA C9mdyFrt6mx+lGWn8SWijrg4wnt0RKUbHDtFq/AZIkluftiaDG3ZMkfMsYzCrvJEg0aT lWy8PE4h8cneqtXkG7KAigIy7FvpHogEOETW+x+aSce8jI2RRxfIhyiPVrUiKMhD5nHb Kkx1FYIJNXh4WwRp25RboLsP9pdBwUvnCOp6ad6yipXyGHSm0ndvhHZ0ZM3zmEkUpXFz vbxA== MIME-Version: 1.0 X-Received: by 10.68.248.33 with SMTP id yj1mr9528966pbc.45.1386836322004; Thu, 12 Dec 2013 00:18:42 -0800 (PST) In-Reply-To: References: Date: Thu, 12 Dec 2013 19:18:41 +1100 Subject: Re: min max from tuples in list From: Chris Angelico Cc: "python-list@python.org" Content-Type: text/plain; charset=UTF-8 Content-Transfer-Encoding: quoted-printable X-BeenThere: python-list@python.org X-Mailman-Version: 2.1.15 Precedence: list List-Id: General discussion list for the Python programming language List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , Newsgroups: comp.lang.python Message-ID: Lines: 72 NNTP-Posting-Host: 2001:888:2000:d::a6 X-Trace: 1386836325 news.xs4all.nl 2844 [2001:888:2000:d::a6]:36760 X-Complaints-To: abuse@xs4all.nl Xref: csiph.com comp.lang.python:61678 On Thu, Dec 12, 2013 at 6:25 PM, Robert Voigtl=C3=A4nder wrote: > I need to find a -performant- way to transform this into a list with tupl= es (a[0],[a[0][1]min],[a[0][1]max]). > > Hard to explaint what I mean .. [0] of the first three tuples is 52. [1] = is 193,193 and 192. > What I need as result for these three tuples is: (52,192,193). > > For the next five tuples it is (51,188,193). > > > Extra challenges: > - This list is sorted. For performance reasons I would like to keep it un= sorted. > - There may be tuples where min=3Dmax. > - There my be tupples where [0] only exists once. So mix is automatically= max Yep, I see what you mean! Apart from the first of the challenges, which is ambiguous: do you mean you'd rather be able to work with it unsorted, or is that a typo, "keep it sorted"? This is a common task of aggregation. Your list is of (key, value) tuples, and you want to do some per-key statistics. Here are three variants on the code: # Fastest version, depends on the keys being already grouped # and the values sorted within each group. It actually returns # the last and first, not the smallest and largest. def min_max_1(lst): prev_key =3D None for key, value in lst: if key !=3D prev_key: if prev_key is not None: yield prev_key, value, key_max key_max =3D value if prev_key is not None: yield prev_key, value, key_max # This version depends on the keys being grouped, but # not on them being sorted within the groups. def min_max_2(lst): prev_key =3D None for key, value in lst: if key !=3D prev_key: if prev_key is not None: yield prev_key, key_min, key_max key_min =3D key_max =3D value else: key_min =3D min(key_min, value) key_max =3D min(key_max, value) if prev_key is not None: yield prev_key, key_min, key_max # Slowest version, does not depend on either the keys # or the values being sorted. Will iterate over the entire # list before producing any results. Returns tuples in # arbitrary order, unlike the others (which will retain). def min_max_3(lst): data =3D {} for key, value in lst: if key not in data: data[key]=3D(value, value) else: data[key][0] =3D min(data[key][0], value) data[key][1] =3D min(data[key][1], value) for key, minmax in data.items(): yield key, minmax[0], minmax[1] Each of these is a generator that yields (key, min, max) tuples. The third one needs the most memory and execution time; the others simply take the input as it comes. None of them actually requires that the input be a list - any iterable will do. ChrisA