Path: csiph.com!newsfeed.hal-mli.net!feeder3.hal-mli.net!newsfeed.hal-mli.net!feeder1.hal-mli.net!news.tele.dk!feed118.news.tele.dk!news.tele.dk!small.news.tele.dk!newsgate.cistron.nl!newsgate.news.xs4all.nl!post.news.xs4all.nl!not-for-mail Return-Path: X-Original-To: python-list@python.org Delivered-To: python-list@mail.python.org X-Spam-Status: OK 0.002 X-Spam-Evidence: '*H*': 1.00; '*S*': 0.00; 'python.': 0.02; 'else:': 0.03; 'aggregate': 0.07; 'postgresql': 0.07; '[];': 0.09; 'indexes': 0.09; 'linear': 0.09; 'lst': 0.09; 'subject:process': 0.09; 'cc:addr:python-list': 0.11; 'python': 0.11; 'jan': 0.12; '-999': 0.16; 'algorithmic': 0.16; 'from:addr:rosuav': 0.16; 'from:name:chris angelico': 0.16; 'merged': 0.16; 'messy': 0.16; 'such,': 0.16; 'ignore': 0.16; 'wrote:': 0.18; 'code,': 0.22; 'cc:addr:python.org': 0.22; 'filtering': 0.24; 'mon,': 0.24; 'cc:2**0': 0.24; 'switch': 0.26; 'header:In-Reply-To:1': 0.27; 'idea': 0.28; "doesn't": 0.30; 'database,': 0.30; 'message- id:@mail.gmail.com': 0.30; "i'm": 0.30; 'reply.': 0.31; 'too.': 0.31; '13,': 0.31; 'larry': 0.31; 'option.': 0.31; 'remotely': 0.31; "they'll": 0.31; 'lists': 0.32; "can't": 0.35; 'but': 0.35; 'received:google.com': 0.35; 'there': 0.35; 'subject:data': 0.36; 'doing': 0.36; 'thanks': 0.36; 'should': 0.36; 'too': 0.37; 'list': 0.37; 'list.': 0.37; 'being': 0.38; 'handle': 0.38; 'whatever': 0.38; 'pm,': 0.38; 'anything': 0.39; 'aspects': 0.39; 'though,': 0.39; 'sure': 0.39; 'how': 0.40; 'even': 0.60; 'removing': 0.60; "you're": 0.61; 'such': 0.63; 'group,': 0.63; 'complexity': 0.84; 'overall,': 0.84; 'partial': 0.84; 'start.': 0.84; 'x):': 0.84; 'to:none': 0.92 DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=gmail.com; s=20120113; h=mime-version:in-reply-to:references:date:message-id:subject:from:cc :content-type; bh=7/s/mE2ojA1rU41GOmtwH0dvXxuvSxC6gq4HSZD1ENU=; b=cczk91SsO4d53OPJRKZV+VALkLqryeHOag2MsP5kyaxep/iSNFK0Jjeaj8jY2oEzz4 Y6aUnj91wLxmCi4vgnc4nkGKdsfL77lpEzzWTqOs3oOjp+1zl40tGZBJcS9pSIfw68TD /VSJ3ZRXayEoxunlHlB4ihvfOhGATDEQSpl0hPKCfxvpiShwHKJGk+dDJHTHKYl4H4V1 yW440VbkRat9oRMBO9K76gxj458MtFCVAUmAUrX3uPHj8x9rfeyhp47Iq5ARbIZjiSpz 9q/wxtfz3EHg2PXpNGdZOBtab2lSeNMv81FQrKvzL/WgTrMWZ1TSvt0lKXcz4CBsYB20 HYpQ== MIME-Version: 1.0 X-Received: by 10.68.247.6 with SMTP id ya6mr27404027pbc.45.1389593399741; Sun, 12 Jan 2014 22:09:59 -0800 (PST) In-Reply-To: References: Date: Mon, 13 Jan 2014 17:09:59 +1100 Subject: Re: efficient way to process data From: Chris Angelico Cc: "python-list@python.org" Content-Type: text/plain; charset=UTF-8 X-BeenThere: python-list@python.org X-Mailman-Version: 2.1.15 Precedence: list List-Id: General discussion list for the Python programming language List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , Newsgroups: comp.lang.python Message-ID: Lines: 56 NNTP-Posting-Host: 2001:888:2000:d::a6 X-Trace: 1389593409 news.xs4all.nl 2858 [2001:888:2000:d::a6]:53526 X-Complaints-To: abuse@xs4all.nl Xref: csiph.com comp.lang.python:63812 On Mon, Jan 13, 2014 at 2:35 PM, Larry Martell wrote: > Thanks for the reply. I'm going to take a stab at removing the group > by and doing it all in python. It doesn't look too hard, but I don't > know how it will perform. Well, if you can't switch to PostgreSQL or such, then doing it in Python is your only option. There are such things as GiST and GIN indexes that might be able to do some of this magic, but I don't think MySQL has anything even remotely like what you're looking for. So ultimately, you're going to have to do your filtering on the database, and then all the aggregation in Python. And it's going to be somewhat complicated code, too. Best I can think of is this, as partial pseudo-code: last_x = -999 x_map = []; y_map = {} merge_me = [] for x,y,e in (SELECT x,y,e FROM t WHERE whatever ORDER BY x): if x