Path: csiph.com!v102.xanadu-bbs.net!xanadu-bbs.net!news.glorb.com!border3.nntp.dca.giganews.com!border1.nntp.dca.giganews.com!nntp.giganews.com!novia!news-out.readnews.com!transit4.readnews.com!panix!roy
From: Roy Smith <roy@panix.com>
Newsgroups: comp.lang.python
Subject: Re: Best way to structure data for efficient searching
Date: Tue, 03 Apr 2012 21:45:18 -0400
Organization: PANIX Public Access Internet and UNIX, NYC
Lines: 20
Message-ID: <roy-E75C32.21451803042012@news.panix.com>
References: <e1863f1f-f4f6-4821-8ffc-ee1ffea10416@k4g2000yqa.googlegroups.com> <jlf92p$8dv$1@dont-email.me>
NNTP-Posting-Host: localhost
X-Trace: reader1.panix.com 1333503920 4904 127.0.0.1 (4 Apr 2012 01:45:20 GMT)
X-Complaints-To: abuse@panix.com
NNTP-Posting-Date: Wed, 4 Apr 2012 01:45:20 +0000 (UTC)
User-Agent: MT-NewsWatcher/3.5.3b3 (Intel Mac OS X)
Xref: csiph.com comp.lang.python:22647

> On 3/28/2012 11:39 AM, Larry.Martell@gmail.com wrote:
> > I have a set of data that is contains 3 fields, K1, K2 and a
> > timestamp. There are duplicates in the data set, and they all have to
> > processed.
> >
> > Then I have another set of data with 4 fields: K3, K4, K5, and a
> > timestamp. There are also duplicates in that data set, and they also
> > all have to be processed.
> >
> > I need to find all the items in the second data set where K1==K3 and
> > K2==K4 and the 2 timestamps are within 20 seconds of each other.

In article <jlf92p$8dv$1@dont-email.me>, John Nagle <nagle@animats.com> 
wrote:
> [some good ideas]
>    All of these approaches are roughly O(N log N), which
> beats the O(N^2) approach you have now.

If the timestamps are sparse enough, I can think of a way that's O(N), 
or pretty close to it.