Groups | Search | Server Info | Keyboard shortcuts | Login | Register [http] [https] [nntp] [nntps]
Groups > comp.lang.python > #7260
| Path | csiph.com!x330-a1.tempe.blueboxinc.net!usenet.pasdenom.info!aioe.org!feeder.news-service.com!newsfeed.xs4all.nl!newsfeed6.news.xs4all.nl!xs4all!post.news.xs4all.nl!not-for-mail |
|---|---|
| Return-Path | <debatem1@gmail.com> |
| X-Original-To | python-list@python.org |
| Delivered-To | python-list@mail.python.org |
| X-Spam-Status | OK 0.001 |
| X-Spam-Evidence | '*H*': 1.00; '*S*': 0.00; 'wed,': 0.03; 'subject:Python': 0.06; 'computed': 0.07; 'constructor': 0.07; 'hettinger': 0.07; 'be:': 0.09; 'bytearray': 0.09; 'collections': 0.09; 'etc).': 0.09; 'overridden': 0.09; 'recipe': 0.09; 'substitute': 0.09; 'url:activestate': 0.09; 'pm,': 0.10; 'api': 0.11; 'meaningful': 0.14; 'wrote:': 0.14; 'halfway': 0.16; 'mmap': 0.16; 'set)': 0.16; 'sizes,': 0.16; 'subject:Filter': 0.16; 'subject:lines': 0.16; 'cache': 0.16; 'entries': 0.16; 'input': 0.17; 'cc:addr:python-list': 0.17; 'url:code': 0.17; 'functions,': 0.19; 'simpler': 0.19; 'header:In-Reply-To:1': 0.21; 'seems': 0.21; 'filter': 0.22; 'raymond': 0.22; 'cc:2**0': 0.22; 'memory': 0.22; 'cc:no real name:2**0': 0.23; 'keys': 0.23; 'objects,': 0.23; 'optional': 0.23; 'fri,': 0.23; 'objects': 0.23; 'worked': 0.24; 'code': 0.24; 'posted': 0.25; "i'm": 0.27; 'message- id:@mail.gmail.com': 0.28; '(the': 0.28; 'thanks': 0.28; 'code,': 0.29; 'updated': 0.29; 'cc:addr:python.org': 0.30; 'this.': 0.31; 'operations': 0.33; 'error': 0.33; "i've": 0.33; 'received:209.85.212': 0.34; 'there': 0.35; 'test': 0.35; 'uses': 0.36; 'difference': 0.37; 'feedback': 0.37; 'takes': 0.37; 'received:google.com': 0.37; 'something': 0.37; 'received:209.85': 0.37; 'convenient': 0.37; 'case': 0.37; 'could': 0.38; 'domain': 0.38; 'positive': 0.38; 'but': 0.38; 'earlier': 0.38; 'rate.': 0.38; 'somewhat': 0.38; 'subject:: ': 0.38; 'addresses': 0.39; 'subject: (': 0.39; 'user': 0.39; 'should': 0.39; 'received:209': 0.39; 'add': 0.39; 'needed.': 0.40; 'easily': 0.60; 'more': 0.60; 'maximum': 0.62; 'unique': 0.63; 'design': 0.63; 'states': 0.67; 'informed': 0.84; 'art,': 0.84; 'bloom': 0.84; 'expects': 0.84; 'subject:updated': 0.84; 'penalty': 0.91; 'probe': 0.91; 'received:209.85.212.178': 0.91; 'received:mail- px0-f178.google.com': 0.91; 'union,': 0.91; '150': 0.93 |
| DKIM-Signature | v=1; a=rsa-sha256; c=relaxed/relaxed; d=gmail.com; s=gamma; h=domainkey-signature:mime-version:in-reply-to:references:date :message-id:subject:from:to:cc:content-type :content-transfer-encoding; bh=RkUdIjjV4xbODYOuwHCpTFyFfCymm5KOAqFyoOOgqsQ=; b=tDPaTO/lbW1dUWpmhmeS12qyjGYRP6EJ2Re0je2ZSOpdVHZ2oXGP7zzVJQX7rkWto9 xbrvsK+ojWP9eYPhSYTEjZPzEn9grR0u+nbeYv4DSETRspTIA9xZfqTpZ4Uev5LOdyR+ rT+8vRq6qnf+VhtBOYp0qxSku088++lFy2NhU= |
| DomainKey-Signature | a=rsa-sha1; c=nofws; d=gmail.com; s=gamma; h=mime-version:in-reply-to:references:date:message-id:subject:from:to :cc:content-type:content-transfer-encoding; b=nK1wfAME9cP+w7barzAiAryj8rsnATFzbHjuS2Bdr5hMLm2iRYNfZiaWihicT/G3nt 3m0szax8KuRNzgzg2Rc87FVRaiDlVH1VINbACQcO23pO4eL62PBZ8H9y0WBN/vhjT4nr vL7cJYBwzgN/WDF6OupWR5H7/HucReZqbwFQE= |
| MIME-Version | 1.0 |
| In-Reply-To | <c5671628-1464-47aa-ac26-9715bdbe94e7@s41g2000prb.googlegroups.com> |
| References | <5a03279e-993e-4a35-9220-da5014d68430@34g2000pru.googlegroups.com> <mailman.2499.1307382457.9059.python-list@python.org> <c5671628-1464-47aa-ac26-9715bdbe94e7@s41g2000prb.googlegroups.com> |
| Date | Wed, 8 Jun 2011 13:48:53 -0700 |
| Subject | Re: Bloom Filter in 22 lines of Python (updated) |
| From | geremy condra <debatem1@gmail.com> |
| To | Raymond Hettinger <python@rcn.com> |
| Content-Type | text/plain; charset=ISO-8859-1 |
| Content-Transfer-Encoding | quoted-printable |
| Cc | python-list@python.org |
| X-BeenThere | python-list@python.org |
| X-Mailman-Version | 2.1.12 |
| Precedence | list |
| List-Id | General discussion list for the Python programming language <python-list.python.org> |
| List-Unsubscribe | <http://mail.python.org/mailman/options/python-list>, <mailto:python-list-request@python.org?subject=unsubscribe> |
| List-Archive | <http://mail.python.org/pipermail/python-list> |
| List-Post | <mailto:python-list@python.org> |
| List-Help | <mailto:python-list-request@python.org?subject=help> |
| List-Subscribe | <http://mail.python.org/mailman/listinfo/python-list>, <mailto:python-list-request@python.org?subject=subscribe> |
| Newsgroups | comp.lang.python |
| Message-ID | <mailman.35.1307566137.11593.python-list@python.org> (permalink) |
| Lines | 53 |
| NNTP-Posting-Host | 82.94.164.166 |
| X-Trace | 1307566137 news.xs4all.nl 49042 [::ffff:82.94.164.166]:47064 |
| X-Complaints-To | abuse@xs4all.nl |
| Xref | x330-a1.tempe.blueboxinc.net comp.lang.python:7260 |
Show key headers only | View raw
On Wed, Jun 8, 2011 at 1:05 PM, Raymond Hettinger <python@rcn.com> wrote: > On Jun 6, 10:47 am, geremy condra <debat...@gmail.com> wrote: >> On Fri, Jun 3, 2011 at 1:17 PM, Raymond Hettinger <pyt...@rcn.com> wrote: >> > Thanks for all the feedback on the earlier post. >> >> > I've updated the recipe to use a cleaner API, simpler code, >> > more easily subclassable, and with optional optimizations >> > for better cache utilization and speed: >> >> > http://code.activestate.com/recipes/577684-bloom-filter/ >> >> Any chance of this landing in collections? > > I would be happy to add something like this to collections > once the API matures. > > Right now, the constructor expects the user to know the desired > memory size and number of probes. Those could be computed > from the maximum number of unique keys and the tolerable > error false positive rate. > > The most convenient constructor could be: > b = BloomFilter(myseq) > where len(myseq) is presumed indicate the maximum number of > unique entries and there is a default tolerable false positive > rate of 1 in 10,000. > > The API already lets the user substitute different methods > for get_probes(). It should also allow the bytearray to be > easily replaced by other objects with indexable access > (array objects, mmap objects, etc). > > We could also provide union, intersection, and difference > operations but these would be a little awkward for general > purpose use because the result is only meaningful when > the sizes, probe functions, and input domain are all the same. > > Fortunately, there is a lot of prior art, so the API design > can be informed by what has worked well for others. Seems like the code that Dan Stromberg posted addresses a lot of this. Thoughts on his code? Also, if you're interested I'm about halfway through writing a C implementation, although it would have to be somewhat less flexible to get a real speed advantage. Right now there's a hefty enough speed penalty to the filter (the states test takes about 150 times as long with a bloom filter as with a normal set) that it may make sense to have a base case that uses an optimized C implementation but that can be overridden as needed. Geremy Condra
Back to comp.lang.python | Previous | Next — Previous in thread | Find similar | Unroll thread
Bloom Filter in 22 lines of Python (updated) Raymond Hettinger <python@rcn.com> - 2011-06-03 13:17 -0700
Re: Bloom Filter in 22 lines of Python (updated) geremy condra <debatem1@gmail.com> - 2011-06-06 10:47 -0700
Re: Bloom Filter in 22 lines of Python (updated) Raymond Hettinger <python@rcn.com> - 2011-06-08 13:05 -0700
Re: Bloom Filter in 22 lines of Python (updated) geremy condra <debatem1@gmail.com> - 2011-06-08 13:48 -0700
csiph-web