Path: csiph.com!usenet.pasdenom.info!news.redatomik.org!newsfeed.xs4all.nl!newsfeed4.news.xs4all.nl!xs4all!newsgate.cistron.nl!newsgate.news.xs4all.nl!post.news.xs4all.nl!not-for-mail Return-Path: X-Original-To: python-list@python.org Delivered-To: python-list@mail.python.org X-Spam-Status: OK 0.013 X-Spam-Evidence: '*H*': 0.97; '*S*': 0.00; 'allowed.': 0.07; 'binary': 0.07; 'element': 0.07; 'remaining': 0.07; 'tries': 0.07; '32-bit': 0.09; 'exp': 0.09; 'linear': 0.09; 'variant': 0.09; 'worse': 0.09; 'def': 0.12; 'random': 0.14; '10000000': 0.16; 'fiction.': 0.16; 'max:': 0.16; 'nearest': 0.16; 'range,': 0.16; 'splits': 0.16; 'subject:search': 0.16; 'ternary': 0.16; 'threshold': 0.16; 'two,': 0.16; 'two.': 0.16; 'worst': 0.16; 'wrote:': 0.18; 'split': 0.19; '>>>': 0.22; 'versions': 0.24; 'performing': 0.26; 'header:In-Reply-To:1': 0.27; 'idea': 0.28; 'point': 0.28; 'function': 0.29; 'am,': 0.29; 'see,': 0.30; 'message- id:@mail.gmail.com': 0.30; "i'm": 0.30; '100000': 0.31; 'factor': 0.31; 'reduced': 0.31; 'search.': 0.31; 'up.': 0.33; 'third': 0.33; 'but': 0.35; 'received:google.com': 0.35; 'add': 0.35; 'there': 0.35; 'version': 0.36; 'oracle': 0.36; 'picking': 0.36; 'doing': 0.36; 'half': 0.37; 'searching': 0.37; 'too': 0.37; 'sometimes': 0.38; 'to:addr:python-list': 0.38; 'previous': 0.38; 'does': 0.39; 'to:addr:python.org': 0.39; 'either': 0.39; 'space': 0.40; 'even': 0.60; 'remove': 0.60; 'numbers': 0.61; 'range': 0.61; 'course': 0.61; "you're": 0.61; 'more': 0.64; 'total': 0.65; 'talking': 0.65; 'within': 0.65; 'number:': 0.66; 'biggest': 0.67; 'search,': 0.74; 'goal': 0.75; '2015': 0.84; 'max,': 0.84; 'min': 0.84; 'subject:find': 0.84; '(max': 0.91; 'subject:Best': 0.91; 'average': 0.93; 'reducing': 0.93 DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=gmail.com; s=20120113; h=mime-version:in-reply-to:references:from:date:message-id:subject:to :content-type:content-transfer-encoding; bh=uS4MEQGQxaRzY3HMxKyeWu3oiV6fAno7LEKX2uyqBZs=; b=jXAjQ3oOCwu8sjWRtJAHQvV7AwSpFDiybCuycTxE7PIQgz7JIuP1FSnwtVjaw5WW6S 40DIZbKcgMn+bQqt2R9gREMbqvhaxkY9mLr+FN0cjgwZ4nAm1VpJBP59++d/ZCoyqjbf WTZINhRJME2mn4ES629ajjay9r8BmY0ybQSq/JZFFCMuPUrzFlhpBzZFh8gCeWH2aOi+ mtGC+BYfG2j189S4wH5o7HDISwtfZbx7AUQVtHhLMQyNVx+otLOYnQZmrDuVEM8tCWyS O+HIrhtrgFvkr5ILWaNWrA93vugu7aQBCc0aQ9jrpaoRAp+E1gsRCbuW7XCbF299XMNa 7BjA== X-Received: by 10.42.64.76 with SMTP id f12mr27517408ici.93.1428437776026; Tue, 07 Apr 2015 13:16:16 -0700 (PDT) MIME-Version: 1.0 In-Reply-To: References: <2e3a3c01-20b3-4948-9b32-bd80ed46822b@googlegroups.com> From: Ian Kelly Date: Tue, 7 Apr 2015 14:15:35 -0600 Subject: Re: Best search algorithm to find condition within a range To: Python Content-Type: text/plain; charset=UTF-8 Content-Transfer-Encoding: quoted-printable X-BeenThere: python-list@python.org X-Mailman-Version: 2.1.20 Precedence: list List-Id: General discussion list for the Python programming language List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , Newsgroups: comp.lang.python Message-ID: Lines: 102 NNTP-Posting-Host: 2001:888:2000:d::a6 X-Trace: 1428437779 news.xs4all.nl 2962 [2001:888:2000:d::a6]:33431 X-Complaints-To: abuse@xs4all.nl Xref: csiph.com comp.lang.python:88616 On Tue, Apr 7, 2015 at 9:40 AM, wrote: > Well of course you use same principles like a binary search setting min a= nd max, closing in on the digit. In this case the searched numbers > base^= exp and number< base^exp+1. > > But since the search is within large bases upto 32-bit space, so base 429= 4967295 is the biggest allowed. I need to find the nearest less exp in base= for each (lets call them pseudo digits). But as you see there it will take= time to add them up. So better doing a binary search, you know min-max hal= f (iteration). You can do the same for a random oracle min max within range= , and if the number of tries in general over 22 i think a random oracle do = it better than a binary search. Okay, so you're talking about doing a binary search but picking the partition point randomly instead of evenly? > It was a long time since i did this, but i do know there is a threshold w= here searching min max with the oracle will be faster than the binary searc= h. I don't know where you got this idea from, but it's fiction. If you split the space in half, then you are always reducing the amount of work remaining by a factor of 2, and the total work is O(log n). If you do an uneven split, then while you will sometimes make a good partition and reduce the amount of work by more than a factor of two, more often the target will be in the larger subspace and you will have reduced the amount of work by less than a factor of two. In the worst case of this, you only remove one element from the search space at a time and end up with a linear search. As a demonstration of this, here's a function which does a binary search for a number in a range and returns the number of partitions it had to perform to find the number: >>> def binsearch(n, min, max): ... if min =3D=3D max: return 0 ... med =3D (max + min) // 2 ... if n <=3D med: ... return 1 + binsearch(n, min, med) ... else: ... return 1 + binsearch(n, med+1, max) ... And here's the average number of partitions needed to find a number in various ranges: >>> sum(binsearch(i, 1, n) for i in range(1, n+1)) / n 16.68928 >>> n =3D 1000000 >>> sum(binsearch(i, 1, n) for i in range(1, n+1)) / n 19.951424 >>> n =3D 10000000 >>> sum(binsearch(i, 1, n) for i in range(1, n+1)) / n 23.3222784 Now let's try a variant function that splits the range at one third instead of evenly: >>> def trisearch(n, min, max): ... if min =3D=3D max: return 0 ... tern =3D (min + min + max) // 3 ... if n <=3D tern: ... return 1 + trisearch(n, min, tern) ... else: ... return 1 + trisearch(n, tern+1, max) ... >>> n =3D 100000 >>> sum(trisearch(i, 1, n) for i in range(1, n+1)) / n 17.88387 >>> n =3D 1000000 >>> sum(trisearch(i, 1, n) for i in range(1, n+1)) / n 21.502276 >>> n =3D 10000000 >>> sum(trisearch(i, 1, n) for i in range(1, n+1)) / n 25.1157592 The ternary version invariably takes more partitions on average to find the goal number. Now let's try a third variant that partitions the range randomly: >>> def randsearch(n, min, max): ... if min =3D=3D max: return 0 ... part =3D randrange(min, max) ... if n <=3D part: ... return 1 + randsearch(n, min, part) ... else: ... return 1 + randsearch(n, part+1, max) ... >>> n =3D 100000 >>> sum(randsearch(i, 1, n) for i in range(1, n+1)) / n 22.17863 >>> n =3D 1000000 >>> sum(randsearch(i, 1, n) for i in range(1, n+1)) / n 26.78371 >>> n =3D 10000000 >>> sum(randsearch(i, 1, n) for i in range(1, n+1)) / n 31.3916557 As you can see, this one does a lot worse on average, and at n=3D1e6 it's already performing worse than either of the previous versions did at n=3D1e7. I'm not even going to try this for n=3D1e8 because it would take too long to run.