Path: csiph.com!usenet.pasdenom.info!news.albasani.net!newsfeed.freenet.ag!news2.euro.net!newsgate.cistron.nl!newsgate.news.xs4all.nl!post.news.xs4all.nl!not-for-mail Return-Path: X-Original-To: python-list@python.org Delivered-To: python-list@mail.python.org X-Spam-Status: OK 0.090 X-Spam-Evidence: '*H*': 0.82; '*S*': 0.00; 'indicating': 0.07; 'indices': 0.07; 'plenty': 0.07; 'counting': 0.09; 'integers': 0.09; 'cc:addr:python-list': 0.11; 'python': 0.11; '(and,': 0.16; 'algorithmic': 0.16; 'benjamin': 0.16; 'interval.': 0.16; 'storing': 0.16; 'wrote:': 0.18; 'trying': 0.19; '>>>': 0.22; 'memory': 0.22; 'cc:addr:python.org': 0.22; 'question': 0.24; 'cc:2**0': 0.24; 'cc:no real name:2**0': 0.24; 'values': 0.27; 'header:In-Reply-To:1': 0.27; 'function': 0.29; 'correct': 0.29; 'message-id:@mail.gmail.com': 0.30; 'constant': 0.31; 'class': 0.32; 'probably': 0.32; 'themselves': 0.32; 'implemented': 0.33; 'not.': 0.33; 'sense': 0.34; 'received:209.85': 0.35; 'received:209.85.220': 0.35; 'but': 0.35; 'received:google.com': 0.35; 'there': 0.35; 'subject:List': 0.36; 'application': 0.37; 'two': 0.37; 'list': 0.37; 'list.': 0.37; 'received:209': 0.37; 'fact': 0.38; 'realize': 0.39; 'how': 0.40; 'numbers': 0.61; 'simple': 0.61; 'first': 0.61; 'save': 0.62; 'sum': 0.64; 'more': 0.64; 'different': 0.65; 'within': 0.65; 'here': 0.66; 'between': 0.67; 'obvious': 0.74; 'prime': 0.74; 'potentially': 0.81; 'discover': 0.82; 'density': 0.84; 'oscar': 0.84; 'quicker': 0.84; 'approach.': 0.91; 'indicator': 0.91; '2013': 0.98 DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=gmail.com; s=20120113; h=x-received:mime-version:in-reply-to:references:from:date:message-id :subject:to:cc:content-type; bh=ZpZ4gzj/NJCpi2q4xE/H2KHr7cnWPR/9f6rMfuUOA24=; b=xMuLttmzuNn3J25Szv9Cy/T6m9IWZddsZRE42cEslZBeuan6qHWFZ9ZyF+tOQX0p48 smaHia/M6j59ZUx7kWikZzcXyX3XxJNVIjC+O2fjSt+GBNWd+mG09oypnR+W/gA0VQwW xmfugubX9yXpOYen22nsMHKOU8o3JwDRWX8AFo/hKULWMYl17wWFujX9MyiFB7YCsOmq cLHaAWrlg4E2PDWEqmW31PkVVrjsmjJCSGQRAJkG5wCzmCJZbJwYRqve3A4jAv5UuIiS kDwpKZbZ77f0JZn+NTQ5/sXqycEnkGvUOEiRCbYgcOm/2jkC/RtwnGTtfvLlG4IEbwne rFdA== X-Received: by 10.58.144.133 with SMTP id sm5mr20431170veb.23.1366664639959; Mon, 22 Apr 2013 14:03:59 -0700 (PDT) MIME-Version: 1.0 In-Reply-To: References: <5175377f$0$29977$c3e8da3$5496439d@news.astraweb.com> <517545F7.5090209@nowhere.org> <51755C38.4000204@nowhere.org> <51756769.20206@nowhere.org> From: Oscar Benjamin Date: Mon, 22 Apr 2013 22:03:39 +0100 Subject: Re: List Count To: Blind Anagram Content-Type: text/plain; charset=ISO-8859-1 Cc: python-list@python.org X-BeenThere: python-list@python.org X-Mailman-Version: 2.1.15 Precedence: list List-Id: General discussion list for the Python programming language List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , Newsgroups: comp.lang.python Message-ID: Lines: 32 NNTP-Posting-Host: 2001:888:2000:d::a6 X-Trace: 1366664643 news.xs4all.nl 2194 [2001:888:2000:d::a6]:38237 X-Complaints-To: abuse@xs4all.nl Xref: csiph.com comp.lang.python:44113 On 22 April 2013 21:18, Oscar Benjamin wrote: > On 22 April 2013 17:38, Blind Anagram wrote: >> On 22/04/2013 17:06, Oscar Benjamin wrote: >> >>> I don't know what your application is but I would say that my first >>> port of call here would be to consider a different algorithmic >>> approach. An obvious question would be about the sparsity of this data >>> structure. How frequent are the values that you are trying to count? >>> Would it make more sense to store a list of their indices? >> >> Actually it is no more than a simple prime sieve implemented as a Python >> class (and, yes, I realize that there are plenty of these around). > > If I understand correctly, you have a list of roughly a billion > True/False values indicating which integers are prime and which are > not. You would like to discover how many prime numbers there are > between two numbers a and b. You currently do this by counting the > number of True values in your list between the indices a and b. > > If my description is correct then I would definitely consider using a > different algorithmic approach. The density of primes from 1 to 1 > billlion is about 5%. Storing the prime numbers themselves in a sorted > list would save memory and allow a potentially more efficient way of > counting the number of primes within some interval. In fact it is probably quicker if you don't mind using all that memory to just store the cumulative sum of your prime True/False indicator list. This would be the prime counting function pi(n). You can then count the primes between a and b in constant time with pi[b] - pi[a]. Oscar