Groups > comp.lang.python > #8119 > unrolled thread

How can I speed up a script that iterates over a large range (600 billion)?

Started by	John Salerno <johnjsal@gmail.com>
First post	2011-06-21 12:48 -0700
Last post	2011-06-22 16:33 +0100
Articles	14 on this page of 34 — 15 participants

Back to article view | Back to comp.lang.python

  How can I speed up a script that iterates over a large range (600 billion)? John Salerno <johnjsal@gmail.com> - 2011-06-21 12:48 -0700
    Re: How can I speed up a script that iterates over a large range (600 billion)? Ian Kelly <ian.g.kelly@gmail.com> - 2011-06-21 14:02 -0600
    Re: How can I speed up a script that iterates over a large range (600 billion)? Irmen de Jong <irmen@-NOSPAM-xs4all.nl> - 2011-06-21 22:10 +0200
      sorry, possibly too much info. was: Re: How can I speed up a script that iterates over a large range (600 billion)? Irmen de Jong <irmen@-NOSPAM-xs4all.nl> - 2011-06-21 22:22 +0200
        Re: sorry, possibly too much info. was: Re: How can I speed up a script that iterates over a large range (600 billion)? John Salerno <johnjsal@gmail.com> - 2011-06-21 14:09 -0700
          Re: sorry, possibly too much info. was: Re: How can I speed up a script that iterates over a large range (600 billion)? Irmen de Jong <irmen.NOSPAM@xs4all.nl> - 2011-06-21 23:39 +0200
          Re: sorry, possibly too much info. was: Re: How can I speed up a script that iterates over a large range (600 billion)? Ian Kelly <ian.g.kelly@gmail.com> - 2011-06-21 15:41 -0600
            Re: sorry, possibly too much info. was: Re: How can I speed up a script that iterates over a large range (600 billion)? John Salerno <johnjsal@gmail.com> - 2011-06-21 14:48 -0700
          Re: sorry, possibly too much info. was: Re: How can I speed up a script that iterates over a large range (600 billion)? Vlastimil Brom <vlastimil.brom@gmail.com> - 2011-06-21 23:33 +0200
          Re: sorry, possibly too much info. was: Re: How can I speed up a script that iterates over a large range (600 billion)? Paul Rubin <no.email@nospam.invalid> - 2011-06-21 17:05 -0700
            Re: sorry, possibly too much info. was: Re: How can I speed up a script that iterates over a large range (600 billion)? John Salerno <johnjsal@gmail.com> - 2011-06-21 18:21 -0700
              Re: sorry, possibly too much info. was: Re: How can I speed up a script that iterates over a large range (600 billion)? Paul Rubin <no.email@nospam.invalid> - 2011-06-21 19:09 -0700
                Re: sorry, possibly too much info. was: Re: How can I speed up a script that iterates over a large range (600 billion)? John Salerno <johnjsal@gmail.com> - 2011-06-21 20:02 -0700
                  Re: sorry, possibly too much info. was: Re: How can I speed up a script that iterates over a large range (600 billion)? Irmen de Jong <irmen.NOSPAM@xs4all.nl> - 2011-06-22 19:46 +0200
              Re: sorry, possibly too much info. was: Re: How can I speed up a script that iterates over a large range (600 billion)? MRAB <python@mrabarnett.plus.com> - 2011-06-22 03:10 +0100
              Re: sorry, possibly too much info. was: Re: How can I speed up a script that iterates over a large range (600 billion)? Mel <mwilson@the-wire.com> - 2011-06-21 23:02 -0400
                Re: sorry, possibly too much info. was: Re: How can I speed up a script that iterates over a large range (600 billion)? John Salerno <johnjsal@gmail.com> - 2011-06-21 20:41 -0700
    Re: How can I speed up a script that iterates over a large range (600 billion)? Mel <mwilson@the-wire.com> - 2011-06-21 16:19 -0400
      Re: How can I speed up a script that iterates over a large range (600 billion)? Tim Roberts <timr@probo.com> - 2011-06-22 23:21 -0700
    Re: How can I speed up a script that iterates over a large range (600 billion)? MRAB <python@mrabarnett.plus.com> - 2011-06-21 21:26 +0100
    Re: How can I speed up a script that iterates over a large range (600 billion)? Terry Reedy <tjreedy@udel.edu> - 2011-06-21 19:30 -0400
      Re: How can I speed up a script that iterates over a large range (600 billion)? Paul Rubin <no.email@nospam.invalid> - 2011-06-21 17:00 -0700
        Re: How can I speed up a script that iterates over a large range (600 billion)? Terry Reedy <tjreedy@udel.edu> - 2011-06-22 00:18 -0400
          Re: How can I speed up a script that iterates over a large range (600 billion)? Paul Rubin <no.email@nospam.invalid> - 2011-06-21 22:32 -0700
            Re: How can I speed up a script that iterates over a large range (600 billion)? Terry Reedy <tjreedy@udel.edu> - 2011-06-22 18:46 -0400
    Re: How can I speed up a script that iterates over a large range (600 billion)? Benjamin Kaplan <benjamin.kaplan@case.edu> - 2011-06-21 20:07 -0700
    Re: How can I speed up a script that iterates over a large range (600 billion)? Chris Torek <nospam@torek.net> - 2011-06-22 05:58 +0000
      Re: How can I speed up a script that iterates over a large range (600 billion)? Paul Rubin <no.email@nospam.invalid> - 2011-06-21 23:23 -0700
        Re: How can I speed up a script that iterates over a large range (600 billion)? Slaunger <slaunger@gmail.com> - 2011-06-23 02:52 -0700
      Re: How can I speed up a script that iterates over a large range (600 billion)? Ian Kelly <ian.g.kelly@gmail.com> - 2011-06-22 01:16 -0600
      Re: How can I speed up a script that iterates over a large range (600 billion)? Anny Mous <b1540457@tyldd.com> - 2011-06-22 22:01 +1000
        Re: How can I speed up a script that iterates over a large range (600 billion)? Chris Angelico <rosuav@gmail.com> - 2011-06-22 22:28 +1000
        Re: How can I speed up a script that iterates over a large range (600 billion)? Ian Kelly <ian.g.kelly@gmail.com> - 2011-06-22 09:52 -0600
      Re: How can I speed up a script that iterates over a large range (600 billion)? MRAB <python@mrabarnett.plus.com> - 2011-06-22 16:33 +0100

Page 2 of 2 — ← Prev page 1 [2]

#8143

From	Terry Reedy <tjreedy@udel.edu>
Date	2011-06-21 19:30 -0400
Message-ID	<mailman.251.1308699015.1164.python-list@python.org>
In reply to	#8119

On 6/21/2011 3:48 PM, John Salerno wrote:

> Absolutely not! Each problem has been designed according to a "one-
> minute rule", which means that although it may take several hours to
> design a successful algorithm with more difficult problems, an
> efficient implementation will allow a solution to be obtained on a
> modestly powered computer in less than one minute."

That statement is for C, not Python. Python is efficient with human 
time, but not machine time. If something really takes a minute in C, 
allow yourself at least 10 minutes or even more with plain CPython.

-- 
Terry Jan Reedy

[toc] | [prev] | [next] | [standalone]

#8147

From	Paul Rubin <no.email@nospam.invalid>
Date	2011-06-21 17:00 -0700
Message-ID	<7x7h8eivkf.fsf@ruckus.brouhaha.com>
In reply to	#8143

Terry Reedy <tjreedy@udel.edu> writes:
>> efficient implementation will allow a solution to be obtained on a
>> modestly powered computer in less than one minute."
> If something really takes a minute in C,
> allow yourself at least 10 minutes or even more with plain CPython.

No.  The idea of the Euler problems is to think up sane algorithms to
solve them, not micro-optimize or use low level languages on crappy
algorithms.

     n=600851475143
     for d in xrange(2,n):
       if d*d > n: break
          while n%d == 0: n //= d
     print n

finishes on my laptop with no noticable pause.  The trick is to stop
testing once you hit the square root of the number.  There is at least
one extremely obvious optimization I didn't bother with (above 2, only
test odd divisors), that would have doubled the speed on top of that.

[toc] | [prev] | [next] | [standalone]

#8180

From	Terry Reedy <tjreedy@udel.edu>
Date	2011-06-22 00:18 -0400
Message-ID	<mailman.268.1308716334.1164.python-list@python.org>
In reply to	#8147

On 6/21/2011 8:00 PM, Paul Rubin wrote:
> Terry Reedy<tjreedy@udel.edu>  writes:
>>> efficient implementation will allow a solution to be obtained on a
>>> modestly powered computer in less than one minute."
>> If something really takes a minute in C,
>> allow yourself at least 10 minutes or even more with plain CPython.
>
> No.

No what? My conditional statement is correct. It turns out not to apply 
here. Great.

> The idea of the Euler problems is to think up sane algorithms to
> solve them, not micro-optimize or use low level languages on crappy
> algorithms.
>
>       n=600851475143
>       for d in xrange(2,n):
>         if d*d>  n: break
>            while n%d == 0: n //= d
>       print n
>
> finishes on my laptop with no noticable pause.

If the best C program for a problem takes 10 seconds or more, then 
applying the same 1 minute limit to Python is insane, and contrary to 
the promotion of good algorithm thinking.

-- 
Terry Jan Reedy

[toc] | [prev] | [next] | [standalone]

#8185

From	Paul Rubin <no.email@nospam.invalid>
Date	2011-06-21 22:32 -0700
Message-ID	<7xmxhabfcv.fsf@ruckus.brouhaha.com>
In reply to	#8180

Terry Reedy <tjreedy@udel.edu> writes:
> If the best C program for a problem takes 10 seconds or more, then
> applying the same 1 minute limit to Python is insane, and contrary to
> the promotion of good algorithm thinking.

The Euler problems are all designed to be doable in 1 minute or less and
the Euler project started in 2001, when personal computers were probably
10x slower than they are today.  So they shouldn't take more than 6
seconds on a modern computer if you're thoughtful.  On a reasonable
modern computer (maybe even a 2001 computer), they should all be doable
in < 1 minute in python, probably well under.  They can probably all be
done in under 1 second in C.

The "largest prime factor of 600851475143" problem we're discussing took
0.017 user cpu seconds in completely unoptimized python on my laptop
using the second-most-naive algoritm possible, including loading the
interpreter from the command line.  Executing an empty file takes the
same amount of time.  The algorithm would probably be >10x faster in C
with a little bit of tweaking.  The problems are about math cleverness,
not CPU resources.  Some of them are pretty hard, but if your solution
is taking more than a minute in Python, it means you should be looking
for a better algorithm, not a faster language.

[toc] | [prev] | [next] | [standalone]

#8247

From	Terry Reedy <tjreedy@udel.edu>
Date	2011-06-22 18:46 -0400
Message-ID	<mailman.303.1308783688.1164.python-list@python.org>
In reply to	#8185

On 6/22/2011 1:32 AM, Paul Rubin wrote:
> Terry Reedy<tjreedy@udel.edu>  writes:
>> If the best C program for a problem takes 10 seconds or more, then
>> applying the same 1 minute limit to Python is insane, and contrary to
>> the promotion of good algorithm thinking.

> The Euler problems

are not the only algorithm problems posted on the web.

  > the Euler project started in 2001, when personal computers were probably
> 10x slower than they are today.

Yes, and I am still using a 2003 XP/Athlon machine, which is adequate 
for running my Python programs, though not recent games.

-- 
Terry Jan Reedy

[toc] | [prev] | [next] | [standalone]

#8170

From	Benjamin Kaplan <benjamin.kaplan@case.edu>
Date	2011-06-21 20:07 -0700
Message-ID	<mailman.264.1308712038.1164.python-list@python.org>
In reply to	#8119

On Tue, Jun 21, 2011 at 4:30 PM, Terry Reedy <tjreedy@udel.edu> wrote:
> On 6/21/2011 3:48 PM, John Salerno wrote:
>
>> Absolutely not! Each problem has been designed according to a "one-
>> minute rule", which means that although it may take several hours to
>> design a successful algorithm with more difficult problems, an
>> efficient implementation will allow a solution to be obtained on a
>> modestly powered computer in less than one minute."
>
> That statement is for C, not Python. Python is efficient with human time,
> but not machine time. If something really takes a minute in C, allow
> yourself at least 10 minutes or even more with plain CPython.
>
> --
> Terry Jan Reedy

Python is the second most popular language on Project Euler, at 14358
users compared to 15897 who use C/C++. I'm pretty sure they don't
assume you use C. Although Python's longs do make some of the early
problems really really easy.

[toc] | [prev] | [next] | [standalone]

#8187

From	Chris Torek <nospam@torek.net>
Date	2011-06-22 05:58 +0000
Message-ID	<its0ar02sia@news2.newsguy.com>
In reply to	#8119

Now that the exercise has been solved...

Instead of "really short code to solve the problem", how about 
some "really long code"? :-)

I was curious about implementing prime factorization as a generator,
using a prime-number generator to come up with the factors, and
doing memoization of the generated primes to produce a program that
does what "factor" does, e.g.:

    $ factor 99999999999999999
    99999999999999999: 3 3 2071723 5363222357

The python code is rather too slow for this particular number (I
gave up on it finding 5363222357) but works quite well on 600851475143,
or even, e.g., 12186606004219:

    $ python factor.py 600851475143 12186606004219
    600851475143: 71 839 1471 6857
    12186606004219: 2071723 5882353

While searching for speed hacks I came across a few interesting
tricks, particularly TZ<omega>TZIOY's mod-30 scheme (erat3) at
stackoverflow.com (see URLs in the comments), which only really
works well in Python 2.7 and later (using itertools.compress).
Here is the end result (run with 2.5 and 2.7, I have no 3.x installed
anywhere convenient, and of course the print calls would change):

import itertools
import sys

def count(start = 0, step = 1):
    """
    Yield count starting from <start> and counting up by <step>.
    Same as itertools.count() except for the <step> argument, and
    allowing non-"int" arguments.
    
    Python 2.7 and later provides this directly, via itertools.

    Note: it's usually faster to use itertools.islice(itertools.count(...))
    than to run the "while True" loop below, so we do that if we can.
    """
    if (sys.version_info[0] > 2 or
            (sys.version_info[0] == 2 and sys.version_info[1] >= 7)):
        return itertools.count(start, step)
    if isinstance(start, int) and isinstance(step, int):
        if step == 1:
            return itertools.count(start)
        if 1 < step < 5: # 5 upper bound is a guess
            return itertools.islice(itertools.count(start), 0, None, step)
    def f(start, step):
        while True:
            yield start
            start += step
    return f(start, step)

# Sieve of Eratosthenes-based prime-number generator.
#
# Based on David Eppstein's for python 2.3(?) and subsequent
# discussion -- see
# http://code.activestate.com/recipes/117119-sieve-of-eratosthenes/
#
# See also:
# http://oreilly.com/pub/a/python/excerpt/pythonckbk_chap1/index1.html?page=last
#
# http://stackoverflow.com/questions/2211990/how-to-implement-an-efficient-infinite-generator-of-prime-numbers-in-python/3796442#3796442
def primes():
    """
    Yields sequence of prime numbers via Sieve of Eratosthenes.
    """
    # Keep the state from the last time we abandoned the
    # generator, in primes.primes and primes.D.  We can then
    # very quickly re-yield previously-saved primes.  We're
    # going to skip even numbers below, we so prime the
    # "primes so far" list with 2.
    #
    # For the fancy (erat3) version, we need to pre-provide
    # 2, 3, and 5, and pre-load D.  Having done that we can
    # start either version at 7.
    try:
        primes.primes
    except AttributeError:
        primes.primes = [2, 3, 5]
    for p in primes.primes:
        yield p
    # OK, now we need a mapping holding known-composite values
    # (numbers "struck from the sieve").
    try:
        D = primes.D
    except AttributeError:
        D = primes.D = { 9: 3, 25: 5 }
    # D[q] exists if q is composite; its value is the first prime
    # number that proves that q is composite.  (However, only odd
    # numbers appear in D.)
    q = p + 2 # where we start sieve-ing, below
    if sys.version_info[0] == 2 and sys.version_info[1] < 7:
        for q in count(q, 2):
            p = D.pop(q, None)
            if p is None:
                # q was not marked composite, so it's prime; moreover,
                # q-squared is composite (and odd) and its first prime
                # factor is q.
                primes.primes.append(q)
                D[q * q] = q
                yield q
            else:
                # q is composite, p is its first prime factor -- e.g.,
                # q=9 and p=3, or q=15 and p=3.  Extend the sieve:
                # find first natural number k where q + 2kp is not already
                # already known as composite.  (e.g., if q=9 and p=3, k=1
                # so that we mark D[15] as composite, with 3 as its first
                # prime factor.)  Note that we are skipping even numbers,
                # so p and q are both odd, so q+p is even, q+2p is odd,
                # q+3p is even, q+4p is odd, etc.  We don't need to mark
                # even-numbered composites in D, which is why we only look
                # at q + 2kp.
                twop = p + p
                x = q + twop # next odd multiple of p
                while x in D: # skip already-known composites
                    x += twop
                D[x] = p
    else:
        #       7  9 11 13 15 17 19 21 23 25 27 29 31 33 35
        MASK = (1, 0, 1, 1, 0, 1, 1, 0, 1, 0, 0, 1, 1, 0, 0)
        MODULOS = frozenset( (1, 7, 11, 13, 17, 19, 23, 29) )

        # If we started counting from 7, we'd want:
        #   itertools.compress(itertools.count(7,2), itertools.cycle(MASK))
        # But we start counting from q which means we need to clip off
        # the first ((q - 7) % 30) // 2 items:
        offset = ((q - 7) % 30) // 2
        for q in itertools.compress(itertools.count(q, 2),
                itertools.islice(itertools.cycle(MASK), offset, None, 1)):
            p = D.pop(q, None)
            if p is None:
                D[q * q] = q
                primes.primes.append(q)
                yield q
            else:
                twop = p + p
                x = q + twop
                while x in D or (x % 30) not in MODULOS:
                    x += twop
                D[x] = p

def factors(num):
    """
    Return all the prime factors of the given number.
    """
    if num < 0:
        num = -num
    if num < 2:
        return
    for p in primes():
        q, r = divmod(num, p)
        while r == 0:
            yield p
            if q == 1:
                return
            num = q
            q, r = divmod(num, p)

if __name__ == '__main__':
    for arg in (sys.argv[1:] if len(sys.argv) > 1 else ['600851475143']):
        try:
            arg = int(arg)
        except ValueError, error:
            print error
        else:
            print '%d:' % arg,
            for fac in factors(arg):
                print fac,
                sys.stdout.flush()
            print
-- 
In-Real-Life: Chris Torek, Wind River Systems
Salt Lake City, UT, USA (40°39.22'N, 111°50.29'W)  +1 801 277 2603
email: gmail (figure it out)      http://web.torek.net/torek/index.html

[toc] | [prev] | [next] | [standalone]

#8188

From	Paul Rubin <no.email@nospam.invalid>
Date	2011-06-21 23:23 -0700
Message-ID	<7xiprybd0v.fsf@ruckus.brouhaha.com>
In reply to	#8187

Chris Torek <nospam@torek.net> writes:
> def primes():
>     """
>     Yields sequence of prime numbers via Sieve of Eratosthenes.
>     """

I think this has the same order-complexity but is enormously slower in
practice, and runs out of recursion stack after a while.  Exercise: spot
the recursion.

    from itertools import islice, ifilter, count

    def primes():
        a = count(2)
        while True:
            p = a.next()
            yield p
            a = ifilter(lambda t,p=p: t%p, a)

    # print first 100 primes
    print list(islice(primes(), 100))

[toc] | [prev] | [next] | [standalone]

#8287

From	Slaunger <slaunger@gmail.com>
Date	2011-06-23 02:52 -0700
Message-ID	<30108a27-f1a2-450a-a241-9c2dcaf3c8e4@eb1g2000vbb.googlegroups.com>
In reply to	#8188

As a general note concerning the use of Python on Project Euler, and
the one minute guideline.

For problems 1-100, each problem is easily solved in less than 1
minute processing time *if* the algorithms and math is done "right"
and with thought.

My project Euler scripts solves the first 100 problems with an average
of 0.91 secs/problem on a 4 y old std business Laptop running 32 bit
Win XP. Of these, one problem takes 18 secs.

For some of the later problems it certainly becomes very difficult to
do all problems within 1 minute if you use Python on an ordinary
processing platform. There you need to resort to a compiled language
like C, C++, or dedicated mathematical software packages, which
implement complex mathematical functions using highly efficient native
libraries.

Kim

[toc] | [prev] | [next] | [standalone]

#8189

From	Ian Kelly <ian.g.kelly@gmail.com>
Date	2011-06-22 01:16 -0600
Message-ID	<mailman.273.1308727028.1164.python-list@python.org>
In reply to	#8187

On Tue, Jun 21, 2011 at 11:58 PM, Chris Torek <nospam@torek.net> wrote:
> I was curious about implementing prime factorization as a generator,
> using a prime-number generator to come up with the factors, and
> doing memoization of the generated primes to produce a program that
> does what "factor" does, e.g.:

This is a generator-based sieve I wrote a while back to solve the
PRIME1 problem at SPOJ.  The problem is to generate all the prime
numbers within specified ranges, where the numbers are great enough
that a full sieve would run out of memory, and the ranges are wide
enough that a O(sqrt(n)) test on each number would simply take too
long:

https://www.spoj.pl/problems/PRIME1/

The script is not terribly impressive from a technical standpoint, but
what tickles me about it is its bootstrappiness; the set that the
"primes" generator checks to determine whether each number is prime is
actually built from the output of the generator, which itself contains
no actual primality-testing logic.  Hope you like it:

8<--------------------------------------------------------------------
import math

def primes(m, n):
    # Yield all the primes in the range [m, n), using the nonprimes set
    # as a reference.  Except for 2, only odd integers are considered.
    if m <= 2:
        yield 2
        m = 3
    elif m % 2 == 0:
        m += 1  # Force m to be odd.
    for p in xrange(m, n, 2):
        if p not in nonprimes:
            yield p

# Read all the bounds to figure out what we need to store.
bounds = [map(int, raw_input().split(' ')) for t in xrange(input())]
limit = max(n for (m, n) in bounds)
sqrt_limit = int(math.sqrt(limit))

# Mark odd multiples of primes as not prime.  Even multiples
# do not need to be marked since primes() won't try them.
nonprimes = set()
for p in primes(3, sqrt_limit+1):
    # Mark odd nonprimes within the base range.  p*3 is the first
    # odd multiple of p; p+p is the increment to get to the next
    # odd multiple.
    nonprimes.update(xrange(p*3, sqrt_limit+1, p+p))

    # Mark odd nonprimes within each of the requested ranges.
    for (m, n) in bounds:
        # Align m to the first odd multiple of p in the range
        # (or the last odd multiple before the range).
        m -= (m % (p + p) - p)
        m = max(m, p*3)
        nonprimes.update(xrange(m, n+1, p+p))

# Generate and write the primes over each input range.
first = True
for (m, n) in bounds:
    if not first:
        print
    first = False
    for p in primes(m, n+1):
        print p
8<--------------------------------------------------------------------

[toc] | [prev] | [next] | [standalone]

#8201

From	Anny Mous <b1540457@tyldd.com>
Date	2011-06-22 22:01 +1000
Message-ID	<4e01d983$0$29977$c3e8da3$5496439d@news.astraweb.com>
In reply to	#8187

Chris Torek wrote:

> Now that the exercise has been solved...
> 
> Instead of "really short code to solve the problem", how about
> some "really long code"? :-)
> 
> I was curious about implementing prime factorization as a generator,
> using a prime-number generator to come up with the factors, and
> doing memoization of the generated primes to produce a program that
> does what "factor" does, e.g.:
> 
>     $ factor 99999999999999999
>     99999999999999999: 3 3 2071723 5363222357
> 
> The python code is rather too slow for this particular number (I
> gave up on it finding 5363222357) 

It shouldn't take more than a few seconds to factorise 10**17-1 even in pure
Python. On my not-very-powerful PC, using a home-brew pure-Python function
(code to follow), I get all four factors in under four seconds.

In comparison, I ran your script for over five minutes before giving up, it
still hadn't found the fourth factor.

Don't be disappointed though, you're in illustrious company. Getting the
Sieve of Eratosthenes *badly* wrong is one of the most common mistakes,
second only to getting involved in land wars in Asia. See this paper by
Melissa O'Neill:

http://www.cs.hmc.edu/~oneill/papers/Sieve-JFP.pdf

What people often describe as the Sieve of Eratosthenes is frequently the
Sieve of Euler, which while mathematically elegant, is computationally
crap. As O'Neill calls it, the Unfaithful Sieve.

In my home-brew primes module, I have this to say about three common
algorithms, including the classic Unfaithful Sieve, Turner's algorithm:

# === Prime number generators to avoid ===

# The following three prime generators are supplied for educational
# purposes, specifically as a terrible warning on how NOT to do it.
#
# Their performance starts at bad in the case of trial_division(), falls to
# terrible with turner(), and ends up with the utterly awful
# naive_division(). None of these three have acceptable performance; they
# are barely tolerable even for the first 100 primes. Their only advantage
# is that they are somewhat easy to understand.

Here's my factors() function. Adapting it to be a generator is left as an
exercise.

def factors(n):
    """factors(integer) -> [list of factors]

    >>> factors(231)
    [3, 7, 11]

    Returns the (mostly) prime factors of integer n. For negative integers,
    -1 is included as a factor. If n is 0 or 1, [n] is returned as the only
    factor. Otherwise all the factors will be prime.
    """
    if n != int(n):
        raise ValueError('argument must be an integer')
    elif n in (0, 1, -1):
        return [n]
    elif n < 0:
        return [-1] + factors(-n)
    assert n >= 2
    result = []
    for p in sieve():
        if p*p > n: break
        while n % p == 0:
            result.append(p)
            n //= p
    if n != 1:
        result.append(n)
    return result

def sieve():
    """Yield prime integers efficiently.

    This uses the Sieve of Eratosthenes, modified to generate the primes
    lazily rather than the traditional version which operates on a fixed
    size array of integers.
    """
    # This implementation based on a version by Melissa E. O'Neill,
    # as reported by Gerald Britton:
    # http://mail.python.org/pipermail/python-list/2009-January/1188529.html
    innersieve = sieve()
    prevsq = 1
    table  = {}
    i = 2
    while True:
        if i in table:
            prime = table[i]
            del table[i]
            nxt = i + prime
            while nxt in table:
                nxt += prime
            table[nxt] = prime
        else:
            yield i
            if i > prevsq:
                j = next(innersieve)
                prevsq = j**2
                table[prevsq] = j
        i += 1

I don't claim that my version of the sieve above will break any
computational records, but it performs quite well for my needs.

-- 
Steven

[toc] | [prev] | [next] | [standalone]

#8204

From	Chris Angelico <rosuav@gmail.com>
Date	2011-06-22 22:28 +1000
Message-ID	<mailman.277.1308745697.1164.python-list@python.org>
In reply to	#8201

On Wed, Jun 22, 2011 at 10:01 PM, Anny Mous <b1540457@tyldd.com> wrote:
>            prime = table[i]
>            del table[i]
>

I don't fully understand your algorithm, but I think these two lines
can be rewritten as:
prime=table.pop(i)

Interesting algo. A recursive generator, not sure I've seen one of those before.

ChrisA

[toc] | [prev] | [next] | [standalone]

#8219

From	Ian Kelly <ian.g.kelly@gmail.com>
Date	2011-06-22 09:52 -0600
Message-ID	<mailman.285.1308757993.1164.python-list@python.org>
In reply to	#8201

On Wed, Jun 22, 2011 at 6:01 AM, Anny Mous <b1540457@tyldd.com> wrote:
> def sieve():
>    """Yield prime integers efficiently.
>
>    This uses the Sieve of Eratosthenes, modified to generate the primes
>    lazily rather than the traditional version which operates on a fixed
>    size array of integers.
>    """
>    # This implementation based on a version by Melissa E. O'Neill,
>    # as reported by Gerald Britton:
>    # http://mail.python.org/pipermail/python-list/2009-January/1188529.html
>    innersieve = sieve()
>    prevsq = 1
>    table  = {}
>    i = 2
>    while True:
>        if i in table:
>            prime = table[i]
>            del table[i]
>            nxt = i + prime
>            while nxt in table:
>                nxt += prime
>            table[nxt] = prime
>        else:
>            yield i
>            if i > prevsq:
>                j = next(innersieve)
>                prevsq = j**2
>                table[prevsq] = j
>        i += 1

This appears to have a small bug in it, but perhaps it doesn't matter.
 At the "yield i" statement, it is possible that i > prevsq.  It could
be that i == next(innersieve)**2, in which case i is yielded as prime
despite being composite.  This would then cause additional errors
further along.

The only way this could happen would be if there were two consecutive
primes such that all the numbers between their squares are composite,
thereby failing to add the next prime to the table until after its
square has been reached.  This seems a rather unlikely scenario, but I
can't say for certain that it never happens.

The Haskell version does not contain this flaw, as far as I can tell.

Cheers,
Ian

[toc] | [prev] | [next] | [standalone]

#8216

From	MRAB <python@mrabarnett.plus.com>
Date	2011-06-22 16:33 +0100
Message-ID	<mailman.282.1308756829.1164.python-list@python.org>
In reply to	#8187

On 22/06/2011 06:58, Chris Torek wrote:
> Now that the exercise has been solved...
>
> Instead of "really short code to solve the problem", how about
> some "really long code"? :-)
>
> I was curious about implementing prime factorization as a generator,
> using a prime-number generator to come up with the factors, and
> doing memoization of the generated primes to produce a program that
> does what "factor" does, e.g.:
>
>      $ factor 99999999999999999
>      99999999999999999: 3 3 2071723 5363222357
>
> The python code is rather too slow for this particular number (I
> gave up on it finding 5363222357) but works quite well on 600851475143,
> or even, e.g., 12186606004219:
>
>      $ python factor.py 600851475143 12186606004219
>      600851475143: 71 839 1471 6857
>      12186606004219: 2071723 5882353
>
[snip]
This code isn't particularly efficient, but it's fast enough:

import math

n = 99999999999999999
limit = math.sqrt(n)
test = 2
factors = []
while test <= limit:
     if n % test == 0:
         factors.append(test)
         n //= test
         limit = math.sqrt(n)
     else:
         test += 1

factors.append(n)

print(factors)

[toc] | [prev] | [standalone]

Page 2 of 2 — ← Prev page 1 [2]

csiph-web

How can I speed up a script that iterates over a large range (600 billion)?

Contents

#8143

#8147

#8180

#8185

#8247

#8170

#8187

#8188

#8287

#8189

#8201

#8204

#8219

#8216