## Prime Time Programming, Part 1

From time to time, I find myself caught up in the heady world of Project Euler. It’s almost like playing Professor Layton or some other puzzle-solving game – mathematical or programming brain teasers, served in bite-sized pieces.

If you look at the Project Euler problems for any length of time, you’ll notice common themes. One theme is prime numbers – many problems can’t be solved without generating varying quantities of primes. To that end, I’d written a prime generator to be shared between problem solutions. Over time, I added functionality and “optimized” the generator to improve the running time of my solutions. Everything was great, until I tried Problem 315, whose solution required a list of primes between 10^{7} and 2×10^{7}. My solution worked, but it ran really slowly – something like 12 minutes. Now, I’m not doing myself any favours by writing in Python and running on a 7-year-old laptop, but that’s still too long.

## My Original Generator

This is the slow-performing generator that I replaced when working on Problem 315. The squeamish reader may want to avert his eyes.

class PrimeGenerator: __primes_so_far = [5] __increment = 2 def __init__(self): self.__next_index = -1 def is_prime(self, n): if n == 2 or n == 3: return True elif n % 2 == 0 or n %3 == 0: return False elif n <= PrimeGenerator.__primes_so_far[-1]: # bisecting didn't really help return n in PrimeGenerator.__primes_so_far else: return self.__is_prime(n) def __is_prime(self, n): limit = math.sqrt(n) g = PrimeGenerator() g.next() # skip 2 g.next() # skip 3 p = g.next() while p <= limit: if n % p == 0: return False p = g.next() return True def next(self): self.next = self.__next3 return 2 def __next3(self): self.next = self.__next5 return 3 def __next5(self): self.__next_index += 1 if self.__next_index < len(PrimeGenerator.__primes_so_far): return PrimeGenerator.__primes_so_far[self.__next_index] candidate = PrimeGenerator.__primes_so_far[-1] while True: candidate += PrimeGenerator.__increment PrimeGenerator.__increment = 6 - PrimeGenerator.__increment if self.__is_prime(candidate): PrimeGenerator.__primes_so_far.append(candidate) return candidate def __iter__(self): return self

### My eyes!

When I first went back to this code, I exclaimed, “What was I thinking?”. I can think of two things:

- The
`is_prime`

member was intended to help out for problems where I didn’t have to create too many primes, but just had to check a few number for primality. This doesn’t really belong here, and clutters the class. I’d be better off focusing on prime generation. - I was optimizing at least partly for the case where I’d want to get lists of primes a couple times—hence the indexing into an already-generated list of primes. If the generator were fast enough, this wouldn’t be necessary.

Things I can’t understand, even today. I’ll have to blame them on then-ignorance, foolishness, and hasty modifications to the class:

- Why the mucking about with
`__next3`

and`__next5`

? What did I have against yield? - Why is
`is_prime`

fussing with a whole new`PrimeGenerator`

and skipping 2 and 3? Why not just go straight for the saved list of primes starting with 5?

In fact, just about the only defensible things (as we’ll see below) in the whole class are:

- ending the modulus checks once the candidate divisor is greater than the square root of the candidate number, and
- the bit where the next
`__increment`

is formed by subtracting the current one from 6 – I was exploiting the fact that, past 3, for any prime*p*, p ≡ 1 (mod 6) or p ≡ 5 (mod 6).

### How slow was it?

The generator took **551.763 seconds** to generate primes less than **10 ^{7}**, as measured by the following:

def run(f): global highest start = datetime.datetime.now() count = 1 for p in f: count += 1 if p > highest: break end = datetime.datetime.now() elapsed = end-start return highest, count, elapsed.seconds + (elapsed.microseconds/1000000.0)

Where `f`

is an instance of `PrimeGenerator`

passed into the `run`

method, and `highest`

is a global that’s been set to 10^{7}.

## Moving forward

Based on the extreme slowness and general horribleness of the code, I figured I’d be better off starting over. So, I chucked the generator and started afresh with the simplest generator I could write. I resolved to make incremental changes, ensuring that at each step, the code was:

- correct (otherwise, why bother)
- faster than the previous version
- simple enough for me to understand

Let’s see what happened.

## Attempt 1: Trial Division

Trial division is one of the simplest methods for generating primes—you start counting at 2, and if no smaller positive integers (other than 1) divide the current number, it’s prime. The **naive implementation** is very simple.

def naive_trial(): n = 2 while True: may_be_prime = True for k in xrange(2, n): if n % k == 0: may_be_prime = False break if may_be_prime: yield n n += 1

This method takes **113.804 seconds** to generate primes below **100000**—I couldn’t wait for the full 10^{7} – it would probably be over 3 hours.

### Trial until root

Fortunately, there are some pretty obvious optimizations one can make to the algorithm. The first comes from the observation that if there is a number k, 1 < k < n, that divides n, then there is a number j that divides n with 1 < j ≤ √n, so we can stop our trial once we’ve hit that point.

def trial_until_root(): n = 2 while True: may_be_prime = True for k in xrange(2, int(n**0.5)+1): if n % k == 0: may_be_prime = False break if may_be_prime: yield n n += 1

This method takes **468 seconds** to generate primes up to 10^{7}. A definite improvement (and already faster my original generator), but there’s still room for more.

### Trial by primes

Here’s another observation about divisors of n: if there’s a number k that divides n, then there’s a prime number p ≤ k that divides n, since either k is prime or has prime factors. So if we keep a list of the primes found so far, we only need to check prime divisors that are less than √n.

def trial_by_primes(): primes_so_far = [] n = 2 while True: may_be_prime = True for p in primes_so_far: if n % p == 0: may_be_prime = False break if p * p > n: # it's prime break if may_be_prime: primes_so_far.append(n) yield n n += 1

Now we’re down to **136 seconds** to generate primes below 10^{7}. That was a worthwhile change, but we have to balance it against the fact that the generator now requires additional storage – the list of primes encountered so far. In this case, we’re storing over 660,000 prime numbers in a list. Even an older laptop can handle this burden, but it’s something to keep in mind.

### That’s odd

And by “that”, I mean “all the prime numbers except for 2”. There’s no point checking the even numbers to see if they’re prime. Let’s see what happens when we skip them. The only tricky part (and it’s not that tricky) is to make sure we still return 2 as our first prime.

def odd_trial_by_primes(): primes_so_far = [] yield 2 n = 3 while True: may_be_prime = True for p in primes_so_far: if n % p == 0: may_be_prime = False break if p * p > n: # it's prime break if may_be_prime: primes_so_far.append(n) yield n n += 2

This method takes **127 seconds** to generate primes less than 10^{7}. Not a huge improvement, but better than nothing, and it doesn’t really complicate the code that much. I’ll keep it. The reason we don’t get a huge improvement here is that checking the even numbers for primeness doesn’t take that much effort – they were eliminated as soon as we modded them by the first prime in `primes_so_far`

. Still, it’s a little quicker to jump right over them than to perform the division.

### What about 3?

If skipping the numbers that are divisible by 2 paid off, will skipping those divisible by 3? As I noted above, every prime *p* greater than 3 satisfies `p ≡ 1 (mod 6) or p ≡ 5 (mod 6)`

. Let’s use that. We’ll take advantage of these facts:

- if p ≡ 1 (mod 6) , then p+4 ≡ 5 (mod 6)
- if p ≡ 5 (mod 6) , then p+2 ≡ 1 (mod 6)

So we want to alternate our `step`

between 2 and 4. Fortunately 6 – 4 = 2 and 6 – 2 = 4, so we can use 6 – step as our next step.

def sixish_trial_by_primes(): primes_so_far = [] yield 2 yield 3 step = 2 n = 5 while True: may_be_prime = True for p in primes_so_far: if n % p == 0: may_be_prime = False break if p * p > n: # it's prime break if may_be_prime: primes_so_far.append(n) yield n n += step step = 6 - step

Now the time drops to **123** seconds to generate primes less than 10^{7}. Clearly we’ve hit diminishing returns – we’re saving two modulus operations on numbers that are divisible by 3 (but not 2), at the cost of a more complicated “step” calculation. We could continue on in this vein, but the gains are not likely to be large, and the complexity increases rapidly. Consider the next step: we’d make sure we don’t test numbers divisible by 2, 3, or 5. That means (after 5) we only consider numbers whose remainders when divided by 30 are one of 1, 7, 11, 13, 17, 19, 23, or 29. The steps between numbers are 6, 4, 2, 4, 2, 4, 6, and 2. Who has the energy?

## The problem with Trial Division

Trial division has a few things going for it:

- it’s simple to understand
- there are some obvious optimizations that can make its performance tolerable

Ultimately, though, its downfall is that it takes a lot of work to verify that a large number is prime. Consider the largest prime number below 10^{7}: 9999991. In order to verify that this is prime, we have to consider all prime factors less than √9999991. There are 446 of these. That’s 446 divisions, just to verify that one number is prime.

We’re unlikely to radically improve performance by continuing to tinker with trial division. It’s time to throw the whole thing away again and try a new approach. Next time we’ll do just that.