Project Euler in Python

Sunday, March 13, 2016

Problem 4 - Largest Palindrome Product

A palindromic number reads the same both ways. The largest palindrome made from the product of two 2-digit numbers is 9009 = 91 × 99. Find the largest palindrome made from the product of two 3-digit numbers.

My first implementation:

top = 0
for x in xrange(100,1000):
    for y in xrange(x,1000):
        blarg = str(y * x)
        if blarg == blarg[::-1]:
            if y * x > top:
                top = y * x
print "Greatest ",top
# 0.334316968918 Typical execution time

There is relatively little that can be done to improve here, as no easy criteria exist to substantially improve what is already a fairly straightforward problem. However, one improvement can be made that improves the search time by a small margin:

top = 0
for x in xrange(999,100,-1):
    for y in xrange(999,x,-1):
        blarg = str(y * x)
        if blarg == blarg[::-1]:
            if y * x > top:
                top = y * x
print "Greatest ",top
# 0.310236930847 Typical execution time

Just by making our search start from the high end of our search range (That is, searching from 999 down to 100, rather than 100 up to 999), we can come across the solution a lot faster (fourth iteration, to be precise). However, this doesn't stop our program from continuing to search after already finding the answer. The only way we can make this execute faster from here is to narrow our search space.

Consider the smallest possible 6-digit (which we know our answer must be, since the smallest 7-digit number is the product of the first two four-digit numbers) palindromic number that is the multiple of two 3-digit numbers, which is 111111 (143 * 777). Since our product must be a palindrome, we can say that the digit in positions 1 and 6 must be the same, as well as the pairs in positions 2 and 5, and 3 and 4, respectively. To put this into polynomial form:

N = 100001x + 10010y + 1100z

For example, if we replace (x, y, z) with (5, 1, 4), we get N = 514415.
To simplify the equation, we can divide out a common factor of the three coefficients, the largest of which is 11:

N = 11(9091x + 910y + 100z)

This tells us something interesting about the space we're searching in: that if we take the prime factors of the 6-digit numbers that are palindromes, that one of the factors of one of the numbers is 11, since 11 is prime. Let's check this with some code:

for x in palis:
    if x % 11 == 0:
        print x,len(str(x)),"Factor of 11!"
    else:
        print x,len(str(x)),"Not factor of 11!"

In this code, the variable palis is a list of all palindromes generated by multiples of three-digit numbers. I'll leave it up to any experimental readers to generate their own copy of said list. Running this code with the proper input demonstrates that for the 6-digit entries in the list, all have a factor of 11, while the 5-digit entries do not.

Now that we know that our answer must have a factor of 11, we can use a condition to switch our search step from 1 to 11 when our first factor is not a multiple of 11.

top = 0
for x in range(999,100,-1):
    if x % 11 == 0:  
        step = -1
        roof = 999
    else:
        step = -11
        roof = 990
    for y in range(roof,x,step):
        prod = x * y
        if prod <= top:
            break
        prodstr = str(prod)
        if prod == prod[::-1]:
            top = prod
print "Greatest ",top
# 0.00324416160583 Typical execution time

Friday, March 11, 2016

Problem 3 - Prime Factorization

The prime factors of 13195 are 5, 7, 13 and 29.
What is the largest prime factor of the number 600851475143 ?

Prime Factorization - Now we're getting into it.
Here we have a problem that not only is the first significant problem in the Project Euler series, but is also a wonderful starting point to explore the application of the difficulty of prime factorization to insure cryptographic security for modern computer systems.

The problem itself is hard because on the surface it seems to need a brute-force approach; What way is there to find the factors of a large integer besides checking all of the integers beneath it for factors? My initial implementation stuck with this approach of dully checking as many prime numbers as possible to see if they are factors:

import math
target = 600851475143L
primes = [1,2,3,5]
for x in range(7,10000):
    half = int(math.ceil(x/2))
    for y in range(2,half+1):
        if float(x) % float(y) == 0:
            break
        if (y == half):
            primes.append(x)
factors = []
for x in primes:
    if target % x == 0:
        factors.append(x)
print factors

While this solution completes in only a few seconds on my laptop, It is nonetheless rather crude. One immediate optimization is possible in the first for loop by making it not iterate over the even numbers beyond two, so that it doesn't test 8, 10, 12 and so on, for being divisible by 2 - A trivial elimination when searching for primes. Let's change that.

for x in range(7,10000,2):
    half = int(math.ceil(x/2))
    for y in range(2,half+1):
        if float(x) % float(y) == 0:
            break
        if (y == half):
            primes.append(x)

This shaves approximately 12% of the run time off on my computer - A small but significant improvement. But let's try another approach. Instead of just stepping up the integers modulus-ing as we go and storing our hits, let's find factors as we go, and divide our goal by that factor, leaving a quotient that can also be factored. To do this, we just need a method that will take a number and return the lowest factor it can find of that number as well as the quotient. Now we have a new number (the quotient) that can be factorized by the same method.

def get_first_factor(innum):
    if innum == 2:
        return 2
    if innum % 2 == 0:
        return 2, innum / 2
    a = 3
    while a < int(math.ceil(innum/2)):
        if innum % a == 0:
            return a, innum / a
        a += 2
    return [innum]

With this, we can then check the returned values for non-primes, and factorize again if needed. I'll use a class to calculate and save prime numbers (more details on this down the road in problem 7), to be able to quickly identify prime factors.

class primes():
    primelist = [2,3,5,7,11,13,17,19,23,29,31,37]
    def is_prime(self,x):
        if x in self.primelist:
            return True
        if x < max(self.primelist) or x == max(self.primelist) + 1:
            return False
        if x % 2 == 0:
            return False
        z = 3
        while z < int(math.ceil(x/2)):
            if x % z == 0:
                return False
            z += 2
        self.primelist.append(x)
        return True

Similarly, if we use a dictionary to store the factor lists, we can later retrieve those lists when working on new ones we're calculating that have the same factors, rather than having to find them again.

class factor_store:
    factors = {} # Dictionary for storage
    plist = primes()
    def factor_list(self,innum):
        if innum in self.factors.keys():
            return self.factors[innum]
        temp = [innum]
        if self.plist.is_prime(innum):
            return temp
        all_primes = False
        x = 0
        while not self.plist.is_prime(temp[x]):
            if x == len(temp) - 1 and self.plist.is_prime(temp[x]):
                all_primes = True
                break
            temp[x:x+1] = get_first_factor(temp[x])
            x += 1
        return temp

This is one of the advantages of using an object-oriented approach for algorithms. By saving our functions' work, we don't have to duplicate it; This is helpful when the algorithms available to us for this problem have high O() magnitudes. This method of finding the same answer completes in almost negligible time compared to my earlier implementations because of the factor and prime storage:

#2.61310791969 - initial factors search
#2.30295395851 - space-narrowed factors search
#0.00332307815552 - prime and factors storage, single factor removal

Tuesday, March 8, 2016

Problem 2 - Every other fib()

Each new term in the Fibonacci sequence is generated by adding the previous two terms. By starting with 1 and 2, the first 10 terms will be:
1, 2, 3, 5, 8, 13, 21, 34, 55, 89, ...
By considering the terms in the Fibonacci sequence whose values do not exceed four million, find the sum of the even-valued terms.

Calculating the nth term of the Fibonacci sequence is a classic means of exploring recursion as a tool in function design, and can be applied here. We can also store Fibonacci results in a list and then use them after generation. We can also use a generator to calculate the nth term. To start, however, I'll present my original, and cringe-worthy implementation of the solution.

last1 = 1
last2 = 1
current = 0
runsum = 0
for x in xrange(1,200):
    if current > 4000000:
        break
    current = last1 + last2
    if current % 2 == 0:
        runsum += current
    last2 = last1
    last1 = current
print runsum

Here, the code loops until the arbitrary limit of 200 (non-inclusive) Fibonacci numbers, generating the next number on each iteration. Every time, it checks to see if the current iteration number is even, if so, it adds the new number to a running total.
Another approach is to use recursion, possibly the least effective method of solving the problem when it comes to scalability, in which a new Fibonacci term is generated with a recursive call, necessitating a new recursion path for each term.

def recur_fib(x):
    if x == 0 or x == 1:
        return 1
    return recur_fib(x - 1) + recur_fib(x - 2)

This method will return the nth Fibonacci number, however, the larger the target term, the longer it will take. One way we can improve on this is to store all of the terms that we generate. We can write the results to a file, or just keep them in a list encapsulated in a class.

class fib_list:
    terms = [1,2]
    def __getitem__(self, n):
        if len(self.terms) < n:
            return self.terms[n]
        while len(self.terms) < n:
            self.add_one_term()
        return self.terms[-1]
    def add_one_term(self):
        self.terms.append(self.terms[-2] + self.terms[-1])
    def add_until_target(self, target):
        while self[-1] < target:
            self.add_one_term()
        self.terms = self.terms[:-1]
temp = fib_list
temp.add_until_target(4000000) 
total = sum([x for x in temp.terms if x % 2 == 0]) # Answer generation

With this class, we can save the Fibonacci numbers in a list and get our answer with a single line of code that executes in trivial time. Generating the Fibonacci sequence remains the most intensive part of the problem. Is there a way to generate only the numbers we need without having to follow the sequence to its target, generating twice as many numbers as necessary?

Thankfully, there is a closed-form solution for generating the nth Fibonacci term known as Binet's formula. With this, we can generate an individual Fibonacci term, and by extension only the ones we need in the sequence (Thanks to Zeychin on StackOverflow for the Python Binet's).

def fib_at_n(n):
    return 0.4472135954999579392818347337462552470881236719223051448541 * (pow(1.6180339887498948482045868343656381177203091798057628621354,n) - pow(-0.618033988749894848204586834365638117720309179805762862135,n))
templist = []
for x in range(0,32,2):
    templist.append(fib_at_n(x))

With this method, our execution time is comparable to the time used in constructing the full Fibonacci sequence above using the list encapsulated in a class. However, this last method is by far the best scaling method. If the target term were much farther along in the sequence, this method would remain the fastest. For demonstration purposes, I'll pit each of these methods against one another in generating the 1000th Fibonacci term.

0.000510931015015 # Original Implementation
0.000808000564575 # List construction
1.90734863281e-06 # Stored list recall
4.05311584473e-06 # Binet's

As you can see, the execution time for the closed solution executes within the same time magnitude as recalling the list from a saved record, making this the best method for generating individual Fibonacci terms, but not the best for generating the whole sequence.

Monday, March 7, 2016

Problem 1 - Fizz Buzz expanded

If we list all the natural numbers below 10 that are multiples of 3 or 5, we get 3, 5, 6 and 9. The sum of these multiples is 23. Find the sum of all the multiples of 3 or 5 below 1000.

This is an expansion on a classic Computer Science 1 problem. The goal of the exercise is to use text output to print all multiples of 3 and 5 (traditionally identified with the words "fizz" and "buzz") between 1 and an arbitrary goal number. The challenge is to use conditions to print the appropriate words without printing the accompanying number multiple times. This identifies whether or not the student can properly implement non-exclusive conditional statements.

Here, however, we're not looking to print corresponding words, but simply to sum the numbers meeting the classic criterion (having factors of 3 or 5) in order to find the "secret" number that is the puzzle solution.

Several approaches are available to us using Python, so I'll start with a procedural addition implementation.

runtotal = 0
for x in xrange(1000):
    if x % 3 == 0 or x % 5 == 0:
        runtotal += x
print runtotal

This rather straightforward implementation runs straight through, and sums the total of all multiples of 3 or 5 as it goes. Let's make a more "Python-esque" (in the sense of compact) method, using set theory and Python list comprehension.

threes = [x for x in range(3,1000,3)] # make list of multiples of three
fives = [x for x in range(5,1000,5)] # make list of multiples of five
print sum(list(set(threes) | set(fives))) # union previous lists, and sum

Here, we can construct lists of all multiples of three and five, respectively, under the goal number. That done, we union the two lists into a new list. Then we simply print the sum of the list. Can we compact this any more?
This time, we'll reduce the construction of the union of the two lists of multiples to a single line and list, instead of creating them separately and then combining. This way, we can reduce all of our solution to a single line:

print sum([x for x in range(3,1000) if x % 3 == 0 or x % 5 == 0])

As we can see, several means of arriving at the same solution exist within Python. The execution times in seconds for these on my laptop are:

0.000421047210693 # procedural addition
0.000199079513550 # sets, separate list comprehensions
0.000399827957153 # sets, single list comprehension

This presents us with a curiosity: how is the implementation that has to iterate to 1000 twice faster than the one that only has to do it once? The answer is that the total loop count of the second is 531, not 1000. This is because of the step parameter that I included in the list comprehensions. These made the loops that added numbers to the lists increment by 3 or 5, rather than every number and check to see if the result is a multiple of 3 or 5. This way, the loops creating theses lists loop fewer times total. We can't use this optimization in the first or last implementations, since doing so would skip some of the numbers that need to be included in the sum.
Edit: I want to thank Alex Gorbatchev for the SyntaxHighlighter javascript plugin, which I'll be using to present formatted Python code.