Project Euler in Python: March 2016

Monday, March 28, 2016

Problem 8 - Largest Digit Sequence Product

The four adjacent digits in the provided 1000-digit number that have the greatest product are 9 × 9 × 8 × 9 = 5832.
Find the thirteen adjacent digits in the 1000-digit number that have the greatest product. What is the value of this product?

Note: my implementation was for the original version of the problem, which asked for the greatest product of a five-digit sequence. Thankfully, almost nothing had to change in the code to adjust for the new 13-term product.

Once again, my initial implementation leaves little to improve upon in terms of efficiency.

ind = 0
greatest = 0
while ind < len(d) - 12:
    sliced = d[ind:ind+13]
    temp = 1
    for x in sliced:
        temp *= x
    if temp > greatest:
        greatest = temp
    ind += 1

# 0.00393509864807 Seconds

One change we can make immediately is to check for products that we know would result in zero. Since we are calculating products (multiplication), we know that if a zero is in the factors, that our result will be zero. Thus, we can just skip digit sequences that contain one or more zeroes.

ind = 0
greatest = 0
while ind < len(d) - 12:
    sliced = d[ind:ind+13]
    if 0 in sliced:
        ind += 1
        continue
    temp = 1
    for x in sliced:
        temp *= x
    if temp > greatest:
        greatest = temp
    ind += 1

# 0.00193285942078 seconds

This seems to reduce our computing time by a factor of two. With a larger data set, we could verify this, but that's not what we want. This is Python; We want a one-liner. List comprehensions ahoy!

First, we need to generate the slices of the input digits d:

slices = [d[x:x+13] for x in range(len(d) - 12)]

But how do we calculate the product of each slice easily? There's no product(iterable) function in the default libraries, but we can create one easily if we take advantage of the associative property of multiplication by just multiplying the first number into the second, and the result into the third, etc. until we are down to one. For this, we can use the core library's reduce() function, which performs a passed function on each pair of a list until there is only one. We only need to pass it a function that multiplies the two parameters, and we can use a simple lambda function for that:

reduce(lambda x,y:x*y, slice)

These will give us a list of products when combined thus:

products = [reduce(lambda x,y:x*y, d[x:x+13]) for x in range(len(d) - 12)]

And, since Python lists conveniently have a maximum function, we just need to apply it.

answer = max([reduce(lambda x,y:x*y, d[x:x+13]) for x in range(len(d) - 12)])

# 0.00341510772705 seconds

This is on the same time order as my original implementation (that didn't skip zero products), and we can fix that with a quick condition in our comprehension:

answer = max([reduce(lambda x,y:x*y, d[x:x+13]) for x in range(len(d) - 12) if 0 not in d[x:x+13]])

# 0.0014820098877 seconds

Thursday, March 24, 2016

Problem 7 - Nth Prime

By listing the first six prime numbers: 2, 3, 5, 7, 11, and 13, we can see that the 6th prime is 13.
What is the 10001st prime number?

Considering the run time of my original implementation:

# 23.6239449978 seconds

There is a lot to improve on in this:

import math
primes = [2,3,5,7]
itera = 8
while len(primes) < 10001:
    primelength = len(primes)
    for x in xrange(primelength):
        if itera % primes[x] == 0:
            break
        elif primes[x] > int(math.ceil(itera/2)):
            primes.append(itera)
            break
    itera += 1
print primes[-1]

Well, one thing to notice immediately is the iterator that only increments by one. Let's cut it in half.

primes = [2,3,5]
itera = 7
while len(primes) < 10001:
    primelength = len(primes)
    for x in xrange(primelength):
        if itera % primes[x] == 0:
            break
        elif primes[x] > int(math.ceil(itera/2)):
            primes.append(itera)
            break
    itera += 2
print primes[-1]

So, how much did that help?

# 23.306112051 seconds

Hmph. That doesn't help much at all.
Well, let's try removing our need to keep a massive list of integers and constantly read and write to it. That should reduce our average iteration time.

lastprime = 0
check = 3
pcount = 2
while pcount <= 10001:
    primeflag = True
        for x in range(3, int(math.ceil(math.floor(check))), 2):
            if check % x == 0:
                primeflag = False
                break
    if primeflag:
        lastprime = check
        pcount += 1
    check += 2
print lastprime

Our running time is:

# 94.41918993

Hmm . . .

Well, we have a relatively inefficient method of checking for primes with a flag. Also, we're checking all of the odd numbers, rather than deliberately stepping over some. Let's implement a more modular means of checking for primes. We can eliminate the two easiest prime factors straight away, as well as the entire category of non-primes below 2.

if x < 2:
    return False
if x == 2:
    return True
if x % 2 == 0:
    return False
if x % 3 == 0:
    return False

With these out of the way, we can put an upper limit on the factors that we check for a specific prime. Since we've already eliminated 2 and 3 as factors, we can say that the largest possible factor of any checked number will be equal to the floor of the square root of that number; One half and one third have already been checked, and any root factors will be larger than these two.

root = math.floor(math.sqrt(x))
factor = 5 # Next prime

Now, we just iterate on a counter, returning false if we find a factor.

while factor <= root:
if x % factor == 0:
    return False
factor += 2

Adding a default return value of True, we finally have a complete function:

def is_prime(x):
    if x < 2:
        return False
    if x == 2:
        return True
    if x % 2 == 0:
        return False
    if x % 3 == 0:
        return False
    root = math.floor(math.sqrt(x))
    factor = 5
    while factor <= root:
        if x % factor == 0:
            return False
        factor += 2
    return True

From here, we need only iterate over the integers, checking for primes. We can check from the last prime so we don't have to start from the bottom of the integers.

last = 5
diff = 2
primecount = 3
while primecount <= 10000:
 x = last + diff
 if is_prime(x):
  last = x
  diff = 2
  primecount += 1
  continue
 diff += 2
print last

Much better:

# 0.235863924026 seconds

Wednesday, March 16, 2016

Problem 6 - Sum of squares and square of sum

The sum of the squares of the first ten natural numbers is
12 + 22 + ... + 102 = 385
The square of the sum of the first ten natural numbers is
(1 + 2 + ... + 10)2 = 552 = 3025
Hence the difference between the sum of the squares of the first ten natural numbers and the square of the sum is 3025 − 385 = 2640. Find the difference between the sum of the squares of the first one hundred natural numbers and the square of the sum.

It is a rare programming problem that is so easy to provide a solution for, but difficult to take the extra step to optimize fully.

runsum = 0
runsquaresum = 0
for x in range(1,101):
    runsum += x
    runsquaresum += x**2
answer = runsum**2 - runsquaresum
# 6.103515625e-05 Run time in seconds

In fact, this solution is so straightforward to implement that it is possible to reduce it to a single line of code:

answer = sum(range(1,101)) ** 2 - sum([x ** 2 for x in range(1,101)])
# 2.90870666504e-05 Run time in seconds

Both of these implementations run on the same big-O magnitude of n, or linear, time. Thus, as our target number grows, our processing time grows in direct proportion. To clarify, if we make our target number 1000 instead of 100, we increase our processing time by a factor of 10. Take our target to 10000 and we again multiply our computation time by 10.

This is one approach to the problem that is a direct generation of the solution. However, suppose we use a different, slightly more search-y approach. Allow me to introduce you to Pascal's triangle:

Here, we have a mathematical construct with some very interesting properties, the most relevant of which to our problem is its containment of several summation series.

With these series, it becomes similarly complex to our previous solution to "look up" our answer in this structure. But, how do we find them?

The first part of our answer is in identifying what we need and finding a corresponding pattern in the pyramid. Since one half of our answer is the square of the sum of a number of consecutive integers, we need look no farther than the third row of the pyramid. The first row being the series of one along the pyramid, the second the summation of the ones in the previous row to that point. For ease of illustration, here's the same structure arranged in a horizontal-vertical orientation.

From here, the summation of previous rows by succeeding ones is slightly more plain. The reason we need to look at the third row, however, is that by looking here we can immediately find the sum of the second row to any given iteration, which is what we need to eventually square to produce the first part (the square of the sum) of our solution, and we need only look one row down from the second row to the corresponding entry to find the sum of the series to that point. For example:

1 + 2 + 3 + 4 + 5 = 15, 15^2 = 225

The second half of our solution is slightly more difficult to find, but can nonetheless be observed in the triangle. If we want to find the sum of squares to a certain point, we need to look at rows 2 and 4.

For example, to find the sum of squares to 5 (1^2 + 2^2 + 3^2 + 4^2 + 5^2), we look at row two to see how far we count over (5), then drop two rows to row 4 (landing at 35). Here, we simply take the number we landed on and add the previous entry on the same row (20), producing 55.

(1^2 + 2^2 + 3^2 + 4^2 + 5^2) = 1 + 4 + 9 + 16 + 25 = 55

This rule repeats throughout the pyramid, and will hold for when our target is 100 or 10000000, so we can say that:

(1 + 2 + 3 + 4 + 5)^2 - (1^2 + 2^2 + 3^2 + 4^2 + 5^2) = 225 - 55 = 170

With this method, our computation time takes a similar O(n) time magnitude, but with an additional time of magnitude n tacked on to generate the list we need to search, thus making the time magnitude remain O(n), but with a greater coefficient attached.

rows = []
rows.append([1 for x in range(100)])
x = 0
while x < 4:
    rows.append([sum(rows[-1][:a + 1]) for a in range(100)])
    x += 1
answer = (rows[2][-1] ** 2) - sum(rows[3][-2:])

While this solution works, when scaling our target up by a factor of 10, our solution time scales by a factor of 100, and this is attributable to the sum calculations that we have to do over and over to generate the sum lists. Thankfully, we can optimize this a bit. We don't need to generate the first row of all ones, since we don't reference it in the future. We can eliminate generation of the fourth row (third sum), since all we need from that row can be generated by summation of certain parts of the third row. Thus, we can optimize our code to once again have a O(n) time scale:

sum_one = [x for x in range(1,101)]
last = 0
sum_two = []
for x in range(100):
 sum_two.append(sum_one[x] + last)
 last = sum_two[-1]
square_of_sum = sum_two[-1] ** 2
sum_of_squares = (sum(sum_two[:-1]) * 2) + sum_two[-1]
answer = square_of_sum - sum_of_squares

After all this, can we improve our solution any more? Yes we can.

There is, thankfully, a deterministic, O(1) (constant time) formula for the sum of a series of integers, which we can use in our algorithm.

1 + 2 + ... + (n-1) + n = (n+1) + (n+1) + ... + (n+1) + (n+1) = (n+1) * n/2 = (n(n+1))/2
1 + 2 + ... + 99 + 100 = 101 + 101 + ... + 101 + 101 = 101 * 50 = 5050

Once we use this formula to find the sum of our series to number n, we square the result to get the first half of our answer. Is there a similar formula for the sum of squares? Yep.

(n(n+1)(2n+1))/6

This, in combination with the first, allows us to calculate the difference incredibly quickly with the complete formula:

((n(n+1))/2)^2 - (n(n+1)(2n+1))/6

Or:

square_of_sum = ((n * (n + 1)) / 2) ** 2
sum_of_squares = (((2 * n) + 1) * (n + 1) * n) / 6
answer = square_of_sum - sum_of_squares

Or, slightly more compact and fast:

answer = (((n * (n + 1)) / 2) ** 2) - (n * (n + 1) * ((2 * n) + 1)) / 6

When n equals 10000000, my original brute force implementation takes over 4 seconds, while this last one takes

# 7.86781311035e-06 seconds

Sunday, March 13, 2016

Problem 5 - Smallest Multiple

2520 is the smallest number that can be divided by each of the numbers from 1 to 10 without any remainder. What is the smallest positive number that is evenly divisible by all of the numbers from 1 to 20?

Well, this is awkward . . .

solution = 0
for x in range(180,1000000000,10):
    if solution != 0:
        break
    for y in range(2,21):
        if x % y != 0:
            break
        if x % y == 0:
            if y == 20:
                solution = x
            continue
print solution
# 40.1628811359 Execution time

In defense of my former self, there is one optimization in here that is helpful. Notice how the step in the for loop is 10. As our target number must have 10 as a factor, we only need search multiples of 10. However, this thinking doesn't immediately take the next logical step and double the step to 20, as this is the largest factor of our target. Amendment:

solution = 0
for x in range(180,1000000000,20):
    for y in range(2,20):
        if x % y != 0:
            break
        if x % y == 0:
            if y == 19:
                solution = x
            continue
    if solution != 0:
        break
print solution
# 22.1047170162 Execution time

And this halves our running time; Lovely. Now to actually address the big O.

To actually have a decent time scale, we need to do better. Stepping back from 20 and checking to see what factors we have to check at a minimum seems like a good starting point. First conclusion: we definitely check the larger numbers that have smaller numbers as a factor, thus checking several numbers at once by proxy. This reduces our search space from [2 ... 20] to [11, 13, 14, 16, 17, 18, 19] if we step by twenty. Let's try this

solution = 0
for x in range(2520,1000000000,20):
    for y in []:
        if x % y != 0:
            break
        if x % y == 0:
            if y == 19:
                solution = x
            continue
    if solution != 0:
        break
print solution
# 8.65195894241 Execution time

That's better, let's keep going. Realizing that a step of 20 doesn't have a factor of 3 in it, we can up our step to 60 to account. However, 60 doesn't have a factor of 7 in it, so we set it to 420.

solution = 0
for x in range(2520,1000000000,420):
    for y in [11, 13, 14, 16, 17, 18, 19]:
        if x % y != 0:
            break
        if x % y == 0:
            if y == 19:
                solution = x
            continue
    if solution != 0:
        break
print solution
# 0.421896934509 Execution time

Now that's respectable for a search approach. Now, let's attack the problem from the other direction. Rather than find our answer among thousands of possibilities, let's just make our answer using some maths.

Since we know that our answer is a product of all of the numbers from 2 to 20 (1 eliminated as tautoligical), we immediately know that we need to multiply all of the primes in here together immediately. What are these?

[2,3,5,7,11,13,17,19]

What's the product so far?

And well below our answer. Why? Because these are just the primes under twenty. We still don't have factors of 4, 8, 9, 12, 16, 18, or 20, which are all necessary. We have some of the factors of these numbers, so let's just put in some more prime factors so we know we can construct these higher factors.

Adding another factor of 2 to the list allows us to construct 4 as well as 12 and 20 when multiplied with 3 and 5 respectively. Another 2 allows us to make 8, another 16. That's all of the factors of two that we need. Including another 3 allows us to make 9 and 18 with one of the 2's, so that's all of our factors from 2 to 20. Our final list of factors is:

[2,2,2,2,3,3,5,7,11,13,17,19]

So how can we generate this list of prime factors? Well, we did some work on that idea in problem 3, but we can't really use that approach since we don't have a target number to work from. Well, let's just recap what our steps to generate the list long hand were:

1 - Include all primes below factor run ceiling
2 - Add more copies of small primes to account for larger non-primes

The only question needed to answer is "How many times do I need to include a copy of a small prime?" Well, the largest number that one can generate under a given target (20, in this case) using just the number 2 is 2^4, which is 16. To include another factor of 2 beyond this would be mandating that our answer be divisible by 32, which is unnecessary for our solution and therefore redundant.

A hypothesis: Include X copies of factor n, so that n^X is as large as possible without passing our ceiling.

Let's try it. We'll need a prime number generator, which we already do from problem 3.

solution = 1
ceiling = 20
plist = primes()
plist.populate_until(ceiling)
factors = []
for x in [z for z in plist.primelist if z <= ceiling]:
    power = 1
    while (x ** power) <= ceiling:
        factors.append(x)
        power += 1
for x in factors:
    solution *= x

This algorithm will generate a smallest multiple of consecutive integers up to the variable ceiling. The populate member method of the primes class is one that I added to prepopulate the list of primes up to a target number (the ceiling, in this case). To generate the correct answer for ceiling = 20, this took:

# 2.59876251221e-05 seconds

Much less awkward . . . phew . . .

P.S. For ceiling = 1000, it took:

# 0.0191540718079 seconds

Problem 4 - Largest Palindrome Product

A palindromic number reads the same both ways. The largest palindrome made from the product of two 2-digit numbers is 9009 = 91 × 99. Find the largest palindrome made from the product of two 3-digit numbers.

My first implementation:

top = 0
for x in xrange(100,1000):
    for y in xrange(x,1000):
        blarg = str(y * x)
        if blarg == blarg[::-1]:
            if y * x > top:
                top = y * x
print "Greatest ",top
# 0.334316968918 Typical execution time

There is relatively little that can be done to improve here, as no easy criteria exist to substantially improve what is already a fairly straightforward problem. However, one improvement can be made that improves the search time by a small margin:

top = 0
for x in xrange(999,100,-1):
    for y in xrange(999,x,-1):
        blarg = str(y * x)
        if blarg == blarg[::-1]:
            if y * x > top:
                top = y * x
print "Greatest ",top
# 0.310236930847 Typical execution time

Just by making our search start from the high end of our search range (That is, searching from 999 down to 100, rather than 100 up to 999), we can come across the solution a lot faster (fourth iteration, to be precise). However, this doesn't stop our program from continuing to search after already finding the answer. The only way we can make this execute faster from here is to narrow our search space.

Consider the smallest possible 6-digit (which we know our answer must be, since the smallest 7-digit number is the product of the first two four-digit numbers) palindromic number that is the multiple of two 3-digit numbers, which is 111111 (143 * 777). Since our product must be a palindrome, we can say that the digit in positions 1 and 6 must be the same, as well as the pairs in positions 2 and 5, and 3 and 4, respectively. To put this into polynomial form:

N = 100001x + 10010y + 1100z

For example, if we replace (x, y, z) with (5, 1, 4), we get N = 514415.
To simplify the equation, we can divide out a common factor of the three coefficients, the largest of which is 11:

N = 11(9091x + 910y + 100z)

This tells us something interesting about the space we're searching in: that if we take the prime factors of the 6-digit numbers that are palindromes, that one of the factors of one of the numbers is 11, since 11 is prime. Let's check this with some code:

for x in palis:
    if x % 11 == 0:
        print x,len(str(x)),"Factor of 11!"
    else:
        print x,len(str(x)),"Not factor of 11!"

In this code, the variable palis is a list of all palindromes generated by multiples of three-digit numbers. I'll leave it up to any experimental readers to generate their own copy of said list. Running this code with the proper input demonstrates that for the 6-digit entries in the list, all have a factor of 11, while the 5-digit entries do not.

Now that we know that our answer must have a factor of 11, we can use a condition to switch our search step from 1 to 11 when our first factor is not a multiple of 11.

top = 0
for x in range(999,100,-1):
    if x % 11 == 0:  
        step = -1
        roof = 999
    else:
        step = -11
        roof = 990
    for y in range(roof,x,step):
        prod = x * y
        if prod <= top:
            break
        prodstr = str(prod)
        if prod == prod[::-1]:
            top = prod
print "Greatest ",top
# 0.00324416160583 Typical execution time

Friday, March 11, 2016

Problem 3 - Prime Factorization

The prime factors of 13195 are 5, 7, 13 and 29.
What is the largest prime factor of the number 600851475143 ?

Prime Factorization - Now we're getting into it.
Here we have a problem that not only is the first significant problem in the Project Euler series, but is also a wonderful starting point to explore the application of the difficulty of prime factorization to insure cryptographic security for modern computer systems.

The problem itself is hard because on the surface it seems to need a brute-force approach; What way is there to find the factors of a large integer besides checking all of the integers beneath it for factors? My initial implementation stuck with this approach of dully checking as many prime numbers as possible to see if they are factors:

import math
target = 600851475143L
primes = [1,2,3,5]
for x in range(7,10000):
    half = int(math.ceil(x/2))
    for y in range(2,half+1):
        if float(x) % float(y) == 0:
            break
        if (y == half):
            primes.append(x)
factors = []
for x in primes:
    if target % x == 0:
        factors.append(x)
print factors

While this solution completes in only a few seconds on my laptop, It is nonetheless rather crude. One immediate optimization is possible in the first for loop by making it not iterate over the even numbers beyond two, so that it doesn't test 8, 10, 12 and so on, for being divisible by 2 - A trivial elimination when searching for primes. Let's change that.

for x in range(7,10000,2):
    half = int(math.ceil(x/2))
    for y in range(2,half+1):
        if float(x) % float(y) == 0:
            break
        if (y == half):
            primes.append(x)

This shaves approximately 12% of the run time off on my computer - A small but significant improvement. But let's try another approach. Instead of just stepping up the integers modulus-ing as we go and storing our hits, let's find factors as we go, and divide our goal by that factor, leaving a quotient that can also be factored. To do this, we just need a method that will take a number and return the lowest factor it can find of that number as well as the quotient. Now we have a new number (the quotient) that can be factorized by the same method.

def get_first_factor(innum):
    if innum == 2:
        return 2
    if innum % 2 == 0:
        return 2, innum / 2
    a = 3
    while a < int(math.ceil(innum/2)):
        if innum % a == 0:
            return a, innum / a
        a += 2
    return [innum]

With this, we can then check the returned values for non-primes, and factorize again if needed. I'll use a class to calculate and save prime numbers (more details on this down the road in problem 7), to be able to quickly identify prime factors.

class primes():
    primelist = [2,3,5,7,11,13,17,19,23,29,31,37]
    def is_prime(self,x):
        if x in self.primelist:
            return True
        if x < max(self.primelist) or x == max(self.primelist) + 1:
            return False
        if x % 2 == 0:
            return False
        z = 3
        while z < int(math.ceil(x/2)):
            if x % z == 0:
                return False
            z += 2
        self.primelist.append(x)
        return True

Similarly, if we use a dictionary to store the factor lists, we can later retrieve those lists when working on new ones we're calculating that have the same factors, rather than having to find them again.

class factor_store:
    factors = {} # Dictionary for storage
    plist = primes()
    def factor_list(self,innum):
        if innum in self.factors.keys():
            return self.factors[innum]
        temp = [innum]
        if self.plist.is_prime(innum):
            return temp
        all_primes = False
        x = 0
        while not self.plist.is_prime(temp[x]):
            if x == len(temp) - 1 and self.plist.is_prime(temp[x]):
                all_primes = True
                break
            temp[x:x+1] = get_first_factor(temp[x])
            x += 1
        return temp

This is one of the advantages of using an object-oriented approach for algorithms. By saving our functions' work, we don't have to duplicate it; This is helpful when the algorithms available to us for this problem have high O() magnitudes. This method of finding the same answer completes in almost negligible time compared to my earlier implementations because of the factor and prime storage:

#2.61310791969 - initial factors search
#2.30295395851 - space-narrowed factors search
#0.00332307815552 - prime and factors storage, single factor removal

Tuesday, March 8, 2016

Problem 2 - Every other fib()

Each new term in the Fibonacci sequence is generated by adding the previous two terms. By starting with 1 and 2, the first 10 terms will be:
1, 2, 3, 5, 8, 13, 21, 34, 55, 89, ...
By considering the terms in the Fibonacci sequence whose values do not exceed four million, find the sum of the even-valued terms.

Calculating the nth term of the Fibonacci sequence is a classic means of exploring recursion as a tool in function design, and can be applied here. We can also store Fibonacci results in a list and then use them after generation. We can also use a generator to calculate the nth term. To start, however, I'll present my original, and cringe-worthy implementation of the solution.

last1 = 1
last2 = 1
current = 0
runsum = 0
for x in xrange(1,200):
    if current > 4000000:
        break
    current = last1 + last2
    if current % 2 == 0:
        runsum += current
    last2 = last1
    last1 = current
print runsum

Here, the code loops until the arbitrary limit of 200 (non-inclusive) Fibonacci numbers, generating the next number on each iteration. Every time, it checks to see if the current iteration number is even, if so, it adds the new number to a running total.
Another approach is to use recursion, possibly the least effective method of solving the problem when it comes to scalability, in which a new Fibonacci term is generated with a recursive call, necessitating a new recursion path for each term.

def recur_fib(x):
    if x == 0 or x == 1:
        return 1
    return recur_fib(x - 1) + recur_fib(x - 2)

This method will return the nth Fibonacci number, however, the larger the target term, the longer it will take. One way we can improve on this is to store all of the terms that we generate. We can write the results to a file, or just keep them in a list encapsulated in a class.

class fib_list:
    terms = [1,2]
    def __getitem__(self, n):
        if len(self.terms) < n:
            return self.terms[n]
        while len(self.terms) < n:
            self.add_one_term()
        return self.terms[-1]
    def add_one_term(self):
        self.terms.append(self.terms[-2] + self.terms[-1])
    def add_until_target(self, target):
        while self[-1] < target:
            self.add_one_term()
        self.terms = self.terms[:-1]
temp = fib_list
temp.add_until_target(4000000) 
total = sum([x for x in temp.terms if x % 2 == 0]) # Answer generation

With this class, we can save the Fibonacci numbers in a list and get our answer with a single line of code that executes in trivial time. Generating the Fibonacci sequence remains the most intensive part of the problem. Is there a way to generate only the numbers we need without having to follow the sequence to its target, generating twice as many numbers as necessary?

Thankfully, there is a closed-form solution for generating the nth Fibonacci term known as Binet's formula. With this, we can generate an individual Fibonacci term, and by extension only the ones we need in the sequence (Thanks to Zeychin on StackOverflow for the Python Binet's).

def fib_at_n(n):
    return 0.4472135954999579392818347337462552470881236719223051448541 * (pow(1.6180339887498948482045868343656381177203091798057628621354,n) - pow(-0.618033988749894848204586834365638117720309179805762862135,n))
templist = []
for x in range(0,32,2):
    templist.append(fib_at_n(x))

With this method, our execution time is comparable to the time used in constructing the full Fibonacci sequence above using the list encapsulated in a class. However, this last method is by far the best scaling method. If the target term were much farther along in the sequence, this method would remain the fastest. For demonstration purposes, I'll pit each of these methods against one another in generating the 1000th Fibonacci term.

0.000510931015015 # Original Implementation
0.000808000564575 # List construction
1.90734863281e-06 # Stored list recall
4.05311584473e-06 # Binet's

As you can see, the execution time for the closed solution executes within the same time magnitude as recalling the list from a saved record, making this the best method for generating individual Fibonacci terms, but not the best for generating the whole sequence.

Monday, March 7, 2016

Problem 1 - Fizz Buzz expanded

If we list all the natural numbers below 10 that are multiples of 3 or 5, we get 3, 5, 6 and 9. The sum of these multiples is 23. Find the sum of all the multiples of 3 or 5 below 1000.

This is an expansion on a classic Computer Science 1 problem. The goal of the exercise is to use text output to print all multiples of 3 and 5 (traditionally identified with the words "fizz" and "buzz") between 1 and an arbitrary goal number. The challenge is to use conditions to print the appropriate words without printing the accompanying number multiple times. This identifies whether or not the student can properly implement non-exclusive conditional statements.

Here, however, we're not looking to print corresponding words, but simply to sum the numbers meeting the classic criterion (having factors of 3 or 5) in order to find the "secret" number that is the puzzle solution.

Several approaches are available to us using Python, so I'll start with a procedural addition implementation.

runtotal = 0
for x in xrange(1000):
    if x % 3 == 0 or x % 5 == 0:
        runtotal += x
print runtotal

This rather straightforward implementation runs straight through, and sums the total of all multiples of 3 or 5 as it goes. Let's make a more "Python-esque" (in the sense of compact) method, using set theory and Python list comprehension.

threes = [x for x in range(3,1000,3)] # make list of multiples of three
fives = [x for x in range(5,1000,5)] # make list of multiples of five
print sum(list(set(threes) | set(fives))) # union previous lists, and sum

Here, we can construct lists of all multiples of three and five, respectively, under the goal number. That done, we union the two lists into a new list. Then we simply print the sum of the list. Can we compact this any more?
This time, we'll reduce the construction of the union of the two lists of multiples to a single line and list, instead of creating them separately and then combining. This way, we can reduce all of our solution to a single line:

print sum([x for x in range(3,1000) if x % 3 == 0 or x % 5 == 0])

As we can see, several means of arriving at the same solution exist within Python. The execution times in seconds for these on my laptop are:

0.000421047210693 # procedural addition
0.000199079513550 # sets, separate list comprehensions
0.000399827957153 # sets, single list comprehension

This presents us with a curiosity: how is the implementation that has to iterate to 1000 twice faster than the one that only has to do it once? The answer is that the total loop count of the second is 531, not 1000. This is because of the step parameter that I included in the list comprehensions. These made the loops that added numbers to the lists increment by 3 or 5, rather than every number and check to see if the result is a multiple of 3 or 5. This way, the loops creating theses lists loop fewer times total. We can't use this optimization in the first or last implementations, since doing so would skip some of the numbers that need to be included in the sum.
Edit: I want to thank Alex Gorbatchev for the SyntaxHighlighter javascript plugin, which I'll be using to present formatted Python code.