Project Euler in Python

Monday, April 4, 2016

Problem 9 - Special Pythagorean Triplets

A Pythagorean triplet is a set of three natural numbers, a < b < c, for which
a^2 + b^2 = c^2

For example, 3^2 + 4^2 = 9 + 16 = 25 = 5^2.
There exists exactly one Pythagorean triplet for which a + b + c = 1000. Find the product a*b*c.

This is a fairly straightforward problem, for which we can use optimizations that we've previously used: searching high-to-low; and limiting our search space.

answer = 0
for c in range(1000,3,-1):
    for b in range(c-1,2,-1):
        a = 1000 - (c + b)
        if c**2 == (a**2 + b**2) and c:
            answer = a*b*c

# 0.368048191071 seconds

From this implementation, we can improve a fair bit. It already searches high-to-low, but we can limit the size of the number space that it searches with a little math. Since our initial search space is across the value of c (the hypotenuse in our Pythagorean triple), we can reduce its size from 1000 to 500, since the smallest factor that can be part of a Pythagorean triple is 2 (Maths here)

answer = 0
for c in range(500,2,-1):
    if answer != 0:
        break
    for b in range(c-1,2,-1):
        a = 1000 - (c + b)
        if c**2 == (a**2 + b**2):
            answer = a*b*c
            break

Here, we also add break conditions for when the answer is found, so that we don't keep searching once our answer is found, dropping our time to:

# 0.0235221385956 seconds

For this method, this is about as optimal as possible without diving deeper into the mathematical properties of Pythagorean triangles.

Monday, March 28, 2016

Problem 8 - Largest Digit Sequence Product

The four adjacent digits in the provided 1000-digit number that have the greatest product are 9 × 9 × 8 × 9 = 5832.
Find the thirteen adjacent digits in the 1000-digit number that have the greatest product. What is the value of this product?

Note: my implementation was for the original version of the problem, which asked for the greatest product of a five-digit sequence. Thankfully, almost nothing had to change in the code to adjust for the new 13-term product.

Once again, my initial implementation leaves little to improve upon in terms of efficiency.

ind = 0
greatest = 0
while ind < len(d) - 12:
    sliced = d[ind:ind+13]
    temp = 1
    for x in sliced:
        temp *= x
    if temp > greatest:
        greatest = temp
    ind += 1

# 0.00393509864807 Seconds

One change we can make immediately is to check for products that we know would result in zero. Since we are calculating products (multiplication), we know that if a zero is in the factors, that our result will be zero. Thus, we can just skip digit sequences that contain one or more zeroes.

ind = 0
greatest = 0
while ind < len(d) - 12:
    sliced = d[ind:ind+13]
    if 0 in sliced:
        ind += 1
        continue
    temp = 1
    for x in sliced:
        temp *= x
    if temp > greatest:
        greatest = temp
    ind += 1

# 0.00193285942078 seconds

This seems to reduce our computing time by a factor of two. With a larger data set, we could verify this, but that's not what we want. This is Python; We want a one-liner. List comprehensions ahoy!

First, we need to generate the slices of the input digits d:

slices = [d[x:x+13] for x in range(len(d) - 12)]

But how do we calculate the product of each slice easily? There's no product(iterable) function in the default libraries, but we can create one easily if we take advantage of the associative property of multiplication by just multiplying the first number into the second, and the result into the third, etc. until we are down to one. For this, we can use the core library's reduce() function, which performs a passed function on each pair of a list until there is only one. We only need to pass it a function that multiplies the two parameters, and we can use a simple lambda function for that:

reduce(lambda x,y:x*y, slice)

These will give us a list of products when combined thus:

products = [reduce(lambda x,y:x*y, d[x:x+13]) for x in range(len(d) - 12)]

And, since Python lists conveniently have a maximum function, we just need to apply it.

answer = max([reduce(lambda x,y:x*y, d[x:x+13]) for x in range(len(d) - 12)])

# 0.00341510772705 seconds

This is on the same time order as my original implementation (that didn't skip zero products), and we can fix that with a quick condition in our comprehension:

answer = max([reduce(lambda x,y:x*y, d[x:x+13]) for x in range(len(d) - 12) if 0 not in d[x:x+13]])

# 0.0014820098877 seconds

Thursday, March 24, 2016

Problem 7 - Nth Prime

By listing the first six prime numbers: 2, 3, 5, 7, 11, and 13, we can see that the 6th prime is 13.
What is the 10001st prime number?

Considering the run time of my original implementation:

# 23.6239449978 seconds

There is a lot to improve on in this:

import math
primes = [2,3,5,7]
itera = 8
while len(primes) < 10001:
    primelength = len(primes)
    for x in xrange(primelength):
        if itera % primes[x] == 0:
            break
        elif primes[x] > int(math.ceil(itera/2)):
            primes.append(itera)
            break
    itera += 1
print primes[-1]

Well, one thing to notice immediately is the iterator that only increments by one. Let's cut it in half.

primes = [2,3,5]
itera = 7
while len(primes) < 10001:
    primelength = len(primes)
    for x in xrange(primelength):
        if itera % primes[x] == 0:
            break
        elif primes[x] > int(math.ceil(itera/2)):
            primes.append(itera)
            break
    itera += 2
print primes[-1]

So, how much did that help?

# 23.306112051 seconds

Hmph. That doesn't help much at all.
Well, let's try removing our need to keep a massive list of integers and constantly read and write to it. That should reduce our average iteration time.

lastprime = 0
check = 3
pcount = 2
while pcount <= 10001:
    primeflag = True
        for x in range(3, int(math.ceil(math.floor(check))), 2):
            if check % x == 0:
                primeflag = False
                break
    if primeflag:
        lastprime = check
        pcount += 1
    check += 2
print lastprime

Our running time is:

# 94.41918993

Hmm . . .

Well, we have a relatively inefficient method of checking for primes with a flag. Also, we're checking all of the odd numbers, rather than deliberately stepping over some. Let's implement a more modular means of checking for primes. We can eliminate the two easiest prime factors straight away, as well as the entire category of non-primes below 2.

if x < 2:
    return False
if x == 2:
    return True
if x % 2 == 0:
    return False
if x % 3 == 0:
    return False

With these out of the way, we can put an upper limit on the factors that we check for a specific prime. Since we've already eliminated 2 and 3 as factors, we can say that the largest possible factor of any checked number will be equal to the floor of the square root of that number; One half and one third have already been checked, and any root factors will be larger than these two.

root = math.floor(math.sqrt(x))
factor = 5 # Next prime

Now, we just iterate on a counter, returning false if we find a factor.

while factor <= root:
if x % factor == 0:
    return False
factor += 2

Adding a default return value of True, we finally have a complete function:

def is_prime(x):
    if x < 2:
        return False
    if x == 2:
        return True
    if x % 2 == 0:
        return False
    if x % 3 == 0:
        return False
    root = math.floor(math.sqrt(x))
    factor = 5
    while factor <= root:
        if x % factor == 0:
            return False
        factor += 2
    return True

From here, we need only iterate over the integers, checking for primes. We can check from the last prime so we don't have to start from the bottom of the integers.

last = 5
diff = 2
primecount = 3
while primecount <= 10000:
 x = last + diff
 if is_prime(x):
  last = x
  diff = 2
  primecount += 1
  continue
 diff += 2
print last

Much better:

# 0.235863924026 seconds

Wednesday, March 16, 2016

Problem 6 - Sum of squares and square of sum

The sum of the squares of the first ten natural numbers is
12 + 22 + ... + 102 = 385
The square of the sum of the first ten natural numbers is
(1 + 2 + ... + 10)2 = 552 = 3025
Hence the difference between the sum of the squares of the first ten natural numbers and the square of the sum is 3025 − 385 = 2640. Find the difference between the sum of the squares of the first one hundred natural numbers and the square of the sum.

It is a rare programming problem that is so easy to provide a solution for, but difficult to take the extra step to optimize fully.

runsum = 0
runsquaresum = 0
for x in range(1,101):
    runsum += x
    runsquaresum += x**2
answer = runsum**2 - runsquaresum
# 6.103515625e-05 Run time in seconds

In fact, this solution is so straightforward to implement that it is possible to reduce it to a single line of code:

answer = sum(range(1,101)) ** 2 - sum([x ** 2 for x in range(1,101)])
# 2.90870666504e-05 Run time in seconds

Both of these implementations run on the same big-O magnitude of n, or linear, time. Thus, as our target number grows, our processing time grows in direct proportion. To clarify, if we make our target number 1000 instead of 100, we increase our processing time by a factor of 10. Take our target to 10000 and we again multiply our computation time by 10.

This is one approach to the problem that is a direct generation of the solution. However, suppose we use a different, slightly more search-y approach. Allow me to introduce you to Pascal's triangle:

Here, we have a mathematical construct with some very interesting properties, the most relevant of which to our problem is its containment of several summation series.

With these series, it becomes similarly complex to our previous solution to "look up" our answer in this structure. But, how do we find them?

The first part of our answer is in identifying what we need and finding a corresponding pattern in the pyramid. Since one half of our answer is the square of the sum of a number of consecutive integers, we need look no farther than the third row of the pyramid. The first row being the series of one along the pyramid, the second the summation of the ones in the previous row to that point. For ease of illustration, here's the same structure arranged in a horizontal-vertical orientation.

From here, the summation of previous rows by succeeding ones is slightly more plain. The reason we need to look at the third row, however, is that by looking here we can immediately find the sum of the second row to any given iteration, which is what we need to eventually square to produce the first part (the square of the sum) of our solution, and we need only look one row down from the second row to the corresponding entry to find the sum of the series to that point. For example:

1 + 2 + 3 + 4 + 5 = 15, 15^2 = 225

The second half of our solution is slightly more difficult to find, but can nonetheless be observed in the triangle. If we want to find the sum of squares to a certain point, we need to look at rows 2 and 4.

For example, to find the sum of squares to 5 (1^2 + 2^2 + 3^2 + 4^2 + 5^2), we look at row two to see how far we count over (5), then drop two rows to row 4 (landing at 35). Here, we simply take the number we landed on and add the previous entry on the same row (20), producing 55.

(1^2 + 2^2 + 3^2 + 4^2 + 5^2) = 1 + 4 + 9 + 16 + 25 = 55

This rule repeats throughout the pyramid, and will hold for when our target is 100 or 10000000, so we can say that:

(1 + 2 + 3 + 4 + 5)^2 - (1^2 + 2^2 + 3^2 + 4^2 + 5^2) = 225 - 55 = 170

With this method, our computation time takes a similar O(n) time magnitude, but with an additional time of magnitude n tacked on to generate the list we need to search, thus making the time magnitude remain O(n), but with a greater coefficient attached.

rows = []
rows.append([1 for x in range(100)])
x = 0
while x < 4:
    rows.append([sum(rows[-1][:a + 1]) for a in range(100)])
    x += 1
answer = (rows[2][-1] ** 2) - sum(rows[3][-2:])

While this solution works, when scaling our target up by a factor of 10, our solution time scales by a factor of 100, and this is attributable to the sum calculations that we have to do over and over to generate the sum lists. Thankfully, we can optimize this a bit. We don't need to generate the first row of all ones, since we don't reference it in the future. We can eliminate generation of the fourth row (third sum), since all we need from that row can be generated by summation of certain parts of the third row. Thus, we can optimize our code to once again have a O(n) time scale:

sum_one = [x for x in range(1,101)]
last = 0
sum_two = []
for x in range(100):
 sum_two.append(sum_one[x] + last)
 last = sum_two[-1]
square_of_sum = sum_two[-1] ** 2
sum_of_squares = (sum(sum_two[:-1]) * 2) + sum_two[-1]
answer = square_of_sum - sum_of_squares

After all this, can we improve our solution any more? Yes we can.

There is, thankfully, a deterministic, O(1) (constant time) formula for the sum of a series of integers, which we can use in our algorithm.

1 + 2 + ... + (n-1) + n = (n+1) + (n+1) + ... + (n+1) + (n+1) = (n+1) * n/2 = (n(n+1))/2
1 + 2 + ... + 99 + 100 = 101 + 101 + ... + 101 + 101 = 101 * 50 = 5050

Once we use this formula to find the sum of our series to number n, we square the result to get the first half of our answer. Is there a similar formula for the sum of squares? Yep.

(n(n+1)(2n+1))/6

This, in combination with the first, allows us to calculate the difference incredibly quickly with the complete formula:

((n(n+1))/2)^2 - (n(n+1)(2n+1))/6

Or:

square_of_sum = ((n * (n + 1)) / 2) ** 2
sum_of_squares = (((2 * n) + 1) * (n + 1) * n) / 6
answer = square_of_sum - sum_of_squares

Or, slightly more compact and fast:

answer = (((n * (n + 1)) / 2) ** 2) - (n * (n + 1) * ((2 * n) + 1)) / 6

When n equals 10000000, my original brute force implementation takes over 4 seconds, while this last one takes

# 7.86781311035e-06 seconds

Sunday, March 13, 2016

Problem 5 - Smallest Multiple

2520 is the smallest number that can be divided by each of the numbers from 1 to 10 without any remainder. What is the smallest positive number that is evenly divisible by all of the numbers from 1 to 20?

Well, this is awkward . . .

solution = 0
for x in range(180,1000000000,10):
    if solution != 0:
        break
    for y in range(2,21):
        if x % y != 0:
            break
        if x % y == 0:
            if y == 20:
                solution = x
            continue
print solution
# 40.1628811359 Execution time

In defense of my former self, there is one optimization in here that is helpful. Notice how the step in the for loop is 10. As our target number must have 10 as a factor, we only need search multiples of 10. However, this thinking doesn't immediately take the next logical step and double the step to 20, as this is the largest factor of our target. Amendment:

solution = 0
for x in range(180,1000000000,20):
    for y in range(2,20):
        if x % y != 0:
            break
        if x % y == 0:
            if y == 19:
                solution = x
            continue
    if solution != 0:
        break
print solution
# 22.1047170162 Execution time

And this halves our running time; Lovely. Now to actually address the big O.

To actually have a decent time scale, we need to do better. Stepping back from 20 and checking to see what factors we have to check at a minimum seems like a good starting point. First conclusion: we definitely check the larger numbers that have smaller numbers as a factor, thus checking several numbers at once by proxy. This reduces our search space from [2 ... 20] to [11, 13, 14, 16, 17, 18, 19] if we step by twenty. Let's try this

solution = 0
for x in range(2520,1000000000,20):
    for y in []:
        if x % y != 0:
            break
        if x % y == 0:
            if y == 19:
                solution = x
            continue
    if solution != 0:
        break
print solution
# 8.65195894241 Execution time

That's better, let's keep going. Realizing that a step of 20 doesn't have a factor of 3 in it, we can up our step to 60 to account. However, 60 doesn't have a factor of 7 in it, so we set it to 420.

solution = 0
for x in range(2520,1000000000,420):
    for y in [11, 13, 14, 16, 17, 18, 19]:
        if x % y != 0:
            break
        if x % y == 0:
            if y == 19:
                solution = x
            continue
    if solution != 0:
        break
print solution
# 0.421896934509 Execution time

Now that's respectable for a search approach. Now, let's attack the problem from the other direction. Rather than find our answer among thousands of possibilities, let's just make our answer using some maths.

Since we know that our answer is a product of all of the numbers from 2 to 20 (1 eliminated as tautoligical), we immediately know that we need to multiply all of the primes in here together immediately. What are these?

[2,3,5,7,11,13,17,19]

What's the product so far?

And well below our answer. Why? Because these are just the primes under twenty. We still don't have factors of 4, 8, 9, 12, 16, 18, or 20, which are all necessary. We have some of the factors of these numbers, so let's just put in some more prime factors so we know we can construct these higher factors.

Adding another factor of 2 to the list allows us to construct 4 as well as 12 and 20 when multiplied with 3 and 5 respectively. Another 2 allows us to make 8, another 16. That's all of the factors of two that we need. Including another 3 allows us to make 9 and 18 with one of the 2's, so that's all of our factors from 2 to 20. Our final list of factors is:

[2,2,2,2,3,3,5,7,11,13,17,19]

So how can we generate this list of prime factors? Well, we did some work on that idea in problem 3, but we can't really use that approach since we don't have a target number to work from. Well, let's just recap what our steps to generate the list long hand were:

1 - Include all primes below factor run ceiling
2 - Add more copies of small primes to account for larger non-primes

The only question needed to answer is "How many times do I need to include a copy of a small prime?" Well, the largest number that one can generate under a given target (20, in this case) using just the number 2 is 2^4, which is 16. To include another factor of 2 beyond this would be mandating that our answer be divisible by 32, which is unnecessary for our solution and therefore redundant.

A hypothesis: Include X copies of factor n, so that n^X is as large as possible without passing our ceiling.

Let's try it. We'll need a prime number generator, which we already do from problem 3.

solution = 1
ceiling = 20
plist = primes()
plist.populate_until(ceiling)
factors = []
for x in [z for z in plist.primelist if z <= ceiling]:
    power = 1
    while (x ** power) <= ceiling:
        factors.append(x)
        power += 1
for x in factors:
    solution *= x

This algorithm will generate a smallest multiple of consecutive integers up to the variable ceiling. The populate member method of the primes class is one that I added to prepopulate the list of primes up to a target number (the ceiling, in this case). To generate the correct answer for ceiling = 20, this took:

# 2.59876251221e-05 seconds

Much less awkward . . . phew . . .

P.S. For ceiling = 1000, it took:

# 0.0191540718079 seconds