Friday, March 11, 2016

Problem 3 - Prime Factorization

The prime factors of 13195 are 5, 7, 13 and 29.
What is the largest prime factor of the number 600851475143 ?


Prime Factorization - Now we're getting into it.
Here we have a problem that not only is the first significant problem in the Project Euler series, but is also a wonderful starting point to explore the application of the difficulty of prime factorization to insure cryptographic security for modern computer systems.

The problem itself is hard because on the surface it seems to need a brute-force approach; What way is there to find the factors of a large integer besides checking all of the integers beneath it for factors? My initial implementation stuck with this approach of dully checking as many prime numbers as possible to see if they are factors:
import math
target = 600851475143L
primes = [1,2,3,5]
for x in range(7,10000):
    half = int(math.ceil(x/2))
    for y in range(2,half+1):
        if float(x) % float(y) == 0:
            break
        if (y == half):
            primes.append(x)
factors = []
for x in primes:
    if target % x == 0:
        factors.append(x)
print factors
While this solution completes in only a few seconds on my laptop, It is nonetheless rather crude. One immediate optimization is possible in the first for loop by making it not iterate over the even numbers beyond two, so that it doesn't test 8, 10, 12 and so on, for being divisible by 2 - A trivial elimination when searching for primes. Let's change that.
for x in range(7,10000,2):
    half = int(math.ceil(x/2))
    for y in range(2,half+1):
        if float(x) % float(y) == 0:
            break
        if (y == half):
            primes.append(x)
This shaves approximately 12% of the run time off on my computer - A small but significant improvement. But let's try another approach. Instead of just stepping up the integers modulus-ing as we go and storing our hits, let's find factors as we go, and divide our goal by that factor, leaving a quotient that can also be factored. To do this, we just need a method that will take a number and return the lowest factor it can find of that number as well as the quotient. Now we have a new number (the quotient) that can be factorized by the same method.
def get_first_factor(innum):
    if innum == 2:
        return 2
    if innum % 2 == 0:
        return 2, innum / 2
    a = 3
    while a < int(math.ceil(innum/2)):
        if innum % a == 0:
            return a, innum / a
        a += 2
    return [innum]
With this, we can then check the returned values for non-primes, and factorize again if needed. I'll use a class to calculate and save prime numbers (more details on this down the road in problem 7), to be able to quickly identify prime factors.
class primes():
    primelist = [2,3,5,7,11,13,17,19,23,29,31,37]
    def is_prime(self,x):
        if x in self.primelist:
            return True
        if x < max(self.primelist) or x == max(self.primelist) + 1:
            return False
        if x % 2 == 0:
            return False
        z = 3
        while z < int(math.ceil(x/2)):
            if x % z == 0:
                return False
            z += 2
        self.primelist.append(x)
        return True
Similarly, if we use a dictionary to store the factor lists, we can later retrieve those lists when working on new ones we're calculating that have the same factors, rather than having to find them again.
class factor_store:
    factors = {} # Dictionary for storage
    plist = primes()
    def factor_list(self,innum):
        if innum in self.factors.keys():
            return self.factors[innum]
        temp = [innum]
        if self.plist.is_prime(innum):
            return temp
        all_primes = False
        x = 0
        while not self.plist.is_prime(temp[x]):
            if x == len(temp) - 1 and self.plist.is_prime(temp[x]):
                all_primes = True
                break
            temp[x:x+1] = get_first_factor(temp[x])
            x += 1
        return temp
This is one of the advantages of using an object-oriented approach for algorithms. By saving our functions' work, we don't have to duplicate it; This is helpful when the algorithms available to us for this problem have high O() magnitudes. This method of finding the same answer completes in almost negligible time compared to my earlier implementations because of the factor and prime storage:
#2.61310791969 - initial factors search
#2.30295395851 - space-narrowed factors search
#0.00332307815552 - prime and factors storage, single factor removal

No comments:

Post a Comment