1 Premium Part - YMonnier/AdvAlgoLP GitHub Wiki
Premium Part - "Jewish-style carp"
Devise a version of Rabin-Karp algorithm which will check if for a given picture M by N pixels large, its top-right K by K corner is duplicated somewhere in the picture.
Your algorithm should replace slow modulo prime operations with faster bitwise mask &'s (as described in the class). Do make sure that the RT is at most linear in the number of pixels in the text (in the same sense as in the case of classical RK).
Maximum space allowed to use except for the space to hold the picture and the set structure is O(M+N).
References:
- Teacher!
Summary
Execution
To execute the algorithm:
git clone [email protected]:YMonnier/AdvAlgoLP.git
cd AdvAlgoLP/Assignment1/Premium
- change image path into the
main.py
python main.py
Idea
Rabin-Karp algorithm uses a hash function to speed up the search.
Instead of using a string, we will use a two-dimensional array which contains pixel: tuple(Red,Green,Blue,Alpha). This array represent the image given in input. The module Image from Python Imaging Library allows to parse/manipulate an image.
We can access the picture with an object behaves like a 2-dimensional array:
pix = im.load() #parse image
print pix[x, y] #get value
pix[x, y] = value # set value
An example of pattern:
To follow the Rabin-Karp idea, we will apply a double hash from the image. Each column will be hash, then hash column's sum:
Implementation
####Python
def rabinKarp(image, K):
imgSize = image.size
width = imgSize[0]
height = imgSize[1]
cornerX = (width - K)
#Check given corner
if K > imgSize[0] or K > imgSize[1]:
print "Error: given corner is too high"
return
# Get 2-dimensional array of image's pixel
# imgPix[x,y] -> pixel
imgPix = image.load()
# Preprocessing
# top-rigth corner Hash with size K
hash_k = 0
#occurence's index found
shifts = []
spiriousHits = 0
for x in range(K): # Hash corner
hash_k += hash(imgPix, cornerX + x, 0, K, K - x)
for y in range(height - K + 1): # for each row
hash_p = 0 # Current hash (picture hash)
for x in range(K): # Hash current image 0 to K size
hash_p += hash(imgPix, x, y, K, K - x)
for x in range(width - K + 1): # for each column
if hash_p == hash_k: # Same hash code
if bruteForce(imgPix, x, y, K, cornerX) == K**2: # apply brute-force
shifts.append((x, y))
else:
spiriousHits += 1
if x < (width - K) and y < (height - K):
hash_p = hash_p - hash(imgPix, x, y, K, K) # remove first column
hash_p = hash_p + hash(imgPix, x + K, y, K, 1) # add next column
print len(shifts)
print shifts
print spiriousHits