1 Premium Part - YMonnier/AdvAlgoLP GitHub Wiki

Premium Part - "Jewish-style carp"

Devise a version of Rabin-Karp algorithm which will check if for a given picture M by N pixels large, its top-right K by K corner is duplicated somewhere in the picture.

Your algorithm should replace slow modulo prime operations with faster bitwise mask &'s (as described in the class). Do make sure that the RT is at most linear in the number of pixels in the text (in the same sense as in the case of classical RK).

Maximum space allowed to use except for the space to hold the picture and the set structure is O(M+N).

References:

Teacher!

Execution

To execute the algorithm:

git clone [email protected]:YMonnier/AdvAlgoLP.git
cd AdvAlgoLP/Assignment1/Premium
change image path into the main.py
python main.py

Idea

Grid1 Grid2

Rabin-Karp algorithm uses a hash function to speed up the search.

Instead of using a string, we will use a two-dimensional array which contains pixel: tuple(Red,Green,Blue,Alpha). This array represent the image given in input. The module Image from Python Imaging Library allows to parse/manipulate an image.

We can access the picture with an object behaves like a 2-dimensional array:

pix = im.load() #parse image
print pix[x, y] #get value
pix[x, y] = value # set value

An example of pattern:

Grid3

To follow the Rabin-Karp idea, we will apply a double hash from the image. Each column will be hash, then hash column's sum:

Implementation

####Python


def rabinKarp(image, K):
	imgSize = image.size
	width = imgSize[0]
	height =  imgSize[1]
	cornerX = (width - K)

	#Check given corner
	if K > imgSize[0] or K > imgSize[1]:
		print "Error: given corner is too high"
		return

	# Get 2-dimensional array of image's pixel
	# imgPix[x,y] -> pixel
	imgPix = image.load()

	# Preprocessing	
	# top-rigth corner Hash with size K
	hash_k = 0

	#occurence's index found
	shifts = []

	spiriousHits = 0

	for x in range(K): # Hash corner
		hash_k += hash(imgPix, cornerX + x, 0, K, K - x)

	for y in range(height - K + 1): # for each row
		hash_p = 0 # Current hash (picture hash)
		for x in range(K): # Hash current image 0 to K size
			hash_p += hash(imgPix, x, y, K, K - x)

		for x in range(width - K + 1): # for each column
			if hash_p == hash_k: # Same hash code
				if bruteForce(imgPix, x, y, K, cornerX) == K**2: # apply brute-force
					shifts.append((x, y))
				else:
					spiriousHits += 1
			if x < (width - K) and y < (height - K):
				hash_p = hash_p - hash(imgPix, x, y, K, K) # remove first column
				hash_p = hash_p + hash(imgPix, x + K, y, K, 1) # add next column

	print len(shifts)
	print shifts
	print spiriousHits