Project Proposal - psastras/vocabtree GitHub Wiki

A System Implementation for Fast Large Scale Image Retrieval

#Summary and Background

We plan to implement a faster alternative for large scale image search based on a Vocabulary tree retrieval scheme. If time permits, we would also like to try comparing / benchmarking hashing based search schemes ie. (LSH) versus our Vocabulary Tree search and BoW index search.

The standard BoW scheme, while effective for small to medium sized databases scales poorly as the number of images per visual word becomes extremely large, since the index look-up must perform a linear scan over all the words contained in the query image. Especially for highly repetitive data, where there can be many similar images this quickly becomes a problem. One solution is to increase the number of visual words (for example from 100k to 1 million), but this also scales poorly since now each bag of word feature is ten times as big (not to mention the extra training time).

Large scale image retrieval is a difficult, but important problem. There are two main issues:

Scaling Images are a very data rich form of information, and it can be difficult to scale algorithms that can process the amount of data contained in millions of images. Often it is difficult to store all the needed data in memory - choosing what information to cache versus what to read off of disk is important for speed.
Semantic Meaning The distance function and search approximations for image retrieval affects the types of semantically similar information a query image and it's results might share. For example, one might want to search for images based on color, edges, etc. Defining what a good search result is is not always clear.

Challenge

We hope to run this search on a very large data base of images. The reference paper went as high as a million images and we plan to go higher. The paper found that an increase in database size lead to a decrease in scores because there were more images to confuse. This is unfortunate as the point of the vocabulary tree is to increase performance for larger data sets without a cost to results. This could be an issue depending on the severity of this problem.

Part of the loose idea of the vocabulary tree is searching down a tree structure, which is not always a trivial thing to parallelize. A clearer idea of how to use parallelism requires a further look into the algorithm itself.

Side question

The paper mentioned that database creation took 2.5 days but that was mainly feature extraction which wouldn't have to be recomputed every time. Since I assume you're going to continue adding data to your data set and will want to rebuild the database, could this be an issue?

Resources

We have code and data necessary for baseline bag of words retrieval, but for implementing faster search methods such as Vocabulary Trees we will be writing our own scalable code. We will be referencing related work on Vocabulary trees, in particular, Scalable Recognition with a Vocabulary Tree (Nister 2006). Noah Snavely also has an existing implementation of Vocabulary Trees, but his implementation is not optimized (especially for multiple threads / nodes). For compute, access to clusters / powerful nodes with a large amount of storage could be useful.

Goals/Deliverables

We plan to implement a working Vocabulary tree retrieval scheme. The speed should scale well with the number of images and the quality of the images retrieved should be comparable to slower methods i.e. matching the same pictures and not missing correct matches. We would like to directly compare run times and match results to those of other searching methods. There are many different parameters to run the vocabulary tree with and we should also find the ideal parameters for our data set.

We hope to further add to the functionality of the vocabulary tree by considering a series of requests instead of a single image comparison. The idea here is to match frames from a video against the database. Based on the match of the first frame we may be able to improve the search time of future frames.

Evaluation

To evaluate our implementation, we plan to measure speed (time it takes for a single query), and accuracy (hit rate, or relevance of results) while varying the parameters of the vocabulary tree. We plan to compare against the baseline algorithm Bag of Words (BoW) + inverted index (and maybe some other variants if time permits). To compare speed we will measure both time it takes to create the tree and time it takes to query an image while varying the number of images + nodes. To compare accuracy, we plan to use two different methods. In the first method we will compute the "hit rate" of our algorithm (relevance of returned results) against a dataset(s) of hand labeled images and relevant queries (ex. INRIA Holidays dataset). In the second method we will compute how many of the matches returned by our algorithm geometrically verify (ie. with a homography) and compare against the baseline(s).

Draft Schedule

Week 1 - Work on vocab tree implementation (single node, single core) + get data ready

Week 2 - Work on vocab tree implementation (single node, single core)

Week 3 - Test initial vocab tree on subset of data + debug, start changing for single node, multicore.

Week 4 - Test single node + multicore, start working on multinode + multicore.

Week 5 - Test multinode + multicore. If time permits and we actually have a full implementation, work on optimizing + implementing other speedup schemes (ex. feature compression)

Week 6 - See week 5 (overrun week in case things take longer than they are planned).

Week 7 - Add benchmarking tools / final code changes + misc things that should have been done earlier. Run tests on larger data sets. Make some vis tools (ex: what does the tree look like?)

Week 8 - Final benchmarking and runs + report (benchmarking could take several days given data size).