Checkpoint 2 - psastras/vocabtree GitHub Wiki
Current Progress
We have a working vocabulary tree running on one node multiple cores. On data sets of 500 and 5000 it will return similar images (just from visual inspection). We can build a valid vocabulary tree and save/load it. We also have a multinode bag-of-words and benchmarking code so we can compare out vocabulary tree with it and (hopefully) see speedup for larger data sets.
Next
Our next immediate step is to get a multinode vocab tree working and prepare the larger the datasets (this mainly involves finding a computer to put everything on and run all precomputation). This will allow us to test with more data and get a better comparison against a bag-of-words as well as see where the performance is lacking. After that we will look into methods of improving the search speed over multiple similar queries, possibly using some sort of weighting strategy on the tree.
Current Benchmark Results
Initial benchmarks were run on the smaller oxford dataset (of ~500 images) for both vocabulary tree and the inverted index. While the inverted index code has been fairly optimized, the vocabulary tree still has some optimization work that needs to be done (in particular, parallel code was not enabled for bow clustering and vocabulary tree search was not enabled at the time of the benchmark, although code for both exists - this is reflected in the tables below). The benchmark was run on a single machine, with multithreading enabled (#threads = 8, Intel(R) Core(TM) i7-3770K CPU @ 3.50GHz).
Timings for the flat inverted index and vocabulary tree are in the table below. Timing was measured for both training and testing. For both BoW and the vocabulary tree, #images to train on remained constant at 128. The time to search is the cumulative time of 128 different image queries (so the time of a single query is the reported time / 128).
The training time for the flat index involves clustering descriptors (bow_train), computing features by clustering sift descriptors to the clusters (bow_features), and computing the inverted index (index_train). The training time for the vocabulary tree consists of just one step, which recursively clusters the features and stores the inverted files at the leaf nodes (tree_train).
Flat Inverted Index
clusters | operation | time (seconds) | threads used |
---|---|---|---|
k=256 | bow_train | 49.8829 | 1 |
bow_features | 2.59891 | 8 | |
index_train | 0.00511888 | 1 | |
index_search | 0.289858 | 8 | |
k=3125 | bow_train | 545.952 | 1 |
bow_features | 3.73074 | 8 | |
index_train | 0.0140396 | 1 | |
index_search | 0.370966 | 8 |
Vocabulary Tree
depth, splits | operation | time (seconds) | threads used |
---|---|---|---|
4,4 | tree_train | 2.94587 | 8 |
tree_search | 0.840413 | 1 | |
5,5 | tree_train | 4.39264 | 8 |
tree_search | 1.57413 | 1 |
Accuracy was measured as #of images in the top 10 results returned for a given image query which geometrically verified (using a fundamental matrix) and averaged over all 128 queries. The current distance metric used at the time of the benchmark was minimum of histogram intersection for BoW + Inverted Index and an L2 norm for the vocabulary tree.
search | parameter | accuracy |
---|---|---|
inverted index | k=256 | 0.203125 |
inverted index | k=3125 | 0.190625 |
vocabulary tree | depth=4,splits=4 | 0.07109 |
vocabulary tree | depth=5,splits=5 | 0.05625 |
Based on these results, there is a large training time advantage for the vocabulary tree at the expense of accuracy for the time being. While the accuracy is not quite up to par with the BoW + Inverted Index, there is definitely some parameter tuning work that needs to be done (the BoW + Inverted Index parameters have already been tuned quite well, but the vocabulary tree has not). It is also probably advantageous to invest more time into vocabulary tree training, since its relative training time is so short (maybe add more images, increase split / depth, etc.).
Sample Search Retrievals
Each row represents a query (the left most image), and its top matches from left to right. An image has a green border around it if it was successfully geometrically verified, otherwise if it failed, the border is red.