ICP 6 - navyagonug/CS5590-BIG-DATA-PROGRAMMING-USING-HADOOP-AND-SPARK GitHub Wiki

PROBLEM STATEMENT

Execute 5 queries each on the following two data sets. Bonus question includes performing Fuzzy search,proximity search and partial match.

FEATURES

Solr is used for full-text search and real-time indexing. Cores and collections are created, followed by running queries on the given datasets. These datasets are in CSV format and are loaded into Solr by performing modifications to schema.xml file.

CONFIGURATIONS

The following is the schema.xml file(for Films dataset) for which modifications has been performed. Fiels has been added for directed_by and the rest of them as follows