ICP 6 - navyagonug/CS5590-BIG-DATA-PROGRAMMING-USING-HADOOP-AND-SPARK GitHub Wiki
PROBLEM STATEMENT
Execute 5 queries each on the following two data sets. Bonus question includes performing Fuzzy search,proximity search and partial match.
FEATURES
Solr is used for full-text search and real-time indexing. Cores and collections are created, followed by running queries on the given datasets. These datasets are in CSV format and are loaded into Solr by performing modifications to schema.xml file.
CONFIGURATIONS
The following is the schema.xml file(for Films dataset) for which modifications has been performed. Fiels has been added for directed_by and the rest of them as follows
QUESTION(QUERIES ON BOOKS DATASET)
-
List all the rows whose genre is 'fantasy'
-
List all the rows whose prices ranges from 5.99 to 7.99 and the books written by author "Steven Brust"
3.List the rows with author name that includes Alexander
4.Proximity search query with a distance value given as 2.
5.List the rows with author George R.R.Martin books which are fantasy but not scifi genre
QUERIES ON FILM DATASET
-
List all the rows with Romance genre
-
List all the rows(films) that were released between 2004-12-11 to 2005-12-11
-
List all the movies directed by Lennon
9.List all the movies that were released on one particular date(2002-02-13)
10.A promixity search with value 1 on directed_by field is applied.
BONUS
-
For proximity
-
For fuzzy search, 0.7(70% of matching) is matched in the query below