ICP 6 - PallaviArikatla/Big-Data-Programming GitHub Wiki
OBJECTIVE: TO PERFORM QUERIES USING APACHE SOLR
QUESTION 1:
For dataset Music
Steps:
- Create music1 schema configuration with the following command.
solrctl instancedir --generate /tmp/music1
- Go through the dataset we work on and change the indexes according to the ID's in the dataset.
Type the following command:
gedit /tmp/music1/conf/schema.xml
with which you'll enter into schema page and edit the page as follows according to the dataset:
- Have to create instance directory followed by creating collection using following commands:
solrctl instancedir --create music1 /tmp/music1
solrctl collection --create music1
-
The file will be created in the Solr terminal. Select the dataset and upload it in the interface.
-
Now run the queries
-
Keyword matching: In this search it looks for that particular keyword alone.
- Wildcard matching: In this search it looks for the entire string and displays it.
- Range: It displays data in the specified range.
- Boost: It prioritizes the content as per the request considering boosting factor.
QUESTION 2:
For the dataset Books.
- Follow the same procedure as above while creating second instance.
Use the following commands:
solrctl instancedir --generate /tmp/books
gedit /tmp/books/conf/schema.xml
solrctl instancedir --create books /tmp/books
solrctl collection --create books
- Edit the schema as per the dataset as follows:
-
Run the following queries in Solr.
-
Keyword matching: In this search it looks for that particular keyword alone.
- Proximity: It prints the data asked in the query collecting data from dataset within proximity factor provided.
- Sort: Sorts data using search parameter we give in the query.
- Fuzzy Logic: Provides data based on approximation.
- AND logic: Compares and provides two outputs at a time.
VIDEO LINK: http://youtu.be/Ln1pxWMorLI?hd=1