ICP 6 - PallaviArikatla/Big-Data-Programming GitHub Wiki

OBJECTIVE: TO PERFORM QUERIES USING APACHE SOLR

QUESTION 1:

For dataset Music

Steps:

  • Create music1 schema configuration with the following command.

solrctl instancedir --generate /tmp/music1

  • Go through the dataset we work on and change the indexes according to the ID's in the dataset.

Type the following command:

gedit /tmp/music1/conf/schema.xml

with which you'll enter into schema page and edit the page as follows according to the dataset:

  • Have to create instance directory followed by creating collection using following commands:

solrctl instancedir --create music1 /tmp/music1

solrctl collection --create music1

  • The file will be created in the Solr terminal. Select the dataset and upload it in the interface.

  • Now run the queries

  • Keyword matching: In this search it looks for that particular keyword alone.

  • Wildcard matching: In this search it looks for the entire string and displays it.

  • Range: It displays data in the specified range.

  • Boost: It prioritizes the content as per the request considering boosting factor.

QUESTION 2:

For the dataset Books.

  • Follow the same procedure as above while creating second instance.

Use the following commands:

solrctl instancedir --generate /tmp/books

gedit /tmp/books/conf/schema.xml

solrctl instancedir --create books /tmp/books

solrctl collection --create books

  • Edit the schema as per the dataset as follows:

  • Run the following queries in Solr.

  • Keyword matching: In this search it looks for that particular keyword alone.

  • Proximity: It prints the data asked in the query collecting data from dataset within proximity factor provided.

  • Sort: Sorts data using search parameter we give in the query.

  • Fuzzy Logic: Provides data based on approximation.

  • AND logic: Compares and provides two outputs at a time.

VIDEO LINK: http://youtu.be/Ln1pxWMorLI?hd=1