ICP 6 - Murarishetti-Shiva-Kumar/Big-Data-Programming GitHub Wiki
- Keyword matching
- Wildcard matching
- Proximity matching
- Range searches
- Fuzzy logic
Creation/generation of instance & Collection
1.Generate instance for films
"solrctl instancedir --generate /tmp/films"
2.Edit the schema.xml created with the instance generation inside the configuration folder to change the attributes based on the dataset given. The data types remain as usual for like the normal query based system. In this case the overall attribute is taken as a float attribute. Just keep the flag required as true only for the unique key.(reviewerID) in this case.
“gedit /tmp/films/conf/schema.xml”
3.Set the unique Id to the required attribute.
4.Now, let us upload the contents of instance directory to Zookeeper.
"solrctl instancedir --create films /tmp/films"
5.Lets create new collection
"solrctl collection --create films"
6.Open the solr browser in the web browser and select the created collection on the leftside dropdown and go into the documents in the collection
- select the document type to csv. Then copy paste all the data inside the dataset into the documents field and submit the document
Keyword Matching
It returns the data only if the query parameter exactly matches with the field data.
1.Searching a keyword “Michael” in the data set.
2.Searching a phrase in the dataset.
3.Searching at a time for two different values using “and” operation.
4.Performing AND & OR operation.
Wildcard matching
It returns all the data from the field mentioned in the query, if it contains the query parameter.
1.Searching all instances of the keyword Michael in the name attribute
2.Returns all the instances where the reviewText string starts with “Not” and ends with “vibration”.
Inference
Here it can be observed that keyword matching is the subset of the wildcard matching.
Proximity matching
It identifies if the attribute mentioned in the query has the keywords with the proximity factor provided.
1.Returns all records whose summary has good in the range of 4.
Boost
1.It gives importance to the attributes which has a boosting factor. The boosting factor taken in this case is 1.5 but in fact this is a variable factor it can be 2,3,5 etc.
Range searches
It gives the results whose attribute value falls in the range mentioned.
1.Finds all the data whose overall rating is between 3 to 4 includes 3 and 4.
2.Finds all the data whose overall rating is less than or equal to 4
3.Finds all the data whose overall rating is greater than or equal to 4
4.Finds all the data whose overall rating is not equal to 5
5.Returns all the data that contains the overall field
Fuzzy logic
It gives the results if the attribute has an approximate value that is mentioned in the query parameters.
1.It gives all the results with the reviewer name like “daze”
- Execute any 5 queries on the given dataset
1.Generate instance for Books
"solrctl instancedir --generate /tmp/books"
2.Edit the schema.xml created with the instance generation inside the configuration folder to change the attributes based on the dataset given. The data types remain as usual for like the normal query based system. In this case the overall attribute is taken as a float attribute. Just keep the flag required as true only for the unique key.(id) in this case.
“gedit /tmp/books/conf/schema.xml”
- Here the pricebook is taken as float type and bookinStock is taken as Boolean
3.Set the unique Id to the required attribute.
4.Now, let us upload the contents of instance directory to Zookeeper.
"solrctl instancedir --create books /tmp/books"
5.Lets create new collection
"solrctl collection --create books"
6.Open the solr browser in the web browser and select the created collection on the leftside dropdown and go into the documents in the collection
- select the document type to csv. Then copy paste all the data inside the dataset into the documents field and submit the document
Sort
It sorts based on the attribute mentioned in the sort in the order (asc or desc) as mentioned in the sort parameter.
1.Gives the results of all the records with category book and pricebook in the descending order q=> “catbook”:”book” ; sort=> pricebook desc
Filter
It helps in combining the queries.
1.Returns the data with catbook like ”book” and pricebook equals to 7.99
Fuzzy Logic
1.Returns the data that matches the genre_s: fantasy
2.Identifies if there are any books in stock
3.Books with author name george
4.series_t exactly matches "A Song of Ice and Fire"
Boost
1.Books which genre “scifi” or “fantasy” with a boosting factor and the pricebook with overall 7 or 8.
2.Books with price 5.99 and genre fantasy
Range
1.Books with price 3 to 8
The schemas and commands can be found in file: https://github.com/Murarishetti-Shiva-Kumar/Big-Data-Programming/blob/main/ICP%206/Commands.txt