Lesson Plan6: Parallel Indexing: Solr and Lucene

1. Film Dataset

Keyword matching
Wildcard matching
Proximity matching
Range searches
Fuzzy logic

Creation/generation of instance & Collection

1.Generate instance for films

"solrctl instancedir --generate /tmp/films"

2.Edit the schema.xml created with the instance generation inside the configuration folder to change the attributes based on the dataset given. The data types remain as usual for like the normal query based system. In this case the overall attribute is taken as a float attribute. Just keep the flag required as true only for the unique key.(reviewerID) in this case.

“gedit /tmp/films/conf/schema.xml”

3.Set the unique Id to the required attribute.

4.Now, let us upload the contents of instance directory to Zookeeper.

"solrctl instancedir --create films /tmp/films"

5.Lets create new collection

"solrctl collection --create films"

6.Open the solr browser in the web browser and select the created collection on the leftside dropdown and go into the documents in the collection

select the document type to csv. Then copy paste all the data inside the dataset into the documents field and submit the document

Keyword Matching

It returns the data only if the query parameter exactly matches with the field data.

1.Searching a keyword “Michael” in the data set.

2.Searching a phrase in the dataset.

3.Searching at a time for two different values using “and” operation.

4.Performing AND & OR operation.

Wildcard matching

It returns all the data from the field mentioned in the query, if it contains the query parameter.

1.Searching all instances of the keyword Michael in the name attribute

2.Returns all the instances where the reviewText string starts with “Not” and ends with “vibration”.

Inference

Here it can be observed that keyword matching is the subset of the wildcard matching.

Proximity matching

It identifies if the attribute mentioned in the query has the keywords with the proximity factor provided.

1.Returns all records whose summary has good in the range of 4.

Boost

1.It gives importance to the attributes which has a boosting factor. The boosting factor taken in this case is 1.5 but in fact this is a variable factor it can be 2,3,5 etc.