Solr&Lucene - praveenpoluri/Big-Data-Programing GitHub Wiki
SOLR & Lucene
Introduction:
Lucene : Lucene Core is a Java library providing powerful indexing and search features, as well as spellchecking, hit highlighting and advanced analysis/tokenization capabilities. The PyLucene sub project provides Python bindings for Lucene Core.
Solr : is a high performance search server built using Lucene Core. Solr is highly scalable, providing fully fault tolerant distributed indexing, search and analytics. It exposes Lucene's features through easy to use JSON/HTTP interfaces or native clients for Java and other languages.
Aim:
Task1:
To generate instance directory, creating instance, creating collection for films dataset and load the given films csv dataset into solr and to run any five queries on the dataset.
Task2:
To generate instance directory, creating instance, creating collection for films dataset and load the given films csv dataset into solr and to run any five queries on the dataset.
Tools:
- Solr
- Oracle Virtual Machine
- Cloudera VM
- HDFS shell
- Apache Solr Dashboard
Tasks:
Task1:
To generate instance directory, creating instance, creating collection for films dataset and load the given films csv dataset into solr and to run any five queries on the dataset.
Generate instance directory :
- Generated instance directory films9
- listed films9 folder to show conf file.
- listed conf file to show default schema and gedited it and added all the column schema as shown in films.csv file.
-
Created instance directory and collection for films dataset as shown above.
-
You can see the created collection for films dataset in SOLR UI of your browser as shown below:
- load the data from films csv dataset into the collection from ( https://github.com/apache/lucene-solr/blob/master/solr/example/films/films.csv) into the created collection by setting file type to csv from drop down.
Queries:
Query 1:
- This is the first query, here i am displaying all the data where genre like Thriller as shown below.
Query 2:
- This is the fuzzy search query to search where data similar to word "dumb"
Query 3:
This is the fuzzy search query which will get data or rows which has words similar to "comedy"
Query 4:
This query display columns with date from 2002-02-01 to all the remaining dates next to it.
Query 5:
This query displays all the films dataset.
Task 2:
To generate instance directory, creating instance, creating collection for films dataset and load the given films csv dataset into solr and to run any five queries on the dataset.
Generate instance directory :
- Generated instance directory books
- listed books folder to show conf file.
- listed conf file to show default schema and gedited it and added all the column schema as shown in books.csv file.
- Created instance directory and collection for films dataset as shown above.
- You can see the created collection for films dataset in SOLR UI of your browser as shown below:
- load the data from films csv dataset into the collection from ( https://github.com/apache/lucene-solr/blob/master/solr/example/books/books.csv) into the created collection by setting file type to csv from drop down.
Queries:
Query 1:
This is the select query to show data where genre_s is scifi and inStock is true:
Query 2:
Select query to show data where author like Isaac Asimov
Query 3:
Select query to show data where author like George R.R. Martin and inStock is true:
Query 4:
Select query to show data where inStock is true:
Query 5:
Select query to show data where squence id from 2 to end:
Limitations using Solr and Lucene:
- No formal support contracts
- No assured availability of training or other professional services to fulfill specific software needs or assist with building an application
- No formalized release testing program, release schedule or assurance of upgrade compatability, though Lucene/Solr contributions must have unit testing before they are committed to the code, and releases receive integrated testing
References:
- https://umkc.app.box.com/s/s6uma9ygb9qgb3l2y0jep6z4xi3mhmeo
- https://umkc.app.box.com/s/1jti66iilgt1xoawp4jzjvb4ks0k2p91
- https://lucene.apache.org/solr/