Solr Elastic - sgml/signature GitHub Wiki
- https://www.ncbi.nlm.nih.gov/pmc/articles/PMC10337356/
- https://www.elastic.co/guide/en/cloud-enterprise/current/ece-hardware-prereq.html
- https://opster.com/guides/elasticsearch/capacity-planning/elasticsearch-minimum-requirements/
- https://reintech.io/blog/elasticsearch-capacity-planning-hardware-guide
Use a JSON structure with a key for each thing you have and each thing you need to match via a string method. For example:
- URL path
- Title
- keywords
var searchengine = { [ { "path": "api", "titleWords": "API Reference", "keywords": "$animate $aria $ariaprovider $compile $cookie $cookies $cookiestore $http $httpbackend $interval $location $log $resource $route $routeparams $routeprovider $sanitize $swipe $timeout access accessed accessibility accidental alert alert-info an and angular angularjs animation animations api apis application applying are aria as attached attributes available aware be become before behavior being both browser browsers build by callbacks can class clean code collection collisions common communicate complex components configuring contain contains convenient cookie cookies copy core css css-based currency current currently dangerous data date default define defined definition-table dependency details developing di directive directives disabilities display do docs documentation dom dump element emulate enable equals etc events examples experience expressions extend factories features file filter filters follow following for function functions global guide handle hashbang helper hooks html html5 improve in include included index inject into is it javascript js js-based keyframe level links linky low lowercase manage manageable management manipulate manner materials methods mobile mock mocks module modules more name names naming ng ng-bind nganimate ngaria ngclick ngcookies nginclude ngmock ngrepeat ngresource ngroute ngsanitize ngtouch ngview not object objects of once operations or organized overview page pages parse partials please posting prefix prefixes present prevent private provide provided providers public pushstate querying querystring quick reference referencing register registered rendered rest restful route routes routing runner securely serialization service services set simple some spaced store string structure supports synchronous template templates test testing tests that the these this to transform transitions trigger triggered turn unit up uppercase url urls use used useful users using values various version via way welcome when which will with within work would wrapper you your", "members": "" } ] };
Here's a comparison of the Python clients for Druid and Elasticsearch:
-
Library:
elasticsearch-py
-
Installation:
pip install elasticsearch
-
Features:
- Low-level client: Provides a thin wrapper around Elasticsearch's REST API, allowing for maximum flexibility.
-
High-level client:
elasticsearch-dsl
offers a more Pythonic way to interact with Elasticsearch, mirroring its JSON DSL. -
Async support: Can be used with
asyncio
for asynchronous operations. - Extensive documentation: Well-documented with numerous examples and a large community.
-
Example Usage:
from datetime import datetime from elasticsearch import Elasticsearch client = Elasticsearch("http://localhost:9200") doc = { "author": "kimchy", "text": "Elasticsearch: cool. bonsai cool.", "timestamp": datetime.now(), } resp = client.index(index="test-index", id=1, document=doc) print(resp["result"])
-
Library:
pydruid
-
Installation:
pip install pydruid
-
Features:
- SQLAlchemy support: Integrates with SQLAlchemy for ORM capabilities.
- Pandas integration: Can convert query results directly into Pandas DataFrames.
- Real-time and batch ingestion: Supports both real-time and batch data ingestion.
- Documentation: Adequate documentation, though not as extensive as Elasticsearch's.
-
Example Usage:
from pydruid.client import PyDruid client = PyDruid("http://localhost:8888", "druid/v2/sql") query = "SELECT * FROM my_table WHERE __time >= CURRENT_TIMESTAMP - INTERVAL '1' DAY" result = client.sql(query) print(result)
-
Elasticsearch: The
elasticsearch-py
client is highly flexible and well-documented, suitable for a wide range of search and analytics use cases. It also offers a high-level client (elasticsearch-dsl
) for more convenient usage. -
Druid: The
pydruid
client is tailored for real-time analytics and integrates well with data science tools like Pandas and SQLAlchemy. It is ideal for time-series and OLAP workloads.
Both clients have their strengths, so the best choice depends on your specific requirements. Are you looking for more search-oriented capabilities or real-time analytics?
Source: Conversation with Copilot, 9/12/2024 (1) Python Elasticsearch Client — Python Elasticsearch client 8.15.1 .... https://elasticsearch-py.readthedocs.io/en/v8.15.1/. (2) Apache Druid vs Elasticsearch. https://druid.apache.org/docs/latest/comparisons/druid-vs-elasticsearch/. (3) Elasticsearch and Druid - Imply. https://imply.io/blog/elasticsearch-and-druid/.
https://wiki.apache.org/solr/UnicodeCollation
https://opensourceconnections.com/blog/2017/02/20/solr-utf8/
https://projects.apache.org/project.html?lucene-pylucene
https://lucene.apache.org/solr/guide/6_6/tokenizers.html
https://lucene.apache.org/solr/guide/6_6/filter-descriptions.html
https://home.apache.org/~hossman/apachecon2008us/ootb/apache-solr-out-of-the-box.pdf
https://archive.apachecon.com/eu2007/materials/solr-talk.pdf