search - NCIEVS/nci-protege5 GitHub Wiki

  • lucene should be the default in preferences

  • full string indexing for all property values. Currently values are tokenized and used for indexing. We need to also have exact match string searching. We do some exact matching on label properties using the builtin protege lookup but this is a workaround and case sensitive.

  • enhanced basic query search

  • all non-alphanumeric chars should be stripped

musings

{
full string search
	- case insensitive
	- variations:  starts with, 
		ends with
		contains?  e.g. "token1 token2 ... tokenX" adjacent and in sequence?
	- limited to specific properties?
	- do not exclude stop words from the indexes

multi-token
	- implicit conjunction:  token1 & token2
	- sequence important?  "token2 & token1" doesn't return a target containing "token1 token2"?
	- exclude stop words

tokenization
	- exclude multiple characters in addition to whitespace

on query:
	process the query string as per the indexing, i.e. if the terms are indexed in one manner, the query 
	string should be processed in the same manner, e.g. whitespace tokenization vs "full" character 
	tokenization (" \?!.=\\/ and so on)
}
⚠️ **GitHub.com Fallback** ⚠️