Operator property specification - Texera/texera GitHub Wiki
This document describes the properties for each operator in Texera. It serves as the communication API of the operators and query plans between Texera-GUI
and Texera-Web
.
Author: Zuozhi Wang, Kishore Narendran
All operators mentioned below commonly have a required property: "attributes" and two optional properties: "limit" and "offset".
{
"attributes" : "attr1_name, attr2_name, attr3_name",
"limit" : "10 (this property is optional)",
"offset" : "5 (this property is optional)"
}
Matcher operators:
{
"operator_type" : "KeywordMatcher",
"keyword" : "a_keyword",
"matching_type" : "one of: [conjunction, phrase, substring]"
}
{
"operator_type" : "DictionaryMatcher",
"dictionary" : "dict_entry_1, dict_entry_2, dict_entry_3",
"matching_type" : "one of: [conjunction, phrase, substring]"
}
{
"operator_type" : "RegexMatcher",
"regex" : "a_regex",
}
{
"operator_type" : "FuzzyTokenMatcher",
"query" : "a query of fuzzy token matcher",
"threshold_ratio" : "0.8",
}
{
"operator_type" : "NlpExtractor",
"nlp_type" : "one of: [Noun, Verb, Adjective, Adverb, NE_ALL, Number, Location, Person, Organization, Money, Percent, Date, Time] (case insensitive)",
}
{
"operator_type" : "Join",
"inner_attribute" : "inner_attr_name",
"outer_attribute" : "outer_attr_name",
"predicate_type" : "one of [CharacterDistance, SimilarityJoin]",
"threshold" : "10"
}
notice that join doesn't have attributes, instead, it has inner_attribute and outer_attribute.
{
"operator_type" : "Projection",
"attributes" : "attr_1_name, attr_2_name"
}
Source Operators:
Keyword, Regex, FuzzyToken, and Dictionary have their corresponding source operator, which adds a another property of "dataSource".
{
"operator_type" : "KeywordSource",
"data_source" : "data_source_name",
"keyword" : "a_keyword",
"matching_type" : "one of: [conjunction, phrase, substring]"
}
{
"operator_type" : "DictionarySource",
"data_source" : "data_source_name",
"dictionary" : "dict_entry_1, dict_entry_2, dict_entry_3",
"matching_type" : "one of: [conjunction, phrase, substring]"
}
{
"operator_type" : "RegexSource",
"dataSource" : "data_source_name",
"regex" : "a_regex"
}
{
"operator_type" : "FuzzyTokenSource",
"data_source" : "data_source_name",
"query" : "a query of fuzzy token matcher",
"threshold_ratio" : "0.8",
}
Sink Operators:
{
"operator_type" : "FileSink",
"file_path" : "file_path"
}
{
"operator_type" : "IndexSink",
"index_path" : "index_path",
"index_name" : "name_of_index"
}
{
"operator_type" : "TupleStreamSink"
}
The JSON format representing the operator graph will be:
{
"operators" : [
{
"operator_id" : "operator_1_id",
"operator properties as mentioned above" : "some properties"
},
{
"operator_id" : "operator_2_id",
"operator properties as mentioned above" : "some properties"
}
],
"links" : [
{
"from" : "operator_1_id",
"to" : "operator_2_id"
}
]
}