Sample Zika Extraction - Texera/texera GitHub Wiki
For all the operators, leave limit and offset empty
-
create KeywordSource with properties:
keyword: zika
data source: promed
matching type: conjunction (default)
attribute: content -
create Projection
attributes: _id, webpage, content -
connect KeywordSource with Projection
-
create Regex_Person
regex:
(A|a|(an)|(An)) .{1,40} ((woman)|(man))
attribute: content -
connect Projection with Regex_Person
-
create NLP_Location
type: location
attribute: content -
connect Projection with NLP_Location
-
create Regex_Date
regex:
(((0?[1-9])|(1[0-2]))(\s|-|.|/)((0?[1-9])|([12][0-9])|(3[01]))(\s|-|.|/)([0-9]{4}|[0-9]{2}))|((0?[1-9])|([12][0-9])|(3[01])) ((jan(uary)?)|(feb(ruary)?)|(mar(ch)?)|(apr(il)?)|(may)|(june?)|(july?)|(aug(ust)?)|(sep(tember)?)|(oct(ober)?)|(nov(ember)?)|(dec(ember)?))
attribute: content -
connect Projection with Regex_Date
-
create Join1
Join attribute: content
id attribute: _id (default)
PredicateType: CharacterDistance (default)
distance: 100 -
connect Regex_Person and NLP_Location with Join1
-
create Join2
(same properties as Join1) -
Connect Join1 and Regex_Date with Join2
-
Create TupleStreamSink (view results)
-
connect Join2 with TupleStreamSinkFor all the operators, leave limit and offset empty
Here's a screenshot of the query plan: