Sample Zika Extraction - Texera/texera GitHub Wiki

For all the operators, leave limit and offset empty

  1. create KeywordSource with properties:
    keyword: zika
    data source: promed
    matching type: conjunction (default)
    attribute: content

  2. create Projection
    attributes: _id, webpage, content

  3. connect KeywordSource with Projection

  4. create Regex_Person
    regex:
    (A|a|(an)|(An)) .{1,40} ((woman)|(man))
    attribute: content

  5. connect Projection with Regex_Person

  6. create NLP_Location
    type: location
    attribute: content

  7. connect Projection with NLP_Location

  8. create Regex_Date
    regex:
    (((0?[1-9])|(1[0-2]))(\s|-|.|/)((0?[1-9])|([12][0-9])|(3[01]))(\s|-|.|/)([0-9]{4}|[0-9]{2}))|((0?[1-9])|([12][0-9])|(3[01])) ((jan(uary)?)|(feb(ruary)?)|(mar(ch)?)|(apr(il)?)|(may)|(june?)|(july?)|(aug(ust)?)|(sep(tember)?)|(oct(ober)?)|(nov(ember)?)|(dec(ember)?))
    attribute: content

  9. connect Projection with Regex_Date

  10. create Join1
    Join attribute: content
    id attribute: _id (default)
    PredicateType: CharacterDistance (default)
    distance: 100

  11. connect Regex_Person and NLP_Location with Join1

  12. create Join2
    (same properties as Join1)

  13. Connect Join1 and Regex_Date with Join2

  14. Create TupleStreamSink (view results)

  15. connect Join2 with TupleStreamSinkFor all the operators, leave limit and offset empty

Here's a screenshot of the query plan: