MatchesListRegexDecideRule vs NotMatchesListRegexDecideRule - internetarchive/heritrix3 GitHub Wiki

class NotMatchesListRegexDecideRule and MatchesListRegexDecideRule are used by DecideRuleSequence for finding candidates for crawling, I found them a little confusing in the beginning , but this is what I have figured out.

class NotMatchesListRegexDecideRule inherits  MatchesListRegexDecideRule

and returns the opposite(in terms of regex evaluation):

protected boolean evaluate(CrawlURI object) {
*       return ! super.evaluate(object);*
}

Again, the opposite in terms of "regex evaluation" not "decision(ACCEPT,REJECT,NONE)" the decision by default for both is "ACCEPT", so  you have the set the decision manually.

so:

if you want to accept a positive pattern match: add

MatchesListRegexDecideRule with <property name="decision" value="ACCEPT"/>

if you want to reject a positive positive pattern match: add

MatchesListRegexDecideRule with <property name="decision" value="REJECT"/>

and:

if you want to accept a negative pattern match: add

NotMatchesListRegexDecideRule with <property name="decision" value="ACCEPT"/>

if you want to reject a negative pattern match: add

NotMatchesListRegexDecideRule with <property name="decision" value="REJECT"/>

To be honest I think the existing of NotMatchesListRegexDecideRule is a bit disturbing, as you can use MatchesListRegexDecideRule for almost every thing.

⚠️ **GitHub.com Fallback** ⚠️