Using Features in ProPPR - TeamCohen/ProPPR GitHub Wiki

Adding Arbitrary Variable Features to a Rule

Because features are goals, they can have arguments (variables) in them when you declare them in a rules file. However, a feature must be fully grounded when it is used -- whatever variables are in the feature must have values already assigned to them in the head (left-hand side) of the rule. Thus, the best strategy is to write programs that force "lazy-evaluation" of features.

For example, this fails:

per_cities_of_residence(X,Y) :- entityInDocSent(X,A),hasEntityInDocSent(A,Y) #f(Y) .

because when the rule is used, spawning new goals "entityInDocSent(X,A)" and "hasEntityInDocSent(A,Y)", ProPPR must attach the feature f(Y). However, since Y is still a variable at that time, ProPPR cannot create a grounded feature. This causes the ProPPR to crash.

Instead, do this:

per_cities_of_residence(X,Y) :- entityInDocSent(X,A),hasEntityInDocSent(A,Y),constraints(A,X,Y) .
constraints(A,X,Y) :- #f(Y) .

By chaining two rules together, we have written a program that waits until Y is bound to a value before assigning a feature associated with the value of Y. The goal constraints(A,X,Y) will be expanded last (because it's at the end of the goal list for per_cities_of_residence(X,Y)), after a value for Y has been assigned by hasEntityInDocSent(A,Y).

Moreover, since "constraints(A,X,Y)" has no premises (the right hand side is empty), it is always true. In this way, this pattern serves as a general method for generic feature-producing, which allows one to add arbitrary variable features to a rule.

Using Arbitrary Goal Predicates as Features

If you declare multiple definitions of a rule and give each definition a different feature, ProPPR can then learn which definitions are most effective.

Suppose we want to use NER tags in our answer. Our first pass might have that we want a hard constraint that all of our answers Y must have a LOCATION NER tag:

per_cities_of_residence(X,Y) :- entityInDocSent(X,A),hasEntityInDocSent(A,Y),hasNER(A,Y,n_LOCATION)

What if we want to learn whether or not this LOCATION tag is useful for answering per_cities_of_residence? As we have it written above, hasNER is a predicate that is looked-up in a *.facts file (i.e. a database lookup). This means that only the proving system accesses this data, and the proving system is all or nothing: if Y doesn't have the n_LOCATION NER tag, Y won't be considered as a solution. The learning system has no way to go back and look at all the values of Y that didn't satisfy this predicate, with the rule written as above. However, if we add some indirection & branching, we can achieve the effect we want. Consider this following rewrite:

per_cities_of_residence(X,Y) :- entityInDocSent(X,A),hasEntityInDocSent(A,Y),constraint(A,Y) .
constraint(A,Y) :- hasNER(A,Y,n_LOCATION) .
constraint(A,Y) :- .

The goal constraint(A,Y) can be satisfied in more than one way. ProPPR may use the hasNER(A,Y,n_LOCATION) premise, or the empty premise. Since ProPPR will create a distinct and unique feature for each rule definition, there will be one feature for the hasNER definition and one feature for the trivial definition. During training, ProPPR will learn which definition of constraint was more effective in answering queries. We can expand on this idea to have arbitrarily complex feature-rule weighting:

per_cities_of_residence(X,Y) :- entityInDocSent(X,A), hasEntityInDocSent(A,Y),
                                constraintNER(A,Y),constraintPOS(A,Y),constraintWords(X,A,Y),
                                features(X,A,Y) .
constraintNER(A,Y) :- hasNER(A,Y,n_LOCATION)  .
constraintNER(A,Y) :- .
constraintPOS(A,Y) :- hasPOS(A,Y,p_NN) .
constraintPOS(A,Y) :- hasPOS(A,Y,p_NNS) .
constraintPOS(A,Y) :- hasPOS(A,Y,p_NNP) .
constraintPOS(A,Y) :- hasPOS(A,Y,p_NNPS) .
constraintPOS(A,Y) :- .
constraintWords(X,A,Y) :- words_text_classified-person(A,X) .
constraintWords(X,A,Y) :- words_text_classified-city(A,Y) .
constraintWords(X,A,Y) :- words_text_classified-per_cities_of_residence(X,A,Y) .
constraintWords(X,A,Y) :- .
features(X,A,Y) :- #f(X),f(A),f(Y) .

Where hasNER and hasPOS are database lookups and words_text_classified-* are lookups to pre-computed values by trained category and relation text classifiers. [Malcolm, what does this mean? -katie]

This can get complex fast. The branches in the graph mimic the branches in the ruleset, so adding these alternate definitions of rules will impact the running time of the cooking procedure. Some of the provers have a maximum depth parameter (TracingDfsProver, PPRProver). Each time a predicate is "consumed" during proving, the current depth is increased by 1. If the branching factor of the i-th goal-predicate, g_i, is #branch(g_i), then the total depth of a rule r is:

branch(r) = \prod_{i=1}^{N}{branch(g_i)}

And each rule is decomposed recursively[? -katie] if it has more cases. If it doesn't, i.e. has a fixed # of cases at some point, then the # of cases is the branching factor for the specific instance of the goal-predicate.

For example, take the last (complex) definition of per_cities_of_residence:

branch(per_cities_of_residence) = 1 * 1 * 2 * 5 * 4 * 1 = 40