Demo 1 - TeamCohen/ProPPR GitHub Wiki
Now supporting ProPPR 2.0!
This demo starts from a freshly built (ant clean build
) copy of ProPPR, and assumes basic familiarity with the bash shell.
First, we'll create our dataset.
Dataset creation
Make a place for the dataset to live:
ProPPR$ mkdir demos
ProPPR$ cd demos/
ProPPR/demos$ mkdir textcattoy
ProPPR/demos$ cd textcattoy/
ProPPR/demos/textcattoy$
Now we'll write our rules file. This file will be used in the inference step, and tells ProPPR how to generate a list of candidate answers to your queries, and how to apply features to different steps of the proof. These features will, in turn, be used in the learning step.
ProPPR/demos/textcattoy$ cat > textcattoy.ppr
predict(X,Y) :- isLabel(Y),abduce_features(X,Y) {r}.
abduce_features(X,Y) :- { w(W,Y): hasWord(X,W) }.
^D
For this demo, we want to answer queries of the form predict(someDocument,?)
to predict the correct label for a document. Let's look at the first rule.
predict(X,Y) :- isLabel(Y),abduce_features(X,Y) {r}.
ProPPR rules look a lot like rules in Prolog: they are clauses. This one says, in english, "Predict label Y for a document X if Y is in our list of labels, then add features relevant to document X and label Y." The order of the goals in the body of the rule is important, because we want to bind the variables in order from left to right. When the rule is used, X will be bound to a document. We then use isLabel
to look up a label, and bind Y to that label. Then we use abduce_features
to add some more information to the graph that we can then use for learning. ProPPR doesn't stop at the first valid label, but will proceed through all possible bindings for Y, adding features for each label assignment.
We'll store the lookup table for isLabel
in a database, which we'll create in a moment. abduce_features
will be defined as a rule later on in the file.
The last part of the rule comes within the '{}' brackets, and is a list of features. Whenever ProPPR invokes this rule, it will mark that transition with the feature r
, which in this case is just a placeholder.
Now let's look at the second rule.
abduce_features(X,Y) :- { w(W,Y): hasWord(X,W) }.
This rule says, in english, "Always abduce features for document X and label Y." Because there is no body for this rule, it always succeeds. The smart part is in the feature.
The feature expression for this rule creates a list of features, one for each word W in document X. Each feature will measure the affinity between that word and the label Y. In inference, ProPPR will be able to use this rule to learn which words are most closely related to which labels, across all the words and labels and documents in the training set.
Now let's go back and create the databases for hasWord
and isLabel
.
ProPPR supports a few different types of databases, but the most basic is a cfacts file, which is tab-separated:
ProPPR/demos/textcattoy$ cat > labels.cfacts
isLabel neg
isLabel pos
^D
Here we've said that neg
is a label and pos
is a label. The predicate we're storing, isLabel
, is 1-ary -- it takes one argument. The cfacts format supports predicates of any number of arguments, so that's fine.
The hasWord
database is a little bigger, so we'll start with an easy-to-read format and then use a shell script to convert it.
ProPPR/demos/textcattoy$ cat > documents.txt
dh a pricy doll house
ft a little red fire truck
rw a red wagon
sc a pricy red sports car
bk punk queen barbie and ken
rb a little red bike
mv a big 7-seater minivan with an automatic transmission
hs a big house in the suburbs with a crushing mortgage
ji a job for life at IBM
tf a huge pile of tax forms due yesterday
jm a huge pile of junk mail bills and catalogs
pb a pricy barbie doll
yc a little yellow toy car
rb2 a red 10 speed bike
rp a red convertible porshe
bp a big pile of paperwork
he a huge backlog of email
wt a life of woe and trouble
^D
Here we've listed a bunch of documents, with the document identifier first, followed by a tab, followed by the space-delimited words in the document. Now we need to put each word in a separate entry in the database, and label it with the functor hasWord
. I like awk for stuff like this. We'll test the script first to see that it looks good, then write the whole thing to a cfacts file.
ProPPR/demos/textcattoy$ awk 'BEGIN{FS=OFS="\t"}{nwords = split($2,words," "); \
for (i=1;i<=nwords;i++) { print "hasWord",$1,words[i] }}' documents.txt | head
hasWord dh a
hasWord dh pricy
hasWord dh doll
hasWord dh house
hasWord ft a
hasWord ft little
hasWord ft red
hasWord ft fire
hasWord ft truck
hasWord rw a
ProPPR/demos/textcattoy$ awk 'BEGIN{FS=OFS="\t"}{nwords = split($2,words," "); \
for (i=1;i<=nwords;i++) { print "hasWord",$1,words[i] }}' documents.txt > hasWord.cfacts
There are awk tutorials all over the internet if you want details, but the general idea of this script is: Before reading any lines, set the field separator for input and output to the tab character. Then on each line of the file, split the second field into words, and for each word, print the fact line: the functor hasWord
, followed by the document identifier, followed by the word. Store the output in the file "hasWord.cfacts".
Now that we have our rules file and database files, we have everything we need for inference.
Next we have to prepare our labelled data for the learning phase. Labelled data for ProPPR is in a tab-separated format with one example per line, starting with the query and then listing solutions labelled with +(correct/positive) or -(incorrect/negative). For ProPPR to work properly, these solutions must be reachable by the logic program as defined in the rules file. If you only have + solutions available, you can have QueryAnswerer give you other reachable solutions from which you can sample - labels. For this problem though we have both + and - labels. We'll use the first 11 documents for training, and the rest for testing:
ProPPR/demos/textcattoy$ cat > train.examples
predict(dh,Y) -predict(dh,neg) +predict(dh,pos)
predict(ft,Y) -predict(ft,neg) +predict(ft,pos)
predict(rw,Y) -predict(rw,neg) +predict(rw,pos)
predict(sc,Y) -predict(sc,neg) +predict(sc,pos)
predict(bk,Y) -predict(bk,neg) +predict(bk,pos)
predict(rb,Y) -predict(rb,neg) +predict(rb,pos)
predict(mv,Y) +predict(mv,neg) -predict(mv,pos)
predict(hs,Y) +predict(hs,neg) -predict(hs,pos)
predict(ji,Y) +predict(ji,neg) -predict(ji,pos)
predict(tf,Y) +predict(tf,neg) -predict(tf,pos)
predict(jm,Y) +predict(jm,neg) -predict(jm,pos)
^D
ProPPR/demos/textcattoy$ cat > test.examples
predict(pb,Y) -predict(pb,neg) +predict(pb,pos)
predict(yc,Y) -predict(yc,neg) +predict(yc,pos)
predict(rb2,Y) -predict(rb2,neg) +predict(rb2,pos)
predict(rp,Y) -predict(rp,neg) +predict(rp,pos)
predict(bp,Y) +predict(bp,neg) -predict(bp,pos)
predict(he,Y) +predict(he,neg) -predict(he,pos)
predict(wt,Y) +predict(wt,neg) -predict(wt,pos)
^D
Now we have all the raw data we need, and we can start running ProPPR tools.
Compiling the dataset
Because we used tab-separated formats for our database, the only thing we need to compile is the rules file.
First set up some environment variables. You only need to do this once every time you open a new shell:
ProPPR/demos/textcattoy$ export PROPPR=/path/to/ProPPR
ProPPR/demos/textcattoy$ export PATH=$PATH:$PROPPR/scripts
These environment variables let you use a helper script called 'proppr' to keep track of settings and make it so you don't have to type out long command lines. The syntax for the proppr
utility is:
proppr *command* *args*
As you go through the rest of this tutorial you'll see examples of some of the commands that proppr
can handle.
For now, we're going to use the 'compile' command:
ProPPR/demos/textcattoy$ proppr compile textcattoy
INFO:root:ProPPR v2
INFO:root:subprocess call options: {'stdout': <open file 'textcattoy.wam', mode 'w' at 0x7fd47268ca50>}
INFO:root:calling: python $PROPPR/src/scripts/compiler.py serialize textcattoy.ppr
INFO:root:compiled textcattoy.ppr to textcattoy.wam
You'll notice that we didn't specify our .ppr file directly, but just used its stem textcattoy
. This is because the proppr
script is designed to make reasonable guesses at filenames based on the stem, which helps keep everything organized.
The resulting .wam
format isn't binary, it's a sort of logic programming assembly code. It is a bit hard for humans to read though, which is why we usually make edits in the plain .ppr
format.
ProPPR/demos/textcattoy$ cat textcattoy.wam
0 comment abduce_features(-1,-2) :- {w(-3,-2) : hasWord(-1,-3)} #v:['X', 'Y', 'W'].
1 abduce_features/2 allocate 3 ['W', 'Y', 'X']
2 initfreevar -1 -2
3 initfreevar -2 -1
4 fclear
5 ffindall 9
6 freport
7 returnp
8 comment features w(-3,-2) : hasWord(-1,-3)
9 pushboundvar -1
10 pushfreevar -3
11 callp hasWord/2
12 fpushstart w 2
13 fpushboundvar -3
14 fpushboundvar -2
15 returnp
16 comment predict(-1,-2) :- isLabel(-2), abduce_features(-1,-2) {r} #v:['X', 'Y'].
17 predict/2 allocate 2 ['Y', 'X']
18 initfreevar -1 -2
19 initfreevar -2 -1
20 fclear
21 fpushstart r 0
22 freport
23 pushboundvar -2
24 callp isLabel/1
25 pushboundvar -1
26 pushboundvar -2
27 callp abduce_features/2
28 returnp
On to inference!
Inference: QueryAnswerer
Let's see how ProPPR does without any training. The internal ranking system can be quite sophisticated, depending on how the rules are written, so it's usually a good idea to collect untrained results.
First we'll tell proppr about the program we want to run in this directory. You only have to do this once when you set up a new dataset:
ProPPR/demos/textcattoy$ proppr set --programFiles textcattoy.wam:labels.cfacts:hasWord.cfacts
INFO:root:ProPPR v2
saved 1 option(s) into proppr.settings
Next we'll have ProPPR answer our test queries:
ProPPR/demos/textcattoy$ proppr answer test
INFO:root:ProPPR v2
INFO:root:calling: java -cp .:$PROPPR/conf/:$PROPPR/bin:$PROPPR/lib/* \
edu.cmu.ml.proppr.QueryAnswerer --queries test.examples \
--solutions test.solutions.txt --programFiles textcattoy.wam:labels.cfacts:hasWord.cfacts
INFO:root:answers in test.solutions.txt
WARN [FactsPlugin] Skipping duplicate fact at hasWord.cfacts:35: hasWord hs a
edu.cmu.ml.proppr.QueryAnswerer.QueryAnswererConfiguration
queries file: test.examples
solutions file: test.solutions.txt
Duplicate checking: up to 1000000
Prover: edu.cmu.ml.proppr.prove.DprProver
Weighting Scheme: edu.cmu.ml.proppr.learn.tools.ReLUWeightingScheme
APR Alpha: 0.1
APR Epsilon: 1.0E-4
APR Depth: 20
INFO [QueryAnswerer] Running queries from test.examples; saving results to test.solutions.txt
INFO [QueryAnswerer] Querying: predict(pb,X1) #v:[?].
INFO [QueryAnswerer] Writing 2 solutions...
INFO [QueryAnswerer] Querying: predict(yc,X1) #v:[?].
INFO [QueryAnswerer] Writing 2 solutions...
INFO [QueryAnswerer] Querying: predict(rb2,X1) #v:[?].
INFO [QueryAnswerer] Writing 2 solutions...
INFO [QueryAnswerer] Querying: predict(rp,X1) #v:[?].
INFO [QueryAnswerer] Writing 2 solutions...
INFO [QueryAnswerer] Querying: predict(bp,X1) #v:[?].
INFO [QueryAnswerer] Writing 2 solutions...
INFO [QueryAnswerer] Querying: predict(he,X1) #v:[?].
INFO [QueryAnswerer] Writing 2 solutions...
INFO [QueryAnswerer] Querying: predict(wt,X1) #v:[?].
INFO [QueryAnswerer] Writing 2 solutions...
Query-answering time: 147
Breaking down that command line:
ProPPR/demos/textcattoy$ proppr answer test
Again, we're using file stems, and not the whole filename, to help keep things organized. So in this case, ProPPR will look for test.examples
as the input file, and write the output to test.solutions.txt
.
Once we have the untrained solutions to the test.examples file, we can evaluate the results using MAP:
ProPPR/demos/textcattoy$ proppr eval test
INFO:root:ProPPR v2
INFO:root:calling: python $PROPPR/scripts/answermetrics.py \
--data test.examples --answers test.solutions.txt --metric map
queries 0 answers 0 labeled answers 14
==============================================================================
metric map (MAP): The average precision after each relevant solution is retrieved
. micro: 0.5
. macro: 0.5
In this command the proppr utility used the prefix test
to guess that we want to evaluate the file "test.solutions.txt" using the labels from the "test.examples" file.
While the default ranking can be quite sophisticated, in this case it's not. We can do lots better than a MAP of 0.5.
There are loads of other query-answering options we could include to specify which proof engine we want it to use, how accurate we want our approximations to be, and to activate multithreading for faster computation on machines with multiple cores. We'll talk about some of those later.
In the meantime, we clearly need to train.
Learning: Trainer
Currently ProPPR can only train on grounded queries, so we'll do that first:
ProPPR/demos/textcattoy$ proppr ground train
INFO:root:ProPPR v2
INFO:root:calling: java -cp .:$PROPPR/conf/:$PROPPR/bin:$PROPPR/lib/* \
edu.cmu.ml.proppr.Grounder --queries train.examples \
--grounded train.examples.grounded --programFiles textcattoy.wam:labels.cfacts:hasWord.cfacts
WARN [FactsPlugin] Skipping duplicate fact at hasWord.cfacts:35: hasWord hs a
INFO [Grounder] Resetting grounding statistics...
edu.cmu.ml.proppr.Grounder.ExampleGrounderConfiguration
queries file: train.examples
grounded file: train.examples.grounded
Duplicate checking: up to 1000000
Prover: edu.cmu.ml.proppr.prove.DprProver
Weighting Scheme: edu.cmu.ml.proppr.learn.tools.ReLUWeightingScheme
APR Alpha: 0.1
APR Epsilon: 1.0E-4
APR Depth: 20
INFO [Grounder] Resetting grounding statistics...
INFO [Grounder] Grounded 1 examples...
INFO [Grounder] Grounded all 11 examples
INFO [Grounder] totalPos: 11 totalNeg: 11 coveredPos: 11 coveredNeg: 11
INFO [Grounder] For positive examples 11/11 proveable [100.0%]
INFO [Grounder] For negative examples 11/11 proveable [100.0%]
Grounding time: 203
Done.
INFO:root:grounded to train.examples.grounded
"Grounding a query" means doing inference on it, but instead of just keeping the solutions at the end, we keep the whole proof graph and save this graph to a file. The learning algorithms then use the graph to help determine what the best weight for each feature is to maximize the scores of the positive-labeled solutions and minimize the scores of the negative-labeled solutions.
Now that we've grounded our training examples, we can train on them:
ProPPR/demos/textcattoy$ proppr train train
INFO:root:ProPPR v2
INFO:root:calling: java -cp .:$PROPPR/conf/:$PROPPR/bin:$PROPPR/lib/* \
edu.cmu.ml.proppr.Trainer --train train.examples.grounded \
--params train.params --programFiles textcattoy.wam:labels.cfacts:hasWord.cfacts
WARNING: unrecognized options detected:
--programFiles
INFO [Trainer]
edu.cmu.ml.proppr.util.ModuleConfiguration
queries file: train.examples.grounded
params file: train.params
Trainer: edu.cmu.ml.proppr.CachingTrainer
Walker: edu.cmu.ml.proppr.learn.L2SRW
Weighting Scheme: edu.cmu.ml.proppr.learn.tools.ReLUWeightingScheme
APR Alpha: 0.1
APR Epsilon: 1.0E-4
APR Depth: 20
INFO [Trainer] Training model parameters on train.examples.grounded...
INFO [CachingTrainer] epoch 1 ...
INFO [CachingTrainer] epoch 2 ...
INFO [CachingTrainer] epoch 3 ...
INFO [CachingTrainer] epoch 4 ...
INFO [CachingTrainer] epoch 5 ...
INFO [CachingTrainer] Reading: 0 Parsing: 24 Training: 184
Training time: 227
INFO [Trainer] Saving parameters to train.params...
There are loads of options we could include at training time, including alternate learning scaffolds and algorithms, and parameters controlling accuracy, the mixture of log and regularization loss, and how feature weights are combined. We won't talk about those in this example, but we're currently collecting data that will help us write guidelines on how to achieve optimum performance for different kinds of datasets.
In the meantime, we can peek at the weights Trainer came up with:
ProPPR/demos/textcattoy$ head train.params
#! weightingScheme=ReLU
w(fire,pos) 1.04324
w(with,neg) 1.02656
w(due,pos) 0.973330
w(tax,pos) 0.979976
w(and,neg) 0.977144
id(trueLoopRestart) 1.00000
w(crushing,neg) 1.00467
w(junk,neg) 1.00271
w(automatic,pos) 0.983177
Here we can see how the w(W,Y)
feature specification in the rules file created a different feature for the different word+label combinations, and how each of those features received a different weight. When we re-run inference using these weights in place of the defaults, ProPPR will generate a different ranking of the candidate solutions. If training has worked effectively, the correct solutions will be at the top of the list.
Post-Training Inference: Tester
Now we'll run inference on our test set using the trained weights:
ProPPR/demos/textcattoy$ proppr answer test --params train.params
INFO:root:ProPPR v2
INFO:root:calling: java -cp .:$PROPPR/conf/:$PROPPR/bin:$PROPPR/lib/* \
edu.cmu.ml.proppr.QueryAnswerer --queries test.examples \
--solutions test.solutions.txt --params train.params \
--programFiles textcattoy.wam:labels.cfacts:hasWord.cfacts
WARN [FactsPlugin] Skipping duplicate fact at hasWord.cfacts:35: hasWord hs a
edu.cmu.ml.proppr.QueryAnswerer.QueryAnswererConfiguration
queries file: test.examples
params file: train.params
solutions file: test.solutions.txt
Duplicate checking: up to 1000000
Prover: edu.cmu.ml.proppr.prove.DprProver
Weighting Scheme: edu.cmu.ml.proppr.learn.tools.ReLUWeightingScheme
APR Alpha: 0.1
APR Epsilon: 1.0E-4
APR Depth: 20
INFO [QueryAnswerer] Running queries from test.examples; saving results to test.solutions.txt
INFO [QueryAnswerer] Querying: predict(pb,X1) #v:[?].
INFO [QueryAnswerer] Writing 2 solutions...
INFO [QueryAnswerer] Querying: predict(yc,X1) #v:[?].
WARN [InnerProductWeighter] Using default weight 1.0 for unknown feature w(yellow,neg) (this message only prints once per feature)
WARN [InnerProductWeighter] Using default weight 1.0 for unknown feature w(toy,neg) (this message only prints once per feature)
WARN [InnerProductWeighter] Using default weight 1.0 for unknown feature w(yellow,pos) (this message only prints once per feature)
WARN [InnerProductWeighter] Using default weight 1.0 for unknown feature w(toy,pos) (this message only prints once per feature)
INFO [QueryAnswerer] Writing 2 solutions...
INFO [QueryAnswerer] Querying: predict(rb2,X1) #v:[?].
WARN [InnerProductWeighter] Using default weight 1.0 for unknown feature w(10,neg) (this message only prints once per feature)
WARN [InnerProductWeighter] Using default weight 1.0 for unknown feature w(speed,neg) (this message only prints once per feature)
WARN [InnerProductWeighter] Using default weight 1.0 for unknown feature w(speed,pos) (this message only prints once per feature)
WARN [InnerProductWeighter] Using default weight 1.0 for unknown feature w(10,pos) (this message only prints once per feature)
INFO [QueryAnswerer] Writing 2 solutions...
INFO [QueryAnswerer] Querying: predict(rp,X1) #v:[?].
WARN [InnerProductWeighter] Using default weight 1.0 for unknown feature w(convertible,neg) (this message only prints once per feature)
WARN [InnerProductWeighter] Using default weight 1.0 for unknown feature w(convertible,pos) (this message only prints once per feature)
INFO [QueryAnswerer] Writing 2 solutions...
INFO [QueryAnswerer] Querying: predict(bp,X1) #v:[?].
INFO [QueryAnswerer] Writing 2 solutions...
INFO [QueryAnswerer] Querying: predict(he,X1) #v:[?].
WARN [InnerProductWeighter] You won't get warnings about other unknown features
INFO [QueryAnswerer] Writing 2 solutions...
INFO [QueryAnswerer] Querying: predict(wt,X1) #v:[?].
INFO [QueryAnswerer] Writing 2 solutions...
Query-answering time: 172
INFO:root:answers in test.solutions.txt
There's a new command line option,
--params
where we specified the trained weights file.
The warnings about using a default weight are because our test set includes some words that weren't in our training set. A few of these non-overlapping features are okay, but as the number of unknown features approaches the number of trained features, that's a sign your training and testing data are too different and you're not going to get good transfer.
Now that we have a set of solutions for the test set using the trained model, we can see how we did:
ProPPR/demos/textcattoy$ proppr eval test
INFO:root:ProPPR v2
INFO:root:calling: python $PROPPR/scripts/answermetrics.py \
--data test.examples --answers test.solutions.txt --metric map
queries 0 answers 0 labeled answers 14
==============================================================================
metric map (MAP): The average precision after each relevant solution is retrieved
. micro: 0.785714285714
. macro: 0.855952380952
Before training, we had a MAP of 50%, or no better than guessing at random. After training, our precision is much better. It worked!
We could get even better results on this dataset if we paid closer attention to the training parameters. For more information, see the Tuning Guide.