Journal - okalldal/gf-exjobb GitHub Wiki

Week 16

Todo:

  • Run evaluation on the following setups. -- {Unigram, Bigram, Bigram with deprel, Interpolation model, Interpolation model with deprel}x{All languages, Only english, No english}x{GF dictionary, Wordnet dictionary, Krasimirs dictionary, Clustered dictionary}
  • Marginalize lemmas to create zero order model consisting only of POS and deprel
  • Send shared task WSD data in one sentence per line format to Prasanth
  • Write on report
  • Add coordination ambiguity examples to qualitative testing
  • Explore the problem of cases where "the middle disappears"

Week 14

Todo

Week 13

This week we had a prestentation for the research group at Chalmers. They came with some good suggestions and we are moving forward with the project. Maximilian is focusing the rest of the week with getting the evaluation methods running smoothly and Oscar is investigating smoothing tools.

Todo

  • Generate possdicts for the krasimir dictionary
  • Rewrite some evaluation code and make a script for running
  • Investigate methods for using hypernym information

Bug

Our reranker fails on the sentence "football is hard to play" since it can't find the head in the expression:

 SentAP (PositA hard_1_A) (EmbedVP (UseV play_1_V))

This is due to SentAP has the following label in Lang.labels:

 SentAP amod acl

We should ask Prasanth if this is the right label information.

Todo

  • Run all evaluation scripts from one file
    • Rewrite some evaluation to make it more flexible
    • Investigate why we don't find the 'head' in some trees
  • Investigate smoothing tools
    • Find a good tool for doing smoothing
    • Recount our post-EM counts

Week 12

This week we are writing our results in the report and preparing two presentations. We will also run the evaluation scripts using the dependency relations.

Todo

  • Report
    • Probabilistic parsing
    • Dependency trees
  • Evaluation
    • WN on server
    • Deprel
    • Sort using rerank
  • Meeting with Prasad
    • Finish slides
    • Print slides
  • Slides
    • Introduction
    • Tree probabilities
    • Toy problem
    • Algorithm
    • Wordnet
    • Results
    • Outlook

Week 11

This week we are working with more evaluation experiments. We want to compare the results when we define our model on different contexts.

Todo

  • Run qualitative evaluation on Swedish sentences
  • Create evaluation tables for these models:
    • Unigram
    • Unigram + deplabel
    • Bigram
    • Bigram + deplabel
  • Report
    • Flesh out theory
    • Add evaluation results

Week 10

This week has been focused on getting some quantitative results about our algorithm and run tests to see how stable the model is across languages, that is: can we still disambiguate the trees for languages we haven't trained our model on?

Some early results from the trainomatic and the wordnet examples:

Results from trainomatic and wordnet examples

Todo

  • Run quantitative evaluation on the trainomatic dataset and wordnet examples
  • Make EM run for wordnet and generate .probs-files
  • Write about evaluation in the report
  • Write more examples in the report

Week 9

The evaluation takes longer than expected, but we manage to get some results. We have complied a list of example sentences in English which we use for qualitative evaluation. For English, our model performs better at most of these example sentences, but the ones it fails at is worth looking more into.

Results from reranking example sentences

Todo

  • Finishing evaluation of the model
    • Run reranking for test sentences in other languages than English
    • For wordnet possibility dictionaries do synset disambiguation on trainomatic data
  • Writing the half time report

Points for the future from meeting

  • Prasanth will look at doing GF category inference in context during parsing, which will help when putting together wordnet and GF models

Week 8

Prasanth has helped us with the autoparsed data and we are starting to train the model using the texts found online. There is some trouble with processing this large amount of data as our computers continue running out of memory. We solve this by moving some of the data processing from python into bash scripts.

Points for the meeting

  • EM for wordnet
    • Changes in the algorithm
    • How should we go from wordnet to GF?
  • Evaluation
    • Using the counts from Prasanth
    • Evaluation using the whole tree
    • How do we present results best?
    • Filter away nummod and punct
    • Remove POS punct, aux, num
    • Isolate word classes
    • It's prob throw away OOV, instead have a fallback
  • Rapport
    • What should go in the Rapport

Week 7

We are continuing with the work to use wordnet as a dictionary. There is several problems that has to be considered. We also has to rewrite the EM algorithm to take into account that one wordnet synset can linearize into several lemmas, this is different from the GF dictionary where every abstract function only has one linearization.

Todo

  • Investigate what has to be changed in order to change from the GF dict to Wordnet.
    • Read in the wordnet data and generate wordnet <-> lemma files for the different languages
    • Investigate how we have to change our algorithm to take P( lemma | wordnet_id ) into account.
    • Rewrite the EM algo
    • Skriv om tree_probs.py to take wordnet probabilities
  • Database for parameters?
  • Parse data
    • Parse the data from CoNLL 2017 shared task.
    • Generate counts.
    • Transfer the counts to a format that can be used by the EM algorithm.
  • Evaluation
    • Rewrite, remove fluff.
    • Rerun experiments with more estimated parameters.

Points from the meeting

  • We shouldn't focus on the GF-dictionary's shortcomings, instead we should focus on moving to wordnet. This is the biggest challenge for the close future.
  • When it comes to the shortcomings of lookupMorpho we will wait for the ud2gf tool to be ready.
  • Think about the disambiguating the trees.
    • Should we take childrens into account when counting inside the EM algorithm?
  • We need to parse more data.
  • For evaluation, check the trees from trainomatic. Gold trees and wordnet examples are also good sources but trainomatic is easier to test the word sense disambiguation.

Week 6

This week we have been writing a summary of the current statistical model. There is now a PDF with our assumptions and with a detailed description of our EM implementation.

We have also been investigating different evaluation methods. For word sense disambiguation evaluation we have been trying to use the trainomatic dataset to do evaluation. It shows promise but we still have parsed to little data to do any real experiments. The same is true for the wordnet example sentences.

Todo

  • Do a write-up of the current statistical model
  • Add resources to the bibliography
  • Send specification of the information we need from UD-trees to Prasanth
  • Send wordnet sentences to prasanth
  • Evaluation methods
    • Group sentences according to lemma
    • Use the trainomatic data for evaluation
  • Investigate how we can reduce the number of parameters
    • Look for clusters of abstract functions
  • Generate GF trees from the GF treebank

Points from the meeting

  • We will continue work with the evaluation methods.
  • The wordnet example-sentences:
    • Let Prasanth re-parse the sentences since spacy can be unreliable.
    • Group the sentences from wordnet together.
  • GF gold-standard trees:
    • Generate variations at leaves and see how our model handles it.
  • Assume independence when dealing with grand-parents.
  • We need to address the problem with the parameter space increasing.
    • Dimension reduction
  • We talked about how the model worked.
  • We may continue with variational inference in the future.
  • Prasanth will help us with pre-processing some UD-trees.

Week 5

This week we will focus on analysing the current model and continuing working on calculating probabilities from GF trees.

We were looking at the problem with evaluating our model, we found that wordnet has a list of example sentences for some synsets. We can use this sentences to evaluate that our bigram model chooses the right synset.

We have also been working on the mathematical model we are using. We wanted to give our statistical model a more rigorous treatment. There is some difficulties translating what we are using the EM algorithm for into the common mathematical language.

We have also tried out a trigram model but found that the parameter space grew so much that it was unfeasible to run any experiments now.

Todo

  • Update the journal online
  • Add resources to the bibliography
  • Generate a sequence of head-child pair from GF tree
  • Extend the model to use the grand parent
  • Do a write-up of the current statistical model
  • Proof-of-concept variational inference
  • Run some experiments using the current model
    • Find a data set for evaluation
    • Estimate tree-probs on a few example sentences

Points from the meeting

  • We will use a more updated snapshot of the ud2gf repository
  • Check out lib/src/Lang.labels
  • First order decomposition / second order decomposition
  • Check LiLT gf2ud first part
  • Look at libs/treebanks in GF for gold standard trees in GF

Week 4

The focus this week has been at implementing tools for analysing the models from last week. We have also spent a few days on cleaning up the code since we now have a better understanding on how the model work.

  • How is the model reacting when we add different languages -- check the KL-divergencies.
  • How can we actually rank GF trees, how should the smoothing for out-of-vocabulary word look like?
  • Use the UD tag in the EM algorithm.
  • Check if there is a point to use parts from ud2gf.

Week 3

We decided to start with trying the expectation maximisation (EM) algorithm. This is used in unsupervised learning to learn values of parameters that can't be observed. In order to understand the problem and to make a working prototype from which we can iterate. We have been reimplementing an experiment previously made by the group to get unigram and bigram probabilities.

We can see that the algorithm can see the difference between word senses that intersect over different languages.