Testing dictionaries against corpus - petermr/CEVOpen GitHub Wiki
Testing dictionaries against corpus
-
I created "invasive plant" minicorpus of 500 research articles using
getpapers -q "invasive plant" -o corpora1 -x -p -k 500 -f corpora1/log.txt
-
I ran following commands
a.
ami -p "corpora1" section
b.
ami -p "corpora1" search --dictionary country.xml invasive_plant.xml
c.
ami -p "corpora1" search --dictionary plant_compound.xml invasive_plant.xml
-
I obtained following data tables.
-
invasive_plant, country, plant_material_history, plant_genus and plant_compound dictionaries are working well. After analyzing, I found values are coming from "Reference" section also. So, I will be analyzing results from ami_gui.
-
Tested city.xml dictionary against "corpora1". It didn't worked. xml formatting problem.
I have tried creating dictionary with several attempts. (using list (txt) and sparql). However, it give error as follow
-
Tested eo_Gene.xml dictionary against "corpora1". It didn't worked. xml formatting problem.
-
Testing and documenting plant_compound dictionary against "oil186" corpus.
Presently documenting csv file containing section wise term counting, false +ves, -ves and true +ves and -ves.
- Difficulties in search:
- Italic not counted.
- Paper wise differential formatting.
- Differential counting of isomeric and non isomeric compounds.
- Counts from "Reference" section.
- Counting chemically modified form of compound (For eg. Counting 1-decanol as well as 1-decanol acetate).
- Capital letter not detected (eg. acetone detected but not Acetone).
- Difficulties in search: