Tutorial on running mmtest - ecologylab/BigSemantics GitHub Wiki
MmTest is a command-line program that accepts a list of URLs, and outputs the extracted metadata from those URLs in XML (in the same order as the input). It also saves the extracted metadata to a temporary file so that you can open it in an XML formatter. Developers and wrapper authors use MmTest to validate the output of a wrapper and inspect possible problems.
Important: Make sure that you have compiled the repository containing your new wrappers, otherwise MmTest will not work correctly.
The simplest way to run MmTest is to use the Ant target "mmtest" in BigSemanticsWrappers/build.xml. To do that, change folder to BigSemanticsWrappers, and type in console: ant mmtest
. In Eclipse, right click the build file and choose Run As -> Ant Build... (with ellipsis) and choose the "mmtest" target.
MmTest accepts a list of URLs in the file MmTestURLs.lst in the same folder. In this file, each URL should occupy a single line. MmTest processes URLs in the file line by line, in order, until the end of file or a special line, "//" (double slashes), is encountered. For example, if the file contains " " (newline indicated by a space), MmTest will process the 2 URLs. If the file contains " // ", MmTest will stop after processing the first URL.
MmTest will output (verbose) logs to the console. Near the end of the output, the program prints extracted metadata in XML in the same order as the input URLs. Besides, it will try to write those XMLs into a temporary file, typically located at $TMP/mmTest.xml
or %TEMP%/mmTest.xml
, that one can open in an XML formatter later. Check the console output for the specific path of this file.
MmTest is implemented as a Java class, whose full name is ecologylab.bigsemantics.tools.MmTest
, in the BigSemanticsSDK project. URLs are specified in the argument list, separated by whitespace characters (whitespace, tab, newline, etc). Double slashes are again used as the termination of the input URL list.
Eclipse launch files can specify the entry class to launch and the argument list. The MmTest.launch in BigSemanticsSDK can be used to run MmTest. To run it, right click it, and select Run As -> MmTest. You can make copies of the launch file and change the argument list for your convenience.
- I'm not getting the semantic data I want, but a
<compound_document>
element.
- The most common reason of this is that your new wrapper lacks a selector, or the selector does not match with the input URL, therefore BigSemantics failed to find the right wrapper to extract information. Read about wrapper authoring to learn how to use selectors. Use the selector testing tool to test your selector against a given URL.
- I'm getting several messages starting with
Memory.reclaim(DownloadMonitor...
and the program hangs.
- Please make sure that you are using JVM parameters to assign enough heap memory to the program. The default size of the heap memory may be insufficient. We recommend to use at least 256m for the heap, (
-Xms256m
).