myDistiller's search client - KurtEnglmeier/myDistiller GitHub Wiki
myDistiller’s search client is an additional tool that helps you to automatically locate and extract particular information from the XML files myDistiller produced for you.
You define a set of elements you want the search client extract for you and stores it under the name of your choice, but with the file extension “rule
”.
Example:
authors."Franz Kafka"
lifetime
With these instructions you focus data extraction on the data concerning the author “Franz Kafka”. In a next step you want the client to extract data on Kafka’s birth and death date. You summarized this information previously in myDistiller under the pattern “lifetime
”
The client then returns the respective section from the XML file.
Franz Kafka:authors."Franz Kafka"
lifetime:
birthdate:
date:1883-7-3
death date:
date:1924-6-3
Each of the XML files produced from the data collection on authors has the root element “authors”. This element name corresponds with the entry in identifiers.config.
The strength of SearchClient lies in its capability to extract data from different sources at the same time. For example, you can extract from all invoices concerning car purchases the items like car type, vehicle identification number, and date of purchase. With this information you scan all certificates from the car registry authority in order to get the registration date. Then you retrieve from all complaints (emails) you got those referring to this car type in a period of six months after the car’s registration.
Install the package myDistiller_search_client.jar
preferably at myDistiller’s location. You can best exploit the power of this client if you integrate it into your application. The program search.java
helps you as blueprint.
Please make sure that SearchClient
and myDistiller
can share the same configuration file Distiller4.config
and the directories listed in this file.