Engine - GateNLP/gateplugin-LearningFramework GitHub Wiki

Implementation: Engine

The Engine class hierarchy is used to represent learning algorithms.

Main functions:

creating: a concrete instance of an Engine is created using the static Engine factory method createEngine(algorithm,algParmString,featureInfo,targetType,dataDirectory). This method takes the Algorithm and algorithm parameters and returns an instance of the Engine. The instance of the engine also usually contains an initialised instance of CorpusRepresention
saving: an Engine is saved using the saveEngine(File) instance method. This saves the info file and any metadata needed to re-create the corpus representation used and the trained model stored in that engine
loading: a saved Engine is restored using the statice Engine method loadEngine(File,String). The static method uses the information in the Info file to create an instance of the proper Engine subclass, then runs the instance-specific initAfterLoad(File,String) method to complete initialisation.
getting the CorpusRepresentation: the getCorpusRepresentation() returns the corpus representation used by the Engine for creating features. In addition, the engine may convert this representation to an algorithm specific representation before running the training or application step (e.g. EngineLibSVM). In most cases this returns a MallerCorpusRepresentation.
training a model: the trainModel(File,String,String) method is used to actually train a model from the corpus representation that is stored in the engine. This method may save the model file in which case already in which case the internal method for saving a model as used when saving the Engine may do nothing.
applying a model: the method applyModel(AnnotationSet, AnnotationSet, AnnotationSet, String) applies the model to instance annotations and returns a list of ModelApplication instances which describe how to change annotations in a GATE document based on the model predictions.

Internal working:

creating an engine using the createEngine method passes on control to the following engine-specific methods:
- initWhenCreating is run when an engine is created from scratch (for training)
- initializeAlgorithm which is responsible to prepare and instantiate the ML algorithm, if necessary
creating an engine using the loadEngine method passes on control to the following engine-specific methods:
- initWhenLoading is run when an engine is restored from the data directory
- loadAndSetCorpusRepresentation takes care of properly restoring/initialising the corpus representation as part of the initWhenLoading method. This may be a null function if the engine's classify method does not use a corpus representation
- loadModel takes care of properly restoring/loading the model. This may be a null function if the engine's classify method does not actually rely on a loaded model (e.g. because a server is used)