Adding user defined features - Horsmann/FlexTag GitHub Wiki

In this article we discuss how to define new feature extractors and how to pass parameters and external resource to them.

All feature extractors have to extend the super class FeatureExtractorResource_ImplBase and implement the interface FeatureExtractor. FlexTag will call the method extract(JCas aView, TextClassificationTarget aTarget). The variable of type TextClassificationTarget refers to the actual text segment for which the feature extractor is called (i.e. the current token).

A simple feature extractor

This feature extractor is one of the most simple extractors. It sets a boolean feature value if a word starts with the # character which is used in social media i.e. Twitter to mark a word as a hashtag. You can find this feature extractor defined in FlexTag if you want to use this feature.

A feature extractor uses an unique name, no other feature extractor is suppose to create a feature with the same name. The extract method returns a set of features. Thus, every feature extractor can define more than one feature for each text classification target. The example below sets only one feature.

public class IsHashtag
    extends FeatureExtractorResource_ImplBase
    implements FeatureExtractor
{

    public final static String FEATURE_NAME = "isHashtag";

    public Set<Feature> extract(JCas aView, TextClassificationTarget aTarget)
        throws TextClassificationException
    {
        String tokenText = aTarget.getCoveredText();
        boolean isHashtag = tokenText.startsWith("#");
        Feature feature = new Feature(FEATURE_NAME, isHashtag ? 1 : 0);

        Set<Feature> features = new HashSet<Feature>();
        features.add(feature);
        return features;
    }
}

Adding a parameter

If you need to pass to your feature extractor resources or want to add (optional) parametrisation you can provide them by defining the following lines for each parameter you wish to add:

public static final String PARAM_LOWER_CASE = "listUseLowCase";
@ConfigurationParameter(name = PARAM_LIST_LOCATION, mandatory = false, defaultValue="false")
Boolean lowerCase;

You can define if a parameter is mandatory by changing the respective mandatory value and/or define a default value which is automatically assigned in case the user does not provide this parameter but you need to initialise it with any non-null value. The variable lowerCase receives the parameter which is then accessible in the feature extractor.

Loading resources

A feature extractor contains an optional initialisation method you can call in case you need to load resources from your file system.

Defining a feature extractor that receives a list as parameter and loads this list could look like this:

/* Parameter that receives the file location */
public static final String PARAM_LIST_LOCATION = "listLocation";
@ConfigurationParameter(name = PARAM_LIST_LOCATION, mandatory = true)
File inputFile;

@Override
public boolean initialize(ResourceSpecifier aSpecifier, Map<String, Object> aAdditionalParams)
    throws ResourceInitializationException
{
    if (!super.initialize(aSpecifier, aAdditionalParams)) {
        return false;
    }
    // TODO: Read data from the provided input file into list
    return true;
}

Full example with resources

Below examples uses both previously introduced parameters, one for setting the evaluation to lower case and one for providing the resource file which shall be loaded

public class IsKeyWordList
    extends FeatureExtractorResource_ImplBase
    implements FeatureExtractor
{
    static final String FEATURE_NAME = "resourceDemo_";
    
    public static final String PARAM_LIST_LOCATION = "listLocation";
    @ConfigurationParameter(name = PARAM_LIST_LOCATION, mandatory = true)
    File inputFile;
    
    public static final String PARAM_LOWER_CASE = "listUseLowCase";
    @ConfigurationParameter(name = PARAM_LIST_LOCATION, mandatory = false, defaultValue="false")
    Boolean lowerCase;


    List<String> keywords = new ArrayList<>();
    
    @Override
    public boolean initialize(ResourceSpecifier aSpecifier, Map<String, Object> aAdditionalParams)
        throws ResourceInitializationException
    {
        if (!super.initialize(aSpecifier, aAdditionalParams)) {
            return false;
        }
        try {
            keywords = FileUtils.readLines(inputFile);
        }
        catch (IOException e) {
            throw new ResourceInitializationException();
        }
        return true;
    }

    @Override
    public Set<Feature> extract(JCas view, TextClassificationTarget aTarget)
        throws TextClassificationException
    {
        
        String text = aTarget.getCoveredText();
        
        if (lowerCase){
            text = text.toLowerCase();
        }

        boolean isKeyword = keywords.contains(text);
        Set<Feature> features = new HashSet<Feature>();
        features.add(new Feature(FEATURE_NAME, isKeyword ? 1 : 0));
        
        return features;
    }
}
⚠️ **GitHub.com Fallback** ⚠️