ctakes lvg - apache/ctakes GitHub Wiki
Adds canonical form of words.
Source class: LvgAnnotator
Source package: org.apache.ctakes.lvg.ae
Parent class: org.apache.uima.fit.component.JCasAnnotator_ImplBase
Dependencies: Section, Base Token
Parameter | Description | Class | Required | Default |
---|---|---|---|---|
CmdCacheFileLocation | File with stored cache of canonical forms | String | No | org/apache/ctakes/lvg/ 2005_norm.voc |
CmdCacheFrequencyCutoff | Minimum frequency required for loading from cache | int | No | 20 |
ExclusionSet | Words to exclude when doing LVG normalization | String[] | No | |
LemmaCacheFileLocation | Path to lemma cache file -- if useLemmaCache and postLemmas are true | String | No | org/apache/ctakes/lvg/ 2005_lemma.voc |
LemmaCacheFrequencyCutoff | Threshold for the frequency of a lemma to be loaded into the cache | int | No | 20 |
PostLemmas | Whether to extract the lexical variants and write to cas (creates large files) | boolean | No | false |
SegmentsToSkip | Segment IDs to skip during processing | String[] | No | |
UseCmdCache | Use cache to track canonical forms | boolean | No | false |
UseLemmaCache | Whether to use a cache for lemmas | boolean | No | false |
UseSegments | Whether to use segments found in upstream cTAKES components | boolean | No | false |
XeroxTreebankMap | Mapping from Xerox parts of speech to Treebank equivalents | String[] | No |
Adds canonical form of Base Tokens.
Source class: LvgBaseTokenAnnotator
Source package: org.apache.ctakes.lvg.ae
Parent class: org.apache.uima.analysis_component.JCasAnnotator_ImplBase
Dependencies: Section, Base Token
No available configuration parameters.
Annotates Lexical Variants for terms with attempted thread safety.
Source class: ThreadSafeLvg
Source package: org.apache.ctakes.lvg.ae
Parent class: org.apache.uima.fit.component.JCasAnnotator_ImplBase
Dependencies: Base Token
No available configuration parameters.