ctakes lvg - apache/ctakes GitHub Wiki
Adds canonical form of words.
Source class: LvgAnnotator
Source package: org.apache.ctakes.lvg.ae
Parent class: org.apache.uima.fit.component.JCasAnnotator_ImplBase
Dependencies: Section, Base Token
| Parameter | Description | Class | Required | Default |
|---|---|---|---|---|
| CmdCacheFileLocation | File with stored cache of canonical forms | String | No | org/apache/ctakes/lvg/ 2005_norm.voc |
| CmdCacheFrequencyCutoff | Minimum frequency required for loading from cache | int | No | 20 |
| ExclusionSet | Words to exclude when doing LVG normalization | String[] | No | |
| LemmaCacheFileLocation | Path to lemma cache file -- if useLemmaCache and postLemmas are true | String | No | org/apache/ctakes/lvg/ 2005_lemma.voc |
| LemmaCacheFrequencyCutoff | Threshold for the frequency of a lemma to be loaded into the cache | int | No | 20 |
| PostLemmas | Whether to extract the lexical variants and write to cas (creates large files) | boolean | No | false |
| SegmentsToSkip | Segment IDs to skip during processing | String[] | No | |
| UseCmdCache | Use cache to track canonical forms | boolean | No | false |
| UseLemmaCache | Whether to use a cache for lemmas | boolean | No | false |
| UseSegments | Whether to use segments found in upstream cTAKES components | boolean | No | false |
| XeroxTreebankMap | Mapping from Xerox parts of speech to Treebank equivalents | String[] | No |
Adds canonical form of Base Tokens.
Source class: LvgBaseTokenAnnotator
Source package: org.apache.ctakes.lvg.ae
Parent class: org.apache.uima.analysis_component.JCasAnnotator_ImplBase
Dependencies: Section, Base Token
No available configuration parameters.
Annotates Lexical Variants for terms with attempted thread safety.
Source class: ThreadSafeLvg
Source package: org.apache.ctakes.lvg.ae
Parent class: org.apache.uima.fit.component.JCasAnnotator_ImplBase
Dependencies: Base Token
No available configuration parameters.