ctakes relation extractor - apache/ctakes GitHub Wiki
The relation extractor is designed to annotation relations between certain Event, Entity and Modifier annotations.
There are currently models trained for detecting body site and severity using machine learning with a model trained on manually annotated clinical data.
Collection Readers
Annotation Engines
Utilities
Piper Files
Reads document texts and annotations from XMI files specified in a provided list.
Source class: XMIReader
Source package: org.apache.ctakes.relationextractor.eval
Parent class: org.apache.uima.fit.component.JCasCollectionReader_ImplBase
Products: Document Id
Parameter | Description | Class | Required | Default |
---|---|---|---|---|
files | The XMI files to be loaded | List | Yes |
Annotates Causal relations in sentences.
Source class: CausesBringsAboutRelationExtractorAnnotator
Source package: org.apache.ctakes.relationextractor.ae
Parent class: org.apache.ctakes.relationextractor.ae.RelationExtractorAnnotator
Dependencies: Sentence, Identified Annotation
Products: Generic Relation
Parameter | Description | Class | Required | Default |
---|---|---|---|---|
classifierFactoryClassName | provides the full name of the ClassifierFactory class to be used. | String | No | org.cleartk.ml.jar. JarClassifierFactory |
dataWriterFactoryClassName | provides the full name of the DataWriterFactory class to be used. | String | No | org.cleartk.ml.jar. DefaultDataWriterFactory |
isTraining | determines whether this annotator is writing training data or using a classifier to annotate. Normally inferred automatically based on whether or not a DataWriterFactory class has been set. | Boolean | No | |
ProbabilityOfKeepingANegativeExample | probability that a negative example should be retained for training | double | No |
Annotates Degree Of relations.
Source class: DegreeOfRelationExtractorAnnotator
Source package: org.apache.ctakes.relationextractor.ae
Parent class: org.apache.ctakes.relationextractor.ae.RelationExtractorAnnotator
Dependencies: Sentence, Identified Annotation
Products: Degree Relation
Parameter | Description | Class | Required | Default |
---|---|---|---|---|
classifierFactoryClassName | provides the full name of the ClassifierFactory class to be used. | String | No | org.cleartk.ml.jar. JarClassifierFactory |
dataWriterFactoryClassName | provides the full name of the DataWriterFactory class to be used. | String | No | org.cleartk.ml.jar. DefaultDataWriterFactory |
isTraining | determines whether this annotator is writing training data or using a classifier to annotate. Normally inferred automatically based on whether or not a DataWriterFactory class has been set. | Boolean | No | |
ProbabilityOfKeepingANegativeExample | probability that a negative example should be retained for training | double | No |
Annotates Degree Of relations in sentences containing a single entity mention of a valid degree_of type and a single modifier.
Source class: Baseline1DegreeOfRelationExtractorAnnotator
Source package: org.apache.ctakes.relationextractor.ae.baselines
Parent class: org.apache.ctakes.relationextractor.ae.DegreeOfRelationExtractorAnnotator
Dependencies: Sentence, Identified Annotation
Products: Degree Relation
Parameter | Description | Class | Required | Default |
---|---|---|---|---|
classifierFactoryClassName | provides the full name of the ClassifierFactory class to be used. | String | No | org.cleartk.ml.jar. JarClassifierFactory |
dataWriterFactoryClassName | provides the full name of the DataWriterFactory class to be used. | String | No | org.cleartk.ml.jar. DefaultDataWriterFactory |
isTraining | determines whether this annotator is writing training data or using a classifier to annotate. Normally inferred automatically based on whether or not a DataWriterFactory class has been set. | Boolean | No | |
ProbabilityOfKeepingANegativeExample | probability that a negative example should be retained for training | double | No |
Annotates Degree Of relations between two shortest-distance entities in sentences with multiple modifiers.
Source class: Baseline2DegreeOfRelationExtractorAnnotator
Source package: org.apache.ctakes.relationextractor.ae.baselines
Parent class: org.apache.ctakes.relationextractor.ae.RelationExtractorAnnotator
Dependencies: Sentence, Identified Annotation
Products: Degree Relation
Parameter | Description | Class | Required | Default |
---|---|---|---|---|
classifierFactoryClassName | provides the full name of the ClassifierFactory class to be used. | String | No | org.cleartk.ml.jar. JarClassifierFactory |
dataWriterFactoryClassName | provides the full name of the DataWriterFactory class to be used. | String | No | org.cleartk.ml.jar. DefaultDataWriterFactory |
isTraining | determines whether this annotator is writing training data or using a classifier to annotate. Normally inferred automatically based on whether or not a DataWriterFactory class has been set. | Boolean | No | |
ProbabilityOfKeepingANegativeExample | probability that a negative example should be retained for training | double | No |
Annotates Degree Of relations between two shortest-distance entities in sentences as long as there is no intervening modifier.
Source class: Baseline3DegreeOfRelationExtractorAnnotator
Source package: org.apache.ctakes.relationextractor.ae.baselines
Parent class: org.apache.ctakes.relationextractor.ae.RelationExtractorAnnotator
Dependencies: Sentence, Identified Annotation
Products: Degree Relation
Parameter | Description | Class | Required | Default |
---|---|---|---|---|
classifierFactoryClassName | provides the full name of the ClassifierFactory class to be used. | String | No | org.cleartk.ml.jar. JarClassifierFactory |
dataWriterFactoryClassName | provides the full name of the DataWriterFactory class to be used. | String | No | org.cleartk.ml.jar. DefaultDataWriterFactory |
isTraining | determines whether this annotator is writing training data or using a classifier to annotate. Normally inferred automatically based on whether or not a DataWriterFactory class has been set. | Boolean | No | |
ProbabilityOfKeepingANegativeExample | probability that a negative example should be retained for training | double | No |
Annotates Degree Of relations between two entities whenever they are enclosed within the same noun phrase.
Source class: Baseline4DegreeOfRelationExtractorAnnotator
Source package: org.apache.ctakes.relationextractor.ae.baselines
Parent class: org.apache.ctakes.relationextractor.ae.DegreeOfRelationExtractorAnnotator
Dependencies: Sentence, Identified Annotation
Products: Degree Relation
Parameter | Description | Class | Required | Default |
---|---|---|---|---|
classifierFactoryClassName | provides the full name of the ClassifierFactory class to be used. | String | No | org.cleartk.ml.jar. JarClassifierFactory |
dataWriterFactoryClassName | provides the full name of the DataWriterFactory class to be used. | String | No | org.cleartk.ml.jar. DefaultDataWriterFactory |
isTraining | determines whether this annotator is writing training data or using a classifier to annotate. Normally inferred automatically based on whether or not a DataWriterFactory class has been set. | Boolean | No | |
ProbabilityOfKeepingANegativeExample | probability that a negative example should be retained for training | double | No |
Annotates Location Of relations.
Source class: LocationOfRelationExtractorAnnotator
Source package: org.apache.ctakes.relationextractor.ae
Parent class: org.apache.ctakes.relationextractor.ae.RelationExtractorAnnotator
Dependencies: Sentence, Identified Annotation
Products: Location Relation
Parameter | Description | Class | Required | Default |
---|---|---|---|---|
classifierFactoryClassName | provides the full name of the ClassifierFactory class to be used. | String | No | org.cleartk.ml.jar. JarClassifierFactory |
dataWriterFactoryClassName | provides the full name of the DataWriterFactory class to be used. | String | No | org.cleartk.ml.jar. DefaultDataWriterFactory |
isTraining | determines whether this annotator is writing training data or using a classifier to annotate. Normally inferred automatically based on whether or not a DataWriterFactory class has been set. | Boolean | No | |
ProbabilityOfKeepingANegativeExample | probability that a negative example should be retained for training | double | No |
Annotates Location Of relations.
Source class: ThreadSafeLocationExtractor
Source package: org.apache.ctakes.relationextractor.concurrent
Parent class: org.apache.ctakes.relationextractor.ae.LocationOfRelationExtractorAnnotator
Dependencies: Sentence, Identified Annotation
Products: Location Relation
Parameter | Description | Class | Required | Default |
---|---|---|---|---|
classifierFactoryClassName | provides the full name of the ClassifierFactory class to be used. | String | No | org.cleartk.ml.jar. JarClassifierFactory |
dataWriterFactoryClassName | provides the full name of the DataWriterFactory class to be used. | String | No | org.cleartk.ml.jar. DefaultDataWriterFactory |
isTraining | determines whether this annotator is writing training data or using a classifier to annotate. Normally inferred automatically based on whether or not a DataWriterFactory class has been set. | Boolean | No | |
ProbabilityOfKeepingANegativeExample | probability that a negative example should be retained for training | double | No |
Annotates Location Of relations in sentences containing exactly two entities (where the entities are of the correct types).
Source class: Baseline1EntityMentionPairRelationExtractorAnnotator
Source package: org.apache.ctakes.relationextractor.ae.baselines
Parent class: org.apache.ctakes.relationextractor.ae.RelationExtractorAnnotator
Dependencies: Sentence, Identified Annotation
Products: Location Relation
Parameter | Description | Class | Required | Default |
---|---|---|---|---|
classifierFactoryClassName | provides the full name of the ClassifierFactory class to be used. | String | No | org.cleartk.ml.jar. JarClassifierFactory |
dataWriterFactoryClassName | provides the full name of the DataWriterFactory class to be used. | String | No | org.cleartk.ml.jar. DefaultDataWriterFactory |
isTraining | determines whether this annotator is writing training data or using a classifier to annotate. Normally inferred automatically based on whether or not a DataWriterFactory class has been set. | Boolean | No | |
ProbabilityOfKeepingANegativeExample | probability that a negative example should be retained for training | double | No |
Annotates Location Of relations in sentences containing with multiple anatomical sites.
Source class: Baseline2EntityMentionPairRelationExtractorAnnotator
Source package: org.apache.ctakes.relationextractor.ae.baselines
Parent class: org.apache.ctakes.relationextractor.ae.RelationExtractorAnnotator
Dependencies: Sentence, Identified Annotation
Products: Location Relation
Parameter | Description | Class | Required | Default |
---|---|---|---|---|
classifierFactoryClassName | provides the full name of the ClassifierFactory class to be used. | String | No | org.cleartk.ml.jar. JarClassifierFactory |
dataWriterFactoryClassName | provides the full name of the DataWriterFactory class to be used. | String | No | org.cleartk.ml.jar. DefaultDataWriterFactory |
isTraining | determines whether this annotator is writing training data or using a classifier to annotate. Normally inferred automatically based on whether or not a DataWriterFactory class has been set. | Boolean | No | |
ProbabilityOfKeepingANegativeExample | probability that a negative example should be retained for training | double | No |
Links each anatomical site with the closest entity of a type that's suitable for location_of, as long as there is no intervening anatomical site.
Source class: Baseline3EntityMentionPairRelationExtractorAnnotator
Source package: org.apache.ctakes.relationextractor.ae.baselines
Parent class: org.apache.ctakes.relationextractor.ae.RelationExtractorAnnotator
Dependencies: Sentence, Identified Annotation
Products: Location Relation
Parameter | Description | Class | Required | Default |
---|---|---|---|---|
classifierFactoryClassName | provides the full name of the ClassifierFactory class to be used. | String | No | org.cleartk.ml.jar. JarClassifierFactory |
dataWriterFactoryClassName | provides the full name of the DataWriterFactory class to be used. | String | No | org.cleartk.ml.jar. DefaultDataWriterFactory |
isTraining | determines whether this annotator is writing training data or using a classifier to annotate. Normally inferred automatically based on whether or not a DataWriterFactory class has been set. | Boolean | No | |
ProbabilityOfKeepingANegativeExample | probability that a negative example should be retained for training | double | No |
Annotates Location Of relations between two entities whenever they are enclosed within the same noun phrase.
Source class: Baseline4EntityMentionPairRelationExtractorAnnotator
Source package: org.apache.ctakes.relationextractor.ae.baselines
Parent class: org.apache.ctakes.relationextractor.ae.RelationExtractorAnnotator
Dependencies: Sentence, Identified Annotation
Products: Location Relation
Parameter | Description | Class | Required | Default |
---|---|---|---|---|
classifierFactoryClassName | provides the full name of the ClassifierFactory class to be used. | String | No | org.cleartk.ml.jar. JarClassifierFactory |
dataWriterFactoryClassName | provides the full name of the DataWriterFactory class to be used. | String | No | org.cleartk.ml.jar. DefaultDataWriterFactory |
isTraining | determines whether this annotator is writing training data or using a classifier to annotate. Normally inferred automatically based on whether or not a DataWriterFactory class has been set. | Boolean | No | |
ProbabilityOfKeepingANegativeExample | probability that a negative example should be retained for training | double | No |
Annotates Manages / Treats relations.
Source class: ManagesTreatsRelationExtractorAnnotator
Source package: org.apache.ctakes.relationextractor.ae
Parent class: org.apache.ctakes.relationextractor.ae.RelationExtractorAnnotator
Dependencies: Sentence, Identified Annotation
Products: Generic Relation
Parameter | Description | Class | Required | Default |
---|---|---|---|---|
classifierFactoryClassName | provides the full name of the ClassifierFactory class to be used. | String | No | org.cleartk.ml.jar. JarClassifierFactory |
dataWriterFactoryClassName | provides the full name of the DataWriterFactory class to be used. | String | No | org.cleartk.ml.jar. DefaultDataWriterFactory |
isTraining | determines whether this annotator is writing training data or using a classifier to annotate. Normally inferred automatically based on whether or not a DataWriterFactory class has been set. | Boolean | No | |
ProbabilityOfKeepingANegativeExample | probability that a negative example should be retained for training | double | No |
Annotates Manifestation Of relations.
Source class: ManifestationOfRelationExtractorAnnotator
Source package: org.apache.ctakes.relationextractor.ae
Parent class: org.apache.ctakes.relationextractor.ae.RelationExtractorAnnotator
Dependencies: Sentence, Identified Annotation
Products: Generic Relation
Parameter | Description | Class | Required | Default |
---|---|---|---|---|
classifierFactoryClassName | provides the full name of the ClassifierFactory class to be used. | String | No | org.cleartk.ml.jar. JarClassifierFactory |
dataWriterFactoryClassName | provides the full name of the DataWriterFactory class to be used. | String | No | org.cleartk.ml.jar. DefaultDataWriterFactory |
isTraining | determines whether this annotator is writing training data or using a classifier to annotate. Normally inferred automatically based on whether or not a DataWriterFactory class has been set. | Boolean | No | |
ProbabilityOfKeepingANegativeExample | probability that a negative example should be retained for training | double | No |
Annotates Modifiers and Chunks.
Source class: ModifierExtractorAnnotator
Source package: org.apache.ctakes.relationextractor.ae
Parent class: org.cleartk.ml.CleartkAnnotator
Dependencies: Base Token, Sentence
Products: Identified Annotation, Chunk
Parameter | Description | Class | Required | Default |
---|---|---|---|---|
classifierFactoryClassName | provides the full name of the ClassifierFactory class to be used. | String | No | org.cleartk.ml.jar. JarClassifierFactory |
dataWriterFactoryClassName | provides the full name of the DataWriterFactory class to be used. | String | No | org.cleartk.ml.jar. DefaultDataWriterFactory |
isTraining | determines whether this annotator is writing training data or using a classifier to annotate. Normally inferred automatically based on whether or not a DataWriterFactory class has been set. | Boolean | No |
Annotates Degree Of relations.
Source class: ThreadSafeDegreeExtractor
Source package: org.apache.ctakes.relationextractor.concurrent
Parent class: org.apache.ctakes.relationextractor.ae.DegreeOfRelationExtractorAnnotator
Dependencies: Sentence, Identified Annotation
Products: Degree Relation
Parameter | Description | Class | Required | Default |
---|---|---|---|---|
classifierFactoryClassName | provides the full name of the ClassifierFactory class to be used. | String | No | org.cleartk.ml.jar. JarClassifierFactory |
dataWriterFactoryClassName | provides the full name of the DataWriterFactory class to be used. | String | No | org.cleartk.ml.jar. DefaultDataWriterFactory |
isTraining | determines whether this annotator is writing training data or using a classifier to annotate. Normally inferred automatically based on whether or not a DataWriterFactory class has been set. | Boolean | No | |
ProbabilityOfKeepingANegativeExample | probability that a negative example should be retained for training | double | No |
Annotates Modifiers and Chunks.
Source class: ThreadSafeModifierExtractor
Source package: org.apache.ctakes.relationextractor.concurrent
Parent class: org.apache.ctakes.relationextractor.ae.ModifierExtractorAnnotator
Dependencies: Base Token, Sentence
Products: Identified Annotation, Chunk
Parameter | Description | Class | Required | Default |
---|---|---|---|---|
classifierFactoryClassName | provides the full name of the ClassifierFactory class to be used. | String | No | org.cleartk.ml.jar. JarClassifierFactory |
dataWriterFactoryClassName | provides the full name of the DataWriterFactory class to be used. | String | No | org.cleartk.ml.jar. DefaultDataWriterFactory |
isTraining | determines whether this annotator is writing training data or using a classifier to annotate. Normally inferred automatically based on whether or not a DataWriterFactory class has been set. | Boolean | No |
Reads annotations from DeepPhe schema Anafora XML files in a directory.
Source class: MetastasisAnaforaXMLReader
Source package: org.apache.ctakes.relationextractor.metastasis
Parent class: org.apache.uima.fit.component.JCasAnnotator_ImplBase
Products: Identified Annotation, Location Relation
No available configuration parameters.
Copies an annotation type from the Gold view to the System view.
Source class: CopyFromGold
Source package: org.apache.ctakes.relationextractor.eval
Parent class: org.apache.uima.fit.component.JCasAnnotator_ImplBase
Parameter | Description | Class | Required | Default |
---|---|---|---|---|
AnnotationClasses | Class[] | Yes | ||
GoldViewName | String | Yes |
Count various stats such as token and relation counts based on the gold standard data.
Source class: GoldAnnotationStatsCalculator
Source package: org.apache.ctakes.relationextractor.data
Parent class: org.apache.uima.fit.component.JCasAnnotator_ImplBase
Dependencies: Sentence, Base Token, Identified Annotation, Generic Relation, Location Relation, Degree Relation
No available configuration parameters.
Enlarges the text span of an identified annotation based upon part of speech.
Source class: IdentifiedAnnotationExpander
Source package: org.apache.ctakes.relationextractor.ae
Parent class: org.apache.uima.fit.component.JCasAnnotator_ImplBase
Dependencies: Identified Annotation
No available configuration parameters.
Clinical Pipeline with degree-of and location-of relations.
$\textcolor{gray}{\textsf{// Clinical Pipeline with degree-of and location-of relations. }}$
$\textcolor{gray}{\textsf{// Default Clinical Pipeline }}$
$\textcolor{magenta}{\textbf{load}}$ DefaultFastPipeline
$\textcolor{gray}{\textsf{// degree-of, relation-of }}$
$\textcolor{magenta}{\textbf{load}}$ RelationSubPipe
Commands and parameters to create a default relation extraction sub-pipeline.
$\textcolor{gray}{\textsf{// Commands and parameters to create a default relation extraction sub-pipeline. }}$
$\textcolor{gray}{\textsf{// This is not a full pipeline. }}$
$\textcolor{gray}{\textsf{// Modifiers. Use addLogged to log start and finish of processing. There aren't default models, so set specifically }}$
$\textcolor{green}{\textbf{add}}$ ModifierExtractorAnnotator$\textcolor{purple}{\textbf{classifierJarPath}}$ =$\textcolor{violet}{\textsf{/org/apache/ctakes/relation/extractor/models/modifier\_extractor/model.jar}}$
$\textcolor{gray}{\textsf{// Degree of severity, etc. }}$
$\textcolor{green}{\textbf{add}}$ DegreeOfRelationExtractorAnnotator$\textcolor{purple}{\textbf{classifierJarPath}}$ =$\textcolor{violet}{\textsf{/org/apache/ctakes/relation/extractor/models/degree\_of/model.jar}}$
$\textcolor{gray}{\textsf{// Location. }}$
$\textcolor{green}{\textbf{add}}$ LocationOfRelationExtractorAnnotator$\textcolor{purple}{\textbf{classifierJarPath}}$ =$\textcolor{violet}{\textsf{/org/apache/ctakes/relation/extractor/models/location\_of/model.jar}}$
Clinical Pipeline with section, paragraph and list detection and degree-of and location-of relations
$\textcolor{gray}{\textsf{// Clinical Pipeline with section, paragraph and list detection and degree-of and location-of relations }}$
$\textcolor{gray}{\textsf{// Default Clinical Pipeline with section, paragraph and list detection }}$
$\textcolor{magenta}{\textbf{load}}$ SectionedFastPipeline
$\textcolor{gray}{\textsf{// degree-of, relation-of }}$
$\textcolor{magenta}{\textbf{load}}$ RelationSubPipe
Thread Safe Default Clinical Pipeline with degree-of and location-of relations
$\textcolor{gray}{\textsf{// Thread Safe Default Clinical Pipeline with degree-of and location-of relations }}$
$\textcolor{gray}{\textsf{// Default Clinical Pipeline }}$
$\textcolor{magenta}{\textbf{load}}$ TsDefaultFastPipeline
$\textcolor{gray}{\textsf{// degree-of, relation-of }}$
$\textcolor{magenta}{\textbf{load}}$ TsRelationSubPipe
Commands and parameters to create a relation extraction sub-pipeline.
$\textcolor{gray}{\textsf{// Commands and parameters to create a relation extraction sub-pipeline. }}$
$\textcolor{gray}{\textsf{// This is not a full pipeline. }}$
$\textcolor{gray}{\textsf{// Modifiers. Use addLogged to log start and finish of processing. There aren't default models, so set specifically }}$
$\textcolor{green}{\textbf{add}}$ $\textcolor{blue}{\textsf{concurrent.ThreadSafeModifierExtractor}}$ $\textcolor{purple}{\textbf{classifierJarPath}}$ =$\textcolor{violet}{\textsf{/org/apache/ctakes/relation/extractor/models/modifier\_extractor/model.jar}}$
$\textcolor{gray}{\textsf{// Degree of severity, etc. }}$
$\textcolor{green}{\textbf{add}}$ $\textcolor{blue}{\textsf{concurrent.ThreadSafeDegreeExtractor}}$ $\textcolor{purple}{\textbf{classifierJarPath}}$ =$\textcolor{violet}{\textsf{/org/apache/ctakes/relation/extractor/models/degree\_of/model.jar}}$
$\textcolor{gray}{\textsf{// Location. }}$
$\textcolor{green}{\textbf{add}}$ $\textcolor{blue}{\textsf{concurrent.ThreadSafeLocationExtractor}}$ $\textcolor{purple}{\textbf{classifierJarPath}}$ =$\textcolor{violet}{\textsf{/org/apache/ctakes/relation/extractor/models/location\_of/model.jar}}$
Thread Safe Clinical Pipeline with section, paragraph and list detection and degree-of and location-of relations.
Ts Sectioned Relation Pipeline
$\textcolor{gray}{\textsf{// Thread Safe Clinical Pipeline with section, paragraph and list detection and degree-of and location-of relations. }}$
$\textcolor{gray}{\textsf{// Default Clinical Pipeline with section, paragraph and list detection }}$
$\textcolor{magenta}{\textbf{load}}$ TsSectionedFastPipeline
$\textcolor{gray}{\textsf{// degree-of, relation-of }}$
$\textcolor{magenta}{\textbf{load}}$ TsRelationSubPipe