ctakes coreference - apache/ctakes GitHub Wiki
Performs coreference resolution for several types of coreference, excluding person mentions and some rare pronouns.
Most basically, the output of this module will be several data types added to the CAS representing the output of the system. These types are as follows:
Markable - Subtyped into NEMarkable (Named entities), PronounMarkable (pronouns), and DemMarkable (certain demonstrative and relative pronouns), these are automatically discovered and taken as input to the coreference resolution algorithm. These are types required above the SHARP types for entities due to some special considerations with span differences and differing type inheritances.
CoreferenceRelation - A type containing two Markables that are believed to co-refer. A CoreferenceRelation has two arguments of type RelationArgument, with a role field containing a value of either "anaphor" or "antecedent." There is also an "argument" field which contains the Markable fulfilling the role.
CollectionTextRelation - A linked list containing chains of Annotations that the classifier says refer to the same entity. This is derived from the set of CoreferenceRelation elements described above. It contains a list of UIMA type NonEmptyFSList, as well as a size field. For singletons there are lists of length 1. For actual chains the size will be different, and each node in the list is of type NonEmptyFSList. That type has a head and tail field. The head points to the data for the node, which is a Markable, and the tail points to the next element in the list, or to a node of type EmptyFSList when the chain is complete.
Annotation Engines
Output Writers
Utilities
Piper Files
Abstract Engine to take action on a patient level instead of document level.
Source class: PatientMentionClusterCoreferencer
Source package: org.apache.ctakes.coreference.ae
Parent class: org.apache.ctakes.core.patient.AbstractPatientConsumer
Parameter | Description | Class | Required | Default |
---|---|---|---|---|
RemovePatient | The Patient Consumer should remove the patient from the cache when finished. | boolean | Yes | true |
EngineName | The Name to use for this Patient Consumer. Must be unique in the pipeline | String | No |
Abstract Engine to take action on a patient level instead of document level.
Source class: PatientScoringWriter
Source package: org.apache.ctakes.coreference.ae
Parent class: org.apache.ctakes.core.patient.AbstractPatientConsumer
Parameter | Description | Class | Required | Default |
---|---|---|---|---|
Config | Descriptive string representing configuration of this run | String | Yes | |
OutputDirectory | Name of chain file in CoNLL format | String | Yes | |
RemovePatient | The Patient Consumer should remove the patient from the cache when finished. | boolean | Yes | true |
EngineName | The Name to use for this Patient Consumer. Must be unique in the pipeline | String | No |
Abstract Engine to take action on a patient level instead of document level.
Source class: ThymeAnaforaCrossDocCorefXmlReader
Source package: org.apache.ctakes.coreference.ae
Parent class: org.apache.ctakes.core.patient.AbstractPatientConsumer
Parameter | Description | Class | Required | Default |
---|---|---|---|---|
IsTraining | Whether this reader is being called at training or test time, and thus whether gold annotations should be put in document or gold CAS | boolean | Yes | |
RemovePatient | The Patient Consumer should remove the patient from the cache when finished. | boolean | Yes | true |
XmlDirectory | Directory containing cross-document coreference annotations | String | Yes | |
EngineName | The Name to use for this Patient Consumer. Must be unique in the pipeline | String | No |
Coreference annotator using mention-synchronous paradigm.
Source class: MentionClusterRankingCoreferenceAnnotator
Source package: org.apache.ctakes.coreference.ae
Parent class: org.cleartk.ml.CleartkAnnotator
Dependencies: Base Token, Sentence, Section, Paragraph, Identified Annotation, Markable
Products: Coreference Relation
Parameter | Description | Class | Required | Default |
---|---|---|---|---|
classifierFactoryClassName | provides the full name of the ClassifierFactory class to be used. | String | No | org.cleartk.ml.jar. JarClassifierFactory |
dataWriterFactoryClassName | provides the full name of the DataWriterFactory class to be used. | String | No | org.cleartk.ml.jar. DefaultDataWriterFactory |
isTraining | determines whether this annotator is writing training data or using a classifier to annotate. Normally inferred automatically based on whether or not a DataWriterFactory class has been set. | Boolean | No | |
ProbabilityOfKeepingANegativeExample | probability that a negative example should be retained for training | double | No |
Coreference annotator using mention-synchronous paradigm.
Source class: MentionClusterCoreferenceAnnotator
Source package: org.apache.ctakes.coreference.ae
Parent class: org.cleartk.ml.CleartkAnnotator
Dependencies: Base Token, Sentence, Section, Identified Annotation, Markable
Products: Coreference Relation
Parameter | Description | Class | Required | Default |
---|---|---|---|---|
classifierFactoryClassName | provides the full name of the ClassifierFactory class to be used. | String | No | org.cleartk.ml.jar. JarClassifierFactory |
dataWriterFactoryClassName | provides the full name of the DataWriterFactory class to be used. | String | No | org.cleartk.ml.jar. DefaultDataWriterFactory |
isTraining | determines whether this annotator is writing training data or using a classifier to annotate. Normally inferred automatically based on whether or not a DataWriterFactory class has been set. | Boolean | No | |
ProbabilityOfKeepingANegativeExample | probability that a negative example should be retained for training | double | No | |
SingleDocument | Specify that coreferences should be sought for a single document. | boolean | No | true |
UseExistingEncoders | Whether to use encoders in output directory during data writing; if we are making multiple calls | boolean | No |
Annotates Event Coreferences.
Source class: EventCoreferenceAnnotator
Source package: org.apache.ctakes.coreference.ae
Parent class: org.apache.ctakes.relationextractor.ae.RelationExtractorAnnotator
Dependencies: Section, Dependency Node, Identified Annotation, Markable
Products: Coreference Relation
Parameter | Description | Class | Required | Default |
---|---|---|---|---|
classifierFactoryClassName | provides the full name of the ClassifierFactory class to be used. | String | No | org.cleartk.ml.jar. JarClassifierFactory |
dataWriterFactoryClassName | provides the full name of the DataWriterFactory class to be used. | String | No | org.cleartk.ml.jar. DefaultDataWriterFactory |
isTraining | determines whether this annotator is writing training data or using a classifier to annotate. Normally inferred automatically based on whether or not a DataWriterFactory class has been set. | Boolean | No | |
PararaphSimilarity | Similarity required to pair paragraphs for coreference | double | No | |
ProbabilityOfKeepingANegativeExample | probability that a negative example should be retained for training | double | No | |
ScoreAllPairs | Whether to score all pairs (as in a feature detector | boolean | No | |
SentenceDistance | Number of sentences allowed between coreferent mentions | int | No |
Annotates Markables for use by Coreference Annotators.
Source class: DeterministicMarkableAnnotator
Source package: org.apache.ctakes.coreference.ae
Parent class: org.apache.uima.fit.component.JCasAnnotator_ImplBase
Dependencies: Section, Sentence, Identified Annotation, Dependency Node, Tree Node, Timex
Products: Markable
No available configuration parameters.
Annotates Markables using a word list.
Source class: MipacqMarkableCreator
Source package: org.apache.ctakes.coreference.ae
Parent class: org.apache.uima.fit.component.JCasAnnotator_ImplBase
Dependencies: Base Token, Chunk
Products: Markable
Parameter | Description | Class | Required | Default |
---|---|---|---|---|
cogVeds | File | No | org/apache/ctakes/coreference/ cogVeds.txt | |
modalAdj | File | No | org/apache/ctakes/coreference/ modalAdjs.txt | |
otherVerbs | File | No | org/apache/ctakes/coreference/ otherVerbs.txt |
Annotates Markable Salience.
Source class: MarkableSalienceAnnotator
Source package: org.apache.ctakes.coreference.ae
Parent class: org.cleartk.ml.CleartkAnnotator
Dependencies: Paragraph, Sentence, Markable, Dependency Node
Usables: Identified Annotation
Parameter | Description | Class | Required | Default |
---|---|---|---|---|
classifierFactoryClassName | provides the full name of the ClassifierFactory class to be used. | String | No | org.cleartk.ml.jar. JarClassifierFactory |
dataWriterFactoryClassName | provides the full name of the DataWriterFactory class to be used. | String | No | org.cleartk.ml.jar. DefaultDataWriterFactory |
isTraining | determines whether this annotator is writing training data or using a classifier to annotate. Normally inferred automatically based on whether or not a DataWriterFactory class has been set. | Boolean | No |
Annotates coreferences between person mentions.
Source class: PersonChainAnnotator
Source package: org.apache.ctakes.coreference.ae
Parent class: org.apache.uima.fit.component.JCasAnnotator_ImplBase
Dependencies: Base Token
Products: Markable, Coreference Relation
No available configuration parameters.
Creates Coreferences using a Simple Vector Machine.
Source class: MipacqSvmChainCreator
Source package: org.apache.ctakes.coreference.ae
Parent class: org.apache.uima.fit.component.JCasAnnotator_ImplBase
Dependencies: Document Id, Markable
Products: Coreference Relation
Parameter | Description | Class | Required | Default |
---|---|---|---|---|
FragsFile | File | No | org/apache/ctakes/coreference/models/ frags.txt | |
ModelFile | File | No | org/apache/ctakes/coreference/models/ ne.mayo.rbf.model | |
StopWords | File | No | org/apache/ctakes/coreference/models/ stop.txt |
Coreference annotator using mention-synchronous paradigm.
Source class: ThreadSafeMentionClusterCoreferencer
Source package: org.apache.ctakes.coreference.concurrent
Parent class: org.apache.ctakes.coreference.ae.MentionClusterCoreferenceAnnotator
Dependencies: Base Token, Sentence, Section, Identified Annotation, Markable
Products: Coreference Relation
Parameter | Description | Class | Required | Default |
---|---|---|---|---|
classifierFactoryClassName | provides the full name of the ClassifierFactory class to be used. | String | No | org.cleartk.ml.jar. JarClassifierFactory |
dataWriterFactoryClassName | provides the full name of the DataWriterFactory class to be used. | String | No | org.cleartk.ml.jar. DefaultDataWriterFactory |
isTraining | determines whether this annotator is writing training data or using a classifier to annotate. Normally inferred automatically based on whether or not a DataWriterFactory class has been set. | Boolean | No | |
ProbabilityOfKeepingANegativeExample | probability that a negative example should be retained for training | double | No | |
SingleDocument | Specify that coreferences should be sought for a single document. | boolean | No | true |
UseExistingEncoders | Whether to use encoders in output directory during data writing; if we are making multiple calls | boolean | No |
Annotates Markable Salience.
Source class: ThreadSafeMarkableSalienceAnnotator
Source package: org.apache.ctakes.coreference.concurrent
Parent class: org.apache.ctakes.coreference.ae.MarkableSalienceAnnotator
Dependencies: Paragraph, Sentence, Markable, Dependency Node
Usables: Identified Annotation
Parameter | Description | Class | Required | Default |
---|---|---|---|---|
classifierFactoryClassName | provides the full name of the ClassifierFactory class to be used. | String | No | org.cleartk.ml.jar. JarClassifierFactory |
dataWriterFactoryClassName | provides the full name of the DataWriterFactory class to be used. | String | No | org.cleartk.ml.jar. DefaultDataWriterFactory |
isTraining | determines whether this annotator is writing training data or using a classifier to annotate. Normally inferred automatically based on whether or not a DataWriterFactory class has been set. | Boolean | No |
Write ODIE Vector File.
Source class: ODIEVectorFileWriter
Source package: org.apache.ctakes.coreference.cc
Parent class: org.apache.uima.analysis_component.JCasAnnotator_ImplBase
Dependencies: Document Id, Markable
No available configuration parameters.
Copy relations from Gold viewCas to current
Source class: CopyCoreferenceRelations
Source package: org.apache.ctakes.coreference.ae
Parent class: org.apache.uima.fit.component.JCasAnnotator_ImplBase
Dependencies: Markable, Coreference Relation, Dependency Node
Parameter | Description | Class | Required | Default |
---|---|---|---|---|
Dropout | boolean | No | ||
GoldViewName | View containing gold standard annotations | String | No |
Writes scores of system coreference chains compared to chains in a Gold View.
Source class: CoreferenceChainScoringOutput
Source package: org.apache.ctakes.coreference.ae
Parent class: org.apache.uima.fit.component.JCasAnnotator_ImplBase
Dependencies: Markable, Coreference Relation
Parameter | Description | Class | Required | Default |
---|---|---|---|---|
Append | Whether output should be appended or newly created | boolean | Yes | |
Config | Descriptive string representing configuration of this run | String | Yes | |
OutputDirectory | Name of chain file in CoNLL format | String | Yes | |
GoldViewName | Name of gold view in jcas | String | No |
Expands Markable text spans to cover a noun phrase.
Source class: MipacqMarkableExpander
Source package: org.apache.ctakes.coreference.ae
Parent class: org.apache.uima.fit.component.JCasAnnotator_ImplBase
Dependencies: Markable, Chunk
No available configuration parameters.
Pairs Markables using a stop word list.
Source class: MipacqMarkablePairGenerator
Source package: org.apache.ctakes.coreference.ae
Parent class: org.apache.uima.fit.component.JCasAnnotator_ImplBase
Dependencies: Sentence, Markable, Chunk
Usables: Identified Annotation
Parameter | Description | Class | Required | Default |
---|---|---|---|---|
StopFile | File | No | org/apache/ctakes/coreference/models/ stop.txt |
Commands and parameters to create a default coreference processing sub-pipeline.
$\textcolor{gray}{\textsf{// Commands and parameters to create a default coreference processing sub-pipeline. }}$
$\textcolor{gray}{\textsf{// This is not a full pipeline. }}$
$\textcolor{gray}{\textsf{// A Dependency Parser is necessary, but is usually added for assertion so don't add one here }}$
$\textcolor{gray}{\textsf{// Constituency Parser adds Terminal Treebank Nodes, needed to create Markables }}$
$\textcolor{green}{\textbf{add}}$ ConstituencyParser
$\textcolor{green}{\textbf{add}}$ DeterministicMarkableAnnotator
$\textcolor{green}{\textbf{addDescription}}$ MarkableSalienceAnnotator /org/apache/ctakes/temporal/models/salience/model.jar
$\textcolor{green}{\textbf{addDescription}}$ MentionClusterCoreferenceAnnotator /org/apache/ctakes/coreference/models/mention-cluster/model.jar
Pipeline with degree-of and location-of relations, events, times, temporal relations, document creation time relations and coreferences.
$\textcolor{gray}{\textsf{// Pipeline with degree-of and location-of relations, events, times, temporal relations, document creation time relations and coreferences. }}$
$\textcolor{gray}{\textsf{// Default Relation and Temporal pipeline }}$
$\textcolor{magenta}{\textbf{load}}$ DefaultRelationTemporalPipeline
$\textcolor{gray}{\textsf{// Coreference resolution }}$
$\textcolor{magenta}{\textbf{load}}$ CorefSubPipe
Pipeline with coreference resolution.
$\textcolor{gray}{\textsf{// Pipeline with coreference resolution. }}$
$\textcolor{gray}{\textsf{// Pipeline }}$
$\textcolor{magenta}{\textbf{load}}$ DefaultFastPipeline
$\textcolor{gray}{\textsf{// Coreference resolution }}$
$\textcolor{magenta}{\textbf{load}}$ CorefSubPipe
Pipeline with degree-of and location-of relations and coreference resolution.
Default Relation Coref Pipeline
$\textcolor{gray}{\textsf{// Pipeline with degree-of and location-of relations and coreference resolution. }}$
$\textcolor{gray}{\textsf{// Pipeline with degree-of and location-of relations }}$
$\textcolor{magenta}{\textbf{load}}$ DefaultRelationPipeline
$\textcolor{gray}{\textsf{// Coreference resolution }}$
$\textcolor{magenta}{\textbf{load}}$ CorefSubPipe
Pipeline with events, times, temporal relations, document creation time relations and coreferences.
Default Temporal Coref Pipeline
$\textcolor{gray}{\textsf{// Pipeline with events, times, temporal relations, document creation time relations and coreferences. }}$
$\textcolor{gray}{\textsf{// Default Temporal pipeline }}$
$\textcolor{magenta}{\textbf{load}}$ DefaultTemporalPipeline
$\textcolor{gray}{\textsf{// Coreference resolution }}$
$\textcolor{magenta}{\textbf{load}}$ CorefSubPipe
Pipeline with section, paragraph and list detection, degree-of and location-of relations ...
$\textcolor{gray}{\textsf{// Pipeline with section, paragraph and list detection, degree-of and location-of relations ... }}$
$\textcolor{gray}{\textsf{// events, times, temporal relations, document creation time relations and coreference resolution. }}$
$\textcolor{gray}{\textsf{// Sectioned Relation and Temporal pipeline }}$
$\textcolor{magenta}{\textbf{load}}$ SectionedRelationTemporalPipeline
$\textcolor{gray}{\textsf{// Coreference resolution }}$
$\textcolor{magenta}{\textbf{load}}$ CorefSubPipe
Pipeline with section, paragraph and list detection and coreference resolution.
$\textcolor{gray}{\textsf{// Pipeline with section, paragraph and list detection and coreference resolution. }}$
$\textcolor{gray}{\textsf{// Pipeline with section, paragraph and list detection }}$
$\textcolor{magenta}{\textbf{load}}$ SectionedFastPipeline
$\textcolor{gray}{\textsf{// Coreference resolution }}$
$\textcolor{magenta}{\textbf{load}}$ CorefSubPipe
Pipeline with section, paragraph and list detection, degree-of and location-of relations and coreferences.
Sectioned Relation Coref Pipeline
$\textcolor{gray}{\textsf{// Pipeline with section, paragraph and list detection, degree-of and location-of relations and coreferences. }}$
$\textcolor{gray}{\textsf{// Pipeline with section, paragraph and list detection with degree-of and location-of relations }}$
$\textcolor{magenta}{\textbf{load}}$ SectionedRelationPipeline
$\textcolor{gray}{\textsf{// Coreference resolution }}$
$\textcolor{magenta}{\textbf{load}}$ CorefSubPipe
Pipeline with section, paragraph and list detection, events, times, temporal relations and document creation time relations.
Sectioned Temporal Coref Pipeline
$\textcolor{gray}{\textsf{// Pipeline with section, paragraph and list detection, events, times, temporal relations and document creation time relations. }}$
$\textcolor{gray}{\textsf{// Pipeline with section, paragraph and list detection and temporal }}$
$\textcolor{magenta}{\textbf{load}}$ SectionedTemporalPipeline
$\textcolor{gray}{\textsf{// Coreference resolution }}$
$\textcolor{magenta}{\textbf{load}}$ CorefSubPipe
Commands and parameters to create a thread-safe default coreference processing sub-pipeline.
$\textcolor{gray}{\textsf{// Commands and parameters to create a thread-safe default coreference processing sub-pipeline. }}$
$\textcolor{gray}{\textsf{// This is not a full pipeline. }}$
$\textcolor{gray}{\textsf{// A Dependency Parser is necessary, but is usually added for assertion so don't add one here }}$
$\textcolor{gray}{\textsf{// Constituency Parser adds Terminal Treebank Nodes, needed to create Markables }}$
$\textcolor{green}{\textbf{add}}$ $\textcolor{blue}{\textsf{concurrent.ThreadSafeConstituencyParser}}$
$\textcolor{green}{\textbf{add}}$ DeterministicMarkableAnnotator
$\textcolor{green}{\textbf{addDescription}}$ $\textcolor{blue}{\textsf{concurrent.ThreadSafeMarkableSalienceAnnotator}}$ /org/apache/ctakes/temporal/models/salience/model.jar
$\textcolor{green}{\textbf{addDescription}}$ $\textcolor{blue}{\textsf{concurrent.ThreadSafeMentionClusterCoreferencer}}$ /org/apache/ctakes/coreference/models/mention-cluster/model.jar
Thread-safe Pipeline with degree-of and location-of relations, events, times, temporal relations, document creation time relations and coreferences.
$\textcolor{gray}{\textsf{// Thread-safe Pipeline with degree-of and location-of relations, events, times, temporal relations, document creation time relations and coreferences. }}$
$\textcolor{gray}{\textsf{// Default Relation and Temporal pipeline }}$
$\textcolor{magenta}{\textbf{load}}$ TsDefaultRelationTemporalPipeline
$\textcolor{gray}{\textsf{// Coreference resolution }}$
$\textcolor{magenta}{\textbf{load}}$ TsCorefSubPipe
Thread-safe Pipeline with coreference resolution.
$\textcolor{gray}{\textsf{// Thread-safe Pipeline with coreference resolution. }}$
$\textcolor{gray}{\textsf{// Pipeline }}$
$\textcolor{magenta}{\textbf{load}}$ TsDefaultFastPipeline
$\textcolor{gray}{\textsf{// Coreference resolution }}$
$\textcolor{magenta}{\textbf{load}}$ TsCorefSubPipe
Thread-safe Pipeline with degree-of and location-of relations and coreference resolution.
Ts Default Relation Coref Pipeline
$\textcolor{gray}{\textsf{// Thread-safe Pipeline with degree-of and location-of relations and coreference resolution. }}$
$\textcolor{gray}{\textsf{// Pipeline with degree-of and location-of relations }}$
$\textcolor{magenta}{\textbf{load}}$ TsDefaultRelationPipeline
$\textcolor{gray}{\textsf{// Coreference resolution }}$
$\textcolor{magenta}{\textbf{load}}$ TsCorefSubPipe
Thread-safe Pipeline with events, times, temporal relations, document creation time relations and coreferences.
Ts Default Temporal Coref Pipeline
$\textcolor{gray}{\textsf{// Thread-safe Pipeline with events, times, temporal relations, document creation time relations and coreferences. }}$
$\textcolor{gray}{\textsf{// Default Temporal pipeline }}$
$\textcolor{magenta}{\textbf{load}}$ TsDefaultTemporalPipeline
$\textcolor{gray}{\textsf{// Coreference resolution }}$
$\textcolor{magenta}{\textbf{load}}$ TsCorefSubPipe
Thread-safe Pipeline with section, paragraph and list detection, degree-of and location-of relations ...
Ts Sectioned Advanced Pipeline
$\textcolor{gray}{\textsf{// Thread-safe Pipeline with section, paragraph and list detection, degree-of and location-of relations ... }}$
$\textcolor{gray}{\textsf{// events, times, temporal relations, document creation time relations and coreference resolution. }}$
$\textcolor{gray}{\textsf{// Sectioned Relation and Temporal pipeline }}$
$\textcolor{magenta}{\textbf{load}}$ TsSectionedRelationTemporalPipeline
$\textcolor{gray}{\textsf{// Coreference resolution }}$
$\textcolor{magenta}{\textbf{load}}$ TsCorefSubPipe
Thread-safe Pipeline with section, paragraph and list detection and coreference resolution.
$\textcolor{gray}{\textsf{// Thread-safe Pipeline with section, paragraph and list detection and coreference resolution. }}$
$\textcolor{gray}{\textsf{// Pipeline with section, paragraph and list detection }}$
$\textcolor{magenta}{\textbf{load}}$ TsSectionedFastPipeline
$\textcolor{gray}{\textsf{// Coreference resolution }}$
$\textcolor{magenta}{\textbf{load}}$ TsCorefSubPipe
Thread-safe Pipeline with section, paragraph and list detection, degree-of and location-of relations and coreferences.
Ts Sectioned Relation Coref Pipeline
$\textcolor{gray}{\textsf{// Thread-safe Pipeline with section, paragraph and list detection, degree-of and location-of relations and coreferences. }}$
$\textcolor{gray}{\textsf{// Pipeline with section, paragraph and list detection with degree-of and location-of relations }}$
$\textcolor{magenta}{\textbf{load}}$ TsSectionedRelationPipeline
$\textcolor{gray}{\textsf{// Coreference resolution }}$
$\textcolor{magenta}{\textbf{load}}$ TsCorefSubPipe
Thread-safe Pipeline with section, paragraph and list detection, events, times, temporal relations and document creation time relations.
Ts Sectioned Temporal Coref Pipeline
$\textcolor{gray}{\textsf{// Thread-safe Pipeline with section, paragraph and list detection, events, times, temporal relations and document creation time relations. }}$
$\textcolor{gray}{\textsf{// Pipeline with section, paragraph and list detection and temporal }}$
$\textcolor{magenta}{\textbf{load}}$ TsSectionedTemporalPipeline
$\textcolor{gray}{\textsf{// Coreference resolution }}$
$\textcolor{magenta}{\textbf{load}}$ TsCorefSubPipe