ctakes core - apache/ctakes GitHub Wiki
Contains code and resources required by all or most other cTAKES modules.
Collection Readers
Annotation Engines
Output Writers
Utilities
Piper Files
Reads document texts from text files in a directory tree.
Source class: FileTreeReader
Source package: org.apache.ctakes.core.cr
Parent class: org.apache.ctakes.core.cr.AbstractFileTreeReader
Products: Document Id, Document Id Prefix
Parameter | Description | Class | Required | Default |
---|---|---|---|---|
InputDirectory | Directory for all input files. | String | Yes | |
CRtoSpace | Change windows-format CR + LF character sequences to LF + . | boolean | No | |
Encoding | The character encoding used by the input files. | String | No | |
Extensions | The extensions of the files that the collection reader will read. | String[] | No | * |
KeepCR | Keep windows-format carriage return characters at line endings. This will only keep existing characters, it will not add them. | boolean | No | |
PatientLevel | The level in the directory hierarchy at which patient identifiers exist.Default value is 1; directly under root input directory. | int | No | |
StripQuotes | Replace document-enclosing quote characters with space characters. | boolean | No | |
WriteBanner | Write a large banner at each major step of the pipeline. | String | No | no |
Reads document texts from text files in a directory, repeating for a number of iterations.
Source class: FilesInDirectoryCollectionCyclicalReads
Source package: org.apache.ctakes.core.cr
Parent class: org.apache.ctakes.core.cr.FilesInDirectoryCollectionReader
Products: Document Id
No available configuration parameters.
Reads document texts from text files in a directory.
Source class: FilesInDirectoryCollectionReader
Source package: org.apache.ctakes.core.cr
Parent class: org.apache.uima.collection.CollectionReader_ImplBase
Products: Document Id
No available configuration parameters.
Reads document texts from database table's fields.
Source class: JdbcNotesReader
Source package: org.apache.ctakes.core.cr.jdbc
Parent class: org.apache.uima.fit.component.JCasCollectionReader_ImplBase
Products: Document Id
Parameter | Description | Class | Required | Default |
---|---|---|---|---|
DbDriver | JDBC driver ClassName. | String | Yes | |
DbPass | Password for database authentication. | String | Yes | |
DbUrl | JDBC URL that specifies database network location and name. | String | Yes | |
DbUser | Username for database authentication. | String | Yes | |
DocColumn | Name of column that contains the document text. | String | Yes | |
SqlStatement | SQL statement to retrieve the document. | String | Yes | |
BirthColumn | Name of column that contains the patient birth date. | String | No | |
DateColumn | Name of column that contains the document original date. | String | No | |
DbDecryptor | JDBC decryptor ClassName. | String | No | |
DeathColumn | Name of column that contains the patient death date. | String | No | |
DecryptPass | Password for text decryption. | String | No | |
EncounterIdColumn | Name of column that contains the encounter id. | String | No | |
FirstNameColumn | Name of column that contains the patient first name. | String | No | |
FirstSoundexColumn | Name of column that contains the patient first name soundex. | String | No | |
GenderColumn | Name of column that contains the patient gender. | String | No | |
IdColumns | Specifies column names that will be used to form a document ID. | String[] | No | |
IdDelimiter | Specifies delimiter used when document ID is built. | String | No | |
InstanceIdColumn | Name of column that contains the document instance id. | String | No | |
InstituteColumn | Name of column that contains the source institution. | String | No | |
KeepAlive | Flag that determines whether to keep JDBC connection open no matter what. | String | No | |
LastNameColumn | Name of column that contains the patient last name. | String | No | |
LastSoundexColumn | Name of column that contains the patient last name soundex. | String | No | |
MiddleNameColumn | Name of column that contains the patient middle name. | String | No | |
NoteSubtypeColumn | Name of column that contains the note subtype. | String | No | |
NoteTypeColumn | Name of column that contains the note type. | String | No | |
PatientColumn | Name of column that contains the patient identifier. | String | No | |
PatientIdColumn | Name of column that contains the patient id. | String | No | |
RevisionColumn | Name of column that contains the document revision number. | String | No | |
RevisionDateColumn | Name of column that contains the document revision date. | String | No | |
SpecialtyColumn | Name of column that contains the author specialty. | String | No | |
StandardColumn | Name of column that contains the document standard. | String | No |
Reads document texts from database text fields.
Source class: JdbcCollectionReader
Source package: org.apache.ctakes.core.cr
Parent class: org.apache.uima.fit.component.JCasCollectionReader_ImplBase
Products: Document Id
Parameter | Description | Class | Required | Default |
---|---|---|---|---|
DbConnResrcName | Name of external resource for database connection. | String | Yes | |
DocTextColName | Name of column from resultset that contains the document text. | String | Yes | |
SqlStatement | SQL statement to retrieve the document. | String | Yes | |
DocIdColNames | Specifies column names that will be used to form a document ID. | String[] | No | |
DocIdDelimiter | Specifies delimiter used when document ID is built. | String | No | |
ValueFileResrcName | Name of external resource for prepared statement value file. | String | No |
Reads a document texts from a single text file, treating each line as a document.
Source class: LinesFromFileCollectionReader
Source package: org.apache.ctakes.core.cr
Parent class: org.apache.uima.collection.CollectionReader_ImplBase
Products: Document Id
No available configuration parameters.
Reads document texts from Lucene text fields.
Source class: LuceneCollectionReader
Source package: org.apache.ctakes.core.cr
Parent class: org.apache.uima.fit.component.CasCollectionReader_ImplBase
Products: Document Id
Parameter | Description | Class | Required | Default |
---|---|---|---|---|
IndexDirectory | Location of lucene index | String | Yes | |
FieldName | Field to look in for document text | String | No | |
MaxWords | Maximum number of words to process (approximate -- actually based on characters) | int | No |
Reads document texts from text files specified in a provided list.
Source class: TextReader
Source package: org.apache.ctakes.core.cr
Parent class: org.apache.uima.fit.component.JCasCollectionReader_ImplBase
Products: Document Id
Parameter | Description | Class | Required | Default |
---|---|---|---|---|
files | The text files to be loaded | List | Yes |
Reads document texts and annotations from XMI files specified in a provided list.
Source class: XMIReader
Source package: org.apache.ctakes.core.cr
Parent class: org.apache.uima.fit.component.JCasCollectionReader_ImplBase
Products: Document Id
Parameter | Description | Class | Required | Default |
---|---|---|---|---|
files | The XMI files to be loaded | List | Yes |
Reads document texts and annotations from XMI files in a directory tree.
Source class: XmiTreeReader
Source package: org.apache.ctakes.core.cr
Parent class: org.apache.ctakes.core.cr.AbstractFileTreeReader
Products: Document Id
Parameter | Description | Class | Required | Default |
---|---|---|---|---|
InputDirectory | Directory for all input files. | String | Yes | |
CRtoSpace | Change windows-format CR + LF character sequences to LF + . | boolean | No | |
Encoding | The character encoding used by the input files. | String | No | |
Extensions | The extensions of the files that the collection reader will read. | String[] | No | * |
KeepCR | Keep windows-format carriage return characters at line endings. This will only keep existing characters, it will not add them. | boolean | No | |
PatientLevel | The level in the directory hierarchy at which patient identifiers exist.Default value is 1; directly under root input directory. | int | No | |
StripQuotes | Replace document-enclosing quote characters with space characters. | boolean | No | |
WriteBanner | Write a large banner at each major step of the pipeline. | String | No | no |
Reads document texts and annotations from XMI files in a directory.
Source class: XmiCollectionReaderCtakes
Source package: org.apache.ctakes.core.cr
Parent class: org.apache.uima.collection.CollectionReader_ImplBase
Products: Document Id
No available configuration parameters.
Annotates Document Sections by detecting Section Headers using Regular Expressions provided in a File.
Source class: CDASegmentAnnotator
Source package: org.apache.ctakes.core.ae
Parent class: org.apache.uima.fit.component.JCasAnnotator_ImplBase
Dependencies: Document Id
Products: Section
Parameter | Description | Class | Required | Default |
---|---|---|---|---|
sections_file | Path to File that contains the section header mappings | String | No | src/user/resources/org/apache/ctakes/core/sections/ccda_sections.txt |
Re-annotates Sentences based upon short lines, preventing a Sentence from spanning over an intentional line break.
Source class: EolSentenceFixer
Source package: org.apache.ctakes.core.ae
Parent class: org.apache.uima.fit.component.JCasAnnotator_ImplBase
Dependencies: Sentence
No available configuration parameters.
Associates Lab Mentions with values.
Source class: LabValueFinder
Source package: org.apache.ctakes.core.ae
Parent class: org.apache.uima.fit.component.JCasAnnotator_ImplBase
Dependencies: Section, Base Token, Identified Annotation
Products: Generic Relation
Parameter | Description | Class | Required | Default |
---|---|---|---|---|
labTUIs | TUIs indicating lab measurements | String[] | Yes | |
allSections | Use all Annotatable sections. This ignores the value of sections | String | No | true |
excludeCUIs | CUIs not indicating specific lab measurements | String[] | No | |
maxLineCount | Maximum newlines between lab and value | int | No | |
sections | Annotatable sections | String[] | No | |
useDrugs | Use Medications in addition to Labs. | String | No | false |
valueWords | Words indicating values | String[] | No |
Annotates formatted List Sections by detecting them using Regular Expressions provided in an input File.
Source class: ListAnnotator
Source package: org.apache.ctakes.core.ae
Parent class: org.apache.uima.fit.component.JCasAnnotator_ImplBase
Dependencies: Section
Products: List
Parameter | Description | Class | Required | Default |
---|---|---|---|---|
LIST_TYPES_PATH | path to a file containing a list of regular expressions and corresponding list types. | String | Yes | org/apache/ctakes/core/list/ DefaultListRegex.bsv |
Checks List Entries for negation, which may be exhibited differently from unstructured negation.
Source class: ListEntryNegator
Source package: org.apache.ctakes.core.ae
Parent class: org.apache.uima.fit.component.JCasAnnotator_ImplBase
Dependencies: List, Identified Annotation
No available configuration parameters.
Re-annotates Paragraphs based upon existing Lists, preventing a Paragraph from spanning more than one List.
Source class: ListParagraphFixer
Source package: org.apache.ctakes.core.ae
Parent class: org.apache.uima.fit.component.JCasAnnotator_ImplBase
Dependencies: List, Sentence
No available configuration parameters.
Re-annotates Sentences based upon existing List Entries, preventing a Sentence from spanning more than one List Entry.
Source class: ListSentenceFixer
Source package: org.apache.ctakes.core.ae
Parent class: org.apache.uima.fit.component.JCasAnnotator_ImplBase
Dependencies: List, Sentence
No available configuration parameters.
Annotates Document Penn TreeBank Tokens.
Source class: TokenizerAnnotatorPTB
Source package: org.apache.ctakes.core.ae
Parent class: org.apache.uima.fit.component.JCasAnnotator_ImplBase
Dependencies: Section, Sentence
Products: Base Token
Parameter | Description | Class | Required | Default |
---|---|---|---|---|
SegmentsToSkip | Set of segments that can be skipped | String[] | No |
Annotates Paragraphs by detecting them using Regular Expressions provided in an input File or by empty text lines.
Source class: ParagraphAnnotator
Source package: org.apache.ctakes.core.ae
Parent class: org.apache.uima.fit.component.JCasAnnotator_ImplBase
Dependencies: Section
Products: Paragraph
Parameter | Description | Class | Required | Default |
---|---|---|---|---|
PARAGRAPH_TYPES_PATH | path to a file containing a list of regular expressions and corresponding paragraph types. | String | No |
Re-annotates Sentences based upon existing Paragraphs, preventing a Sentence from spanning more than one Paragraph.
Source class: ParagraphSentenceFixer
Source package: org.apache.ctakes.core.ae
Parent class: org.apache.uima.fit.component.JCasAnnotator_ImplBase
Dependencies: Paragraph, Sentence
No available configuration parameters.
Sentence detector that uses B I O for determination. Useful for documents in which newlines may not indicate sentence boundaries.
Source class: SentenceDetectorAnnotatorBIO
Source package: org.apache.ctakes.core.ae
Parent class: org.cleartk.ml.CleartkAnnotator
Dependencies: Section
Products: Sentence
Parameter | Description | Class | Required | Default |
---|---|---|---|---|
classifierFactoryClassName | provides the full name of the ClassifierFactory class to be used. | String | No | org.cleartk.ml.jar. JarClassifierFactory |
dataWriterFactoryClassName | provides the full name of the DataWriterFactory class to be used. | String | No | org.cleartk.ml.jar. DefaultDataWriterFactory |
FeatureConfiguration | FEAT_CONFIG | No | ||
isTraining | determines whether this annotator is writing training data or using a classifier to annotate. Normally inferred automatically based on whether or not a DataWriterFactory class has been set. | Boolean | No | |
TokenFilename | String | No |
Annotates Document Sections by detecting Section Headers using Regular Expressions provided in a Bar-Separated-Value (BSV) File.
Source class: BsvRegexSectionizer
Source package: org.apache.ctakes.core.ae
Parent class: org.apache.ctakes.core.ae.RegexSectionizer
Products: Section
Parameter | Description | Class | Required | Default |
---|---|---|---|---|
SectionsBsv | path to a BSV file containing a list of regular expressions and corresponding section types. | String | Yes | org/apache/ctakes/core/sections/ DefaultSectionRegex.bsv |
TagDividers | True if lines of divider characters ____ , ---- , === should divide sections | boolean | No | true |
Annotates Document Sections by detecting Section Headers in template.
Source class: SectionSegmentAnnotator
Source package: org.apache.ctakes.core.ae
Parent class: org.apache.uima.analysis_component.JCasAnnotator_ImplBase
Products: Section
No available configuration parameters.
Annotates Sentences based upon an OpenNLP model.
Source class: SentenceDetector
Source package: org.apache.ctakes.core.ae
Parent class: org.apache.uima.fit.component.JCasAnnotator_ImplBase
Dependencies: Section
Products: Sentence
Parameter | Description | Class | Required | Default |
---|---|---|---|---|
SentenceModelFile | Path to sentence detector model file | String | Yes | org/apache/ctakes/core/models/sentdetect/ sd-med-model.zip |
SegmentsToSkip | Set of segments that can be skipped | String[] | No |
Annotates Document as a single Section.
Source class: SimpleSegmentAnnotator
Source package: org.apache.ctakes.core.ae
Parent class: org.apache.uima.fit.component.JCasAnnotator_ImplBase
Products: Section
Parameter | Description | Class | Required | Default |
---|---|---|---|---|
SegmentID | Name to give to all segments | String | No | SIMPLE_SEGMENT |
Annotates Document Sections by detecting start and end Section Tags.
Source class: SimpleSegmentWithTagsAnnotator
Source package: org.apache.ctakes.core.ae
Parent class: org.apache.uima.analysis_component.JCasAnnotator_ImplBase
Products: Section
No available configuration parameters.
Annotates Sentences based upon an OpenNLP model.
Source class: ThreadSafeSentenceDetector
Source package: org.apache.ctakes.core.concurrent
Parent class: org.apache.ctakes.core.ae.SentenceDetector
Dependencies: Section
Products: Sentence
Parameter | Description | Class | Required | Default |
---|---|---|---|---|
SentenceModelFile | Path to sentence detector model file | String | Yes | org/apache/ctakes/core/models/sentdetect/ sd-med-model.zip |
SegmentsToSkip | Set of segments that can be skipped | String[] | No |
Thread safe sentence detector that uses B I O for determination. Useful for documents in which newlines may not indicate sentence boundaries.
Source class: ThreadSafeSentenceDetectorBio
Source package: org.apache.ctakes.core.concurrent
Parent class: org.apache.ctakes.core.ae.SentenceDetectorAnnotatorBIO
Dependencies: Section
Products: Sentence
Parameter | Description | Class | Required | Default |
---|---|---|---|---|
classifierFactoryClassName | provides the full name of the ClassifierFactory class to be used. | String | No | org.cleartk.ml.jar. JarClassifierFactory |
dataWriterFactoryClassName | provides the full name of the DataWriterFactory class to be used. | String | No | org.cleartk.ml.jar. DefaultDataWriterFactory |
FeatureConfiguration | FEAT_CONFIG | No | ||
isTraining | determines whether this annotator is writing training data or using a classifier to annotate. Normally inferred automatically based on whether or not a DataWriterFactory class has been set. | Boolean | No | |
TokenFilename | String | No |
Annotates Document Tokens.
Source class: TokenizerAnnotator
Source package: org.apache.ctakes.core.ae
Parent class: org.apache.uima.analysis_component.JCasAnnotator_ImplBase
Dependencies: Section
Products: Base Token
No available configuration parameters.
Writes a two-column BSV file containing CUIs and their total counts in a document.
Source class: CuiCountFileWriter
Source package: org.apache.ctakes.core.cc
Parent class: org.apache.uima.fit.component.CasConsumer_ImplBase
Dependencies: Document Id, Identified Annotation
Parameter | Description | Class | Required | Default |
---|---|---|---|---|
OutputDirectory | Directory for all output files. | String | No |
Writes a list of CUIs, covered text and preferred text to files.
Source class: CuiListFileWriter
Source package: org.apache.ctakes.core.cc
Parent class: org.apache.ctakes.core.cc.AbstractJCasFileWriter
Dependencies: Document Id, Sentence, Base Token
Usables: Document Id Prefix, Identified Annotation, Event, Timex, Temporal Relation
Parameter | Description | Class | Required | Default |
---|---|---|---|---|
OutputDirectory | Directory for all output files. | File | Yes | |
SubDirectory | SubDirectory for files. | String | No |
Writes Text files with original text from the document.
Source class: FilesInDirectoryCasConsumer
Source package: org.apache.ctakes.core.cc
Parent class: org.apache.uima.collection.CasConsumer_ImplBase
Dependencies: Document Id
No available configuration parameters.
Writes Text files with original text from the document in a specified directory.
Source class: NormalizedFilesInDirectoryCasConsumer
Source package: org.apache.ctakes.core.cc
Parent class: org.apache.uima.collection.CasConsumer_ImplBase
Dependencies: Document Id, Base Token
No available configuration parameters.
Writes HTML files with a Table representation of extracted information.
Source class: HtmlTableCasConsumer
Source package: org.apache.ctakes.core.cc
Parent class: org.apache.uima.collection.CasConsumer_ImplBase
Dependencies: Base Token
No available configuration parameters.
Writes html files with document text and simple markups (Semantic Group, CUI, Negation).
Source class: HtmlTextWriter
Source package: org.apache.ctakes.core.cc.html
Parent class: org.apache.ctakes.core.cc.AbstractJCasFileWriter
Dependencies: Document Id, Sentence, Base Token
Usables: Document Id Prefix, Identified Annotation, Event, Timex, Temporal Relation
Parameter | Description | Class | Required | Default |
---|---|---|---|---|
OutputDirectory | Directory for all output files. | File | Yes | |
SubDirectory | SubDirectory for files. | String | No |
Writes html files with document text and simple markups (Semantic Group, CUI, Negation).
Source class: HtmlTextWriter
Source package: org.apache.ctakes.core.cc.pretty.html
Parent class: org.apache.ctakes.core.cc.AbstractJCasFileWriter
Dependencies: Document Id, Sentence, Base Token
Usables: Document Id Prefix, Identified Annotation, Event, Timex, Temporal Relation
Parameter | Description | Class | Required | Default |
---|---|---|---|---|
OutputDirectory | Directory for all output files. | File | Yes | |
SubDirectory | SubDirectory for files. | String | No |
Writes UMLS Concepts to a standard I2B2 Observation_Fact table.
Source class: I2b2JdbcWriter
Source package: org.apache.ctakes.core.cc.jdbc.i2b2
Parent class: org.apache.ctakes.core.cc.jdbc.AbstractJCasJdbcWriter
Dependencies: Identified Annotation
Parameter | Description | Class | Required | Default |
---|---|---|---|---|
DbDriver | JDBC driver ClassName. | String | Yes | |
DbPass | Password for database authentication. | String | Yes | |
DbUrl | JDBC URL that specifies database network location and name. | String | Yes | |
DbUser | Username for database authentication. | String | Yes | |
FactOutputTable | Name of the Observation_Fact table for writing output. | String | Yes | |
BatchSize | Number of statements to use in a batch. 0 or 1 denotes that batches should not be used. | String | No | |
KeepAlive | Flag that determines whether to keep JDBC connection open no matter what. | String | No | |
RepeatCuis | Repeat Concepts with the same Cui but possibly different Semantic Type or Preferred Text. | boolean | No |
Stores extracted information and document metadata in a database.
Source class: JdbcWriterTemplate
Source package: org.apache.ctakes.core.cc
Parent class: org.apache.ctakes.core.cc.AbstractJdbcWriter
Dependencies: Document Id, Identified Annotation
No available configuration parameters.
Writes a table of Medication information to file, sorted by character index.
Source class: MedicationTableFileWriter
Source package: org.apache.ctakes.core.cc
Parent class: org.apache.ctakes.core.cc.AbstractTableFileWriter
Dependencies: Document Id, Identified Annotation
Usables: Document Id Prefix
Parameter | Description | Class | Required | Default |
---|---|---|---|---|
OutputDirectory | Directory for all output files. | File | Yes | |
SubDirectory | SubDirectory for files. | String | No | |
TableType | Type of Table to write to File. Possible values are: BSV, CSV, HTML, TAB | String | No |
Writes text files with document text and simple markups (POS, Semantic Group, CUI, Negation).
Source class: PrettyTextWriterFit
Source package: org.apache.ctakes.core.cc.pretty.plaintext
Parent class: org.apache.ctakes.core.cc.AbstractJCasFileWriter
Dependencies: Document Id, Sentence, Base Token
Usables: Document Id Prefix, Identified Annotation, Event, Timex, Temporal Relation
Parameter | Description | Class | Required | Default |
---|---|---|---|---|
OutputDirectory | Directory for all output files. | File | Yes | |
SubDirectory | SubDirectory for files. | String | No |
Writes text files with document text and simple markups (POS, Semantic Group, CUI, Negation).
Source class: PrettyTextWriterUima
Source package: org.apache.ctakes.core.cc.pretty.plaintext
Parent class: org.apache.uima.collection.CasConsumer_ImplBase
Dependencies: Document Id, Sentence, Base Token
Usables: Identified Annotation, Event, Timex, Temporal Relation
No available configuration parameters.
Writes text files with lists of annotations and properties (POS, Semantic Group, CUI, Negation).
Source class: PropertyTextWriterFit
Source package: org.apache.ctakes.core.cc.property.plaintext
Parent class: org.apache.uima.fit.component.CasConsumer_ImplBase
Dependencies: Document Id, Sentence, Identified Annotation
Parameter | Description | Class | Required | Default |
---|---|---|---|---|
OutputDirectory | Directory for all output files. | String | No |
Writes text files with lists of annotations and properties (POS, Semantic Group, CUI, Negation).
Source class: PropertyTextWriterUima
Source package: org.apache.ctakes.core.cc.property.plaintext
Parent class: org.apache.uima.collection.CasConsumer_ImplBase
Dependencies: Document Id, Sentence, Identified Annotation
No available configuration parameters.
Writes a table of Annotation information to file, grouped by Semantic Type.
Source class: SemanticTableFileWriter
Source package: org.apache.ctakes.core.cc
Parent class: org.apache.ctakes.core.cc.AbstractTableFileWriter
Dependencies: Document Id, Identified Annotation
Usables: Document Id Prefix
Parameter | Description | Class | Required | Default |
---|---|---|---|---|
OutputDirectory | Directory for all output files. | File | Yes | |
SubDirectory | SubDirectory for files. | String | No | |
TableType | Type of Table to write to File. Possible values are: BSV, CSV, HTML, TAB | String | No |
Writes Text files with original text from the document, sentence by sentence.
Source class: SentenceTokensPrinter
Source package: org.apache.ctakes.core.cc
Parent class: org.apache.uima.collection.CasConsumer_ImplBase
Dependencies: Document Id, Sentence, Base Token
No available configuration parameters.
Writes BSV files with original text for extracted annotations and their span offsets.
Source class: TextSpanWriter
Source package: org.apache.ctakes.core.cc
Parent class: org.apache.uima.fit.component.CasConsumer_ImplBase
Dependencies: Identified Annotation
Parameter | Description | Class | Required | Default |
---|---|---|---|---|
OutputDirectory | Directory for all output files. | String | No |
Writes a two-column BSV file containing Begin and End offsets of tokens in a document.
Source class: TokenOffsetsCasConsumer
Source package: org.apache.ctakes.core.cc
Parent class: org.apache.uima.collection.CasConsumer_ImplBase
Dependencies: Document Id, Base Token
No available configuration parameters.
Writes a table of base tokens and their spans in a directory tree.
Source class: TokenTableFileWriter
Source package: org.apache.ctakes.core.cc
Parent class: org.apache.ctakes.core.cc.AbstractTableFileWriter
Usables: Document Id Prefix, Base Token
Parameter | Description | Class | Required | Default |
---|---|---|---|---|
OutputDirectory | Directory for all output files. | File | Yes | |
SubDirectory | SubDirectory for files. | String | No | |
TableType | Type of Table to write to File. Possible values are: BSV, CSV, HTML, TAB | String | No |
Writes a two-column BSV file containing Words and their total counts in a document.
Source class: TokenFreqCasConsumer
Source package: org.apache.ctakes.core.cc
Parent class: org.apache.uima.collection.CasConsumer_ImplBase
Dependencies: Base Token
No available configuration parameters.
Writes XMI files with full representation of input text and all extracted information.
Source class: XmiWriterCasConsumerCtakes
Source package: org.apache.ctakes.core.cc
Parent class: org.apache.uima.fit.component.CasConsumer_ImplBase
Dependencies: Document Id
Parameter | Description | Class | Required | Default |
---|---|---|---|---|
OutputDirectory | Output directory to write xmi files | File | Yes |
Writes XMI files with full representation of input text and all extracted information.
Source class: FileTreeXmiWriter
Source package: org.apache.ctakes.core.cc
Parent class: org.apache.ctakes.core.cc.AbstractJCasFileWriter
Dependencies: Document Id
Usables: Document Id Prefix
Parameter | Description | Class | Required | Default |
---|---|---|---|---|
OutputDirectory | Directory for all output files. | File | Yes | |
SubDirectory | SubDirectory for files. | String | No |
Writes XMI files with full representation of input text and all extracted information.
Source class: CasConsumer
Source package: org.apache.ctakes.core.cc
Parent class: org.apache.uima.collection.CasConsumer_ImplBase
Dependencies: Document Id
No available configuration parameters.
Removes annotations of a given type from the JCas.
Source class: FilterAnnotator
Source package: org.apache.ctakes.core.ae
Parent class: org.apache.uima.analysis_component.JCasAnnotator_ImplBase
Dependencies: Base Token
No available configuration parameters.
Runs an external process.
Source class: CommandRunner
Source package: org.apache.ctakes.core.ae
Parent class: org.apache.ctakes.core.ae.AbstractCommandRunner
Parameter | Description | Class | Required | Default |
---|---|---|---|---|
OutputDirectory | Directory for all output files. | File | Yes | |
Command | A full command line to be executed. Make sure to quote. | String | No | |
CommandDir | The Command Executable's directory. | String | No | |
Log | A name for the streaming logger. Default is the Command. | String | No | |
LogFile | File to which cTAKES output should be sent. | String | No | |
Pause | Pause for some seconds. Default is 0 | int | No | |
PerDoc | yes to run the command once per document. Default is no. | String | No | no |
SetJavaHome | Set JAVA_HOME to the Java running cTAKES. Default is yes. | String | No | yes |
Wait | Wait for the process to finish. Default is no. | String | No | no |
WorkingDir | The Working Directory directory. | String | No |
Starts a new instance of cTAKES with the given piper parameters.
Source class: CtakesRunner
Source package: org.apache.ctakes.core.ae
Parent class: org.apache.ctakes.core.ae.PausableFileLoggerAE
Parameter | Description | Class | Required | Default |
---|---|---|---|---|
OutputDirectory | Directory for all output files. | File | Yes | |
Pipeline | Piper parameters. Make sure to quote. | String | Yes | |
LogFile | File to which cTAKES output should be sent. | String | No | |
Pause | Pause for some seconds. Default is 0 | int | No | |
Wait | Wait for the process to finish. Default is no. | String | No | no |
use FinishedLogger in (sub) package log.
Source class: FinishedLogger
Source package: org.apache.ctakes.core.util
Parent class: org.apache.uima.fit.component.JCasAnnotator_ImplBase
No available configuration parameters.
Logs the Document ID to Log4j and Standard Output.
Source class: DocumentIdPrinterAnalysisEngine
Source package: org.apache.ctakes.core.ae
Parent class: org.apache.uima.fit.component.JCasAnnotator_ImplBase
Dependencies: Document Id
No available configuration parameters.
Forcibly Exits cTAKES. Use only at the end of a pipeline.
Source class: ExitForcer
Source package: org.apache.ctakes.core.ae
Parent class: org.apache.ctakes.core.ae.inert.PausableAE
Parameter | Description | Class | Required | Default |
---|---|---|---|---|
ForceExit | Forcibly exits the system when the value is yes. Yes by default. | String | No | yes |
Pause | Pause for some seconds. Default is 0 | int | No | |
Wait | Wait for the process to finish. Default is no. | String | No | no |
Writes a banner message COMPLETE to the log when all processing is finished.
Source class: FinishedLogger
Source package: org.apache.ctakes.core.util.log
Parent class: org.apache.uima.fit.component.JCasAnnotator_ImplBase
No available configuration parameters.
Copies document text and all annotations into a new JCas.
Source class: CopyAnnotator
Source package: org.apache.ctakes.core.ae
Parent class: org.apache.uima.fit.component.JCasAnnotator_ImplBase
Parameter | Description | Class | Required | Default |
---|---|---|---|---|
dataBindMap | Mapping between source methods and destination methods in a bar (" | ") separated format | String[] | Yes |
destObjClass | Name of destination class | String | Yes | |
srcObjClass | Name of source class | String | Yes |
Reads annotations from SHARP schema Knowtator XML files in a directory.
Source class: SHARPKnowtatorXMLReader
Source package: org.apache.ctakes.core.ae
Parent class: org.apache.uima.fit.component.JCasAnnotator_ImplBase
Products: Identified Annotation, Event, Timex, Location Relation, Degree Relation, Temporal Relation
Parameter | Description | Class | Required | Default |
---|---|---|---|---|
SetDefaults | whether or not to set default attribute values if no annotation is present | boolean | Yes | |
TextDirectory | directory containing the text files (if DocumentIDs are just filenames); defaults to assuming that DocumentIDs are full file paths | File | No |
Joins Sentences with person titles Mr. Mrs. Dr. that have been split by SentenceDetectorBIO.
Source class: MrsDrSentenceJoiner
Source package: org.apache.ctakes.core.ae
Parent class: org.apache.uima.fit.component.JCasAnnotator_ImplBase
Dependencies: Sentence
No available configuration parameters.
Does absolutely nothing.
Source class: NullAnnotator
Source package: org.apache.ctakes.core.ae
Parent class: org.apache.uima.analysis_component.JCasAnnotator_ImplBase
No available configuration parameters.
Removes or modifies annotations that overlap.
Source class: OverlapAnnotator
Source package: org.apache.ctakes.core.ae
Parent class: org.apache.uima.analysis_component.JCasAnnotator_ImplBase
Dependencies: Base Token
No available configuration parameters.
Caches each Document JCas in a Patient JCas as a View.
Source class: PatientNoteCollector
Source package: org.apache.ctakes.core.patient
Parent class: org.apache.uima.fit.component.JCasAnnotator_ImplBase
No available configuration parameters.
Analysis Engine that executes the PiperFileRunner. Kludge for desc files (CPE).
Source class: PiperFileRunEngine
Source package: org.apache.ctakes.core.ae
Parent class: org.apache.uima.fit.component.JCasAnnotator_ImplBase
Parameter | Description | Class | Required | Default |
---|---|---|---|---|
PiperParams | Command Line Parameters normally used to run a piper file. | String | Yes |
Will pip a specified python package.
Source class: PythonPipper
Source package: org.apache.ctakes.core.ae
Parent class: org.apache.ctakes.core.ae.PythonRunner
Parameter | Description | Class | Required | Default |
---|---|---|---|---|
OutputDirectory | Directory for all output files. | File | Yes | |
PipPackage | Path of the python package to pip. | String | Yes | |
Command | A full command line to be executed. Make sure to quote. | String | No | |
CommandDir | The Command Executable's directory. | String | No | |
Log | A name for the streaming logger. Default is the Command. | String | No | |
LogFile | File to which cTAKES output should be sent. | String | No | |
Pause | Pause for some seconds. Default is 0 | int | No | |
PerDoc | yes to run the command once per document. Default is no. | String | No | no |
VirtualEnv | Path to Python virtual environment. | String | No | |
Wait | Wait for the process to finish. Default is no. | String | No | no |
WorkingDir | The Working Directory directory. | String | No |
Starts a Python process with the given parameters.
Source class: PythonRunner
Source package: org.apache.ctakes.core.ae
Parent class: org.apache.ctakes.core.ae.AbstractCommandRunner
Parameter | Description | Class | Required | Default |
---|---|---|---|---|
OutputDirectory | Directory for all output files. | File | Yes | |
Command | A full command line to be executed. Make sure to quote. | String | No | |
CommandDir | The Command Executable's directory. | String | No | |
Log | A name for the streaming logger. Default is the Command. | String | No | |
LogFile | File to which cTAKES output should be sent. | String | No | |
Pause | Pause for some seconds. Default is 0 | int | No | |
PerDoc | yes to run the command once per document. Default is no. | String | No | no |
VirtualEnv | Path to Python virtual environment. | String | No | |
Wait | Wait for the process to finish. Default is no. | String | No | no |
WorkingDir | The Working Directory directory. | String | No |
Simple Annotator to place before and after other annotators that do not Log their Start and Finish.
Source class: StartFinishLogger
Source package: org.apache.ctakes.core.ae
Parent class: org.apache.uima.fit.component.JCasAnnotator_ImplBase
Parameter | Description | Class | Required | Default |
---|---|---|---|---|
LOGGER_NAME | provides the full name of the Annotator Engine for which start / end logging should be done. | String | Yes | StartEndProgressLogger |
IS_START | indicates whether this should log a start. | Boolean | No | |
LOGGER_TASK | provides the descriptive purpose of the Annotator Engine for which start / end logging should be done. | String | No | Processing ... |
Commands and parameters for a small tokenization pipeline.
$\textcolor{gray}{\textsf{// Commands and parameters for a small tokenization pipeline. }}$
$\textcolor{green}{\textbf{add}}$ SimpleSegmentAnnotator
$\textcolor{green}{\textbf{add}}$ SentenceDetector
$\textcolor{green}{\textbf{add}}$ TokenizerAnnotatorPTB
Commands and parameters for a small tokenization pipeline with sections, paragraphs and lists.
$\textcolor{gray}{\textsf{// Commands and parameters for a small tokenization pipeline with sections, paragraphs and lists. }}$
$\textcolor{gray}{\textsf{// Annotate sections by known regex }}$
$\textcolor{green}{\textbf{add}}$ BsvRegexSectionizer
$\textcolor{gray}{\textsf{// The sentence detector needs our custom model path, otherwise default values are used. }}$
$\textcolor{gray}{\textsf{//add SentenceDetectorAnnotatorBIO classifierJarPath=/org/apache/ctakes/core/models/sentdetect/model.jar }}$
$\textcolor{gray}{\textsf{// The SentenceDetectorAnnotatorBIO is a "lumper" that works well for notes in which end of line does not indicate a sentence. }}$
$\textcolor{gray}{\textsf{// If that is not your case, then you may get better results using the more standard SentenceDetector }}$
$\textcolor{green}{\textbf{add}}$ SentenceDetector
$\textcolor{gray}{\textsf{// By default, paragraphs are parsed using empty lines as separators and Part \#: }}$
$\textcolor{green}{\textbf{add}}$ ParagraphAnnotator
$\textcolor{gray}{\textsf{// Fix sentences so that no sentence spans across two or more paragraphs. }}$
$\textcolor{green}{\textbf{add}}$ ParagraphSentenceFixer
$\textcolor{gray}{\textsf{// Use regular expressions created for the Pitt notes to discover formatted lists and tables. }}$
$\textcolor{green}{\textbf{add}}$ ListAnnotator
$\textcolor{gray}{\textsf{// Fix sentences so that no sentence spans across two or more list entries. }}$
$\textcolor{green}{\textbf{add}}$ ListSentenceFixer
$\textcolor{gray}{\textsf{// Now we can finally tokenize, tag parts of speech and chunk using adjusted sentences. }}$
$\textcolor{green}{\textbf{add}}$ TokenizerAnnotatorPTB
Commands and parameters for a small thread-safe tokenization pipeline.
$\textcolor{gray}{\textsf{// Commands and parameters for a small thread-safe tokenization pipeline. }}$
$\textcolor{green}{\textbf{add}}$ SimpleSegmentAnnotator
$\textcolor{green}{\textbf{add}}$ $\textcolor{blue}{\textsf{concurrent.ThreadSafeSentenceDetector}}$
$\textcolor{green}{\textbf{add}}$ TokenizerAnnotatorPTB
Commands and parameters for a small thread-safe tokenization pipeline with sections, paragraphs and lists.
$\textcolor{gray}{\textsf{// Commands and parameters for a small thread-safe tokenization pipeline with sections, paragraphs and lists. }}$
$\textcolor{gray}{\textsf{// Annotate sections by known regex }}$
$\textcolor{green}{\textbf{add}}$ BsvRegexSectionizer
$\textcolor{gray}{\textsf{// The sentence detector needs our custom model path, otherwise default values are used. }}$
$\textcolor{gray}{\textsf{//add concurrent.ThreadSafeSentenceDetectorBio classifierJarPath=/org/apache/ctakes/core/models/sentdetect/model.jar }}$
$\textcolor{gray}{\textsf{// The SentenceDetectorAnnotatorBIO is a "lumper" that works well for notes in which end of line does not indicate a sentence. }}$
$\textcolor{gray}{\textsf{// If that is not your case, then you may get better results using the more standard SentenceDetector }}$
$\textcolor{green}{\textbf{add}}$ $\textcolor{blue}{\textsf{concurrent.ThreadSafeSentenceDetector}}$
$\textcolor{gray}{\textsf{// By default, paragraphs are parsed using empty lines as separators and Part \#: }}$
$\textcolor{green}{\textbf{add}}$ ParagraphAnnotator
$\textcolor{gray}{\textsf{// Fix sentences so that no sentence spans across two or more paragraphs. }}$
$\textcolor{green}{\textbf{add}}$ ParagraphSentenceFixer
$\textcolor{gray}{\textsf{// Use regular expressions created for the Pitt notes to discover formatted lists and tables. }}$
$\textcolor{green}{\textbf{add}}$ ListAnnotator
$\textcolor{gray}{\textsf{// Fix sentences so that no sentence spans across two or more list entries. }}$
$\textcolor{green}{\textbf{add}}$ ListSentenceFixer
$\textcolor{gray}{\textsf{// Now we can finally tokenize, tag parts of speech and chunk using adjusted sentences. }}$
$\textcolor{green}{\textbf{add}}$ TokenizerAnnotatorPTB