Section Headings - DeepPhe/DeepPhe-Release GitHub Wiki

Why Section Headings are Important

DeepPhe uses a list of section headings to identify parts of notes and reports to consider more important than other parts. This helps reduce the amount of noise (false positives) within the system's output.

Customizing the Section Headings

The default list of section headings is contained in the sections.txt file.

The first few lines are comments about the file.

Note that by default the sections headings in the list are case sensitive.

You can modify the existing file or you can create a copy and modify that copy.

To make the system use a modified copy, update the file data/pipeline/DeepPhe.document.piper so that this line

add Sectionizer

looks like the following line, with OUR.sections.txt replaced with your filename (and directory).

# Use customized list of section headers
add Sectionizer sections_file=/org/apache/ctakes/cancer/sections/OUR.sections.txt

Adding a variation to an existing section

You can add variations of the existing section headings to the end of the line by separating them with a comma. For example, if your institution uses **FINAL REPORT as a heading, you can update the line

Final Report,Final Report  

to be

Final Report,Final Report,\*\*FINAL REPORT

Note that each asterisk was escaped because it is one of the characters that is required to be escaped, as described by the comments at the top of the sections.txt file.

Adding a New Section

You can add new section headings too. To do so, add a new line to the sections.txt file in the following format

  • the line starts with a nicely readable name for the section
  • after that, add a comma
  • after that, add a comma-separated list of each variation that should be recognized as equivalent

For example, if your reports contain a section with the heading 'GENOMIC RESULTS:', you might add the following line

Genomic Results,GENOMIC RESULTS:

Note that would only recognize the upper case heading to be recognized.

If you also want a heading of 'Genomic Results' to be recognized, your line would be

Genomic Results,Genomic Results,GENOMIC RESULTS:

If you also want the heading GENOMIC RESULTS (without the trailing colon) you could add that as well.

Genomic Results,Genomic Results,GENOMIC RESULTS:,GENOMIC RESULTS

However, DeepPhe will append a colon to all the patterns, so this would suffice:

Genomic Results,Genomic Results,GENOMIC RESULTS

When to add a colon and when not to

This example line

Genomic Results,Genomic,GENOMIC RESULTS:

would identify the following lines as the start of a Genomic Results section

Genomic 
Genomic: 
GENOMIC RESULTS:

The following would not be recognized as the start of a Genomic Results section

Genomic Result (additional word on the line)
GENOMIC RESULTS (missing trailing colon)

Seeing which sections are being recognized

You can use the SectionWriter component to see which sections of a document are being recognized

Adding the following line to DeepPhe.document.piper after the add Sectionizer line will create a subdirectory where files will be written containing the sections headings the system found in your reports/notes.

add org.healthnlp.deepphe.uima.cc.SectionWriter SubDirectory=SECTIONS

Seeing which sections are given more importance

You can add the SectionWriter multiple times to see how the sections change as the pipeline runs - after Sectionizer and then again after SectionRemover.

//  Discover sections.
add Sectionizer  

// Write out the sections recognized
add org.healthnlp.deepphe.uima.cc.SectionWriter SubDirectory=SECTIONS

//  Remove sections that should not be used by the rest of the pipeline.
add SectionRemover

// Using SectionWriter after SectionRemover writes out the headers of 
// the sections that will be annotated
add org.healthnlp.deepphe.uima.cc.SectionWriter SubDirectory=SECTIONS_IMPORTANT
⚠️ **GitHub.com Fallback** ⚠️