Sentence Segmentation from Unformatted Text using Language Modeling and Sequence Labeling Approaches - volodymyr-sokolov/publications GitHub Wiki

Conference Paper

Ievgen Iosifov , Olena Iosifova , Volodymyr Sokolov

Abstract

Current research devoted to the Natural Language Processing problem of sentence segmentation from raw text. The focus was directed to the task of segmentation of auto-generated transcripts for videos that do not have any punctuation and segmentation. Two general approaches to solve the problem of sentence segmentation were proposed and experiments concluded on a comparison of results of pre-trained transformer-based models. Research on how different approach of solving problem affects results were carried out. As a result, the sequence labeling approach turned out to be the most suitable.

https://ieeexplore.ieee.org/document/9468084 | 10.1109/PICST51311.2020.9468084

Keywords

fine-tuning; natural language process; NLP; sentence segmentation component; transformer

SciVal Topics

Computational Linguistics; Natural Language Processing Systems; Language Modeling


Publisher

2020 IEEE International Conference on Problems of Infocommunications. Science and Technology (PIC S&T)

6–9 October 2020 Kharkiv, Ukraine

First Online: 2 July 2021


Indices

  • INSPEC: 20778104

  • KUBG: 37097


Cite

CEUR-WS

I. Iosifov, O. Iosifova, V. Sokolov, Sentence Segmentation from Unformatted Text using Language Modeling and Sequence Labeling Approaches, in: IEEE 7th International Scientific and Practical Conference Problems of Infocommunications. Science and Technology (2020) 335–337. doi:10.1109/PICST51311.2020.9468084.

⚠️ **GitHub.com Fallback** ⚠️