experiment_emr - wuxuehong214/nPTAS GitHub Wiki

Experiment: Entity extraction from electronic medical records

Experimental introduction

  • Data Set:500 Chinese electronic medical records (including chief complaint and current medical history) were collected from the Internet, and they are all related to Nephrosis. Download

  • The targets:extract the entities of "symptom" and "sign" from the medical records, and to distinguish between "yes"(have the symptom or sign)and "no"(not have the symptom or sign) on nPTAS.

  • Expreiment process:the first model is trained when 70 emrs are annotated, and the training process is repeated once every 20 annotated emrs are added

  • An example of EMR is shown below:

主 诉:左侧腰部痛10年,加重7日。
现病史:患者于起病时间:10年前无明显诱因起病缓急:突然出现症状左侧腰部痛,疼痛呈持续性胀痛,疼痛不向它处放射,伴随症状无尿频、尿急、尿痛、畏寒、发热、血尿、排尿困难,诊治情况发病后在当地医院诊治,诊断为左肾结石,经抗炎输液病情缓解。以后疼痛反复发作,每次经对症治疗好转。近7天来症状明显加重。在当地医院治疗经抗炎、解痉、镇痛等对症处理后无明显效果,为进一步诊断治疗,来我院就诊。门诊以左肾结石收住我科。患者自发病以来,精神状态较差,感畏寒,饮食一般,大便正常。

Experimental Results

A series of statistical indexes in the process of experiment

No Number of annotated emrs Number of annotated entites Number of annotated entities for each type Average annotation time(seconds) IDCNN-CRF BiLSTM-CRF
symptom_yes symptom_no sign_yes sign_no train duration(m) precision(%) recall(%) F1 train duration(m) precision(%) recall(%) F1
1 70 814 356 380 64 14 43 3.82 43.53 58.73 0.5 3.67 48.82 61.39 0.5439
2 90 1077 475 498 89 15 42 3.83 55.96 62.89 0.5922 4 59.55 71.62 0.6503
3 110 1323 577 610 117 19 43 5.91 40.86 57.14 0.4765 5.5 60.83 66.97 0.6376
4 130 1550 671 726 133 20 39 5.11 53.11 61.04 0.568 5.28 59.56 72.57 0.6547
5 150 1834 818 841 153 22 41 4.9 46.9 59.55 0.5248 5.6 59.89 76.26 0.6709
6 170 2063 932 941 168 22 40 6.23 53.49 63.19 0.5793 6.9 56.3 65.3 0.6047
7 190 2314 1069 1037 182 26 38 5.99 61.35 70 0.6539 6.4 61.13 74.02 0.6696
8 210 2566 1175 1155 208 28 37 5.93 63.84 71.19 0.6732 6.03 66.4 74.77 0.7034
9 230 2794 1272 1262 230 30 36 6.8 63.31 74.43 0.6842 6.11 62.7 76.17 0.6878
10 250 3041 1391 1366 253 31 37 7.11 58.59 65.82 0.62 7.25 56.65 64.39 0.6027
11 270 3300 1497 1488 282 33 35 6.82 64.42 72.43 0.6819 6.73 62.29 72.19 0.6687
12 290 3529 1597 1600 294 38 33 6.4 69.65 74.49 0.7199 6.71 72.8 78.27 0.7544
13 310 3776 1716 1704 315 41 36 6.63 62.72 73.75 0.6779 6.56 56.57 70.75 0.6287
14 330 4018 1852 1790 333 43 33 7.11 63.16 75 0.6857 7.2 68.6 79.18 0.7351
15 350 4239 1994 1898 353 44 34 7.18 69.18 73.46 0.7125 7.15 67.21 75.06 0.7092
16 370 4487 2061 2006 375 45 31 9.21 62.92 68.8 0.6573 8.33 64.22 74.64 0.6904
17 390 4713 2157 2122 388 46 35 7.8 67.04 73.96 0.7033 7.3 66.67 75.92 0.7099
18 410 4953 2269 2228 410 46 32 7.18 73.61 80.16 0.7674 7.77 69.73 76.45 0.7293
19 430 5188 2379 2360 432 51 29 8.38 71.89 80.06 0.7576 7.82 69.74 77.62 0.7347
20 450 5417 2476 2438 449 54 31 7.8 72.11 79.89 0.758 7.73 70.07 77.73 0.737
21 470 5677 2592 2559 468 58 27 7.67 72.04 77.76 0.7479 7.78 74.56 78.74 0.7659
22 500 6005 2743 2722 481 59 28 9.55 72.88 80.7 0.7659 9.63 73.34 80.31 0.7667

A series of training tasks in the process of experiment on nPTAS

tasks

The detail processes of the experimental

1 Create a project

1.1 Input project basic information

In the "project management" module, click new project and fill in the basic information of the project, as shown in the following figure:

create project

1.2 Project initialization configuration

Configure the project with data sets, participants, annotation objectives, annotation specifications, etc. 500 data sets need to be annotated in this project, including 2 annotation personnel, 2 reviewers , and 4 annotation tags:

Configuration

1.3 Publish the project

After the project configuration is completed, the annotation personnel can not view the project until it is published.

2 Corpus annotation

2.1 Visual annotation

The annotation personnel log in to the system, choose the project, collect the task and annotate it, as shown in the following figure:

annotation

2.2 Review the corpus

The corpus reviewer log in the system, choose the project, collect the task and review it , as shown in the following figure:

Review

2.3 Query the corpus

When the corpus is approved, we can query and check the details of the corpus :

Query

3 Train the model

In order to reflect the effectiveness of the system, the first model is trained when 70 emrs are annotated, and the training process is repeated once every 20 annotated emrs are added. In this experiment, two algorithms are used for model training, they are BiLSTM-CRF and IDCNN-CRF. Each time an iterative model is generated, the recorded experimental process data include: the number of annotated emrs, the number of annotated entities, the average annotation time of each data set (calculated based on the start time and completion time of the task), the time spent in model training, the accuracy of the model(precision), recall rate and the comprehensive index F1 value.。

The detail processes of the experiment as follows!

3.1 【70】 emrs are annoatated

  • Number of annotated entities: 814
  • sysmptom_yes:356
  • symptom_no:380
  • sign_yes:64
  • sign_no: 14
  • Average annotation time(Seconds):43
  • Corpus from 70 emrs:Download
  • Create the training task

task

  • Choose the algorithm alg

  • Train the model train

  • Report of the model(BiLSTM-CRF & IDCNN-CRF)

bilstm idcnn

3.2 【90】 emrs are annoatated

  • Number of annotated entities: 1077
  • sysmptom_yes:475
  • symptom_no:498
  • sign_yes:89
  • sign_no: 15
  • Average annotation time(Seconds):42
  • Corpus from 90 emrs:Download
  • Create the training task

task

  • Train the model train

  • Report of the model(BiLSTM-CRF & IDCNN-CRF)

bilstm idcnn

3.3 【110】 emrs are annoatated

  • Number of annotated entities: 1323
  • sysmptom_yes:577
  • symptom_no:610
  • sign_yes:117
  • sign_no: 19
  • Average annotation time(Seconds):42
  • Corpus from 110 emrs:Download
  • Create the training task

task

  • Train the model train

  • Report of the model(BiLSTM-CRF & IDCNN-CRF)

bilstm idcnn

3.4 【130】 emrs are annoatated

  • Number of annotated entities: 1550
  • sysmptom_yes:671
  • symptom_no:726
  • sign_yes:133
  • sign_no: 20
  • Average annotation time(Seconds):39
  • Corpus from 130 emrs:Download
  • Create the training task

task

  • Train the model train

  • Report of the model(BiLSTM-CRF & IDCNN-CRF)

bilstm idcnn

3.5 【150】 emrs are annoatated

  • Number of annotated entities: 1834
  • sysmptom_yes:818
  • symptom_no:841
  • sign_yes:153
  • sign_no: 22
  • Average annotation time(Seconds):41
  • Corpus from 150 emrs:Download
  • Train the model train

  • Report of the model(BiLSTM-CRF & IDCNN-CRF)

bilstm idcnn

3.6 【170】 emrs are annoatated

  • Number of annotated entities: 2063
  • sysmptom_yes:932
  • symptom_no:941
  • sign_yes:168
  • sign_no: 22
  • Average annotation time(Seconds):40
  • Corpus from 170 emrs:Download
  • Create the training task

task

  • Train the model train

  • Report of the model(BiLSTM-CRF & IDCNN-CRF)

bilstm idcnn

3.7 【190】 emrs are annoatated

  • Number of annotated entities: 2314
  • sysmptom_yes:1069
  • symptom_no:1037
  • sign_yes:182
  • sign_no: 26
  • Average annotation time(Seconds):38
  • Corpus from 190 emrs:Download
  • Create the training task

task

  • Train the model train

  • Report of the model(BiLSTM-CRF & IDCNN-CRF)

bilstm idcnn

3.8 【210】 emrs are annoatated

  • Number of annotated entities: 2566
  • sysmptom_yes:1175
  • symptom_no:1155
  • sign_yes:208
  • sign_no: 28
  • Average annotation time(Seconds):37
  • Corpus from 210 emrs:Download
  • Create the training task

task

  • Train the model train

  • Report of the model(BiLSTM-CRF & IDCNN-CRF)

bilstm idcnn

3.9 【230】 emrs are annoatated

  • Number of annotated entities: 2794
  • sysmptom_yes:1272
  • symptom_no:1262
  • sign_yes:230
  • sign_no: 30
  • Average annotation time(Seconds):36
  • Corpus from 230 emrs:Download
  • Create the training task

task

  • Train the model train

  • Report of the model(BiLSTM-CRF & IDCNN-CRF)

bilstm idcnn

3.10 【250】 emrs are annoatated

  • Number of annotated entities: 3041
  • sysmptom_yes:1391
  • symptom_no:1366
  • sign_yes:253
  • sign_no: 31
  • Average annotation time(Seconds):37
  • Corpus from 250 emrs:Download
  • Create the training task

task

  • Train the model train

  • Report of the model(BiLSTM-CRF & IDCNN-CRF)

bilstm idcnn

3.11 【270】 emrs are annoatated

  • Number of annotated entities: 3300
  • sysmptom_yes:1497
  • symptom_no:1488
  • sign_yes:282
  • sign_no: 33
  • Average annotation time(Seconds):35
  • Corpus from 270 emrs:Download
  • Create the training task

task

  • Train the model train

  • Report of the model(BiLSTM-CRF & IDCNN-CRF)

bilstm idcnn

3.12 【290】 emrs are annoatated

  • Number of annotated entities: 3529
  • sysmptom_yes:1597
  • symptom_no:1600
  • sign_yes:294
  • sign_no: 38
  • Average annotation time(Seconds):33
  • Corpus from 290 emrs:Download
  • Create the training task

task

  • Train the model train

  • Report of the model(BiLSTM-CRF & IDCNN-CRF)

bilstm idcnn

3.13 【310】 emrs are annoatated

  • Number of annotated entities: 3776
  • sysmptom_yes:1716
  • symptom_no:1704
  • sign_yes:315
  • sign_no: 41
  • Average annotation time(Seconds):36
  • Corpus from 310 emrs:Download
  • Create the training task

task

  • Train the model train

  • Report of the model(BiLSTM-CRF & IDCNN-CRF)

bilstm idcnn

3.14 【330】 emrs are annoatated

  • Number of annotated entities: 4018
  • sysmptom_yes:1852
  • symptom_no:1790
  • sign_yes:333
  • sign_no: 43
  • Average annotation time(Seconds):33
  • Corpus from 330 emrs:Download
  • Create the training task

task

  • Train the model train

  • Report of the model(BiLSTM-CRF & IDCNN-CRF)

bilstm idcnn

3.15 【350】 emrs are annoatated

  • Number of annotated entities: 4239
  • sysmptom_yes:1994
  • symptom_no:1898
  • sign_yes:353
  • sign_no:44
  • Average annotation time(Seconds):34
  • Corpus from 350 emrs:Download
  • Create the training task

task

  • Train the model train

  • Report of the model(BiLSTM-CRF & IDCNN-CRF)

bilstm idcnn

3.16 【370】 emrs are annoatated

  • Number of annotated entities: 4487
  • sysmptom_yes:2061
  • symptom_no:2006
  • sign_yes:375
  • sign_no:45
  • Average annotation time(Seconds):31
  • Corpus from 370 emrs:Download
  • Create the training task

task

  • Train the model train

  • Report of the model(BiLSTM-CRF & IDCNN-CRF)

bilstm idcnn

3.17 【390】 emrs are annoatated

  • Number of annotated entities: 4713
  • sysmptom_yes:2157
  • symptom_no:2122
  • sign_yes:388
  • sign_no:46
  • Average annotation time(Seconds):32
  • Corpus from 390 emrs:Download
  • Create the training task

task

  • Train the model train

  • Report of the model(BiLSTM-CRF & IDCNN-CRF)

bilstm idcnn

3.18 【410】 emrs are annoatated

  • Number of annotated entities: 4953
  • sysmptom_yes:2269
  • symptom_no:2228
  • sign_yes:410
  • sign_no:46
  • Average annotation time(Seconds):32
  • Corpus from 410 emrs:Download
  • Create the training task

task

  • Train the model train

  • Report of the model(BiLSTM-CRF & IDCNN-CRF)

bilstm idcnn

3.19 【430】 emrs are annoatated

  • Number of annotated entities: 5188
  • sysmptom_yes:2379
  • symptom_no:2360
  • sign_yes:432
  • sign_no:51
  • Average annotation time(Seconds):29
  • Corpus from 430 emrs:Download
  • Create the training task

task

  • Train the model train

  • Report of the model(BiLSTM-CRF & IDCNN-CRF)

bilstm idcnn

3.20 【450】 emrs are annoatated

  • Number of annotated entities: 5417
  • sysmptom_yes:2476
  • symptom_no:2438
  • sign_yes:449
  • sign_no:54
  • Average annotation time(Seconds):31
  • Corpus from 450 emrs:Download
  • Create the training task

task

  • Train the model train

  • Report of the model(BiLSTM-CRF & IDCNN-CRF)

bilstm idcnn

3.21 【470】 emrs are annoatated

  • Number of annotated entities: 5677
  • sysmptom_yes:2592
  • symptom_no:2559
  • sign_yes:468
  • sign_no:58
  • Average annotation time(Seconds):27
  • Corpus from 470 emrs:Download
  • Create the training task

task

  • Train the model train

  • Report of the model(BiLSTM-CRF & IDCNN-CRF)

bilstm idcnn

3.22 【500】 emrs are annoatated

  • Number of annotated entities: 6005
  • sysmptom_yes:2743
  • symptom_no:2722
  • sign_yes:481
  • sign_no:59
  • Average annotation time(Seconds):28
  • Corpus from 500 emrs:Download
  • Create the training task

task

  • Train the model train

  • Report of the model(BiLSTM-CRF & IDCNN-CRF)

bilstm idcnn

Overall of the experimental results

results

⚠️ **GitHub.com Fallback** ⚠️