experiment_emr - wuxuehong214/nPTAS GitHub Wiki
-
Data Set:500 Chinese electronic medical records (including chief complaint and current medical history) were collected from the Internet, and they are all related to Nephrosis. Download
-
The targets:extract the entities of "symptom" and "sign" from the medical records, and to distinguish between "yes"(have the symptom or sign)and "no"(not have the symptom or sign) on nPTAS.
-
Expreiment process:the first model is trained when 70 emrs are annotated, and the training process is repeated once every 20 annotated emrs are added
-
An example of EMR is shown below:
主 诉:左侧腰部痛10年,加重7日。
现病史:患者于起病时间:10年前无明显诱因起病缓急:突然出现症状左侧腰部痛,疼痛呈持续性胀痛,疼痛不向它处放射,伴随症状无尿频、尿急、尿痛、畏寒、发热、血尿、排尿困难,诊治情况发病后在当地医院诊治,诊断为左肾结石,经抗炎输液病情缓解。以后疼痛反复发作,每次经对症治疗好转。近7天来症状明显加重。在当地医院治疗经抗炎、解痉、镇痛等对症处理后无明显效果,为进一步诊断治疗,来我院就诊。门诊以左肾结石收住我科。患者自发病以来,精神状态较差,感畏寒,饮食一般,大便正常。
No | Number of annotated emrs | Number of annotated entites | Number of annotated entities for each type | Average annotation time(seconds) | IDCNN-CRF | BiLSTM-CRF | |||||||||
---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|
symptom_yes | symptom_no | sign_yes | sign_no | train duration(m) | precision(%) | recall(%) | F1 | train duration(m) | precision(%) | recall(%) | F1 | ||||
1 | 70 | 814 | 356 | 380 | 64 | 14 | 43 | 3.82 | 43.53 | 58.73 | 0.5 | 3.67 | 48.82 | 61.39 | 0.5439 |
2 | 90 | 1077 | 475 | 498 | 89 | 15 | 42 | 3.83 | 55.96 | 62.89 | 0.5922 | 4 | 59.55 | 71.62 | 0.6503 |
3 | 110 | 1323 | 577 | 610 | 117 | 19 | 43 | 5.91 | 40.86 | 57.14 | 0.4765 | 5.5 | 60.83 | 66.97 | 0.6376 |
4 | 130 | 1550 | 671 | 726 | 133 | 20 | 39 | 5.11 | 53.11 | 61.04 | 0.568 | 5.28 | 59.56 | 72.57 | 0.6547 |
5 | 150 | 1834 | 818 | 841 | 153 | 22 | 41 | 4.9 | 46.9 | 59.55 | 0.5248 | 5.6 | 59.89 | 76.26 | 0.6709 |
6 | 170 | 2063 | 932 | 941 | 168 | 22 | 40 | 6.23 | 53.49 | 63.19 | 0.5793 | 6.9 | 56.3 | 65.3 | 0.6047 |
7 | 190 | 2314 | 1069 | 1037 | 182 | 26 | 38 | 5.99 | 61.35 | 70 | 0.6539 | 6.4 | 61.13 | 74.02 | 0.6696 |
8 | 210 | 2566 | 1175 | 1155 | 208 | 28 | 37 | 5.93 | 63.84 | 71.19 | 0.6732 | 6.03 | 66.4 | 74.77 | 0.7034 |
9 | 230 | 2794 | 1272 | 1262 | 230 | 30 | 36 | 6.8 | 63.31 | 74.43 | 0.6842 | 6.11 | 62.7 | 76.17 | 0.6878 |
10 | 250 | 3041 | 1391 | 1366 | 253 | 31 | 37 | 7.11 | 58.59 | 65.82 | 0.62 | 7.25 | 56.65 | 64.39 | 0.6027 |
11 | 270 | 3300 | 1497 | 1488 | 282 | 33 | 35 | 6.82 | 64.42 | 72.43 | 0.6819 | 6.73 | 62.29 | 72.19 | 0.6687 |
12 | 290 | 3529 | 1597 | 1600 | 294 | 38 | 33 | 6.4 | 69.65 | 74.49 | 0.7199 | 6.71 | 72.8 | 78.27 | 0.7544 |
13 | 310 | 3776 | 1716 | 1704 | 315 | 41 | 36 | 6.63 | 62.72 | 73.75 | 0.6779 | 6.56 | 56.57 | 70.75 | 0.6287 |
14 | 330 | 4018 | 1852 | 1790 | 333 | 43 | 33 | 7.11 | 63.16 | 75 | 0.6857 | 7.2 | 68.6 | 79.18 | 0.7351 |
15 | 350 | 4239 | 1994 | 1898 | 353 | 44 | 34 | 7.18 | 69.18 | 73.46 | 0.7125 | 7.15 | 67.21 | 75.06 | 0.7092 |
16 | 370 | 4487 | 2061 | 2006 | 375 | 45 | 31 | 9.21 | 62.92 | 68.8 | 0.6573 | 8.33 | 64.22 | 74.64 | 0.6904 |
17 | 390 | 4713 | 2157 | 2122 | 388 | 46 | 35 | 7.8 | 67.04 | 73.96 | 0.7033 | 7.3 | 66.67 | 75.92 | 0.7099 |
18 | 410 | 4953 | 2269 | 2228 | 410 | 46 | 32 | 7.18 | 73.61 | 80.16 | 0.7674 | 7.77 | 69.73 | 76.45 | 0.7293 |
19 | 430 | 5188 | 2379 | 2360 | 432 | 51 | 29 | 8.38 | 71.89 | 80.06 | 0.7576 | 7.82 | 69.74 | 77.62 | 0.7347 |
20 | 450 | 5417 | 2476 | 2438 | 449 | 54 | 31 | 7.8 | 72.11 | 79.89 | 0.758 | 7.73 | 70.07 | 77.73 | 0.737 |
21 | 470 | 5677 | 2592 | 2559 | 468 | 58 | 27 | 7.67 | 72.04 | 77.76 | 0.7479 | 7.78 | 74.56 | 78.74 | 0.7659 |
22 | 500 | 6005 | 2743 | 2722 | 481 | 59 | 28 | 9.55 | 72.88 | 80.7 | 0.7659 | 9.63 | 73.34 | 80.31 | 0.7667 |
In the "project management" module, click new project and fill in the basic information of the project, as shown in the following figure:
Configure the project with data sets, participants, annotation objectives, annotation specifications, etc. 500 data sets need to be annotated in this project, including 2 annotation personnel, 2 reviewers , and 4 annotation tags:
After the project configuration is completed, the annotation personnel can not view the project until it is published.
The annotation personnel log in to the system, choose the project, collect the task and annotate it, as shown in the following figure:
The corpus reviewer log in the system, choose the project, collect the task and review it , as shown in the following figure:
When the corpus is approved, we can query and check the details of the corpus :
In order to reflect the effectiveness of the system, the first model is trained when 70 emrs are annotated, and the training process is repeated once every 20 annotated emrs are added. In this experiment, two algorithms are used for model training, they are BiLSTM-CRF and IDCNN-CRF. Each time an iterative model is generated, the recorded experimental process data include: the number of annotated emrs, the number of annotated entities, the average annotation time of each data set (calculated based on the start time and completion time of the task), the time spent in model training, the accuracy of the model(precision), recall rate and the comprehensive index F1 value.。
The detail processes of the experiment as follows!
- Number of annotated entities: 814
- sysmptom_yes:356
- symptom_no:380
- sign_yes:64
- sign_no: 14
- Average annotation time(Seconds):43
- Corpus from 70 emrs:Download
- Create the training task
-
Choose the algorithm
-
Train the model
-
Report of the model(BiLSTM-CRF & IDCNN-CRF)
- Number of annotated entities: 1077
- sysmptom_yes:475
- symptom_no:498
- sign_yes:89
- sign_no: 15
- Average annotation time(Seconds):42
- Corpus from 90 emrs:Download
- Create the training task
-
Train the model
-
Report of the model(BiLSTM-CRF & IDCNN-CRF)
- Number of annotated entities: 1323
- sysmptom_yes:577
- symptom_no:610
- sign_yes:117
- sign_no: 19
- Average annotation time(Seconds):42
- Corpus from 110 emrs:Download
- Create the training task
-
Train the model
-
Report of the model(BiLSTM-CRF & IDCNN-CRF)
- Number of annotated entities: 1550
- sysmptom_yes:671
- symptom_no:726
- sign_yes:133
- sign_no: 20
- Average annotation time(Seconds):39
- Corpus from 130 emrs:Download
- Create the training task
-
Train the model
-
Report of the model(BiLSTM-CRF & IDCNN-CRF)
- Number of annotated entities: 1834
- sysmptom_yes:818
- symptom_no:841
- sign_yes:153
- sign_no: 22
- Average annotation time(Seconds):41
- Corpus from 150 emrs:Download
-
Train the model
-
Report of the model(BiLSTM-CRF & IDCNN-CRF)
- Number of annotated entities: 2063
- sysmptom_yes:932
- symptom_no:941
- sign_yes:168
- sign_no: 22
- Average annotation time(Seconds):40
- Corpus from 170 emrs:Download
- Create the training task
-
Train the model
-
Report of the model(BiLSTM-CRF & IDCNN-CRF)
- Number of annotated entities: 2314
- sysmptom_yes:1069
- symptom_no:1037
- sign_yes:182
- sign_no: 26
- Average annotation time(Seconds):38
- Corpus from 190 emrs:Download
- Create the training task
-
Train the model
-
Report of the model(BiLSTM-CRF & IDCNN-CRF)
- Number of annotated entities: 2566
- sysmptom_yes:1175
- symptom_no:1155
- sign_yes:208
- sign_no: 28
- Average annotation time(Seconds):37
- Corpus from 210 emrs:Download
- Create the training task
-
Train the model
-
Report of the model(BiLSTM-CRF & IDCNN-CRF)
- Number of annotated entities: 2794
- sysmptom_yes:1272
- symptom_no:1262
- sign_yes:230
- sign_no: 30
- Average annotation time(Seconds):36
- Corpus from 230 emrs:Download
- Create the training task
-
Train the model
-
Report of the model(BiLSTM-CRF & IDCNN-CRF)
- Number of annotated entities: 3041
- sysmptom_yes:1391
- symptom_no:1366
- sign_yes:253
- sign_no: 31
- Average annotation time(Seconds):37
- Corpus from 250 emrs:Download
- Create the training task
-
Train the model
-
Report of the model(BiLSTM-CRF & IDCNN-CRF)
- Number of annotated entities: 3300
- sysmptom_yes:1497
- symptom_no:1488
- sign_yes:282
- sign_no: 33
- Average annotation time(Seconds):35
- Corpus from 270 emrs:Download
- Create the training task
-
Train the model
-
Report of the model(BiLSTM-CRF & IDCNN-CRF)
- Number of annotated entities: 3529
- sysmptom_yes:1597
- symptom_no:1600
- sign_yes:294
- sign_no: 38
- Average annotation time(Seconds):33
- Corpus from 290 emrs:Download
- Create the training task
-
Train the model
-
Report of the model(BiLSTM-CRF & IDCNN-CRF)
- Number of annotated entities: 3776
- sysmptom_yes:1716
- symptom_no:1704
- sign_yes:315
- sign_no: 41
- Average annotation time(Seconds):36
- Corpus from 310 emrs:Download
- Create the training task
-
Train the model
-
Report of the model(BiLSTM-CRF & IDCNN-CRF)
- Number of annotated entities: 4018
- sysmptom_yes:1852
- symptom_no:1790
- sign_yes:333
- sign_no: 43
- Average annotation time(Seconds):33
- Corpus from 330 emrs:Download
- Create the training task
-
Train the model
-
Report of the model(BiLSTM-CRF & IDCNN-CRF)
- Number of annotated entities: 4239
- sysmptom_yes:1994
- symptom_no:1898
- sign_yes:353
- sign_no:44
- Average annotation time(Seconds):34
- Corpus from 350 emrs:Download
- Create the training task
-
Train the model
-
Report of the model(BiLSTM-CRF & IDCNN-CRF)
- Number of annotated entities: 4487
- sysmptom_yes:2061
- symptom_no:2006
- sign_yes:375
- sign_no:45
- Average annotation time(Seconds):31
- Corpus from 370 emrs:Download
- Create the training task
-
Train the model
-
Report of the model(BiLSTM-CRF & IDCNN-CRF)
- Number of annotated entities: 4713
- sysmptom_yes:2157
- symptom_no:2122
- sign_yes:388
- sign_no:46
- Average annotation time(Seconds):32
- Corpus from 390 emrs:Download
- Create the training task
-
Train the model
-
Report of the model(BiLSTM-CRF & IDCNN-CRF)
- Number of annotated entities: 4953
- sysmptom_yes:2269
- symptom_no:2228
- sign_yes:410
- sign_no:46
- Average annotation time(Seconds):32
- Corpus from 410 emrs:Download
- Create the training task
-
Train the model
-
Report of the model(BiLSTM-CRF & IDCNN-CRF)
- Number of annotated entities: 5188
- sysmptom_yes:2379
- symptom_no:2360
- sign_yes:432
- sign_no:51
- Average annotation time(Seconds):29
- Corpus from 430 emrs:Download
- Create the training task
-
Train the model
-
Report of the model(BiLSTM-CRF & IDCNN-CRF)
- Number of annotated entities: 5417
- sysmptom_yes:2476
- symptom_no:2438
- sign_yes:449
- sign_no:54
- Average annotation time(Seconds):31
- Corpus from 450 emrs:Download
- Create the training task
-
Train the model
-
Report of the model(BiLSTM-CRF & IDCNN-CRF)
- Number of annotated entities: 5677
- sysmptom_yes:2592
- symptom_no:2559
- sign_yes:468
- sign_no:58
- Average annotation time(Seconds):27
- Corpus from 470 emrs:Download
- Create the training task
-
Train the model
-
Report of the model(BiLSTM-CRF & IDCNN-CRF)
- Number of annotated entities: 6005
- sysmptom_yes:2743
- symptom_no:2722
- sign_yes:481
- sign_no:59
- Average annotation time(Seconds):28
- Corpus from 500 emrs:Download
- Create the training task
-
Train the model
-
Report of the model(BiLSTM-CRF & IDCNN-CRF)