cnn dailymail stats info - lngvietthang/das GitHub Wiki
#CNN/Dailymail Dataset Statistical Information:
cnn | dailymail | |
---|---|---|
Number of Document | 92454 | 219484 |
The Content Vocabulary Size | 281460 | 495313 |
The Highlights Vocabulary Size | 87696 | 169412 |
The Average Number of Word in Content | 672 | 717 |
The Average Number of Word in Highlight | 45 | 54 |
The Average Number of Sentences in Content | 34 | 35 |
The Average Number of Sentences in Highlight | 4 | 4 |
The Average Number of New Words in Highlight | 9 | 8 |
Some dataset's statistical information: | ||
![]() |
||
![]() |
||
![]() |
||
![]() |
||
![]() |
||
![]() |
||
![]() |