Data Sets - Texera/texera GitHub Wiki
Authors: Chen Li
Medline Abstracts
- Uploader: Zuozhi Wang and Chen Li
- Data size: 100K docs - 47MB (zipped), 1M docs - 531MB (zipped)
- Number of records: 100K docs, 1M docs
- Download URL: https://drive.google.com/drive/u/0/folders/0Bxp0qxtbSGxYd0s3NXZPQTJtUkE
- Sample records.
Each line is one separate record in JSON format. A sample record is as following:
{
"pmid":"19866847",
"affiliation":"Surgeon, U. S. A.",
"article_title":"ON THE APPEARANCE ......",
"authors":"W Reed",
"journal_issue":"2-5 Sep 1, 1897",
"journal_title":"The Journal of experimental medicine",
"keywords":"",
"mesh_headings":"",
"abstract":"1. The claim of L. Pfeiffer that ........",
"zipf_score":0.019866847
}
Twitter Data
- Uploader: Zuozhi Wang and Jianfeng Jia
- Data size: 10K tweets: 4MB (zipped), 30MB (unzipped), 200K tweets: 100MB (zipped), 700MB (unzipped)
- Number of records: 10K tweets, 200K tweets
- Download URL: https://drive.google.com/drive/folders/0Bxp0qxtbSGxYVWlENVVOTzA3QUE?usp=sharing
- Sample records:
Each line is one tweet and its information in JSON format. A sample record is as following:
{
"created_at": "Tue Nov 17 21:33:18 +0000 2015",
"id": 666730857898508288,
"id_str": "666730857898508288",
"text": "Get a major thrill out of ditching something\/someone way before they even get the opportunity to pull some shady activity.",
"source": "\u003ca href=\"http:\/\/twitter.com\/download\/iphone\" rel=\"nofollow\"\u003eTwitter for iPhone\u003c\/a\u003e",
"truncated": false,
"in_reply_to_status_id": null,
"in_reply_to_status_id_str": null,
"in_reply_to_user_id": null,
"in_reply_to_user_id_str": null,
"in_reply_to_screen_name": null,
"user": {
"id": 329833893,
"id_str": "329833893",
"name": "Jade Castillo",
"screen_name": "RealJadeMarie",
"location": "Los Angeles, CA",
"url": "http:\/\/Instagram.com\/realjademarie",
"description": null,
"protected": false,
"verified": false,
"followers_count": 1423,
"friends_count": 557,
"listed_count": 11,
"favourites_count": 35345,
"statuses_count": 40776,
"created_at": "Tue Jul 05 17:56:10 +0000 2011",
"utc_offset": -21600,
"time_zone": "Central Time (US & Canada)",
"geo_enabled": true,
"lang": "en",
"contributors_enabled": false,
"is_translator": false,
"profile_background_color": "FF6699",
"profile_background_image_url": "http:\/\/abs.twimg.com\/images\/themes\/theme11\/bg.gif",
"profile_background_image_url_https": "https:\/\/abs.twimg.com\/images\/themes\/theme11\/bg.gif",
"profile_background_tile": true,
"profile_link_color": "B40B43",
"profile_sidebar_border_color": "CC3366",
"profile_sidebar_fill_color": "E5507E",
"profile_text_color": "362720",
"profile_use_background_image": true,
"profile_image_url": "http:\/\/pbs.twimg.com\/profile_images\/665437037613457408\/8CCCd9iG_normal.jpg",
"profile_image_url_https": "https:\/\/pbs.twimg.com\/profile_images\/665437037613457408\/8CCCd9iG_normal.jpg",
"profile_banner_url": "https:\/\/pbs.twimg.com\/profile_banners\/329833893\/1445677148",
"default_profile": false,
"default_profile_image": false,
"following": null,
"follow_request_sent": null,
"notifications": null
},
"geo": null,
"coordinates": null,
"place": {
"id": "fbd6d2f5a4e4a15e",
"url": "https:\/\/api.twitter.com\/1.1\/geo\/id\/fbd6d2f5a4e4a15e.json",
"place_type": "admin",
"name": "California",
"full_name": "California, USA",
"country_code": "US",
"country": "United States",
"bounding_box": {
"type": "Polygon",
"coordinates": [
[
[-124.482003, 32.528832],
[-124.482003, 42.009519],
[-114.131212, 42.009519],
[-114.131212, 32.528832]
]
]
},
"attributes": {}
},
"contributors": null,
"is_quote_status": false,
"retweet_count": 0,
"favorite_count": 0,
"entities": {
"hashtags": [],
"urls": [],
"user_mentions": [],
"symbols": []
},
"favorited": false,
"retweeted": false,
"filter_level": "low",
"lang": "en",
"timestamp_ms": "1447795998440"
}
Proposal Data (in Chinese)
- Uploader: Qinhua Huang
- Data size: 150KB
- Num of records: N/A
- Download URL: https://drive.google.com/drive/folders/0B-xzsxV4BxGeQTdNQkRRek4xQ00?usp=sharing
- Sample records: Records are not strictly structured. Each record roughly contains 5 sections: project title, project source, project period, project leaders, project description. A sample record is as following:
(六)河南黄淮海平原中低产地区农业投资问题研究
项目来源:中国农科院农业经济研究所
起止年限:1988-1990年
主持人:卑圣模、王蕴娴
该项目系中国农科院农经所“七五”重点科研项目“黄淮海平原中低产地区综合治理开发与农业投资问题研究”的子课题......