Dataset Yelp's Academic Dataset - Rostlab/DM_CS_WS_2016-17 GitHub Wiki
Dataset Yelp's Academic Dataset
- Proposer: Abhishek Kapoor -
@kapoorabhishek24
- [email protected] - Votes 🗳:
Summary
Yelp's Academic Dataset is a treasure trove of local business data, categories namely Restaurants, Hotels, Education, Travel, Local Services, and so on. The data is collated from 10 cities across 4 countries namely UK, Canada, Germany, and US. It would be interesting to know users opinions from these countries, cultural trends, predict seasonal effects on a particular business and about the kind of business which is in demand right now, if you ever want to start one of your own.
Prediction Goals
- Figuring out the Trend Setters: Which Business made things popular?
- Seasonal Effects: Is winter more preferred for a certain business?
- Does Location has an Impact on the business?
- Finding the expert users
- Business trends and recommendations
- Finding N-Grams in customer reviews, which words were frequently used for well reputed restaurants?
Long Description
Some facts about the yelp's Academic Dataset:
* Size: ~2.6 GB
* Format: JSON
* User Reviews: 2.7M
* Businesses: 86K
* Users: 687K
The Dataset can be used to predict plenty of interesting insights. It is diverse and has a huge collection of real world business reviews and customer expectations. By having a diverse set of cities, we can compare and contrast what makes a particular city different. What cuisines are people raving about in these different countries? Does location play a role in business success, which cities are favoring what kind of business? Which is the most trending business type, which people are in need of or is playing a major role in their daily life? Detecting Changepoints regarding a business like which event/review/time lead to a business failure.
All this predictions and findings will get us to know about the current trends regarding business success/failures, user expectations and opinions and so on. We can also submit this project to Yelp (No Deadline for Academic Research).
Notes on the Dataset
Each file is composed of a single object type, one json-object per-line.
Business: {
'type': 'business',
'business_id': (encrypted business id),
'name': (business name),
'neighborhoods': [(hood names)],
'full_address': (localized address),
'city': (city),
'state': (state),
'latitude': latitude,
'longitude': longitude,
...
Users: {
'type': 'user',
'user_id': (encrypted user id),
'name': (first name),
'review_count': (review count),
'average_stars': (floating point average, like 4.31),
'votes': {(vote type): (count)},
'friends': [(friend user_ids)],
'elite': [(years_elite)],
'yelping_since': (date, formatted like '2012-03'),
'compliments': {
(compliment_type): (num_compliments_of_this_type),
...
},
'fans': (num_fans),
}
For further information check here.