Dataset Yelp's Academic Dataset - Rostlab/DM_CS_WS_2016-17 GitHub Wiki

Dataset Yelp's Academic Dataset

Summary

Yelp's Academic Dataset is a treasure trove of local business data, categories namely Restaurants, Hotels, Education, Travel, Local Services, and so on. The data is collated from 10 cities across 4 countries namely UK, Canada, Germany, and US. It would be interesting to know users opinions from these countries, cultural trends, predict seasonal effects on a particular business and about the kind of business which is in demand right now, if you ever want to start one of your own.

Prediction Goals

  • Figuring out the Trend Setters: Which Business made things popular?
  • Seasonal Effects: Is winter more preferred for a certain business?
  • Does Location has an Impact on the business?
  • Finding the expert users
  • Business trends and recommendations
  • Finding N-Grams in customer reviews, which words were frequently used for well reputed restaurants?

Long Description

Some facts about the yelp's Academic Dataset:
* Size: ~2.6 GB
* Format: JSON
* User Reviews: 2.7M
* Businesses: 86K
* Users: 687K

The Dataset can be used to predict plenty of interesting insights. It is diverse and has a huge collection of real world business reviews and customer expectations. By having a diverse set of cities, we can compare and contrast what makes a particular city different. What cuisines are people raving about in these different countries? Does location play a role in business success, which cities are favoring what kind of business? Which is the most trending business type, which people are in need of or is playing a major role in their daily life? Detecting Changepoints regarding a business like which event/review/time lead to a business failure.

All this predictions and findings will get us to know about the current trends regarding business success/failures, user expectations and opinions and so on. We can also submit this project to Yelp (No Deadline for Academic Research).

Notes on the Dataset

Each file is composed of a single object type, one json-object per-line.

Business: {
    'type': 'business',
    'business_id': (encrypted business id),
    'name': (business name),
    'neighborhoods': [(hood names)],
    'full_address': (localized address),
    'city': (city),
    'state': (state),
    'latitude': latitude,
    'longitude': longitude,
     ...

Users: {
   'type': 'user',
   'user_id': (encrypted user id),
   'name': (first name),
   'review_count': (review count),
   'average_stars': (floating point average, like 4.31),
   'votes': {(vote type): (count)},
   'friends': [(friend user_ids)],
   'elite': [(years_elite)],
   'yelping_since': (date, formatted like '2012-03'),
   'compliments': {
       (compliment_type): (num_compliments_of_this_type),
       ...
   },
   'fans': (num_fans),
  }

For further information check here.

Links / Data / Other