Data Questions - wimlds/smart_cities GitHub Wiki

Below are questions that the organizers found interesting. Please use these samples as inspiration for the question(s) you want to explore! We've separated the questions by track, but don't let that cramp your creativity!

In parenthesis, we've provided a link to a suggested dataset that you can use to answer the question. You may also wish to consider other datasets for another perspective or complementary information.

Describe the Dents

  1. Visualize the trends of different subway card uses. For example, are there peak times for student card use? (MTA Turnstiles)
  2. Based on building permits and 311 calls, where might building construction be taking place that is not permitted or is out of scope of what has been permitted? (311 Data)
  3. Deep dive into Uber and TLC data - Is Uber serving the outer boroughs more than Taxis? Are there differences in peak ride times? (TLC data, Uber
  4. What is the density of tobacco and liquor licenses near parks and schools? (Tobacco and Liquor, Parks and Schools))

Model the Mayhem

  1. Is there a correlation between rental prices and distance to parks or subway stations? Can this be modeled for various neighborhoods? (Zillow rents, Distances)
  2. Can you quantify how much money Uber is taking from the Taxi industry? (TLC data, Uber)
  3. Is Uber making rush hour traffic worse? (Uber, Traffic data)
  4. Measure asthma risk by number of asthma discharges in the SPARCS data and build a model based on park access, crime rate, highway length, 311 complaints, construction, tree type...(SPARCS, Park Access, Crime Rate, Highway lengths, 311 Complaints, Tree Data)
  5. Generate an interactive map of NYC (restricted area possibly, maybe a single subway line) that displays the safety of walking routes around a subway stop: Create an algorithm to calculate the safety score, find all possible routes from a point, map each route by its safety score. (Crime Data, Subway Stations)

Clean the Chaos

  1. Geocode crime data - can you find the overlap of zipcodes and precincts? (Crime Data, Geospatial data)
  2. Clean and find anomolies in the TLC data set. Are there: Unrealistic fares? Unrealistic travel speeds? Wrong GPS coordinates? Long distance rides to other major cities disguised as taxi rides? (TLC data)
  3. Clean and analyze turnstile data provided by the MTA. What is the popularity of stations by time of day? By time of year? Are there any interesting blips in the data? (MTA)