Data Access Instructions - wimlds/smart_cities GitHub Wiki

We have stored all the data on Google Cloud and will be using Google's BigQuery platform to access and download the data to our local machines using SQL commands (BigQuery SQL reference can be found here). The data is stored as tables in one of two databases:

  • smart_cities_data, or
  • bigquery-public-data.new_york

Check out here or the documentation pages of each data source for the exact table name or database information.

Python

To access the data through Python, you will first need to:

To download 10 rows from the MIT Streetscore dataset to your local machine you run the code:

from pandas.io.gbq import read_gbq

project = "spheric-crow-161317"
sample_query = "SELECT * FROM `smart_cities_data.mit_streetscore` LIMIT 10"

df = read_gbq(query=sample_query, project_id=project, dialect='standard')

R

To access the data through R, you will first need to:

  • Install the package bigrquery from GitHub using the command:
# install.packages('devtools')
devtools::install_github("rstats-db/bigrquery")

To count the number of trips by month from the TLC Yellow Taxi data from 2015 and download it to your computer you can use the code:

library(bigrquery)

projectid <- "spheric-crow-161317"
query <- "SELECT
  TIMESTAMP_TRUNC(pickup_datetime,
    MONTH) month,
  COUNT(*) trips
FROM
  `bigquery-public-data.new_york.tlc_yellow_trips_2015`
GROUP BY
  1
ORDER BY
  1"

df <- query_exec(query, projectid, useLegacySql = FALSE)