Data Access Instructions - wimlds/smart_cities GitHub Wiki
We have stored all the data on Google Cloud and will be using Google's BigQuery platform to access and download the data to our local machines using SQL commands (BigQuery SQL reference can be found here). The data is stored as tables in one of two databases:
- smart_cities_data, or
- bigquery-public-data.new_york
Check out here or the documentation pages of each data source for the exact table name or database information.
Python
To access the data through Python, you will first need to:
- Install the Google API Client
- Update Pandas to Version 19.0 or higher
To download 10 rows from the MIT Streetscore dataset to your local machine you run the code:
from pandas.io.gbq import read_gbq
project = "spheric-crow-161317"
sample_query = "SELECT * FROM `smart_cities_data.mit_streetscore` LIMIT 10"
df = read_gbq(query=sample_query, project_id=project, dialect='standard')
R
To access the data through R, you will first need to:
- Install the package bigrquery from GitHub using the command:
# install.packages('devtools')
devtools::install_github("rstats-db/bigrquery")
To count the number of trips by month from the TLC Yellow Taxi data from 2015 and download it to your computer you can use the code:
library(bigrquery)
projectid <- "spheric-crow-161317"
query <- "SELECT
TIMESTAMP_TRUNC(pickup_datetime,
MONTH) month,
COUNT(*) trips
FROM
`bigquery-public-data.new_york.tlc_yellow_trips_2015`
GROUP BY
1
ORDER BY
1"
df <- query_exec(query, projectid, useLegacySql = FALSE)