Doc: TLC - wimlds/smart_cities GitHub Wiki

The New York TLC taxi data set is a public data set provided by the Taxi and Limousine Commission (TLC). It includes trip records from all trips completed in yellow and green taxis in New York City. Data are available for the year 2009-2016 for the yellow, 2013-16 for the green taxis. Each trip record contains pick-up and drop-off time, pick-up and drop-off locations, trip distances, itemized fares (recorded through meters installed in each taxi), and driver-reported passenger counts, tips and payment info, and several other pieces of information.
The vehicles-for-higher (VFH) data, available for 2015 and 2016, contains fewer variables and does not contain latitude-longitude pickup and drop-off location, but only a location id.

The data are organized in separate files by vehicle type (yellow, green cab, or VFH) and by year. Be careful as these datasets are extremely large! The green taxi data are the smallest with millions of rows each year (one row per ride), each of the VFH datasets has about 60 million rows, and each year of yellow taxi data contains 100-200 million rows.

TLC Yellow Taxi Trip Data


Table path:

  • `bigquery-public-data.new_york.tlc_yellow_trips_2009`
  • `bigquery-public-data.new_york.tlc_yellow_trips_2010`
  • `bigquery-public-data.new_york.tlc_yellow_trips_2011`
  • `bigquery-public-data.new_york.tlc_yellow_trips_2012`
  • `bigquery-public-data.new_york.tlc_yellow_trips_2013`
  • `bigquery-public-data.new_york.tlc_yellow_trips_2014`
  • `bigquery-public-data.new_york.tlc_yellow_trips_2015`
  • `bigquery-public-data.new_york.tlc_yellow_trips_2016`
Column Name Explanation
vendor_id String A code indicating the TPEP provider that provided the record. 1= Creative Mobile Technologies, LLC; 2= VeriFone Inc
pickup_datetime Timestamp The date and time when the meter was engaged.
dropoff_datetime Timestamp The date and time when the meter was disengaged.
passenger_count Integer The number of passengers in the vehicle. This is a driver-entered value.
trip_distance Float The elapsed trip distance in miles reported by the taximeter.
pickup_longitude Float Longitude where the meter was engaged.
pickup_latitude Float Latitude where the meter was engaged
rate_code Integer The final rate code in effect at the end of the trip. 1= Standard rate 2=JFK 3=Newark 4=Nassau or Westchester 5=Negotiated fare 6=Group ride
store_and_fwd_flag String This flag indicates whether the trip record was held in vehicle memory before sending to the vendor, aka “store and forward,” because the vehicle did not have a connection to the server. Y= store and forward trip N= not a store and forward trip
dropoff_longitude Float Longitude where the meter was disengaged.
dropoff_latitude Float Latitude where the meter was disengaged
payment_type String A numeric code signifying how the passenger paid for the trip. 1= Credit card 2= Cash 3= No charge 4= Dispute 5= Unknown 6= Voided trip
fare_amount Float The time-and-distance fare calculated by the meter
extra Float Miscellaneous extras and surcharges. Currently, this only includes the $0.50 and $1 rush hour and overnight charges.
mta_tax Float $0.50 MTA tax that is automatically triggered based on the metered rate in use
tip_amount Float Tip amount – This field is automatically populated for credit card tips. Cash tips are not included.
tolls_amount Float Total amount of all tolls paid in trip.
total_amount Float The total amount charged to passengers. Does not include cash tips
imp_surcharge Float $0.30 improvement surcharge assessed trips at the flag drop. The improvement surcharge began being levied in 2015.

TLC Green Taxi Trip Data


Table path:

  • `bigquery-public-data.new_york.tlc_green_trips_2013`
  • `bigquery-public-data.new_york.tlc_green_trips_2014`
  • `bigquery-public-data.new_york.tlc_green_trips_2015`
  • `bigquery-public-data.new_york.tlc_green_trips_2016`
Column Name Explanation
vendor_id String A code indicating the TPEP provider that provided the record. 1= Creative Mobile Technologies, LLC; 2= VeriFone Inc
pickup_datetime Timestamp The date and time when the meter was engaged.
dropoff_datetime Timestamp The date and time when the meter was disengaged.
store_and_fwd_flag String This flag indicates whether the trip record was held in vehicle memory before sending to the vendor, aka “store and forward,” because the vehicle did not have a connection to the server. Y= store and forward trip N= not a store and forward trip
rate_code Integer The final rate code in effect at the end of the trip. 1= Standard rate 2=JFK 3=Newark 4=Nassau or Westchester 5=Negotiated fare 6=Group ride
pickup_longitude Float Longitude where the meter was engaged
pickup_latitude Float Latitude where the meter was engaged.
dropoff_longitude Float Longitude where the meter was timed off.
dropoff_latitude Float Latitude where the meter was timed off.
passenger_count Integer The number of passengers in the vehicle. This is a driver-entered value.
trip_distance Float The elapsed trip distance in miles reported by the taximeter.
fare_amount Float The time-and-distance fare calculated by the meter
extra Float Miscellaneous extras and surcharges. Currently, this only includes the $0.50 and $1 rush hour and overnight charges
mta_tax Float $0.50 MTA tax that is automatically triggered based on the metered rate in use
tip_amount Float Tip amount – This field is automatically populated for credit card tips. Cash tips are not included.
tolls_amount Float Total amount of all tolls paid in trip.
ehail_fee Float Describe this field...
total_amount Float The total amount charged to passengers. Does not include cash tips.
payment_type Integer A numeric code signifying how the passenger paid for the trip. 1= Credit card 2= Cash 3= No charge 4= Dispute 5= Unknown 6= Voided trip
distance_between_service Float
time_between_service Integer
trip_type Integer A code indicating whether the trip was a street-hail or a dispatch that is automatically assigned based on the metered rate in use but can be altered by the driver. 1= Street-hail 2= Dispatch
imp_surcharge Float $0.30 improvement surcharge assessed on hailed trips at the flag drop. The improvement surcharge began being levied in 2015.

TLC VFH Trip Data

More information on the VFH data is available on this fiveThirtyEight github repository, togwther with Uber data form 2014.


Table path:

  • `bigquery-public-data.new_york.tlc_fhv_trips_2015`
  • `bigquery-public-data.new_york.tlc_fhv_trips_2016`
Column Name Explanation
location_id Integer The TLC taxi zone of the trip pick-up
pickup_datetime Timestamp The date and time of the trip pick-up.
dispatching_base_num String The TLC Base License Number of the base that dispatched the trip.
borough String
zone String
service_zone String