Accessing NYC Open Data’s API to Import Relevant City Datasets - BillionOysterProject/digital-platform-beta GitHub Wiki
New York City agencies are legally required to make certain datasets public and to update them regularly, which the City does through the website NYC Open Data. Using NYC Open Data’s API, we can pull in the latest information about public schools, parks, land use, and more.
NYC Open Data uses Socrata to manage their APIs- to learn how to work with their API you can view the "API Docs" option that accompanies each dataset (click "API" in the top right). Here's an example from a dataset we use.
NOTE ON DATA NORMALIZATION: Check each NYC dataset very carefully to ensure that the field you are looking at does not contain extra spaces or characters if you're going to use one field as a key to map to a different table. For example, the terms "ats_system_code" and "dbn" are unique numerical codes used for each school, and you can use them, like we do, to create a table that pulls school location information AND demographic information from two different datasets into one table. But the "ats_system_code" field in the 2017-2018 School Locations dataset contains an extra set of spaces that the "dbn" field in the Demographic Snapshot does not contain. If you try to use a function to map them it won't work unless you trim the extra spaces. This may also occasionally pop up if some values are capitalized or not, so check the formatting carefully!
ADDITIONAL NOTE ON DATA NORMALIZATION: It's also important to note that school names may not match exactly from dataset to dataset! For example, one dataset lists "The Longwood Academy of Discovery" and another "The Longwood Academy for Discovery." So matching on a consistent, unique field like ATS Code or DBN is key.
The platform is currently using the following open datasets:
-
2017-2018 School Locations
We use this dataset for the sign up process. (It's in use on the current platform and will be incorporated into the new sign up form as well.) Initially, each teacher participant created their school's "organization" when they signed up for the platform, but this lead to a number of duplicates when some teachers listed their school by name and others listed it by number. By pulling school information from the Department of Education (DOE), we:- Avoid duplicates
- Shorten the sign up form
- Ensure a higher level of accuracy
- Gain extra information about a school, like their City Council district or principal's phone number
- Create an auto-updating system- the city is required to update this dataset at least yearly, so by pulling the data from the API any changes to the school's info get updated automatically
-
2013-2018 Demographic Snapshot School
This dataset contains student demographic and enrollment data by school from 2013-14 through 2017-18. Recently, the NYC DOE has started moving away from the commonly used "Title I" designation as its core metric for identifying high-needs schools. Instead, they use Poverty % and the Economic Need Index. Since each school has a unique identifying code in the City's database (sometimes referred to as the "ATS Code," sometimes the "DBN" in different datasets), and we associated that code with each school's "organization" in our database, we were able to pull in this set of demographic data, which includes the economic data above, plus numbers and percentages of students by race, ethnicity, disability, and more. It's a way that we can make sure our team devotes extra resources and attention to schools who need us the most.
-
On the dataset's NYC Open Data page, click "API" and copy the url for the "API Endpoint." You'll use this as the resource for your Diecast template.
-
In your template, in the front matter, add each dataset as a separate binding. For example:
- name: nycSchoolDemographics
resource: 'https://data.cityofnewyork.us/resource/98et-3mve.json'
param_joiner: ','
params:
$limit: 15000
year: '2017-18'
$select:
- year
- economic_need_index
- asian_2
- black_2
- dbn
- english_language_learners_2
- hispanic_2
- poverty_2
- students_with_disabilities_2
- white_2
- school_name
- name: nycSchoolLocations
resource: 'https://data.cityofnewyork.us/resource/r2nx-nhxe.json'
param_joiner: ','
params:
$limit: 15000
$select:
- ats_system_code
- council_district
- grades_text
- location_1_address
- location_1_city
- location_1_state
- location_1_zip
- location_category_description
- nta_name
- location_type_description
- location_name
-
You'll need to use NYC Open Data's API conventions instead of the platform's API conventions. For example, this means that instead of specifying "limit" as a parameter, you'll use "$limit" (see above).
-
Use
param_joiner: ','
to separate parameters using a comma, per NYC Open Data's conventions. -
To only pull specific fields, use "$select" and list out the exact name of each field, like the example above.
Going forward, we're also planning to include other NYC Open Data sets. We included the unique identifying code used by the NYC Parks Department for each city park for each of our restoration sites in the database, which means we can provide information for participants on the site's accessibility for people with disabilities, bathrooms, waterfront access points, and more. This will provide teachers with a crucial piece of support to help them plan for and differentiate their instruction for their students. Environmental datasets from agencies like the Department of Environmental protection will allow students to compare their data to the city's. And information about political districts can support students who want to engage with local politicians about what they want to see in their communities.
Prospective Datasets:
- City Council Members
- Directory of Toilets in Public Parks
- Directory of Parks Disability Accessibility Facilities and Programs
Note: there are many other potentially useful datasets on NYC Open Data, but not all of them have an API. These datasets would get integrated using different methods than what's described here.
If you're interested in seeing a particular NYC Open Data set on the platform, you can get in touch with Heather Flanagan, the platform's developer and product manager, at: [email protected]