Data and Methodology

Data

The primary data source is a dataset of incidents responded to by the New York City Fire Department (FDNY) between 2013 and 2015, as accessed from the New York City Open Data Portal. Incidents in the data categorized as a gas leak related to natural gas or liquid petroleum gas (LPG) are aggregated at the zip code and census tract level. In this data set, basic geographic identifiers including the zip code and road segment identifier (name) are provided. This data was integrated with data sets of complaints (929,000 records through 2015), violations (1.85 million records through 2015), and work permits (2.43 million records through 2015) from the NYC Department of Buildings (DOB) as well as the NYC Primary Land Use Tax Lot Output (PLUTO) data (2015 release, 850,000 records). Each of these datasets are available publicly through NYC Open Data or NYC.gov. Finally, demographic data from the American Community Survey (ACS) was used. Once integrated and inspected for a rough manual vetting of appropriate features, this integrated dataset yielded 735 features.

Allocation of gas leaks to geospatial boundaries

To use the geographical data for this research, the number of leaks per zip code were calculated for years 2013, 2014, and 2015. This zip code aggregation created 195 data points, one for each zip code. The authors also associated the gas leak data at the census tract level to improve the granularity of the analysis. The FDNY gas leak data contains zip codes but does not provide a specific street address location for each incident, nor census tract geographic identifiers. The road name and zip code from each incident was used to aggregate the data to a specific road segment. Next, a shapefile of all New York City roads was analyzed to create one or more road segments for each road in New York City. If a road was located in a single zip code, only one segment would be created. However, if a road crossed two or more zip codes, the road segment would be split at each zip code boundary, resulting in the number of road segments equal to the number of zip codes the road crossed. For each newly created road segment-zip code feature, a small buffer was created to allow for geographic association.
When the procedure was completed, 94% of the road name-zip code aggregations were associated with the road segment buffers mentioned above. The buffers were then overlaid on census tracts for New York City’s five boroughs. A geospatial intersection was performed to determine the number of census tracts intersected by each road segment buffer. The gas leaks reported on a given segment are then allocated evenly among each of the census tracts with which its buffer intersects.  Finally, summing the census tract frequencies results in the total number of gas leaks per census tract for the years of 2013, 2014, and 2015.
An example of this operation can be seen in Figure \ref{965803}. Since multiple road segments can pass through the same census tract, there will be multiple frequencies associated with each tract. This aggregation process results in 2,163 census tracts having a gas leak count greater than zero. Each labeled census tract constitutes a data point in the analysis.