Introduction

“Noise complaints continue to be the number one quality of life issue for New York City residents” (http://www.nyc.gov/html/dep/html/noise/index.shtml). In SONYC (Bello et al., 2018), we are building a system which end goal is the mitigation of noise pollution. This work is being done in collaboration with city agencies including the DEP. Through conversations with them, we have learned that one of the largest amounts of complaints comes from after hour constructions. From these conversations, we came to learn that most of these complaints seem to come from those people who own more expensive houses. However, this needed further analysis. The question that this project tries to answer is do owners of more expensive neighborhoods submit more complaints to 311 than those from less expensive neighborhoods? To answer this question I ran statistical tests to verify the correlation between the number of noise complaints and the neighborhood they come from based on housing prices and if neighborhoods with higher housing prices coincide with a higher percentage of noise complaints coming from after/before hours construction. To answer this question I first ran a Pearson correlation test between the percentage of noise complaints coming from after/before hours construction and the housing prices variable. Then I ran a one-way ANOVA between the different groups of housing prices. Finally, I ran a least square regression model between these variables.

Data

Two datasets are suitable in order to answer my question. The first one is the 311 Noise Complaints (available here) and the Demographic, Social, Economic, and Housing Profiles by Community District/PUMA (available here). The 311 data contains data from 2010 to today and the Housing data contains data from 2012 to 2016, which is why I will focus only on data from 2012 to 2016. Besides these two datasets, I will need the PUMA dataset, which comes in a shapefile format and that I can use to assign PUMA granularity level to the Noise Complaints data and with this, compare the first two data sets.
Based on Minkoff (2016), I decided to instead of using the raw numbers from the 311 Noise Complaints data, to use the percentage of noise complaints coming from construction after/before hours per PUMA area. To achieve this, the first thing was downloading the data removing those rows that did not contain any information in the latitude and longitude columns, as we cannot assign those to any PUMA area. After that, I converted the 'Created Date' column to datetime format so I could get rid of those rows containing data from before 2012 and after 2016. After doing this, and dropping the unnecessary columns, I created a new column with the longitude and latitude values combined (the original column did not work as it was a string and was not possible to convert to a point variable) and converted the DataFrame to a Geopandas DataFrame. This new frame was joined with the PUMA data to get the PUMA id for each row on the 311 data. Then, by grouping by PUMA granularity level, I got the number of noise complaints per area and finally, the percentage of noise complaints coming from after/before hours construction complaints for each area.
For the Housing data, I removed all columns except those corresponding to the total amount of owned houses (OOcHU2E) and the percentage of houses per price category. Then, because those columns representing the first pricing categories were small I grouped them to create a more informative column containing the percentage of houses priced at less than $300,000. After doing this, I was left with 4 categories and created a new column called 'group' where I assigned each PUMA granularity level a group based on the highest percentage of housing values. After merging both datasets, I merged the new dataset with the PUMA dataset to get the geometry variable that was used to make the plots.
Figure 1 shows the percentage of after/before hours construction complaints per PUMA. As we can see, areas in Manhattan are the ones with the highest percentage of noise complaints for after/before hours construction compared to all noise complaints in the area, followed by 4-5 areas between Brooklyn and Queens.