<Mengyun Li, mengyunli0220, ml6506>
Abstract:
In this project I want to predict the Emergency Response Time in NYC. I chose Incident Type, Location, Borough, time of day , Creation year, Creation Month, Longitude and Latitude as prediction features and applied random forest model to my training dataset. As a result I found most prediction of emergency response time in NYC are less than a day
Introduction:
Emergency response is an important part of urban management which is directly related to public safety. A general prediction of emergency response time could be used to help people from OEM to clearly know which type of incident emergency response action needs to be improved in future. I searched on Google and Github and I haven't find relevant finished projects especially using the same feature as I chose for the model training . This project aims at classifying the response time of different emergency incidents patterns, so that we could predict how long it will take NYC Office of Emergency Management to close a case based on borough, incident type and time of day , creation year, creation month, longitude and latitude. I applied Random Forest to realize the process.
Data:
Emergency Response Incidents
Link: https://data.cityofnewyork.us/Public-Safety/Emergency-Response-Incidents/pasr-j7fb
The data set comes from NYC Open data, which includes Incident type, Location, Borough, Creation Date(accurate to the seconds), Close Date(accurate to the seconds), Latitude and Longitude. Eight years data can be found in this data set, from 2011 to 2018.  In addition I also derived creation year and creation month from creation date for model features.
I only included Emergency Response Incidents dataset from NYC Open data in this project because is the largest dataset I can find free online. There are few other sources provide  Emergency Response Incidents data, however, their dataset sizes are much smaller and there is no primary key to uniquely identify each single data record. Including data record from other resources might cause duplications in the dataset and affect the accuracy of our model prediction. 
The major issue I found with this dataset is there are a lot of typos in column borough, I used replace function manually cleaned up these typos