Data Cleaning - Snapping GPS points to a line
When using a smartphone as a primary data collection device, there can be some amount of error or noise associated with location data. There are a number of reasons for imprecise location data such as the GPS sensor not having a lock on a minimum number of satellites, current weather conditions or in the particular case for cities with high rise buildings, the Urban Canyon affect. \cite{mat_gordon_bad_2013}
Location data containing a "cloud of points" around a street instead of being on than the street centerline can severely affect results. To remedy this, a "snapping " process has been developed to ensure that readings that stray from the street or bike lane are corrected.
Shapely \cite{gillies_shapely_2017}, open source python package for spatial data manipulation contains a function
interpolate() that allows for snapping points to the nearest line using linear referencing. An issue with this approach is that there may be several competing lines for a given point (ie. all bike-lanes in the vicinity). The objective is to snap a point onto the bike lane that is currently being ridden.
One approach could be to simply choose the closest line. However, that would require the calculation of distances from every point to every line which would be computationally very expensive. Instead, a buffer is generated around each bike-lane segment and then spatially join the points to that buffer. In addition to being more efficient, this method has the advantage of mapping of every point to a bike-lane segment.
The disadvantage is that a point can be joined to more than one bike-lane segment. For example, at a street intersection/corner with 2 bike-lanes, a point could theoretically belong to both intersecting bike-lanes. To correct for this, we choose the same bike-lane lane as the previous point.