Accurately geocoding collision data is necessary when building a collision database that will be used for spatial analyses. Geocoding is the process of assigning a latitude/longitude (X/Y) coordinate to a descriptive location. For collision data, the descriptive location is typically a primary road and a secondary intersecting road. To geocode a collision the primary and secondary road must match to a location on a digital street network. When the collision location is described perfectly without typos, it is usually easy to match. However, there are frequently typos, abbreviations or other anomalies that make it difficult to match the collision to a street network. In addition, the street network may fail to recognize a valid name or a new road that may not exist, preventing the collision from being geocoded.
For these reasons it can be difficult to geocode collision data accurately, but the importance cannot be understated. Even if you are not planning to view the collisions on a map the geocoding process can assign collisions to the nearest road segment or intersection. This is important for basic tabular analyses to rank high collision intersections. Since the street names for an intersection will differ slightly in the police report, geocoding is necessary to standardize the intersection names.
As part of the RoadSafe GIS service, we provide automatic geocoding and data updates of new collision data. For every new client an initial setup process lays the foundation for accurate collision data geocoding. The goal is to maximize the number of collisions that can be geocoded and minimize the location errors. Without getting into the entire geocoding process, I wanted to highlight the value added approach we use for RoadSafe GIS to handle specific examples. If you are interested in a longer general overview of a collision geocoding process, you can refer to this article regarding the geocoding process I helped develop at UC Berkeley years ago. The methodology in that article is over 7 years old now, but provided some of our original inspiration. Our geocoding process at RoadSafe GIS is very different today, however, and much better equipped to deliver more accurate results at the local level.
We do this by:
- Conducting thorough manual reviews of the data to build an extensive exception list of intersections that need special attention.
- Editing the street network as necessary for new street geometries or names.
- Developing a python script to clean the intersection names, handle the exceptions, or directly assign an XY coordinate.
This allows us to geocode a very high percentage and ensures the accuracy of the matched location. And most importantly, the results are repeatable with new years of data since our python script reads directly from the exception list and handles the record appropriately. If manual reviews in the future identify any new issues, we simply add the new intersection to the exception list and it is automatically taken care of.
That should give you an idea of how we geocode the collisions, but over the course of our work we have come across several special scenarios. I will outline those in future blog posts and discuss how we handled them in our process. For now, I posted a brief description of them below. Stay tuned for our next post.
Offset intersections
Collins St intersects with Reseda Blvd in two separate locations.
Location does not exist in the street network data
The private driveway into The Home Depot off Roscoe Blvd.
Invalid offset direction
State route 1 is typically oriented North/South, but in some locations it becomes East/West or slightly opposite. This can create a mismatch with the offset direction in the collision record.