As part of our ongoing work to test and improve our geocoding processes, we recently looked at SWITRS collision data for the City of San Diego. In my past work when attempting to geocode data for San Diego, we found the various street networks were quite outdated compared to other areas in California. I presumed it was due to rapid population growth but a quick Google search surprisingly showed that might not have necessarily been the case: http://www.sandiegouniontribune.com/news/2011/mar/08/san-diego-growth-last-decade-is-the-slowest-ever/

Regardless of the reasons, it was difficult to geocode collisions since there were many new streets in the suburban areas.  In order to efficiently geocode data in San Diego we would need a better, more up to date street network. Fortunately the great resources provided by SANDAG mostly solved that problem as we downloaded their street data from the Regional GIS Data Warehouse. Our geocoding process contains many steps that adhere to the street network formats we have been using, so we wanted to see how adaptable it would be to a new network. It required us to build custom Locators and change some parts in the code but in the end was a success. This should allow us to get up and running much quicker in the future when we need to use other street networks.

Having said that, it does require a good deal of manual labor on the first go around to geocode a higher percentage of collisions. Depending on the naming convention of streets and how the officers enter locations in the police reports there are many variables to be handled. For example, an intersection like Ruffin Rd & Aero Dr you can see a West junction and an East junction that would normally fail in our code when attempting to offset North or South along Ruffin Rd. This type of intersection was previously discussed, and we have actually greatly improved automation to handle these cases, but the two intersections of Ruffin Rd are simply too far apart. If our code searched for multiple junctions at that distance for every collision it would really slow the process down. Therefore we put an exception in for the specific case and gave an alternate name in the street network to differentiate the east/west intersections. It is still impossible to identify the exact junction if the collision is not offset north or south without manually reviewing the report or the inclusion of a GPS coordinate, but it is at least a start. We also identified and matched many other locations at the intersections of mall or store entrances.

The end result is we geocoded about 88.5% of local roadway collisions from 2010-2014 in the City of San Diego. If you are familiar with SWITRS we used only non-state highway (with the exception of including state highway 78 since it is a surface street for all the portions in San Diego). The 88.5% is lower than we have been able to achieve in other areas in California, but we would only likely be able to increase one or two percent more with further edits to the street network. We also did not geocode collisions with invalid offset directions. For example, even though we can geocode to an intersection, if the collision says it is offset north or south along the primary road and the road is clearly in the east/west directions we considered it it invalid for geocoding. This occurred for approximately 2%. Many other records simply had poor descriptions of the primary/secondary road that would require individually addressing each one to allow for geocoding. In the end, the geocoding result compares favorably to other efforts such as TIMS at UC Berkeley where about 83% of the non-state highway collisions have been geocoded for San Diego. 

If you are interested in the geocoded data, we have made it available for download. This includes all fatal, injury and property damage only (PDO) collisions on non-state highways (but including state highway 78) for the City of San Diego from 2010-2014. The raw data template for SWITRS fields can be accessed here. Additional fields have been added to the collision data table for a standardized matching address (match_addr), primary road name (m_primaryr), secondary road name (m_secondrd). These are very useful fields for ranking by intersections.

If you have your own collision data or need data for a specific city/county geocoded, please contact us for a service quote at info@roadsafegis.com. We can offer quick turnarounds depending on the location and amount of data needed. Also, if you would like test drive our Collision Database Management System and access the San Diego collision data from there we can set you up with a free demo by sending us an email.

Co-Founder of RoadSafe GIS. Bringing collision data, GIS and cloud technology together. Formerly of UC Berkeley SafeTREC and ESRI.