Google Streetview Road Accident

A Clustered Google Maps of 10k Dutch Traffic Accidents

The Open Data Portal is a website made by the Dutch Government. It includes a data set of all registered traffic accidents in the province of North-Holland from 2005 to 2009. We munge the data and place it all on a Google Maps. We use MarkerClusterer to deal with the 10k+ markers.

Fullscreen map

The Open Data Portal

The Dutch have an Open Data Portal with data sets. These data sets come from a variety of government institutions.

The data set qualities are hit and miss. The good data sets are accessible, use sane formats, are of a decent size and scope, and have meta information. The bad data sets are unavailable, outdated, hastily created or lack content or context.

The Data Set

This data set containing all registered traffic accidents in the province of North-Holland from 2005 to 2009 was released by Rijkswaterstaat. Rijkswaterstaat is part of the Dutch Ministry of Infrastructure and the Environment, the former Ministry of Transport, Public Works and Water Management.

It’s an interesting data set that is complete with information, such as:

  • The number of animals hurt in the accident
  • If there was only material damage
  • How many people went to the hospital
  • The location of the accident as written down by (first) responders.

RD-Coordinates

Rijksdriehoeksmeting reference point

The location measurements for the traffic accidents are most precise. A problem is that they are not in longitude/latitude, which is required for Google Maps.

The Netherlands uses a variety of coordinate measurements, such as: UTM, ETRS89, ITRS and even the 1841 Bessel ellipsoid. This data set uses Rijksdriehoekmeting or RD-coordinates. The reference point is the “Onze-Lieve-Vrouwen” tower in Amersfoort (a famous landmark constructed around 1470).

Unusual formatOnze-lieve-vrouwe tower

A second problem with this data set is the format. .CSV and JSON-formats are easy to work with.  .XLS and some custom formats are doable.

Some formats are not nice to see: WordPerfect files, .PDF images and the format for this data set: .DBF or DBase-format. The information for this data set was probably stored in a legacy IT system for which no saner export format was available.

Munging

First we convert the .DBF file to .CSV. We end up with a .CSV file with 29 columns. We do not know what the column headers mean, but a quick Google search finds us the metadata description. We are interested in

  • SLA = Number of victims
  • DODSLA = Total number of deaths
  • XCOR = The X RD-Coordinate
  • YCOR = The Y RD-Coordinate

Now we convert all RD-Coordinates to Latitude / Longitude. This document in Dutch first described accurate 4th degree polynomials for the transformation of geodetic co-ordinates between the national datum of the Rijksdriehoeksmeting (RD) and WGS84 (G873). This sounds complex and it probably is. Just look at sample PHP code to convert RD to Latitude and Longitude:

function rd2wgs ($x, $y)
{
    // Calculate WGS84 coördinates
    $dX = ($x - 155000) * pow(10, - 5);
    $dY = ($y - 463000) * pow(10, - 5);
    $SomN = (3235.65389 * $dY) + (- 32.58297 * 
         pow($dX, 2)) + (- 0.2475 *
         pow($dY, 2)) + (- 0.84978 * pow($dX, 2) *
         $dY) + (- 0.0655 * pow($dY, 3)) + (- 0.01709 *
         pow($dX, 2) * pow($dY, 2)) + (- 0.00738 *
         $dX) + (0.0053 * pow($dX, 4)) + (- 0.00039 *
         pow($dX, 2) * pow($dY, 3)) + (0.00033 * pow(
            $dX, 4) * $dY) + (- 0.00012 *
         $dX * $dY);
    $SomE = (5260.52916 * $dX) + (105.94684 * $dX 
         * $dY) + (2.45656 *
         $dX * pow($dY, 2)) + (- 0.81885 * pow(
            $dX, 3)) + (0.05594 *
         $dX * pow($dY, 3)) + (- 0.05607 * pow(
            $dX, 3) * $dY) + (0.01199 *
         $dY) + (- 0.00256 * pow($dX, 3) * pow(
            $dY, 2)) + (0.00128 *
         $dX * pow($dY, 4)) + (0.00022 * pow($dY,
            2)) + (- 0.00022 * pow(
            $dX, 2)) + (0.00026 *
         pow($dX, 5));

    $Latitude = 52.15517 + ($SomN / 3600);
    $Longitude = 5.387206 + ($SomE / 3600);

    return array(
        'latitude' => $Latitude ,
        'longitude' => $Longitude);
}

Finally we parse all 10k+ observations from the data set as a 1.1Mb JSON file, contents:

...
{"latitude":52.648147252398,"longitude":5.1075994799015,"victims":"6","deaths":"3"}
...

Putting it all on Google Maps

Presenting 10k markers on a Google Map creates a two-fold problem. Many computers can not handle rendering 10k markers on a screen. Besides that, so many markers on screen would look awful as a data visualization. You’d want it to look presentable, in a reasonable time, on many devices and on all zoom levels.

MarkerClusterer

MarkerClusterer V3 allows you to put hundred of thousandths of markers on a Google Map. This is the canonical article on Google Developers called “Too many Markers!”.

Algorithm clustering visualized

MarkerClusterer forms clusters based on grids. For more information on the algorithm visit the article or read this article “MarkerClusterer in C# and HTML Canvas” for an illustrated guide.

Code & Data

Links:

Observations & Thoughts

Visualizing open data in this way could be used to identify dangerous roads, both for citizens and policy makers.

Road with many accidents

Most accident locations seem to be spaced out evenly. Maybe all roads are subdivided into blocks.

The subject matter is macabre and it can be rather eerie to look at the location of a deadly accident in streetview. (flag).

accident on farm road

skidmarks near accident

Perhaps the data could be clustered/analyzed further: What are the most deadly roads? Which intersections have the densest amount of accidents? Would cross-validation show data patterns to make predictions?

Different Related Dataset

The Dutch news site nu.nl and the Dutch Ministry of Infrastructure and the Environment have released a new data set of 4406 deadly accidents on all roads in the Netherlands. It contains accidents from the period 2006-2012.

As this data set is released on Google Docs (which has .csv export) and contains Longitude Latitude columns visualizing it on a map was a quick process.

Fullscreen map

This data set contains the age and sex of the victims, and the mode of transportation. Some rare modes of transportation are:

  • Motorized skateboard (Male aged 20)
  • Mother with baby stroller (Female aged 0)
  • Wheelchair (Male aged 80)
  • Cart (Male aged 15)
  • Other solid object (Male aged 33)
  • Horse and carriage (7 occassions)

Both data sets warrant further analysis.

Acknowledgements

Thanks to the Open Data initiative for granting access to data that belongs to everyone.  Hack de Overheid (Hack the Government) for promoting more open data access, hackathons and applications. Google for creating and hosting Google Maps. Github for hosting the code and data.

 

2 thoughts on “A Clustered Google Maps of 10k Dutch Traffic Accidents”

  1. Nice work. Very interesting techniques. I am embarking on similar pursuits with energy consumpion data. See Liander Open Data.

Leave a Reply

Your email address will not be published. Required fields are marked *