Monday, September 6, 2010

Data Quality Control

The computer science truism “Garbage-in, garbage-out” is equally applicable to GIS and spatial analyses. For maritime risk analysis, the sources of data are of widely divergent quality. At one end of the spectrum, some vessel traffic has been simulated on the basis of GIS or radar tracks, both of which provide quite accurate boat location data in near-continuous time. At the other extreme, survey data of recreational boaters is used to estimate the frequency and approximate location of this type of activity.

The MARIN researchers are very scrupulous in maintaining records (metadata) on the sources for the data, their relative accuracy, and the assumptions and processes used for preparing the data. Some of these procedures are complex, combining automated and semi-automated methods for verifying and validating records and fields in the databases. For spatial analyses in particular, small discrepancies in location information can sometimes translate into large errors in the analytical results, necessitating such meticulous data preparation. Under the general rubric of data-cleaning, we have coined the term geocleaning to refer to the specific processes where location data (such at latitudes/longitudes) are verified and corrected.