DM4DEM: A GRASS-compatible tool for blunder detection of DEM
 

Gonzalo Durañona and Carlos López(*)
Centro de Cálculo, INCO
Facultad de Ingeniería CP 11300
Julio Herrera y Reissig 565, Montevideo, URUGUAY
Ph. +5982 7114229; Fax +5982 7115446
Email: gonzalod@interware.org, carlos@fing.edu.uy


    Keywords: DEM, accuracy assessment of source data, grid data, quality control

    Abstract
    Present day GIS are complex pieces of software devoted to manipulate, analyze, store and report results about geographic data. However, there is a substantial lack of standard, readily available tools to critically analyze the input data itself, in order to detect or highlight suspicious values. This complains about present day software is highly complementary with the one raised about uncertainty of data, but is not the same. We here attempt to improve the accuracy of the dataset by using judiciously the supplied algorithms, which might help in reducing the uncertainty to some extent. DM4DEM (which stands for Data Mining for Digital Elevation Models) is an application for Graphical Information System which can do outlier detection of raster data in general, and in particular of DEM. Its interface allows the user to locate unlikely values of the elevation of the digital set using different criteria or algorithms even provided by the end user, and later edit them within the same environment. Both outlier (or blunder) detection algorithms shipped with the software do not assume any particular source for the DEM (i.e. contour lines, photogrammetric pairs, direct survey, etc.) which makes the tool very suitable for end users, which might receive the data just "as is" without metadata about its lineage. To the author´s knowledge, this is the first implementation of this feature in a popular GIS package.

    1. Introduction
    Recent efforts to describe the natural uncertainty of typical GIS datasets led to a review of the relationship between theory and software. Some authors claim that, despite being data quality (and accuracy as one of its components) have caught the attention of the research community, little transfer have done to the current GIS software. They simply ignore the possibility of dealing with errors, outliers and similar artifacts, despite appropriate knowledge (and software!) has been developed elsewhere. Chrisman, 1998 describes for example, the pervasive use of the least square method for most task within a GIS, even though it is well known that such algorithm is badly affected by outliers. He illustrated his point with the operation of coordinate transformations. The circle is closed by considering that the users will not be aware of the effects of uncertainty and errors since its most favorite software ignores such possibility, and thus they will not require accuracy statements from the associated metadata records to their datasets.
    This paper attempts to break such circle, by describing a piece of software which can be added to a popular GIS package and allow the user to manage state-of-the-art techniques for outlier detection in DEM.

    (*) To whom correspondence should be directed

    2. Theory
    Despite that the software has been designed to be extensible, it has two algorithms incorporated by default. The technical references are Felicísimo, 1994 (F1994) and López, 1997, 2000a (L1997).
    Both methods produce an ordered list of unlikely elevations, being the most suspicious first. The simplest method is F1994. The idea is that the differences between a local interpolant and the elevation belong to a Gaussian distribution. Once estimated the parameters, a "studentized" residual distribution can be calculated, and outlying values can be unmasked trhough large values of the statistic. This is a very simple procedure, which relies in some strong hypothesis not observed in some real cases (López, 1997).
    The method L1997 dissects the DEM of size mxn in elongated strips of equal width w. In the column-wise direction, each strip of size mxw can be considered as a cloud of m points in Rw. Standard statistical techniques like Principal Component Analysis can be used to analyze such cloud, picking the most unlikely points (Hawkins, 1974; López, 1997). This approach produces a set of candidates for each strip, and full coverage of the DEM is achieved by considering all the strips. The stripping can be done either column-wise or row-wise, and each produce a different set of candidates. The points belonging to the intersection of both sets are the most unlikely ones, and will be the primary candidates. Once a point in Rw is selected, a sensitivity analysis is performed to identify which of the w coordinates has the larger effect on the statistic used, thus identifying an individual pixel in the raster image, or an individual elevation point. Details are given in the original reference.

    3. Computer program
    This software was created to be executed either from the GRASS shell or the TclTk-Grass bar. Making use of graphical interfaces, the DM4DEM system follows the same styles of the applications TclTkGrass, so the user can work on a familiar environment.
    Moreover, using the GRASS philosophy, the system follows the same programming styles that allow the product to be used cross-platform. It can be installed on different architectures, giving it more portability for his massive distribution. The development was done almost all in Linux, and was tested also on an AIX Unix system.

    3.1 Features
    The system is designed around the concepts of projects and runs. Each project corresponds to a single DEM under consideration, and by keeping them apart we provide some sort of multiuser environment. For each project we offer the possibility of different runs, differentiated by the parameters used for each one. Each run might correspond for a different method, and parameters. The system as such allows keeping information of the different projects the user works on, integrated with Grass tools for the visualization, storage and manipulation of results from the algorithms

Figure 1. Interface for language selection

The software has the following functions:

Figure 2 Spanish version of the toolbar


Fig. 4 Candidate points and zoomed area as displayed in the system. The candidates are ordered, and they can be browsed one at a time. This plots are controlled by the interfase illustrated in figure 3

 
6. Conclusions
We have briefly described a (yet another!) DEM editor software. Its unique features include handling state-of-the-art algorithms for outlier detection but now integrated with a popular GIS package: GRASS. Once the algorithms suggest candidate points for been outliers, the editor can make, track and undo the changes over the original DEM, keeping different projects and different runs over each project. Through the GIS, it can display, rotate, illuminate, etc. selected areas of the DEM near the candidates, letting the user working in interactive mode. Other features include native multilingual support (currently english and spanish). It can also imputate blindly the candidates elevations with user-supplied rules, as described in the references. In order to disseminate the technology, its requirements are weak: UNIX environment, freeware software like GRASSS, tcl/tk and Octave for the main calculations. It is freely available in the WEB in binary and source form at this site
References