DM4DEM: A GRASS-compatible tool for blunder detection
of DEM
Gonzalo Durañona and Carlos López(*)
Centro de Cálculo, INCO
Facultad de Ingeniería CP 11300
Julio Herrera y Reissig 565, Montevideo, URUGUAY
Ph. +5982 7114229; Fax +5982 7115446
Email: gonzalod@interware.org, carlos@fing.edu.uy
Abstract
Present day GIS are complex pieces of software devoted to manipulate,
analyze, store and report results about geographic data. However, there
is a substantial lack of standard, readily available tools to critically
analyze the input data itself, in order to detect or highlight suspicious
values. This complains about present day software is highly complementary
with the one raised about uncertainty of data, but is not the same. We
here attempt to improve the accuracy of the dataset by using judiciously
the supplied algorithms, which might help in reducing the uncertainty to
some extent. DM4DEM (which stands for Data Mining for Digital Elevation
Models) is an application for Graphical Information System which can do
outlier detection of raster data in general, and in particular of DEM.
Its interface allows the user to locate unlikely values of the elevation
of the digital set using different criteria or algorithms even provided
by the end user, and later edit them within the same environment. Both
outlier (or blunder) detection algorithms shipped with the software do
not assume any particular source for the DEM (i.e. contour lines, photogrammetric
pairs, direct survey, etc.) which makes the tool very suitable for end
users, which might receive the data just "as is" without metadata about
its lineage. To the author´s knowledge, this is the first implementation
of this feature in a popular GIS package.
1. Introduction
Recent efforts to describe the natural uncertainty of typical GIS datasets
led to a review of the relationship between theory and software. Some authors
claim that, despite being data quality (and accuracy as one of its components)
have caught the attention of the research community, little transfer have
done to the current GIS software. They simply ignore the possibility of
dealing with errors, outliers and similar artifacts, despite appropriate
knowledge (and software!) has been developed elsewhere. Chrisman, 1998
describes for example, the pervasive use of the least square method for
most task within a GIS, even though it is well known that such algorithm
is badly affected by outliers. He illustrated his point with the operation
of coordinate transformations. The circle is closed by considering that
the users will not be aware of the effects of uncertainty and errors since
its most favorite software ignores such possibility, and thus they will
not require accuracy statements from the associated metadata records to
their datasets.
This paper attempts to break such circle, by describing a piece of
software which can be added to a popular GIS package and allow the user
to manage state-of-the-art techniques for outlier detection in DEM.
(*) To whom correspondence should be directed
2. Theory
Despite that the software has been designed to be extensible, it has
two algorithms incorporated by default. The technical references are Felicísimo,
1994 (F1994) and López, 1997, 2000a (L1997).
Both methods produce an ordered list of unlikely elevations, being
the most suspicious first. The simplest method is F1994. The idea is that
the differences between a local interpolant and the elevation belong to
a Gaussian distribution. Once estimated the parameters, a "studentized"
residual distribution can be calculated, and outlying values can be unmasked
trhough large values of the statistic. This is a very simple procedure,
which relies in some strong hypothesis not observed in some real cases
(López, 1997).
The method L1997 dissects the DEM of size mxn in elongated strips of
equal width w. In the column-wise direction, each strip of size mxw can
be considered as a cloud of m points in Rw. Standard statistical
techniques like Principal Component Analysis can be used to analyze such
cloud, picking the most unlikely points (Hawkins, 1974; López, 1997).
This approach produces a set of candidates for each strip, and full coverage
of the DEM is achieved by considering all the strips. The stripping can
be done either column-wise or row-wise, and each produce a different set
of candidates. The points belonging to the intersection of both sets are
the most unlikely ones, and will be the primary candidates. Once a point
in Rw is selected, a sensitivity analysis is performed to identify
which of the w coordinates has the larger effect on the statistic used,
thus identifying an individual pixel in the raster image, or an individual
elevation point. Details are given in the original reference.
3. Computer program
This software was created to be executed either from the GRASS shell
or the TclTk-Grass bar. Making use of graphical interfaces, the DM4DEM
system follows the same styles of the applications TclTkGrass, so the user
can work on a familiar environment.
Moreover, using the GRASS philosophy, the system follows the same programming
styles that allow the product to be used cross-platform. It can be installed
on different architectures, giving it more portability for his massive
distribution. The development was done almost all in Linux, and was tested
also on an AIX Unix system.
3.1 Features
The system is designed around the concepts of projects and
runs. Each project corresponds to a single DEM under consideration,
and by keeping them apart we provide some sort of multiuser environment.
For each project we offer the possibility of different runs, differentiated
by the parameters used for each one. Each run might correspond for a different
method, and parameters. The system as such allows keeping information of
the different projects the user works on, integrated with Grass tools for
the visualization, storage and manipulation of results from the algorithms
Figure 1. Interface for language selection
The software has the following functions:
Figure 2 Spanish version of the toolbar
4. General computing flow
5. Testing and examples
For illustration purposes, we will show the DEM described by Day and
Muller, 1988, corresponding to Mountagne Sainte Victoire, in France. There
are two DEM´s for the same area, with different accuracy. The comparative
performance of the methods have been analyzed (in terms of accuracy improvement
performance) in López, 2000b. Further examples will be posted at
the WEB site of the system.
6. ConclusionsFig. 4 Candidate points and zoomed area as displayed in the system. The candidates are ordered, and they can be browsed one at a time. This plots are controlled by the interfase illustrated in figure 3
We have briefly described a (yet another!) DEM editor software. Its unique features include handling state-of-the-art algorithms for outlier detection but now integrated with a popular GIS package: GRASS. Once the algorithms suggest candidate points for been outliers, the editor can make, track and undo the changes over the original DEM, keeping different projects and different runs over each project. Through the GIS, it can display, rotate, illuminate, etc. selected areas of the DEM near the candidates, letting the user working in interactive mode. Other features include native multilingual support (currently english and spanish). It can also imputate blindly the candidates elevations with user-supplied rules, as described in the references. In order to disseminate the technology, its requirements are weak: UNIX environment, freeware software like GRASSS, tcl/tk and Octave for the main calculations. It is freely available in the WEB in binary and source form at this site
References