Especificación de requerimientos de software

DM4DEM: A GRASS-compatible tool for blunder detection of DEM

Gonzalo Durañona and Carlos López^(*)
Centro de Cálculo, INCO
Facultad de Ingeniería CP 11300
Julio Herrera y Reissig 565, Montevideo, URUGUAY
Ph. +5982 7114229; Fax +5982 7115446
Email: gonzalod@interware.org, carlos@fing.edu.uy

Keywords:

Abstract
Present day GIS are complex pieces of software devoted to manipulate, analyze, store and report results about geographic data. However, there is a substantial lack of standard, readily available tools to critically analyze the input data itself, in order to detect or highlight suspicious values. This complains about present day software is highly complementary with the one raised about uncertainty of data, but is not the same. We here attempt to improve the accuracy of the dataset by using judiciously the supplied algorithms, which might help in reducing the uncertainty to some extent. DM4DEM (which stands for Data Mining for Digital Elevation Models) is an application for Graphical Information System which can do outlier detection of raster data in general, and in particular of DEM. Its interface allows the user to locate unlikely values of the elevation of the digital set using different criteria or algorithms even provided by the end user, and later edit them within the same environment. Both outlier (or blunder) detection algorithms shipped with the software do not assume any particular source for the DEM (i.e. contour lines, photogrammetric pairs, direct survey, etc.) which makes the tool very suitable for end users, which might receive the data just "as is" without metadata about its lineage. To the author´s knowledge, this is the first implementation of this feature in a popular GIS package.

1. Introduction
Recent efforts to describe the natural uncertainty of typical GIS datasets led to a review of the relationship between theory and software. Some authors claim that, despite being data quality (and accuracy as one of its components) have caught the attention of the research community, little transfer have done to the current GIS software. They simply ignore the possibility of dealing with errors, outliers and similar artifacts, despite appropriate knowledge (and software!) has been developed elsewhere. Chrisman, 1998 describes for example, the pervasive use of the least square method for most task within a GIS, even though it is well known that such algorithm is badly affected by outliers. He illustrated his point with the operation of coordinate transformations. The circle is closed by considering that the users will not be aware of the effects of uncertainty and errors since its most favorite software ignores such possibility, and thus they will not require accuracy statements from the associated metadata records to their datasets.
This paper attempts to break such circle, by describing a piece of software which can be added to a popular GIS package and allow the user to manage state-of-the-art techniques for outlier detection in DEM.

(*) To whom correspondence should be directed

2. Theory
Despite that the software has been designed to be extensible, it has two algorithms incorporated by default. The technical references are Felicísimo, 1994 (F1994) and López, 1997, 2000a (L1997).
Both methods produce an ordered list of unlikely elevations, being the most suspicious first. The simplest method is F1994. The idea is that the differences between a local interpolant and the elevation belong to a Gaussian distribution. Once estimated the parameters, a "studentized" residual distribution can be calculated, and outlying values can be unmasked trhough large values of the statistic. This is a very simple procedure, which relies in some strong hypothesis not observed in some real cases (López, 1997).
The method L1997 dissects the DEM of size mxn in elongated strips of equal width w. In the column-wise direction, each strip of size mxw can be considered as a cloud of m points in R^w. Standard statistical techniques like Principal Component Analysis can be used to analyze such cloud, picking the most unlikely points (Hawkins, 1974; López, 1997). This approach produces a set of candidates for each strip, and full coverage of the DEM is achieved by considering all the strips. The stripping can be done either column-wise or row-wise, and each produce a different set of candidates. The points belonging to the intersection of both sets are the most unlikely ones, and will be the primary candidates. Once a point in R^w is selected, a sensitivity analysis is performed to identify which of the w coordinates has the larger effect on the statistic used, thus identifying an individual pixel in the raster image, or an individual elevation point. Details are given in the original reference.

3. Computer program
This software was created to be executed either from the GRASS shell or the TclTk-Grass bar. Making use of graphical interfaces, the DM4DEM system follows the same styles of the applications TclTkGrass, so the user can work on a familiar environment.
Moreover, using the GRASS philosophy, the system follows the same programming styles that allow the product to be used cross-platform. It can be installed on different architectures, giving it more portability for his massive distribution. The development was done almost all in Linux, and was tested also on an AIX Unix system.

3.1 Features
The system is designed around the concepts of projects and runs. Each project corresponds to a single DEM under consideration, and by keeping them apart we provide some sort of multiuser environment. For each project we offer the possibility of different runs, differentiated by the parameters used for each one. Each run might correspond for a different method, and parameters. The system as such allows keeping information of the different projects the user works on, integrated with Grass tools for the visualization, storage and manipulation of results from the algorithms

Figure 1. Interface for language selection

The software has the following functions:

Pick unlikely elevation points with advanced algorithms included in the program, currently those reported by Felicísimo, 1994 and López, 1997, 2000a.
It is easily extensible. It has the flexibility to allow new algorithms to be incorporated to the menu.
Show 3D views around these points, with rotations, azimuth, and distances of view appropriately driven by sliders and buttons.
Suggests appropriate values for the suspicious elevations, with parameters that can be modified by the user.
Supports multiple languages, with spanish and english already available (see Fig. 1)
Has revision facilities, can show history of modified points and labels of maps.
Works where GRASS works, this means, the entire UNIX platform, but not limited to it.
It offers the possibility of modify the elevations by either automatic algorithms (already supplied or not) or by manual estimation

Figure 2 Spanish version of the toolbar

4. General computing flow

Fig. 3 Spanish version of the 3D rotation controls.

5. Testing and examples
For illustration purposes, we will show the DEM described by Day and Muller, 1988, corresponding to Mountagne Sainte Victoire, in France. There are two DEM´s for the same area, with different accuracy. The comparative performance of the methods have been analyzed (in terms of accuracy improvement performance) in López, 2000b. Further examples will be posted at the WEB site of the system.

Fig. 4 Candidate points and zoomed area as displayed in the system. The candidates are ordered, and they can be browsed one at a time. This plots are controlled by the interfase illustrated in figure 3

6. Conclusions
We have briefly described a (yet another!) DEM editor software. Its unique features include handling state-of-the-art algorithms for outlier detection but now integrated with a popular GIS package: GRASS. Once the algorithms suggest candidate points for been outliers, the editor can make, track and undo the changes over the original DEM, keeping different projects and different runs over each project. Through the GIS, it can display, rotate, illuminate, etc. selected areas of the DEM near the candidates, letting the user working in interactive mode. Other features include native multilingual support (currently english and spanish). It can also imputate blindly the candidates elevations with user-supplied rules, as described in the references. In order to disseminate the technology, its requirements are weak: UNIX environment, freeware software like GRASSS, tcl/tk and Octave for the main calculations. It is freely available in the WEB in binary and source form at this site

References

López, C. 2000b. DM4DEM: An experiment on the Accuracy Improvement of Photogrammetrically derived DEM. In Proceedings of the 4^th. International Symposium on Spatial Accuracy Assessment in Natural Resources and Environmental Sciences, Amsterdam, The Netherlands, 2000.
Felicísimo, A., 1994, Parametric statistical method for error detection in digital elevation models. ISPRS J. of Photogrammetry and Remote Sensing, 49, 4, 29-33.
López, C., 1997. Locating some types of random errors in Digital Terrain Models. International Journal of Geographic Information Science, 11, 7, 677-698.
López, C., 2000a. On the improving of elevation accuracy of Digital Elevation Models: a comparison of some error detection procedures Transactions in GIS, 4, 1, 43-64
Chrisman, N. R. 1998, Speaking truth to power: An agenda for Change. In Spatial Accuracy Assessment: Land Information Uncertainty in Natural Resources, ISBN 1-57504-119-7. Eds. Lowell, K. and Jaton, A. 27-31
Hawkins, D. M., 1974, The detection of errors in multivariate data, using Principal Components. Journal of the American Statistical Association, 69, 340-344
Day, T. and Muller, J.P., 1988, Quality assessment of Digital Elevation Models produced by automatic stereo matchers from SPOT image pairs. In Proceedings of the 16th. International Congress of the International Society for Photogrammetry and Remote Sensing, Kyoto. Commission III, 148-159