Accuracy improvement of GIS datasets
DM4DEM: A GRASS-compatible tool for blunder detection of DEM
"DM4DEM: a GRASS-compatible tool for blunder detection of DEM", 2000 Durañona, G. and López, C. 4th. International Symposium on Spatial Accuracy Assessment in Natural Resources and Environmental Sciences, Amsterdam, The Netherlands, July 12-14, 191-194
Abstract
DM4DEM (which stands for Data Mining for Digital Elevation Models) is an application for Graphical Information System which can do outlier detection of raster data in general, and in particular of DEM. The goal here is to improve the accuracy of the dataset by using just the available DEM and the supplied algorithms. Its interface allows the user to locate unlikely values of the elevation of the digital set using either algorithms already included or even new ones provided by the end user. Once located, the outliers can be edited within the same environment. Both outlier (or blunder) detection algorithms shipped with the software do not assume any particular source for the DEM (i.e. contour lines, photogrammetric pairs, direct survey, etc.) which makes the tool very suitable for end users, which might receive the data just "as is" without any extra information. To the authorīs knowledge, this is the first implementation of this feature in a popular GIS package.
Current GIS software simply neglects any accuracy information about the data. Users are unaware of the accuracy effects on the final result, partly because their favorite software ignores such possibility. If they realize the effects, specific tools to improve accuracy will be required. This paper describes one of such tools.
Despite that the software has been designed to be extensible, it has two algorithms incorporated by default. They are described in deep in Felicísimo, A., 1994, J. of Photogrammetry and Remote Sensing, 49, 4, 29-33; López, C., 1997, IJGIS, 11, 7, 677-698 and López, C., 2000 Transactions in GIS, 4, 1, 43-64, where some numerical results are also presented. Both methods produce an ordered list of unlikely elevations, being the most suspicious first. The simplest method is F1994. The idea is that the differences between a local interpolant and the elevation belong to a Gaussian distribution. Once estimated the parameters, a "studentized" residual distribution can be calculated, and outlying values can be unmasked trhough large values of the statistic. This is a very simple procedure, which relies in some strong hypothesis not observed in some real cases.
The method L1997 dissects the DEM of size mxn in elongated strips of equal width w. In the column-wise direction, each strip of size mxw can be interpreted as a cloud of m points in Rw. Standard statistical techniques like Principal Component Analysis can be used to analyze such cloud, picking the most unlikely points in Rw. This approach produces a set of candidates for each strip, and full coverage of the DEM is achieved by considering all the strips. The stripping can be done either column-wise or row-wise, and each produce a different set of candidates. The points belonging to the intersection of both sets are the most unlikely ones, and will be the primary candidates. Once a point in Rw is selected, a sensitivity analysis is performed to identify which of the w coordinates has the larger effect on the statistic used, thus identifying an individual pixel in the raster image, or an individual elevation point. Details are given in the original reference.
This software was created to be executed either from the GRASS shell or the TclTk-Grass bar. Making use of graphical interfaces, the DM4DEM system follows the same styles of the applications TclTkGrass, so the user can work on a familiar environment. Moreover, using the GRASS philosophy, the system follows the same programming styles that allow the product to be used cross-platform. It can be installed on different architectures, giving it more portability for his massive distribution. The development was done almost all in Linux, and was tested also on an AIX Unix system.
The system is designed around the concepts of projects and runs. Each project corresponds to a single DEM under consideration, and by keeping them apart we provide some sort of multiuser environment. For each project we offer the possibility of different runs, differentiated by the parameters used for each one. Each run might correspond for a different method, and parameters. The system as such allows keeping information of the different projects the user works on, integrated with Grass tools for the visualization, storage and manipulation of results from the algorithms. The software has the following functions:
Figure 1. Interface for language selection
Figure 2 Spanish version of the toolbar
The product is freely available on the WEB, both in source and compiled form for the abovementioned operating systems.
To run DM4DEM you must first launch either GRASS or TclTk-GRASS from the X-Windows environment. If you are using TclTk-GRASS you have the option to add a command to the TclTk-GRASS menu bar, but you can also call the software from the command line. To work with a DEM, first you create a new empty project or open a previous session. The import of elevation maps to DM4DEM is done easily by the menu driven window shown in Fig. 2, where you have also the option to change language, to run one of the algorithms included with the package or add a new one, or request some help. Then you are ready to search for unlikely elevation points. After some seconds an ordered list of candidate points is produced, and you can move back and forward between them. It is possible to modify them manually or automatically using a user defined estimation formulae. In manual mode, the point under analysis is highlighted and the current and suggested elevation values are presented.
Also, you have the option to rotate the DEM in the neighborhood of a point and change the Field of view, with tools illustrated in Fig. 3. The changes are displayed in real time through an independent GRASS window that can help you with the visualization and make yourself comfortable and confident with the editing process. The project keeps a history of the candidates found so you can save the project anytime and resume your work later. Also you can save notes about the project you are working with.
Fig. 3 Spanish version of the 3D rotation controls.
Once you are satisfied with the DEM obtained, you can save the project or you can export the DEM as a single map to the GRASS environment. All this procedures are explained in the on line help that comes with the software and also from the web site.
Fig. 4 Candidate points and zoomed area as displayed in the system. The candidates are ordered, and they can be browsed one at a time. This plots are controlled by the interface illustrated in figure 3
4. Conclusions
We have briefly described a (yet another!) DEM editor software. Its unique features include handling state-of-the-art algorithms for outlier detection fully integrated with a popular GIS package: GRASS. Once the algorithms suggest candidate elevation for been outliers, the editor can make, track and undo the changes over the original DEM, keeping different projects and different runs over each project. Through the GIS, it can display, rotate, illuminate, etc. selected areas of the DEM near the candidates, letting the user working in interactive mode. Other features include native multilingual support (currently english and spanish). It can also blindly imputate the candidates elevations with user-supplied rules, as described in the references. In order to disseminate the technology, its hardware and software requirements are weak: UNIX environment, freeware software like GRASSS, tcl/tk and MATLAB or Octave for the main calculations. It is freely available in the WEB in binary and source form at http://www.interware.org/dm4dem
Biography
Gonzalo Durañona
Carlos López holds a PhD degree in geoinformatics. He has been active in the academic and private sector regarding quality control of different GIS-type datasets, including DEM, Census Data, meteorological data, etc. using advanced methods like High Breakdown Statistics, Artificial Neural Networks, Cross Validation Maximum Likelihood, etc. or developing new ones. His work has been published in journals and conferences. He is currently with the National Clearinghouse of Uruguay, in charge of QC of GIS datasets.
Affiliation:
Author's Name: Gonzalo Durañona
Academic title(s):
Position:
Company's name:
Full address:
Phone:
Fax:
Email: gonzalod@interware.org
Author's Name: Carlos López
Academic title(s): Ph. D.
Position: Consultant
Company's name: Clearinghouse Nacional de Datos Geográficos
Full address:
Rincón 575
CP 11400
Montevideo
URUGUAY
Phone: +598 99 138259
Fax +598 2 3367447 ext 107
Email: carlos.lopez@ieee.org