Centro de Cálculo, Facultad de Ingeniería (11), CC 30
Internet: carlos.lopez@ieee.org
Montevideo, URUGUAY
Abstract
Removing outliers from records prior of its use is a major concern in any technical or scientific field. Meteorology is not
an exception, and an important effort in devise methods has been made to locate them despite the fact that it has been
misconsidered as a purely technical task. The currently applied methods are very crude because they are mostly
computerized versions of traditional criteria, failing to exploit the capabilities of modern computer systems. Extensive
comparison among methods have not been done, no reliable statistical comparison among different outlier detection
strategies can be made without a tool for generate instances of a database contaminated with artificial errors. This paper
describes a heuristic model suitable to simulate the usual errors observed in a 30 years, ten stations, daily rain dataset,
which has been carefully checked against typing errors. We will restrict ourselves to simulate only such errors. Some
methods are discussed, namely: a) choosing at random other value in the same dataset b) choose at random other value for
the same station c) model imperfectly some driving mechanism for the errors. The results will be compared with the
observed problem, and from them we were able to show that options a) and b) underpredicts the difference between errors
and true values, while even imperfect, option c) renders satisfactory results.
Presented at:
X Congresso Brasileiro de Meteorologia, Brasilia, 26-30 October, 1998