This work most significant contribution work consisted of the PTidal program development. PTidal is a general purpose second order shallow-water model implemented to be used on loosely coupled shared-nothing UNIX computers14.1. The model has been tailored for use in the Río de la Plata region and its present capabilities and performance are well suited for real time engineering applications and predictions, like environmental risk assessment. Another important contribution was the inclusion of the low reflective boundary condition on the scheme, which was unavoidable in the Río de la Plata application and in the infinite length channel simulation as well.
Two codes variations were developed: the Explicit-Implicit and the Fully Implicit. They both attain a considerable speedup relative to the serial model. The utilization result is, as expected, better in the denser meshes due to the coarser grain parallelism available in that case. Those achievements fulfill our goals, in sum, we want to:
The numerical scheme has a good accuracy, second order at least. This was numerically verified. The comparison with the analytical 1D solution, introduced in this research, showed a quasi-exact agreement between the analytical solution and the model results.
Each of both two developed parallel versions have their pros and cons. The Explicit-Implicit program has better parallelism, leading to less wasted time in the processes waiting for messages. It has the disadvantage of the CFL constraint in the time step. The Fully Implicit code, instead, is not as good parallelized due to the high contention between processes in the distributed Tri-Diagonal solver. That drawback is not so overwhelming in our case, working on clusters of small number of nodes. On the other hand, this code doesn't need to fulfill the CFL condition, but the stability condition explained below which is less restrictive.
In practical applications over the Río de la Plata we have concluded that the time step must be further limited due to the steep variations in the sea bottom and the large depth near the continental shelf. This limitation is significant mainly while simulating wind storm tides. We successfully tested the Full-Implicit model with a maximum of 2 minutes time step and the Explicit-Implicit model using 1 minute step in the 1 km grid size mesh.
Just a few of the available results of the model application have been included here. For the astronomic and meteorologic tide simulations we sketched a comparison of level results, cotidal and corange maps, velocity fields for both tidal regimes, highlighting the prominent influence of the storm surges on the region.
The slight difference between data and model prediction for the astronomic application tide may be explained by many reasons. The physical domain is very complex. The South artificial boundary, where an a priory prediction of the wave is made, is a source of uncertainty. The astronomic tide used as ``real register'' to comparison is computed from historic records, which may be inaccurate. The bottom sea profile employed may be a source of disagreement as well.
With the wind storm the physical problem is even more complex. Coast winds must be extrapolated into the grid points, since no data is available inside the Río de la Plata. Nevertheless the agreement between the tide registers and the model prediction is qualitatively good.
Since the agreement between data and prediction is good we consider that PTidal is a very useful tool for use in engineering applications such the one introduced in figure 8.12.
There are results of this work which are in the field of parallel architectures computing as well. We found an outstanding performance of the ''parallel-virtual'' equipment over the shared memory (SMP) equipment, despite it has higher communication bandwidth and smaller latency time. The afore mentioned result verifies Warren et al.'s (1997) finding, that floating point programs performance depend deeply on memory bandwidth rather than on network performance. Since SMP equipment must share the memory bus across its 4 processors, it behaves worse than the PVM cluster where each node processor employs all the available memory bandwidth.
The smaller speedup obtained in the case of the PC cluster is not disappointing. The performance/price ratio is even better in this equipment due to the lower price components. The other advantages are: scalability, use of commodity components, ease of use, easier support, etc. Using 6 relatively small PC (Pentium 133 MHz), we obtained a sustained rate of 52 MFLOPS, far more than the powerful workstation available at our department.