$next$ $up$ $previous$
Next: Some Recurring Difficulties Up: Generic Components for Grid Previous: A Case Study: Grid

Efficiency

When dealing with high-level implementations of algorithms, one generally runs the risk of losing performance with respect to a low level implementation. However, the generic approach using C++ templates often allows good compilers to optimize out much of this so-called abstraction penalty. We used compilers gcc 2.95, KAI KCC v3.4 and g77 v0.5.24³(with options -O3 -fforce-addr -funroll-loops) on Linux 2.2.14 running on a 450 MHz Pentium 86686 with 512K cache. We tested several grid sizes between 400 and 250.000 cells; the ratios of run times did not show large dependencies on grid size.

A first test case was the vertex-cell incidence counting algorithm shown on page $[*]$ , using a simple array-based Fortran77 data structure for triangular grids as point of reference:

   INTEGER ntri 
   INTEGER til(1:3,1:ntri), ncells(1:nv) 
   DO 20 c = 1,ntri
      DO 10 vc = 1,3
         ncells(til(vc,c)) = ncells(til(vc,c))+1 
10    CONTINUE
20 CONTINUE

Here ntri is the number of triangles, and til(v,c) is the index of vertex number v of cell c ( $1 \leq$ v $\leq 3$ ). It turns out that in this case, the KAI C++ compiler (options used: +K3) can completely eliminate the overhead due to abstraction.

The algorithm involves indirect addressing, which is much more typical of unstructured grid algorithms than are plain vector loops in a BLAS style.

Another test case involving geometric primitives still shows a non-vanishing overhead, where the factor varies between 1.2 and 1.8. For instance, the following loop for summing up the facet normals of a cell has an overhead of about 1.8 if a generic grid geometry is used, and of 1.2 if the geometry is specialized to the grid type (that is, using low-level code inside the calculation of normal()):⁴

coord_type normal(0,0);
for(CellIterator c(aGrid); c; ++c)
  for(FacetOnCellIterator fc(*c); fc; ++fc) 
    normal += Geom.normal(fc);

The only difference between the two implementations of normal() is, that in the high-level case coordinate values are stored in a grid function and accessed via intermediate vertices (carrying an extra pointer to a grid), whereas in the low-level case, coordinate values are stored in an array and are accessed by vertex indices obtained as in the Fortran til cell-vertex incidence array above.

The options used for KCC were +K3 -abstract_float -abstract_pointer -restrict -inline_implicit_space_time=100. Omitting the last option increases run time by a factor of about 3! A reason is probably the deeper nesting of function calls inside normal() in the high-level version.

The examples presented are in some respect a worst-case scenario for the performance of generic implementations using the micro-kernel, because no real work is performed inside the loops. The performance gain obtained by using a better data layout often outweighs the difference between high-level and low-level data access.

$next$ $up$ $previous$
Next: Some Recurring Difficulties Up: Generic Components for Grid Previous: A Case Study: Grid

Guntram Berti 2000-09-29