The goal of this post is to evaluate the performance of generating visualization output for large computations produced by ASPECT. ASPECT uses deal.II to generate the visualization output. By default we generate VTU files of the unstructured mesh. Instead of generating one file per MPI rank, the output can be grouped to a specified number of files (even a single one). These files are written using MPI I/O, which should allow for fast performance.

Machine setup and striping

Computations done on Frontera, nsinker benchmark, adaptive refinement
Computations were done on 32 nodes (56 cores each)
/scratch1/ LFS filesystem has 16 OSTs (file servers), up to 60 GB/s
Striping can be enabled (-1 = maximum striping) by calling
```
lfs setstripe -c -1 <file or fikder>
```
default striping is 1 (disabled)

Results

Conclusions

This little experiment showed some interesting results:

For now, grouping to 16 without striping gives the best performance. This is the default in ASPECT.
We can achieve up to 2 GB/s in performance. This is far from the theoretical maximum.
Overall, performance is good enough: The linear solver takes about 30 seconds for 800m DoFs (IO: 5 seconds).

Future work

Compare against HDF5 output.
Check why striping is slower than writing several files.
How about 32 files instead of 16?

References

Frontera Guide