Molecular Dynamics Exercise 2
Exercise 2: Let's Get Our Feet a Little Wet - OpenMP
- Gaining proficiency with multi-threaded parallelism through OpenMP.
- Understanding performance considerations of multi-threaded programs.
- Learning how to run multi-threaded programs on the HPC architecture.
You can move on when...
You have a working OpenMP parallel version of the Molecular Dynamics program, and have measured its performance over the specified scenarios.
As our first attempt at parallelizing the Molecular Dynamics program, we will use the OpenMP protocol. As you learned in the lectures, OpenMP is a quick and easy way to get parallelism out of a predominantly serial code by adding multi-threaded capabilities. So while this will not run over multiple nodes of the HPC machine, it will run over multiple cores on the node. However many this is will depend on which architecture you assigned (e.g. Ranger has 16 cores per node).
- In the case of the MD program, there are several places in which OpenMP pragmas could be inserted; we should parallelize every loop we can.
- Insert an OpenMP pragma at the appropriate spot to parallelize the loop.
- Find in the documentation of the HPC architecture and learn how to compile OpenMP code on the machine. Ask for help if needed. Links to the documentation are:
- For Ranger: http://services.tacc.utexas.edu/index.php/ranger-user-guide
- For Athena: http://www.nics.tennessee.edu/computing-resources/kraken
- For BlueFire: http://www.cisl.ucar.edu/computers/bluefire/
- Submit a small test job to the queue over 1 thread and 4 threads and make sure you still get the same answers. (Look at the documentation to figure out how to run multi-threaded code)
- Test and plot the performance of the code over 1, 2, 4, 8 and 16 threads, with matrix sizes of 1000, 10000, and 10000.
- You will want to use reductions to collect the TotKin and TotPot variables across threads.
- Are you now comfortable with running multi-threaded programs on your HPC architecture? It gets more complicated from here.
- Was the scaling of the algorithm what you expected? How far could you take the multi-threaded versions?
- Have you increased the size of molecular system you can handle with this parallelization? What is limiting you?
- Try taking the OpenMP parallelism further; see if including nested parallelism improves performance.
- Try different scheduling methods to see if any provide better performance. If it does or doesn't why do you think that is so.
- A solution to this example can be found here for C/C++ and FORTRAN.