Welcome
Participating Sites
Teaching Assistants
Group Discussion Etiquette
Software Requirements
Getting Started on TeraGrid
Course Schedule
Presentation Materials for Week
  + Keynote
  + Introduction to HPC Systems
  + Hybrid MPI Programming
  + Multi-core Programming
  + Totalview Debugging Techniques
  + Parallel I/O
  + Experience from the Field
  + Eclipse
  + DDT Debugging Techniques
  + Numerical Libraries
  + Performance and Code Profiling
  + Visualization
    > Overview and Introduction to Scientific Visualization
    > Parallel Visualization, Data Formatting, Software Overview
    > Hands-on Tutorial: VisIt
    > Hands-on Tutorial: ParaView
    > Sample Datasets
Biographies for Presenters
General Exercises
  + Jacobi Exercise 1
  + Jacobi Exercise 2
  + Jacobi Exercise 3
  + Jacobi Exercise 4
  + Jacobi Exercise 5
Molecular Dynamics Background
  + Molecular Dynamics Exercise 1
  + Molecular Dynamics Exercise 2
  + Molecular Dynamics Exercise 3
  + Molecular Dynamics Exercise 4
  + Molecular Dynamics Exercise 5
Access to Other Training Resources

Molecular Dynamics Exercise 2

Exercise 2: Let's Get Our Feet a Little Wet - OpenMP

Objectives:

Gaining proficiency with multi-threaded parallelism through OpenMP.
Understanding performance considerations of multi-threaded programs.
Learning how to run multi-threaded programs on the HPC architecture.

You can move on when...

You have a working OpenMP parallel version of the Molecular Dynamics program, and have measured its performance over the specified scenarios.

Description

As our first attempt at parallelizing the Molecular Dynamics program, we will use the OpenMP protocol. As you learned in the lectures, OpenMP is a quick and easy way to get parallelism out of a predominantly serial code by adding multi-threaded capabilities. So while this will not run over multiple nodes of the HPC machine, it will run over multiple cores on the node. However many this is will depend on which architecture you assigned (e.g. Ranger has 16 cores per node).

Instructions

In the case of the MD program, there are several places in which OpenMP pragmas could be inserted; we should parallelize every loop we can.
Insert an OpenMP pragma at the appropriate spot to parallelize the loop.
Find in the documentation of the HPC architecture and learn how to compile OpenMP code on the machine. Ask for help if needed. Links to the documentation are:

For Ranger: http://services.tacc.utexas.edu/index.php/ranger-user-guide
For Athena: http://www.nics.tennessee.edu/computing-resources/kraken
For BlueFire: http://www.cisl.ucar.edu/computers/bluefire/

Submit a small test job to the queue over 1 thread and 4 threads and make sure you still get the same answers. (Look at the documentation to figure out how to run multi-threaded code)
Test and plot the performance of the code over 1, 2, 4, 8 and 16 threads, with matrix sizes of 1000, 10000, and 10000.

Tips

You will want to use reductions to collect the TotKin and TotPot variables across threads.

Questions

Are you now comfortable with running multi-threaded programs on your HPC architecture? It gets more complicated from here.
Was the scaling of the algorithm what you expected? How far could you take the multi-threaded versions?
Have you increased the size of molecular system you can handle with this parallelization? What is limiting you?

Extra Credit

Try taking the OpenMP parallelism further; see if including nested parallelism improves performance.
Try different scheduling methods to see if any provide better performance. If it does or doesn't why do you think that is so.

Hints

A solution to this example can be found here for C/C++ and FORTRAN.