HPC University: Introduction to Profiling

Tuesday April 30, 2013

Profiling is a technique for determining which code regions are likely candidates for parallelization. As Donald Knuth said, premature optimization is the root of all evil (at least in software engineering). In software engineering, one often has to balance simple and well-written code, and well-optimized code, so choosing when to optimize is very important. Two common heuristics for choosing where to optimize are looking for code regions that have long runtime, and looking for code regions that are called repeatedly.

Download the BCCD, burn it onto a CD or USB drive (see http://bccd.net/wiki/index.php/Burning_the_BCCD_to_CD or http://bccd.net/wiki/index.php/InstallInstructions#Create_a_bootable_USB_drive) and boot it up on another computer or in a virtual machine.

Go into the ~/Molecular-dynamics/mindy/src directory.

Open the Makefile in your favorite editor, and make these changes:

Add a CXXFLAGS line to the top of the file like this: CXXFLAGS=-pg

Add "-pg" to the CCFLAGS line in the default block, making it look like this: "CCFLAGS = -O3 -ffast-math -pg"

Build the software: nice make

Run 10000-step simulation with some of the supplied test data: ./mindy_g++ 10000 ../test/alanin.pdb ../test/alanin.psf ../test/alanin.params

You will get a file called gmon.out in the current directory. This file contains profile and code path information from the execution of mindy. You can use the gprof utility to interpret the profile information: gprof mindy_g++ gmon.out|less. You will get a lot of output covering both the time the program spent in particular code regions (called a flat profile), and also an ordered list of functions at the bottom in call-order (known as a call graph), showing where each function was called.

First examine the flat profile section at the top.

Where did mindy spend most of its time?

Can you figure out how the functions were called based on the number of times they were called in relation to the number of steps the program went through?

If you were to consider parallelizing a section of mindy, which function might you approach first and which paradigm (shared memory, distributed memory, GPGPU) might you try first? Consult the code in ComputeNonbonded.C to help you answer this question.

Now examine the call graph lower down in the output. The call graph is organized by index number representing functions, with the amount of time spent in the function and its children as a percentage of total run-time of the program, the absolute amount of time spent in the function itself, the absolute amount of time spent in the functions the function called (the child functions), and the number of times each function was called.

From examining the flat profile, you should know which function was called the most number of times and occupied the most CPU time. Use the call graph to figure out which parent function called it.

The flat profile had some functions that took up a minimal amount of time on their own. The call graph, though, will show that they are all called by the same parent function. Which parent function is this? Find this function in the ComputeBonded.C source file. Are there dependencies that would preclude easy parallelization?

Show solution

| XSEDE Code of Conduct | Not Logged In. Login

Home

Careers

Educators

Events

Resources

Students