This problem uses debugging tools, e.g.
TotalView,
idb, or
gdb, to test your knowledge of Un*x/MPI debugging.
As an attached resource, you will find a piece of C code titled 'bug_code.c' (check the resources for "BugCode"). The crux of this problem is to find out why this program isn't completing successfully when using multiple nodes.
A successful completion of this problem will require you to debug the issues in this process.
Your solution should include:
* A description of how you went about debugging this program.
* A description of how you attached to the program in order to assess the situation (if applicable)
* A description of how you resolved the issue that was preventing the program from normal termination
* A description of how to change the source code to obtain the desired outcome.
* You may need to use X-Forwarding (the '-X' command-line option to ssh) if you plan on using a debugger such as TotalView installed on a remote machine.
* Research ways to modify variables under the debugging tools that you have access to.
Please Note:This and many of the following problem utilize the Message Passing Interface (MPI). Provided below are some helpful links for implementations and general information related to MPI; you may find these useful for future challenges.
-
http://www.mcs.anl.gov/research/projects/mpi/tutorial/mpiexmpl/contents.html: MPI Tutorials
-
This blog gives some convenient tips on setting up Visual Studio and Mac OSX with MPI.
-
MPICH2: MPI Implementation on Windows
-A
variety of other MPI implementations.
-
MPI Cheat Sheet
There are two errors present, both of which are related to the for-loop that calculates the mod for each process, and neither of which pertain to MPI. The first error is an arithmetic error. Because my_mod and source are undefined, the loop cannot generate a value using garbage values. Source is relatively unimportant, so changing it to 1 and my_mod to equal my_rank solves this problem and provides unique mod results for each process. This error pertains to lines 16 and 30.
The second error is in the test expression for the for-loop. Because i is a signed short integer, when it increments past SHRT_MAX, it will wrap around to negative values and never end the loop. Changing i to be an integer instead of a short integer solves this error. This error is found in lines 17 and 29.