Jacobi Exercise 1
Exercise 1: Starting Out
- Getting familiar with the high-performance computing platform you will be using for the workshop.
- Getting familiar with the Jacobi iteration algorithm used in all of these exercises.
You can move on when?You have successfully compiled, submitted, and competed a run with the Jacobi program and completed a plot of the scaling of the algorithm with respect to matrix dimension.
DescriptionIn Exercise 1, you will become familiar with the serial version of the algorithm described in Background section. A reference implementation will be provided, with your task to examine and make sure you understand it, compile it on your HPC architecture, and then submit several runs of differing matrix sizes to view the performance characteristics of the code and the processors in your machine.
The program can be downloaded at:
- For Kraken: CC jacobi.cpp -o jacobi
- For Ranger: pgCC jacobi.cpp -o jacobi
- For Bluefire: xlC jacobi.cpp -o jacobi
- For Kraken: ftn jacobi.F -o jacobi
- For Ranger: pgf90 jacobi.F -o jacobi
- For Bluefile: xlF jacobi.F -o jacobi
jacobi <Dimension> <NumIteration> <RowPeek> <ColPeek> Dimension - The size of one side of the square matrix NumIterations - The number of fixed iterations RowPeek, ColPeek - Specify the x,y coordinates on the grid of an
Because of indexing the FORTRAN version of the code produces a different answer to the same command line, the answer will be 1.9977370057.
- Download the serial version of the code in your language of choice.
- Spend some time looking over the code, if there is something you don't understand, please ask an instructor to help.
- Compile the code with optimization level -O3.
- Test the code on a very small matrix (e.g. the inputs 10 100 3 3 should give 22.622).
- Submit the following matrix sizes for 100 iterations to the queue: 128, 256, 512, 1024, 4096.
- Make a plot of matrix dimension vs. time reported to determine the scaling of the algorithm.
Questions to Ponder...
- The scaling of your algorithm with matrix size should be relatively straight forward to determine, are the results what you expected. If not, can you think why?
- What may limit the size of system you can do with this serial algorithm.
- Are there any compiler flags beyond -O3 that enhance the serial performance of the code?
- Are there any programmatic enhancements that could be made to improve performance?
- One could analyze this algorithm with in-depth performance tools to understand why it performance at certain sizes.
- The queue submission script for this exercise should be fairly similar to the one you used for the example hello_world at the beginning of the workshop.