ECE 757 Fall 2009 Homework 1 // Due in Lecture Wed Feb 18, 2009
The purpose of this assigment is to:
- give you experience writing simple shared memory programs using OpenMP and
Pthreads
- provide a gentle introduction to parallel programming
- provide a foundation for writing and/or running much more complex
programs on various parallel programming environments
This assignment is to be completed individually.
Programming environment:
You will do this assignment on niagara.ece.wisc.edu, which is a Sun Fire T2000 Server
containing a 32-thread Sun UltraSparc-T1 processor. To login to this machine
you can use your ECE login. If you do not have an ECE login, email the
instructors and we will arrange access for you.
Note: Since we are all sharing access to a single machine, please try to
start the assignment early. If multiple users are doing intensive runs at the
same time, they can greatly impact the performance of each other’s programs.
Program Description:
In this homework, you will be using OpenMP and Pthreads to parallelize a Finnish dice game
known as Kymppitonni ("Ten-thousand"). In this game, the objective is to reach
a total of 10,000 points in as few turns as possible. Each turn a player can
roll dice until they have rolled all dice or reached their point cutoff. The
point cutoff is strategically selected by the player. A high cutoff could net
more points, but also risks losing all accumulated points if the player rolls a
"dud".
The provided serial application plays multiple games of kymppitonni (per cutoff) varying the cutoff
value from a minimum cutoff to a maximum cutoff in increments of 50. Since all
games are independent from each other, this problem should be very
easy to parallelize.
The serial C code code is here.
Example call:
./kymppitonni <num games per cutoff> <minimum cutoff> <maximum cutoff>
The provided program tries to collect some statistics about the played games.
Specifically 1) The best realized cutoff point 2) The worst realized cutoff
point 3) The average point benefit obtained by playing with x dice still in
play. 4) The probability of rolling a "dud" with x dice still in play. All of
these statistics represent shared state across games which your parallel code
will have to respect.
Homework Requirements
This homework involves the completion of three primary tasks.
- Profiling and analysis of existing sequential code
- Parallelization of program with Pthreads
- Parallelization of program with OpenMP
Part 1 - Profiling and analysis
Before attempting to parallelize the program we will first attempt to estimate
the ideal scalability of the application using Amdhal’s law. The
existing application measures the total execution time in nanoseconds.
Using your best estimate, modify the program to measure the time of the expected
parallel segment and serial segment. Note: You should be using the gethrtime()
function which returns a value in terms of nanoseconds. Report the ratio of
parallel to total program execution time.
Using the expected parallel/serial ratio from your measurement, create a graph
showing the speedup expected by scaling from 1-32 processors. You only need
points for 1, 2, 4, 8, 16, and 32.
Part 2 - Pthreads implementation
In this section you will modify the existing program to properly support
threaded execution with pthreads. An excellent reference for pthreads can be
found here.
You should take
into consideration how to partition work into threads. Specifically, you
should be careful in assigning which threads execute certain games as not all
games require equal time to execute (those with very high cutoffs often take
much more time). You are completely free to restructure the code, define new
data structures, or interchange looping order - as long as individual games are
still simulated properly and global statistics are properly maintained. Also,
make sure the make the number of threads to execute a command line parameter to
your executable.
Note: The existing code makes a call to the library call random.
This function is not thread safe. You should replace this with a call to
rand_r. Look up the man page documentation for usage.
Once you have completed parallelizing your code, record the speedup using total execution
time ranging from 2-32 processors for the parallel code over the serial
version.
Part 3 - OpenMP implementation
Now parallelize your code using OpenMP rather than Pthreads. An excellent
reference for OpenMP is available here.
Note: You will also have to use rand_r instead for this code section.
Once you have completed parallelizing your code, record the speedup total execution
time ranging from 2-32 processors for the parallel code over the serial
version.
Deliverables:
Note: For each graph, you should have 2 data segments. Run the program
once with (games=10, minimum cutoff=50, maximum cutoff=7000) and again with
(games=1000, minimum cutoff=50, maximum cutoff=7000).
- Estimated parallel and sequential fractions from Part 1
- A graph showing expected speedup derived from the parallel and sequential
fractions. Graph should range from 1-32 processors.
- A graph showing your achieved speedup from parallelization with
Pthreads.
- A graph showing your achieved speedup from parallelization with
OpenMP.
- A 1-2 paragraph discussion comparing your Pthread results with the ideal
speedups expected from part 1. Make sure to discuss any observable
differences.
- A 1-2 paragraph discussion comparing your OpenMP results with the ideal
speedups expected from part 1. Make sure to discuss any observable
differences.
- Finally, directly compare the observed speedups of Pthreads and OpenMP.
You should write 1-2 paragraphs discussion which performed better and a
possible explantion for any differences
- Submit your source code using the learn@uw dropbox.
For further details on the Niagara platform (which may help you
explain your results), refer to these 752
lecture notes (starting at slide 50), or this 752 reading.