openmp – anderswallin.net

Here's a simple piece of c-code (try zipped version) for testing how to parallelize code with OpenMP. It compiles with
gcc -fopenmp -lm otest.c

The CPU-load while running looks like this:

Looks like two logical CPUs never get used (two low lines beyond "5" in the chart). It outputs some timing information:
running with 1 threads: runtime = 17.236827 s clock=17.230000 running with 2 threads: runtime = 8.624231 s clock=17.260000 running with 3 threads: runtime = 5.791805 s clock=17.090000 running with 4 threads: runtime = 5.241023 s clock=20.820000 running with 5 threads: runtime = 4.107738 s clock=20.139999 running with 6 threads: runtime = 4.045839 s clock=20.240000 running with 7 threads: runtime = 4.056122 s clock=20.280001 running with 8 threads: runtime = 4.062750 s clock=20.299999
which can be plotted like this:

I'm measuring the clock-cycles spent by the program using clock(), which I hope is some kind of measure of how much work is performed. Note how the amount of work increases due to overheads related to creating threads and communication between them. Another plot shows the speedup:

The i7 uses Hyper Threading to present 8 logical CPUs to the system with only 4 physical cores. Anyone care to run this on a real 8-core machine ? 🙂

Next stop is getting this to work from a Boost Python extension.

Tag: openmp

Drop-Cutter examples

OpenMP test on i7