Shared-memory programming with OpenMP
Note
Starting at
l7Ob.pdf
, slide 30
Producers and consumers
- Queue: a natural data structure to use in many multithreaded applications
- you know what a queue is
- Producer: produces requests for data
- Consumer: “consumes” the request by finding or generating the requested data
- Message passing: each thread could have a shared message queue → when one thread wants to “send a message”, it enqueues the message in the destination thread’s queue
- a thread can receive a message by dequeuing the message at the front of the line
for (sent_msgs = 0; sent_msgs < send_max; sent_msgs++) {
send_msg();
try_recieve();
}
while (!done())
try_recieve();
- Sending messages: adding messages to the queue
mesg = random();
dest = random() % thread_count;
# pragma omp critical
enqueue(queue, dest, my_rank, mesg);
- Receiving messages: dequeueing messages from the queue
- only the queue owner can dequeue messages → avoid using synchronization for dequeuing
if (queue_size == 0) return;
else if (queue_size == 1)
# pragma omp critical
dequeue(queue, &src, &mesg);
else
dequeue(queue, &src, &mesg);
print_message(src, mesg);
- Termination detection:
queue_suze = enqueued - dequeued
if (queue_size == 0 && done_sending == thread_cound)
// each thread increments ^^^^ this after completing its for loop
return TRUE;
else
return FALSE
Startup
- when the program begins execution:
- a single thread → the master thread
- gets command line arguments and allocates an array of message queues
- a single thread → the master thread
- this array needs to be shared among the threads
- any thread can send any other thread
- any thread can enqueue a message in any of the queues
- one or more threads may finish allocating their queues before some other threads
- we need an explicit barrier → blocks until all threads in the team have reached the barrier
# pragma omp barrier
The Atomic Directive
- can only protect critical sections that consist of a single C assignment statement from the following list:
x <op> = <expression>;
<op>
must be one of the following:+
,*
,-
,/
,&
,^
,|
,<<
,>>
x++;
++x;
x--;
--x;
# pragma omp atomic
- Critical section: a statement that only does a load-modify-store
- many processors provide a special load-modify-store instruction
# pragma omp atomic
x += y++
only
x
is protected above,y
is not protected & could get messed up
Critical Sections
- OpenMP provides the option of adding a name to a critical directive
# pragma omp critical(name)
- do this when → two blocks protected with critical directives with different names can be executed simultaneously
- however:
- names are set during compilation
- we want a different critical name for each thread’s queue
Locks
- Lock: explicitly enforce mutual exclusion in a critical section
- consists of a data structure & functions
- lock structure shared among threads:
- master thread initializes the lock
- a thread must destroy it
- before entering critical region → thread sets the lock
- after finishing with the code → thread relinquishes the lock
- Lock types:
- Simple: set just once before it is unset
- Nested: can be set multiple times by the same thread
void omp_init_lock(omp_lock_t* lock_p); /* out */
void omp_set_lock(omp_lock_t* lock_p); /* in/out */
void omp_unset_lock(omp_lock_t* lock_p); /* in/out */
void omp_destroy_lock(omp_lock_t* lock_p); /* in/out */
Using locks in the message-passing program
// # pragma omp critical
// /* q_p = msg_queues[dest] */
// enqueue(p_q, my_rank, mesg);
/* q_p = msg_queues[dest] */
omp_set_lock(&q_p->lock);
enqueue(q_p, my_rank, mesg);
omp_unset_lock(&q_p->lock);
// # pragma omp critical
// /* q_p = msg_queues[my_rank] */
// dequeue(p_q, &src, &mesg);
/* q_p = msg_queues[my_rank] */
omp_set_lock(&q_p->lock);
dequeue(q_p, &src, &mesg);
omp_unset_lock(&q_p->lock);
Critical directives, atomic directives, or locks?
- atomic and critical directives guarantee mutual exclusion across process/program
- unnamed critical sessions are mutual exclusive among themselves
- use named critical directives for unrelated critical regions
- locks should be used for accessing (mutual exclusion) over data structures
Caveats
- do not mix the different types of mutual exclusion for a single critical section (will lead to incorrect results)
- there is no guarantee of fairness in the mutual exclusion constructs
- can be dangerous to nest mutual exclusion constructs
/* unnamed critical regions share the same implicit lock */
#pragma omp critical
y = f(x);
...
double f(double x) {
#pragma omp critical
z = g(x); /* z is shared */
...
}
/* named critical regions use different locks */
#pragma omp critical(one)
y = f(x);
...
double f(double x) {
#pragma omp critical(two)
z = g(x); /* z is global */
...
}
Matrix-vector multiplication
Help
i am too lazy to take notes on all of this, look at
l7Ob.pdf
slides 48-52
Thread safety
- Thread-safe code: code is thread safe if it can be simultaneously executed by multiple threads without causing issues
- some libraries might be strictly sequential
- example:
#include <stdio.h>
#include <string.h>
#include <omp.h>
void Tokenize(
char* lines[] , /* in/out */
int line_count, /* in */
int thread_count /* in */)
{
int my_rank, i, j;
char *my_token;
#pragma omp parallel num_threads(thread_count) \
default(none) private(my_rank, i, j, my_token) \
shared(lines, line_count)
{
my_rank = omp_get_thread_num();
#pragma omp for schedule(static, 1)
for (i = 0; i < line_count; i++) {
printf("Thread %d > line %d = %s", my_rank, i, lines[i]);
j = 0;
my_token = strtok(lines[i], " \t\n");
while (my_token != NULL) {
printf("Thread %d > token %d = %s\n", my_rank, j, my_token);
my_token = strtok(NULL, " \t\n");
j++;
}
} /* for i */
} /* omp parallel */
} /* Tokenize */
Performance
Note
Start of
l4a.pdf
slides (notes fell off after this point)
What is performance?
- defined by 2 factors:
- computational requirements (what needs to be done)
- computational resources (what can be used to do it)
Scalability
- scale up: changing conditions/study of performance space
Note
Ended
l4a.pdf
slide 10, midterm review next lecture