Shared-memory programming with OpenMP

Note

Starting at l7Ob.pdf, slide 30

Producers and consumers

  • Queue: a natural data structure to use in many multithreaded applications
    • you know what a queue is
  • Producer: produces requests for data
  • Consumer: “consumes” the request by finding or generating the requested data
  • Message passing: each thread could have a shared message queue when one thread wants to “send a message”, it enqueues the message in the destination thread’s queue
    • a thread can receive a message by dequeuing the message at the front of the line
for (sent_msgs = 0; sent_msgs < send_max; sent_msgs++) {
    send_msg();
    try_recieve();
}
 
while (!done())
    try_recieve();
  • Sending messages: adding messages to the queue
    mesg = random();
    dest = random() % thread_count;
#   pragma omp critical
    enqueue(queue, dest, my_rank, mesg);
  • Receiving messages: dequeueing messages from the queue
    • only the queue owner can dequeue messages avoid using synchronization for dequeuing
    if (queue_size == 0) return;
    else if (queue_size == 1)
#   pragma omp critical
        dequeue(queue, &src, &mesg);
    else
        dequeue(queue, &src, &mesg);
        
    print_message(src, mesg);
  • Termination detection:
queue_suze = enqueued - dequeued
if (queue_size == 0 && done_sending == thread_cound)
// each thread increments ^^^^ this after completing its for loop
    return TRUE;
else
    return FALSE

Startup

  • when the program begins execution:
    • a single thread the master thread
      • gets command line arguments and allocates an array of message queues
  • this array needs to be shared among the threads
    • any thread can send any other thread
    • any thread can enqueue a message in any of the queues
  • one or more threads may finish allocating their queues before some other threads
  • we need an explicit barrier blocks until all threads in the team have reached the barrier
#   pragma omp barrier

The Atomic Directive

  • can only protect critical sections that consist of a single C assignment statement from the following list:
    • x <op> = <expression>;
      • <op> must be one of the following: +, *, -, /, &, ^, |, <<, >>
    • x++;
    • ++x;
    • x--;
    • --x;
#   pragma omp atomic
  • Critical section: a statement that only does a load-modify-store
  • many processors provide a special load-modify-store instruction
#   pragma omp atomic
    x += y++

only x is protected above, yis not protected & could get messed up

Critical Sections

  • OpenMP provides the option of adding a name to a critical directive
#   pragma omp critical(name)
  • do this when two blocks protected with critical directives with different names can be executed simultaneously
  • however:
    • names are set during compilation
    • we want a different critical name for each thread’s queue

Locks

  • Lock: explicitly enforce mutual exclusion in a critical section
    • consists of a data structure & functions
  • lock structure shared among threads:
    • master thread initializes the lock
    • a thread must destroy it
    • before entering critical region thread sets the lock
    • after finishing with the code thread relinquishes the lock
  • Lock types:
    • Simple: set just once before it is unset
    • Nested: can be set multiple times by the same thread
void omp_init_lock(omp_lock_t* lock_p);     /* out */
void omp_set_lock(omp_lock_t* lock_p);      /* in/out */
void omp_unset_lock(omp_lock_t* lock_p);    /* in/out */
void omp_destroy_lock(omp_lock_t* lock_p);  /* in/out */

Using locks in the message-passing program

// #   pragma omp critical
//     /* q_p = msg_queues[dest] */
//     enqueue(p_q, my_rank, mesg);
 
/* q_p = msg_queues[dest] */
omp_set_lock(&q_p->lock);
enqueue(q_p, my_rank, mesg);
omp_unset_lock(&q_p->lock);
// #   pragma omp critical
//     /* q_p = msg_queues[my_rank] */
//     dequeue(p_q, &src, &mesg);
 
/* q_p = msg_queues[my_rank] */
omp_set_lock(&q_p->lock);
dequeue(q_p, &src, &mesg);
omp_unset_lock(&q_p->lock);

Critical directives, atomic directives, or locks?

  • atomic and critical directives guarantee mutual exclusion across process/program
    • unnamed critical sessions are mutual exclusive among themselves
  • use named critical directives for unrelated critical regions
  • locks should be used for accessing (mutual exclusion) over data structures

Caveats

  1. do not mix the different types of mutual exclusion for a single critical section (will lead to incorrect results)
  2. there is no guarantee of fairness in the mutual exclusion constructs
  3. can be dangerous to nest mutual exclusion constructs
/* unnamed critical regions share the same implicit lock */
#pragma omp critical
y = f(x);
...
double f(double x) {
    #pragma omp critical
    z = g(x);   /* z is shared */
    ...
}
/* named critical regions use different locks */
#pragma omp critical(one)
y = f(x);
...
double f(double x) {
    #pragma omp critical(two)
    z = g(x);   /* z is global */
    ...
}

Matrix-vector multiplication

Help

i am too lazy to take notes on all of this, look at l7Ob.pdf slides 48-52

Thread safety

  • Thread-safe code: code is thread safe if it can be simultaneously executed by multiple threads without causing issues
  • some libraries might be strictly sequential
  • example:
#include <stdio.h>
#include <string.h>
#include <omp.h>
 
void Tokenize(
    char* lines[] ,   /* in/out */
    int   line_count, /* in */
    int   thread_count /* in */) 
{
    int  my_rank, i, j;
    char *my_token;
 
    #pragma omp parallel num_threads(thread_count) \
        default(none) private(my_rank, i, j, my_token) \
        shared(lines, line_count)
    {
        my_rank = omp_get_thread_num();
 
        #pragma omp for schedule(static, 1)
        for (i = 0; i < line_count; i++) {
            printf("Thread %d > line %d = %s", my_rank, i, lines[i]);
            j = 0;
 
            my_token = strtok(lines[i], " \t\n");
            while (my_token != NULL) {
                printf("Thread %d > token %d = %s\n", my_rank, j, my_token);
                my_token = strtok(NULL, " \t\n");
                j++;
            }
        } /* for i */
    } /* omp parallel */
} /* Tokenize */

Performance

Note

Start of l4a.pdf slides (notes fell off after this point)

What is performance?

  • defined by 2 factors:
    • computational requirements (what needs to be done)
    • computational resources (what can be used to do it)

Scalability

  • scale up: changing conditions/study of performance space

Note

Ended l4a.pdf slide 10, midterm review next lecture