Shared-memory programming with OpenMP

Note

Starting at l7Ob.pdf, slide 30

Producers and consumers

Queue: a natural data structure to use in many multithreaded applications
- you know what a queue is
Producer: produces requests for data
Consumer: “consumes” the request by finding or generating the requested data
Message passing: each thread could have a shared message queue → when one thread wants to “send a message”, it enqueues the message in the destination thread’s queue
- a thread can receive a message by dequeuing the message at the front of the line

for (sent_msgs = 0; sent_msgs < send_max; sent_msgs++) {
    send_msg();
    try_recieve();
}
 
while (!done())
    try_recieve();

Sending messages: adding messages to the queue

    mesg = random();
    dest = random() % thread_count;
#   pragma omp critical
    enqueue(queue, dest, my_rank, mesg);

Receiving messages: dequeueing messages from the queue
- only the queue owner can dequeue messages → avoid using synchronization for dequeuing

    if (queue_size == 0) return;
    else if (queue_size == 1)
#   pragma omp critical
        dequeue(queue, &src, &mesg);
    else
        dequeue(queue, &src, &mesg);
        
    print_message(src, mesg);

Termination detection:

queue_suze = enqueued - dequeued
if (queue_size == 0 && done_sending == thread_cound)
// each thread increments ^^^^ this after completing its for loop
    return TRUE;
else
    return FALSE

Startup

when the program begins execution:
- a single thread → the master thread
  - gets command line arguments and allocates an array of message queues
this array needs to be shared among the threads
- any thread can send any other thread
- any thread can enqueue a message in any of the queues
one or more threads may finish allocating their queues before some other threads
we need an explicit barrier → blocks until all threads in the team have reached the barrier

#   pragma omp barrier

The Atomic Directive

can only protect critical sections that consist of a single C assignment statement from the following list:
- x <op> = <expression>;
  - <op> must be one of the following: +, *, -, /, &, ^, |, <<, >>
- x++;
- ++x;
- x--;
- --x;

#   pragma omp atomic

Critical section: a statement that only does a load-modify-store
many processors provide a special load-modify-store instruction

#   pragma omp atomic
    x += y++

only x is protected above, yis not protected & could get messed up

Critical Sections

OpenMP provides the option of adding a name to a critical directive

#   pragma omp critical(name)

do this when → two blocks protected with critical directives with different names can be executed simultaneously
however:
- names are set during compilation
- we want a different critical name for each thread’s queue

Locks

Lock: explicitly enforce mutual exclusion in a critical section
- consists of a data structure & functions
lock structure shared among threads:
- master thread initializes the lock
- a thread must destroy it
- before entering critical region → thread sets the lock
- after finishing with the code → thread relinquishes the lock
Lock types:
- Simple: set just once before it is unset
- Nested: can be set multiple times by the same thread

void omp_init_lock(omp_lock_t* lock_p);     /* out */
void omp_set_lock(omp_lock_t* lock_p);      /* in/out */
void omp_unset_lock(omp_lock_t* lock_p);    /* in/out */
void omp_destroy_lock(omp_lock_t* lock_p);  /* in/out */

Using locks in the message-passing program

// #   pragma omp critical
//     /* q_p = msg_queues[dest] */
//     enqueue(p_q, my_rank, mesg);
 
/* q_p = msg_queues[dest] */
omp_set_lock(&q_p->lock);
enqueue(q_p, my_rank, mesg);
omp_unset_lock(&q_p->lock);

// #   pragma omp critical
//     /* q_p = msg_queues[my_rank] */
//     dequeue(p_q, &src, &mesg);
 
/* q_p = msg_queues[my_rank] */
omp_set_lock(&q_p->lock);
dequeue(q_p, &src, &mesg);
omp_unset_lock(&q_p->lock);

Critical directives, atomic directives, or locks?

atomic and critical directives guarantee mutual exclusion across process/program
- unnamed critical sessions are mutual exclusive among themselves
use named critical directives for unrelated critical regions
locks should be used for accessing (mutual exclusion) over data structures

Caveats

do not mix the different types of mutual exclusion for a single critical section (will lead to incorrect results)
there is no guarantee of fairness in the mutual exclusion constructs
can be dangerous to nest mutual exclusion constructs

/* unnamed critical regions share the same implicit lock */
#pragma omp critical
y = f(x);
...
double f(double x) {
    #pragma omp critical
    z = g(x);   /* z is shared */
    ...
}

/* named critical regions use different locks */
#pragma omp critical(one)
y = f(x);
...
double f(double x) {
    #pragma omp critical(two)
    z = g(x);   /* z is global */
    ...
}

Matrix-vector multiplication

Help

i am too lazy to take notes on all of this, look at l7Ob.pdf slides 48-52

Thread safety

Thread-safe code: code is thread safe if it can be simultaneously executed by multiple threads without causing issues
some libraries might be strictly sequential
example:

#include <stdio.h>
#include <string.h>
#include <omp.h>
 
void Tokenize(
    char* lines[] ,   /* in/out */
    int   line_count, /* in */
    int   thread_count /* in */) 
{
    int  my_rank, i, j;
    char *my_token;
 
    #pragma omp parallel num_threads(thread_count) \
        default(none) private(my_rank, i, j, my_token) \
        shared(lines, line_count)
    {
        my_rank = omp_get_thread_num();
 
        #pragma omp for schedule(static, 1)
        for (i = 0; i < line_count; i++) {
            printf("Thread %d > line %d = %s", my_rank, i, lines[i]);
            j = 0;
 
            my_token = strtok(lines[i], " \t\n");
            while (my_token != NULL) {
                printf("Thread %d > token %d = %s\n", my_rank, j, my_token);
                my_token = strtok(NULL, " \t\n");
                j++;
            }
        } /* for i */
    } /* omp parallel */
} /* Tokenize */

Performance

Note

Start of l4a.pdf slides (notes fell off after this point)

What is performance?

defined by 2 factors:
- computational requirements (what needs to be done)
- computational resources (what can be used to do it)

Scalability

scale up: changing conditions/study of performance space

Note

Ended l4a.pdf slide 10, midterm review next lecture

Connor's Notes

⬅️ Back to portfolio

Explorer

COSC 3P93: Lecture 9 Notes

Shared-memory programming with OpenMP

Producers and consumers

Startup

The Atomic Directive

Critical Sections

Locks

Using locks in the message-passing program

Critical directives, atomic directives, or locks?

Caveats

Matrix-vector multiplication

Thread safety

Performance

What is performance?

Scalability

Graph View

Table of Contents