We've already seen:
Let's now finish up our discussion of synchronization by talking about:
Readers and writer locks are used for exactly what their name implies: multiple readers can be using a resource, with no writers, or one writer can be using a resource with no other writers or readers. This situation occurs often enough to warrant a special kind of synchronization primitive devoted exclusively to that purpose.
Often you have a data structure that's shared by a bunch of threads. Obviously, only one thread can be writing to the data structure at a time. If more than one thread was writing, then the threads can potentially overwrite each other's data. To prevent this from happening, the writing thread obtains the rwlock (the readers/writer lock) in an exclusive manner, meaning that it and only it has access to the data structure. Note that the exclusivity of the access is controlled strictly by voluntary means. It's up to you, the system designer, to ensure that all threads that touch the data area synchronize by using the rwlocks.
The opposite occurs with readers. Since reading a data area is a non-destructive operation, any number of threads can be reading the data (even if it's the same piece of data that another thread is reading). An implicit point here is that no threads can be writing to the data area while any thread or threads are reading from it. Otherwise, the reading threads may be confused by reading a part of the data, getting preempted by a writing thread, and then, when the reading thread resumes, continue reading data, but from a newer update of the data. A data inconsistency then results.
Let's look at the calls that you use with rwlocks.
The first two calls are used to initialize the library's internal storage areas for the rwlocks:
int
pthread_rwlock_init (pthread_rwlock_t *lock,
const pthread_rwlockattr_t *attr);
int
pthread_rwlock_destroy (pthread_rwlock_t *lock);
The pthread_rwlock_init() function takes the lock argument (of type pthread_rwlock_t) and initializes it based on the attributes specified by attr. We're just going to use an attribute of NULL in our examples, which means, use the defaults. For detailed information about the attributes, see the BlackBerry 10 OS C Library Reference for pthread_rwlockattr_init() , pthread_rwlockattr_destroy() , pthread_rwlockattr_getpshared() , and pthread_rwlockattr_setpshared() .
When done with the rwlock, you typically call pthread_rwlock_destroy() to destroy the lock, which invalidates it. You should never use a lock that is either destroyed or hasn't been initialized yet.
Next we need to fetch a lock of the appropriate type. As mentioned above, there are basically two modes of locks: a reader wants non-exclusive access, and a writer wants exclusive access. To keep the names simple, the functions are named after the user of the locks:
int pthread_rwlock_rdlock (pthread_rwlock_t *lock); int pthread_rwlock_tryrdlock (pthread_rwlock_t *lock); int pthread_rwlock_wrlock (pthread_rwlock_t *lock); int pthread_rwlock_trywrlock (pthread_rwlock_t *lock);
There are four functions instead of the two that you may have expected. The expected functions are pthread_rwlock_rdlock() and pthread_rwlock_wrlock() , which are used by readers and writers, respectively. These are blocking calls—if the lock isn't available for the selected operation, the thread blocks. When the lock becomes available in the appropriate mode, the thread unblocks. Because the thread unblocked from the call, it can now assume that it's safe to access the resource protected by the lock.
Sometimes, though, a thread won't want to block, but instead wants to see if it can get the lock. That's what the try versions are for. It's important to note that the try versions obtain the lock if they can, but if they can't, then they won't block, but instead just return an error indication. The reason they have to obtain the lock if they can is simple. Suppose that a thread wanted to obtain the lock for reading, but didn't want to wait in case it wasn't available. The thread calls pthread_rwlock_tryrdlock() , and is told that it can have the lock. If the pthread_rwlock_tryrdlock() didn't allocate the lock, then bad things can happen — another thread can preempt the one that was told to go ahead, and the second thread can lock the resource in an incompatible manner. Since the first thread wasn't actually given the lock, when the first thread goes to actually acquire the lock (because it was told it could), it would use pthread_rwlock_rdlock(), and now it would block, because the resource was no longer available in that mode. So, if we didn't lock it if we could, the thread that called the try version can still potentially block anyway!
Finally, regardless of the way that the lock was used, we need some way of releasing the lock:
int pthread_rwlock_unlock (pthread_rwlock_t *lock);
Once a thread has done whatever operation it wanted to do on the resource, it would release the lock by calling pthread_rwlock_unlock() . If the lock is now available in a mode that corresponds to the mode requested by another waiting thread, then that thread would be made READY.
Note that we can't implement this form of synchronization with just a mutex. The mutex acts as a single-threading agent, which is okay for the writing case (where you want only one thread to be using the resource at a time) but it falls flat in the reading case, because only one reader is allowed. A semaphore can't be used either, because there's no way to distinguish the two modes of access—a semaphore allows multiple readers, but if a writer were to acquire the semaphore, as far as the semaphore is concerned this is no different from a reader acquiring it, and now you have the ugly situation of multiple readers and one or more writers!
Another common situation that occurs in multithreaded programs is the need for a thread to wait until something happens. This something can be anything! It can be the fact that data is now available from a device, or that a conveyor belt has now moved to the proper position, or that data has been committed to disk, or whatever. Another twist to throw in here is that several threads may need to wait for the given event. To accomplish this, we use either a condition variable or the much simpler sleepon lock.
To use sleepon locks, you actually need to perform several operations. Let's look at the calls first, and then look at how you use the locks.
int pthread_sleepon_lock (void); int pthread_sleepon_unlock (void); int pthread_sleepon_broadcast (void *addr); int pthread_sleepon_signal (void *addr); int pthread_sleepon_wait (void *addr);
As described above, a thread needs to wait for something to happen. The most obvious choice in the list of functions above is the pthread_sleepon_wait() . But first, the thread needs to check if it really does have to wait. Let's set up an example. One thread is a producer thread that's getting data from some piece of hardware. The other thread is a consumer thread that's doing some form of processing on the data that just arrived. Let's look at the consumer first:
volatile int data_ready = 0;
consumer ()
{
while (1) {
while (!data_ready) {
// WAIT
}
// process data
}
}
The consumer is sitting in its main processing loop (the while (1)); it's going to do its job forever. The first thing it does is look at the data_ready flag. If this flag is a 0, it means there's no data ready. Therefore, the consumer should wait. Somehow, the producer wakes it up, at which point the consumer should reexamine its data_ready flag. Let's say that's exactly what happens, and the consumer looks at the flag and decides that it's a 1, meaning data is now available. The consumer goes off and processes the data, and then goes to see if there's more work to do, and so on.
We're going to run into a problem here. How does the consumer reset the data_ready flag in a synchronized manner with the producer? Obviously, we're going to need some form of exclusive access to the flag so that only one of those threads is modifying it at a given time. The method that's used in this case is built with a mutex, but it's a mutex that's buried in the implementation of the sleepon library, so we can access it only via two functions: pthread_sleepon_lock() and pthread_sleepon_unlock() . Let's modify our consumer:
consumer ()
{
while (1) {
pthread_sleepon_lock ();
while (!data_ready) {
// WAIT
}
// process data
data_ready = 0;
pthread_sleepon_unlock ();
}
}
Now we've added the lock and unlock around the operation of the consumer. This means that the consumer can now reliably test the data_ready flag, with no race conditions, and also reliably set the flag.
Okay, great. Now what about the WAIT call? As we suggested earlier, it's effectively the pthread_sleepon_wait() call. Here's the second while loop:
while (!data_ready) {
pthread_sleepon_wait (&data_ready);
}
The pthread_sleepon_wait() actually does three distinct steps!
The reason it has to unlock and lock the sleepon library's mutex is simple — since the whole idea of the mutex is to ensure mutual exclusion to the data_ready variable, this means that we want to lock out the producer from touching the data_ready variable while we're testing it. But, if we don't do the unlock part of the operation, the producer would never be able to set it to tell us that data is indeed available! The re-lock operation is done purely as a convenience; this way the user of the pthread_sleepon_wait() doesn't have to worry about the state of the lock when it wakes up.
Let's switch over to the producer side and see how it uses the sleepon library. Here's the full implementation:
producer ()
{
while (1) {
// wait for interrupt from hardware here...
pthread_sleepon_lock ();
data_ready = 1;
pthread_sleepon_signal (&data_ready);
pthread_sleepon_unlock ();
}
}
As you can see, the producer locks the mutex as well so that it can have exclusive access to the data_ready variable in order to set it.
Let's examine in detail what happens. We've identified the consumer and producer states as:
| State | Meaning |
|---|---|
| CONDVAR | Waiting for the underlying condition variable associated with the sleepon |
| MUTEX | Waiting for a mutex |
| READY | Capable of using, or already using, the CPU |
| INTERRUPT | Waiting for an interrupt from the hardware |
| Action | Mutex owner | Consumer state | Producer state |
|---|---|---|---|
| Consumer locks mutex | Consumer | READY | INTERRUPT |
| Consumer examines data_ready | Consumer | READY | INTERRUPT |
| Consumer calls pthread_sleepon_wait() | Consumer | READY | INTERRUPT |
| pthread_sleepon_wait() unlocks mutex | Free | READY | INTERRUPT |
| pthread_sleepon_wait() blocks | Free | CONDVAR | INTERRUPT |
| Time passes | Free | CONDVAR | INTERRUPT |
| Hardware generates data | Free | CONDVAR | READY |
| Producer locks mutex | Producer | CONDVAR | READY |
| Producer sets data_ready | Producer | CONDVAR | READY |
| Producer calls pthread_sleepon_signal() | Producer | CONDVAR | READY |
| Consumer wakes up, pthread_sleepon_wait() tries to lock mutex | Producer | MUTEX | READY |
| Producer releases mutex | Free | MUTEX | READY |
| Consumer gets mutex | Consumer | READY | READY |
| Consumer processes data | Consumer | READY | READY |
| Producer waits for more data | Consumer | READY | INTERRUPT |
| Time passes (consumer processing) | Consumer | READY | INTERRUPT |
| Consumer finishes processing, unlocks mutex | Free | READY | INTERRUPT |
| Consumer loops back to top, locks mutex | Consumer | READY | INTERRUPT |
The last entry in the table is a repeat of the first entry—we've gone around one complete cycle.
What's the purpose of the data_ready variable? It actually serves two purposes:
We'll defer the discussion of about the difference between pthread_sleepon_signal() and pthread_sleepon_broadcast() to the discussion of condition variables next.
Condition variables (or condvars) are remarkably similar to the sleepon locks we just saw above. In fact, sleepon locks are built on top of condvars, which is why we had a state of CONDVAR in the explanation table for the sleepon example. It bears repeating that the pthread_cond_wait() function releases the mutex, waits, and then reacquires the mutex, just like the pthread_sleepon_wait() function did.
Let's skip the preliminaries and redo the example of the producer and consumer from the sleepon topic, using condvars instead. Then we'll discuss the calls.
/*
* cp1.c
*/
#include <stdio.h>
#include <pthread.h>
int data_ready = 0;
pthread_mutex_t mutex = PTHREAD_MUTEX_INITIALIZER;
pthread_cond_t condvar = PTHREAD_COND_INITIALIZER;
void *
consumer (void *notused)
{
printf ("In consumer thread...\n");
while (1) {
pthread_mutex_lock (&mutex);
while (!data_ready) {
pthread_cond_wait (&condvar, &mutex);
}
// process data
printf ("consumer: got data from producer\n");
data_ready = 0;
pthread_cond_signal (&condvar);
pthread_mutex_unlock (&mutex);
}
}
void *
producer (void *notused)
{
printf ("In producer thread...\n");
while (1) {
// get data from hardware
// we'll simulate this with a sleep (1)
sleep (1);
printf ("producer: got data from h/w\n");
pthread_mutex_lock (&mutex);
while (data_ready) {
pthread_cond_wait (&condvar, &mutex);
}
data_ready = 1;
pthread_cond_signal (&condvar);
pthread_mutex_unlock (&mutex);
}
}
main ()
{
printf ("Starting consumer/producer example...\n");
// create the producer and consumer threads
pthread_create (NULL, NULL, producer, NULL);
pthread_create (NULL, NULL, consumer, NULL);
// let the threads run for a bit
sleep (20);
}
Pretty much identical to the sleepon example we just saw, with a few variations (we also added some printf() functions and a main() so that the program would run!). Right away, the first thing that we see is a new data type: pthread_cond_t. This is the declaration of the condition variable; we've called ours condvar.
Next thing we notice is that the structure of the consumer is identical to that of the consumer in the previous sleepon example. We've replaced the pthread_sleepon_lock() and pthread_sleepon_unlock() with the standard mutex versions ( pthread_mutex_lock() and pthread_mutex_unlock() ). The pthread_sleepon_wait() was replaced with pthread_cond_wait(). The main difference is that the sleepon library has a mutex buried deep within it, whereas when we use condvars, we explicitly pass the mutex. We get a lot more flexibility this way.
Finally, we notice that we've got pthread_cond_signal() instead of pthread_sleepon_signal() (again with the mutex passed explicitly).
In the sleepon section, we promised to talk about the difference between the pthread_sleepon_signal() and pthread_sleepon_broadcast() functions. We talk about the difference between the two condvar functions pthread_cond_signal() and pthread_cond_broadcast().
The short story is this: the signal version wakes up only one thread. So, if there are multiple threads blocked in the wait function, and a thread did the signal, then only one of the threads wakes up. Which one? The highest priority one. If there are two or more at the same priority, the ordering of wakeup is indeterminate. With the broadcast version, all blocked threads wake up.
It may seem wasteful to wake up all threads. On the other hand, it may seem sloppy to wake up only one (effectively random) thread.
Therefore, we should look at where it makes sense to use one over the other. Obviously, if you have only one thread waiting, as we did in either version of the consumer program, a signal does just fine — one thread wakes up and, it is the only thread that's currently waiting.
In a multithreaded situation, we to ask: Why are these threads waiting? There are usually two possible answers:
Or:
In the first case, we can imagine that all the threads have code that might look like the following:
/*
* cv1.c
*/
#include <stdio.h>
#include <pthread.h>
pthread_mutex_t mutex_data = PTHREAD_MUTEX_INITIALIZER;
pthread_cond_t cv_data = PTHREAD_COND_INITIALIZER;
int data;
thread1 ()
{
for (;;) {
pthread_mutex_lock (&mutex_data);
while (data == 0) {
pthread_cond_wait (&cv_data, &mutex_data);
}
// do something
pthread_mutex_unlock (&mutex_data);
}
}
// thread2, thread3, and so on have the identical code.
In this case, it really doesn't matter which thread gets the data, provided that one of them gets it and does something with it.
However, if you have something like this, things are a little different:
/*
* cv2.c
*/
#include <stdio.h>
#include <pthread.h>
pthread_mutex_t mutex_xy = PTHREAD_MUTEX_INITIALIZER;
pthread_cond_t cv_xy = PTHREAD_COND_INITIALIZER;
int x, y;
int isprime (int);
thread1 ()
{
for (;;) {
pthread_mutex_lock (&mutex_xy);
while ((x > 7) && (y != 15)) {
pthread_cond_wait (&cv_xy, &mutex_xy);
}
// do something
pthread_mutex_unlock (&mutex_xy);
}
}
thread2 ()
{
for (;;) {
pthread_mutex_lock (&mutex_xy);
while (!isprime (x)) {
pthread_cond_wait (&cv_xy, &mutex_xy);
}
// do something
pthread_mutex_unlock (&mutex_xy);
}
}
thread3 ()
{
for (;;) {
pthread_mutex_lock (&mutex_xy);
while (x != y) {
pthread_cond_wait (&cv_xy, &mutex_xy);
}
// do something
pthread_mutex_unlock (&mutex_xy);
}
}
In these cases, waking up one thread isn't going to cut it! We must wake up all three threads and have each of them check to see if its predicate has been satisfied or not.
This nicely reflects the second case in our question above (Why are these threads waiting?). Since the threads are all waiting on different conditions (thread1() is waiting for x to be less than or equal to 7 or y to be 15, thread2() is waiting for x to be a prime number, and thread3() is waiting for x to be equal to y), we have no choice but to wake them all.
Sleepons have one principal advantage over condvars. Suppose that you want to synchronize many objects. With condvars, you would typically associate one condvar per object. Therefore, if you had M objects, you would most likely have M condvars. With sleepons, the underlying condvars (on top of which sleepons are implemented) are allocated dynamically as threads wait for a particular object. Therefore, using sleepons with M objects and N threads blocked, you would have (at most) N condvars (instead of M).
However, condvars are more flexible than sleepons, because:
The first point might just be viewed as being argumentative. The second point, however, is significant. When the mutex is buried in the library, this means that there can be only one per process—regardless of the number of threads in that process, or the number of different sets of data variables. This can be a very limiting factor, especially when you consider that you must use the one and only mutex to access any and all data variables that any thread in the process needs to touch!
A much better design is to use multiple mutexes, one for each data set, and explicitly combine them with condition variables as required. The true power and danger of this approach is that there is absolutely no compile time or run time checking to make sure that you:
The easiest way around these problems is to have a good design and design review, and also to borrow techniques from object-oriented programming (like having the mutex contained in a data structure, having routines to access the data structure, and so on). Of course, how much of one or both you apply depends not only on your personal style, but also on performance requirements.
The key points to remember when using condvars are:
Here's a picture:
One interesting note. Since there is no checking, you can do things like associate one set of variables with mutex ABC, and another set of variables with mutex DEF, while associating both sets of variables with condvar ABCDEF:
This is actually quite useful. Since the mutex is always to be used for access and testing, this implies that you have to choose the correct mutex whenever you want to look at a particular variable. Fair enough — if you examine variable C, you need to lock mutex MutexABC. What if you changed variable E? Well, before you change it, you have to acquire the mutex MutexDEF. Then you change it, and hit condvar CondvarABCDEF to tell others about the change. Shortly thereafter, you release the mutex.
Now, consider what happens. Suddenly, you have a bunch of threads that had been waiting on CondvarABCDEF that now wake up (from their pthread_cond_wait()). The waiting function immediately attempts to reacquire the mutex. The critical point here is that there are two mutexes to acquire. This means that on an SMP system, two concurrent streams of threads can run, each examining what it considers to be independent variables, using independent mutexes.
BlackBerry 10 OS lets you do something else that's elegant. POSIX says that a mutex must operate between threads in the same process, and lets a conforming implementation extend that. BlackBerry 10 OS extends this by allowing a mutex to operate between threads in different processes. To understand why this works, recall that there really are two parts to what's viewed as the operating system—the kernel, which deals with scheduling, and the process manager, which worries about memory protection and processes (among other things). A mutex is really just a synchronization object used between threads. Since the kernel worries only about threads, it really doesn't care that the threads are operating in different processes—this is an issue for the process manager.
So, if you've set up a shared memory area between two processes, and you've initialized a mutex in that shared memory, there's nothing stopping you from synchronizing multiple threads in those two (or more!) processes via the mutex. The same pthread_mutex_lock() and pthread_mutex_unlock() functions still work.
Another thing that BlackBerry 10 OS has added is the concept of thread pools. You'll often notice in your programs that you want to be able to run a certain number of threads, but you also want to be able to control the behavior of those threads within certain limits. For example, in a server you may decide that initially just one thread should be blocked, waiting for a message from a client. When that thread gets a message and is off servicing a request, you may decide that it would be a good idea to create another thread, so that it could be blocked waiting in case another request arrived. This second thread would then be available to handle that request, and so on. After a while, when the requests had been serviced, you would now have a large number of threads sitting around, waiting for further requests. To conserve resources, you may decide to kill off some of those extra threads.
This is a common operation, and BlackBerry 10 OS provides a library to help with this.
It's important for the discussions that follow to realize there are really two distinct operations that threads (that are used in thread pools) perform:
The blocking operation doesn't generally consume CPU. In a typical server, this is where the thread is waiting for a message to arrive. Contrast that with the processing operation, where the thread may or may not be consuming CPU (depending on how the process is structured). In the thread pool functions that we'll look at later, you'll see that we have the ability to control the number of threads in the blocking operation as well as the number of threads that are in the processing operations.
BlackBerry 10 OS provides the following functions to deal with thread pools:
#include <sys/dispatch.h>
thread_pool_t *
thread_pool_create (thread_pool_attr_t *attr,
unsigned flags);
int
thread_pool_destroy (thread_pool_t *pool);
int
thread_pool_start (void *pool);
int
thread_pool_limits (thread_pool_t *pool,
int lowater,
int hiwater,
int maximum,
int increment,
unsigned flags);
int
thread_pool_control (thread_pool_t *pool,
thread_pool_attr_t *attr,
uint16_t lower,
uint16_t upper,
unsigned flags);
As you can see from the functions provided, you first create a thread pool definition using thread_pool_create() , and then start the thread pool via thread_pool_start() . When you're done with the thread pool, you can use thread_pool_destroy() to clean up after yourself. Note that you might never call thread_pool_destroy(), as in the case where the program is a server that runs forever. The thread_pool_limits() function is used to specify thread pool behavior and adjust attributes of the thread pool, and the thread_pool_control() function is a convenience wrapper for the thread_pool_limits() function.
So, the first function to look at is thread_pool_create(). It takes two parameters, attr and flags. The attr is an attributes structure that defines the operating characteristics of the thread pool (from <sys/dispatch.h>):
typedef struct _thread_pool_attr {
// thread pool functions and handle
THREAD_POOL_HANDLE_T *handle;
THREAD_POOL_PARAM_T
*(*block_func)(THREAD_POOL_PARAM_T *ctp);
void
(*unblock_func)(THREAD_POOL_PARAM_T *ctp);
int
(*handler_func)(THREAD_POOL_PARAM_T *ctp);
THREAD_POOL_PARAM_T
*(*context_alloc)(THREAD_POOL_HANDLE_T *handle);
void
(*context_free)(THREAD_POOL_PARAM_T *ctp);
// thread pool parameters
pthread_attr_t *attr;
unsigned short lo_water;
unsigned short increment;
unsigned short hi_water;
unsigned short maximum;
} thread_pool_attr_t;
We broke the thread_pool_attr_t type into two sections, one that contains the functions and handle for the threads in the thread pool, and another that contains the operating parameters for the thread pool.
Let's first look at the thread pool parameters to see how you control the number and attributes of threads that operate in this thread pool. The following diagram illustrates the relationship of the lo_water, hi_water, and maximum parameters:
(Note that CA is the context_alloc() function, CF is the context_free() function, blocking operation is the block_func() function, and processing operation is the handler_func().)
One other key parameter to controlling the threads is the flags parameter passed to the thread_pool_create() function. It can have one of the following values:
The above descriptions may seem a little dry. Let's look at an example.
You can find the complete version of tp1.c in Sample programs. Here, we just focus on the lo_water, hi_water, increment, and the maximum members of the thread pool control structure:
/*
* part of tp1.c
*/
#include <sys/dispatch.h>
int
main ()
{
thread_pool_attr_t tp_attr;
void *tpp;
...
tp_attr.lo_water = 3;
tp_attr.increment = 2;
tp_attr.hi_water = 7;
tp_attr.maximum = 10;
...
tpp = thread_pool_create (&tp_attr, POOL_FLAG_USE_SELF);
if (tpp == NULL) {
fprintf (stderr,
"%s: can't thread_pool_create, errno %s\n",
progname, strerror (errno));
exit (EXIT_FAILURE);
}
thread_pool_start (tpp);
...
After setting the members, we call thread_pool_create() to create a thread pool. This returns a pointer to a thread pool control structure (tpp), which we check against NULL (which would indicate an error). Finally, we call thread_pool_start() with the tpp thread pool control structure.
We specify POOL_FLAG_USE_SELF which means that the thread that called thread_pool_start() is considered an available thread for the thread pool. So, at this point, there is only that one thread in the thread pool library. Since we have a lo_water value of 3, the library immediately creates increment number of threads (2 in this case). At this point, 3 threads are in the library, and all 3 of them are in the blocking operation. The lo_water condition is satisfied, because there are at least that number of threads in the blocking operation; the hi_water condition is satisfied, because there are less than that number of threads in the blocking operation; and finally, the maximum condition is satisfied as well, because we don't have more than that number of threads in the thread pool library.
Now, one of the threads in the blocking operation unblocks (for example, in a server application, a message was received). This means that now one of the three threads is no longer in the blocking operation (instead, that thread is now in the processing operation). Since the count of blocking threads is less than the lo_water, it trips the lo_water trigger and causes the library to create increment (2) threads. So now there are 5 threads total (4 in the blocking operation, and 1 in the processing operation).
More threads unblock. Let's assume that none of the threads in the processing operation none completes any of their requests yet. Here's a table illustrating this, starting at the initial state (we use Proc Op for the processing operation, and Blk Op for the blocking operation, as we did in the previous diagram, Thread flow when using thread pools.):
| Event | Proc Op | Blk Op | Total |
|---|---|---|---|
| Initial | 0 | 1 | 1 |
| lo_water trip | 0 | 3 | 3 |
| Unblock | 1 | 2 | 3 |
| lo_water trip | 1 | 4 | 5 |
| Unblock | 2 | 3 | 5 |
| Unblock | 3 | 2 | 5 |
| lo_water trip | 3 | 4 | 7 |
| Unblock | 4 | 3 | 7 |
| Unblock | 5 | 2 | 7 |
| lo_water trip | 5 | 4 | 9 |
| Unblock | 6 | 3 | 9 |
| Unblock | 7 | 2 | 9 |
| lo_water trip | 7 | 3 | 10 |
| Unblock | 8 | 2 | 10 |
| Unblock | 9 | 1 | 10 |
| Unblock | 10 | 0 | 10 |
As you can see, the library always checks the lo_water variable and creates increment threads at a time until it hits the limit of the maximum variable (as it did when the Total column reached 10 — no more threads were being created, even though the count had underflowed the lo_water).
This means that at this point, there are no more threads waiting in the blocking operation. Let's assume that the threads are now finishing their requests (from the processing operation); watch what happens with the hi_water trigger:
| Event | Proc Op | Blk Op | Total |
|---|---|---|---|
| Completion | 9 | 1 | 10 |
| Completion | 8 | 2 | 10 |
| Completion | 7 | 3 | 10 |
| Completion | 6 | 4 | 10 |
| Completion | 5 | 5 | 10 |
| Completion | 4 | 6 | 10 |
| Completion | 3 | 7 | 10 |
| Completion | 2 | 8 | 10 |
| hi_water trip | 2 | 7 | 9 |
| Completion | 1 | 8 | 9 |
| hi_water trip | 1 | 7 | 8 |
| Completion | 0 | 8 | 8 |
| hi_water trip | 0 | 7 | 7 |
Notice how nothing really happened during the completion of processing for the threads until we tripped over the hi_water trigger. The implementation is that as soon as the thread finishes, it looks at the number of receive blocked threads and decides to kill itself if there are too many (that is, more than hi_water) waiting at that point. The nice thing about the lo_water and hi_water limits in the structures is that you can effectively have an operating range where a sufficient number of threads are available, and you're not unnecessarily creating and destroying threads. In our case, after the operations performed by the above tables, we now have a system that can handle up to 4 requests simultaneously without creating more threads (7 - 4 = 3, which is the lo_water trip).
Now that we have a good feel for how the number of threads is controlled, let's turn our attention to the other members of the thread pool attribute structure:
// thread pool functions and handle
THREAD_POOL_HANDLE_T *handle;
THREAD_POOL_PARAM_T
*(*block_func)(THREAD_POOL_PARAM_T *ctp);
void
(*unblock_func)(THREAD_POOL_PARAM_T *ctp);
int
(*handler_func)(THREAD_POOL_PARAM_T *ctp);
THREAD_POOL_PARAM_T
*(*context_alloc)(THREAD_POOL_HANDLE_T *handle);
void
(*context_free)(THREAD_POOL_PARAM_T *ctp);
Recall from the diagram Thread flow when using thread pools, that the context_alloc() function gets called for every new thread being created. (Similarly, the context_free() function gets called for every thread being destroyed.)
The handle member of the structure (above) is passed to the context_alloc() function as its sole parameter. The context_alloc() function is responsible for performing any per-thread setup required and for returning a context pointer (called ctp in the parameter lists). Note that the contents of the context pointer are entirely up to you—the library doesn't care what you put into the context pointer.
Now that the context has been created by context_alloc(), the block_func() function is called to perform the blocking operation. Note that the block_func() function gets passed the results of the context_alloc() function. Once the block_func() function unblocks, it returns a context pointer, which gets passed by the library to the handler_func(). The handler_func() is responsible for performing the work — for example, in a typical server, this is where the message from the client is processed. The handler_func() must return a zero for now — non-zero values are reserved for future expansion by QSS. The unblock_func() is also reserved at this time; just leave it as NULL. The following pseudo code sample is based on the same flow as shown in Thread flow when using thread pools):
FOREVER DO
IF (#threads < lo_water) THEN
IF (#threads_total < maximum) THEN
create new thread
context = (*context_alloc) (handle);
ENDIF
ENDIF
retval = (*block_func) (context);
(*handler_func) (retval);
IF (#threads > hi_water) THEN
(*context_free) (context)
kill thread
ENDIF
DONE
Note that the above is greatly simplified; its only purpose is to show you the data flow of the ctp and handle parameters and to give some sense of the algorithms used to control the number of threads.