We didn't talk about the various parameters in the previous examples so that we could focus just on the message passing. Now let's take a look.
In the server example, we saw that the server created just one channel. It could certainly have created more, but generally, servers don't do that. (The most obvious example of a server with two channels is the Transparent Distributed Processing (TDP, also known as Qnet) native network manager—definitely an odd piece of software!)
As it turns out, there really isn't much need to create multiple channels in the real world. The main purpose of a channel is to give the server a well-defined place to listen for messages, and to give the clients a well-defined place to send their messages (via a connection). About the only time that you have multiple channels in a server is if the server wants to provide either different services, or different classes of services, depending on which channel the message arrived on. The second channel can be used, for example, as a place to drop wake up pulses—this ensures that they're treated as a different class of service than messages arriving on the first channel.
Previously, we said that you can have a pool of threads running in a server, ready to accept messages from clients, and that it didn't really matter which thread got the request. This is another aspect of the channel abstraction. Under previous versions of the QNX family of operating systems (notably QNX 4), a client would target messages at a server identified by a node ID and process ID. Since QNX 4 is single-threaded, this means that there cannot be confusion about to whom the message is being sent. However, once you introduce threads into the picture, the design decision had to be made as to how you would address the threads (really, the service providers). Since threads are ephemeral, it really didn't make sense to have the client connect to a particular node ID, process ID, and thread ID. Also, what if that particular thread was busy? We would have to provide some method to allow a client to select a non-busy thread within a defined pool of service-providing threads.
Well, that's exactly what a channel is. It's the address of a pool of service-providing threads. The implication here is that a bunch of threads can issue a MsgReceive() function call on a particular channel, and block, with only one thread getting a message at a time.
Often a server needs to know who sent it a message. There are a number of reasons for this:
It would be cumbersome (and a security hole) to have the client provide this information with each and every message sent. Therefore, there's a structure filled in by the kernel whenever the MsgReceive() function unblocks because it got a message.
struct _msg_info
{
int nd;
int srcnd;
pid_t pid;
int32_t chid;
int32_t scoid;
int32_t coid;
int32_t msglen;
int32_t tid;
int16_t priority;
int16_t flags;
int32_t srcmsglen;
int32_t dstmsglen;
};
You pass it to the MsgReceive() function as the last argument. If you pass a NULL, then nothing happens. (The information can be retrieved later via the MsgInfo() call, so it's not gone forever!)
Let's look at the fields:
In the code sample above, notice how we:
rcvid = MsgReceive (...); ... MsgReply (rcvid, ...);
This is a key snippet of code, because it illustrates the binding between receiving a message from a client, and then being able to (sometime later) reply to that particular client. The receive ID is an integer that acts as a magic cookie that you need to hold onto if you want to interact with the client later. What if you lose it? It's gone. The client does not unblock from the MsgSend() until you (the server) die, or if the client has a timeout on the message-passing call (and even then it's tricky; see the TimerTimeout() function in the BlackBerry 10 OS C Library Reference, and the discussion about its use in Clocks, timers, and getting a kick every so often.
Also, note that except in one special case (the MsgDeliverEvent() function which we'll look at later), once you've done the MsgReply(), that particular receive ID ceases to have meaning.
This brings us to the MsgReply() function.
MsgReply() accepts a receive ID, a status, a message pointer, and a message size. We've just finished discussing the receive ID; it identifies who the reply message should be sent to. The status variable indicates the return status that should be passed to the client's MsgSend() function. Finally, the message pointer and size indicate the location and size of the optional reply message that should be sent.
The MsgReply() function may appear to be very simple (and it is), but its applications require some examination.
There's absolutely no requirement that you reply to a client before accepting new messages from other clients via MsgReceive()! This can be used in a number of different scenarios. In a typical device driver, a client may make a request that won't be serviced for a long time. For example, the client may ask an Analog-to-Digital Converter (ADC) device driver to go out and collect 45 seconds worth of samples. In the meantime, the ADC driver shouldn't just close up shop for 45 seconds! Other clients might want to have requests serviced (for example, there might be multiple analog channels, or there might be status information that should be available immediately, and so on).
Architecturally, the ADC driver queues the receive ID that it got from the MsgReceive(), starts up the 45-second accumulation process, and goes off to handle other requests. When the 45 seconds are up and the samples have been accumulated, the ADC driver can find the receive ID associated with the request and then reply to the client.
You also want to hold off replying to a client in the case of the reply-driven server/subserver model (where some of the clients are the subservers). Since the subservers are looking for work, you would make a note of their receive IDs and store those away. When actual work arrived, then and only then would you reply to the subserver, thus indicating that it should do some work.
When you finally reply to the client, there's no requirement that you transfer any data. This is used in two scenarios. You may choose to reply with no data if the sole purpose of the reply is to unblock the client. Let's say the client just wants to be blocked until some particular event occurs, but it doesn't need to know which event. In this case, no data is required by the MsgReply() function; the receive ID is sufficient:
MsgReply (rcvid, EOK, NULL, 0);
This unblocks the client (but doesn't return any data) and returns the EOK success indication.
As a slight modification of that, you may want to return an error status to the client. In this case, you can't do that with MsgReply(), but instead must use MsgError() :
MsgError (rcvid, EROFS);
In the above example, the server detects that the client is attempting to write to a read-only filesystem, and, instead of returning any actual data, returns an errno of EROFS back to the client.
Alternatively, you may have already transferred the data (via MsgWrite() ), and there's no additional data to transfer.
Why the two calls? They're subtly different. While both MsgError() and MsgReply() unblock the client, MsgError() does not transfer any additional data, causes the client's MsgSend() function to return -1, and causes the client to have errno set to whatever was passed as the second argument to MsgError().
On the other hand, MsgReply() could transfer data (as indicated by the third and fourth arguments), and cause the client's MsgSend() function to return whatever was passed as the second argument to MsgReply(). MsgReply() has no effect on the client's errno.
Generally, if you're returning only a pass/fail indication (and no data), you use MsgError(), whereas if you're returning data, you use MsgReply(). Traditionally, when you do return data, the second argument to MsgReply() is a positive integer indicating the number of bytes being returned.
In the ConnectAttach() function, we required a Node Descriptor (ND), a process ID (PID), and a channel ID (CHID) to be able to attach to a server. So far we haven't talked about how the client finds this ND/PID/CHID information.
If one process creates the other, then it's easy—the process creation call returns with the process ID of the newly created process. Either the creating process can pass its own PID and CHID on the command line to the newly created process or the newly created process can issue the getppid() function call to get the PID of its parent and assume a well-known CHID.
What if we have two perfect strangers? This would be the case if, for example, a third party created a server and an application that you wrote wanted to talk to that server. The real issue is, how does a server advertise its location?
There are many ways of doing this. Here are four of them, in increasing order of programming elegance:
The first approach is very simple, but can suffer from pathname pollution, where the /etc directory has all kinds of *.pid files in it. Since files are persistent (meaning they survive after the creating process dies and the machine reboots), there's no obvious method of cleaning up these files, except perhaps to have a grim reaper task that runs around seeing if these things are still valid.
There's another related problem. Since the process that created the file can die without removing the file, there's no way of knowing whether or not the process is still alive until you try to send a message to it. Worse yet, the ND/PID/CHID specified in the file may be so stale that it would have been reused by another program! The message that you send to that program is at best rejected, and at worst may cause damage. So that approach is out.
The second approach, where we use global variables to advertise the ND/PID/CHID values, is not a general solution, as it relies on the client's being able to access the global variables. And since this requires shared memory, it certainly won't work across a network! This generally gets used in either tiny test case programs or in very special cases, but always in the context of a multithreaded program. Effectively, all that happens is that one thread in the program is the client, and another thread is the server. The server thread creates the channel and then places the channel ID into a global variable (the node ID and process ID are the same for all threads in the process, so they don't need to be advertised.) The client thread then picks up the global channel ID and performs the ConnectAttach() to it.
The third approach, where we use the name_attach() and name_detach() functions, works well for simple client/server situations.
The last approach, where the server becomes a resource manager, is definitely the cleanest and is the recommended general-purpose solution.
What if a low-priority process and a high-priority process send a message to a server at the same time?
If two processes send a message simultaneously, the entire message from the higher-priority process is delivered to the server first.
If both processes are at the same priority, then the messages are delivered in time order (since there's no such thing as absolutely simultaneous on a single-processor machine — even on an SMP box there is some ordering as the CPUs arbitrate kernel access among themselves).
We'll come back to some of the other subtleties introduced by this question when we look at priority inversions later.
So far you've seen the basic message-passing primitives. These are all that you need. However, there are a few extra functions that make life much easier. Let's consider an example using a client and server where we might need other functions.
The client issues a MsgSend() to transfer some data to the server. After the client issues the MsgSend() it blocks; it's now waiting for the server to reply.
An interesting thing happens on the server side. The server has called MsgReceive() to receive the message from the client. Depending on the design that you choose for your messages, the server may or may not know how big the client's message is. Why on earth would the server not know how big the message is? Consider the filesystem example that we've been using. Suppose the client does:
write (fd, buf, 16);
This works as expected if the server does a MsgReceive() and specifies a buffer size of, say, 1024 bytes. Since our client sent only a tiny message (28 bytes), we have no problems.
However, what if the client sends something bigger than 1024 bytes, say 1 megabyte?
write (fd, buf, 1000000);
How is the server going to gracefully handle this? We can, arbitrarily, say that the client isn't allowed to write more than n bytes. Then, in the client-side C library code for write() , we can look at this requirement and split up the write request into several requests of n bytes each. This is awkward.
The other problem with this example would be, how big should n be?
You can see that this approach has major disadvantages:
Luckily, this problem has a fairly simple workaround that also gives us some advantages.
Two functions, MsgRead() and MsgWrite() , are especially useful here. The important fact to keep in mind is that the client is blocked. This means that the client isn't going to go and change data structures while the server is trying to examine them.
The MsgRead() function looks like this:
#include <sys/neutrino.h>
int MsgRead (int rcvid,
void *msg,
int nbytes,
int offset);
MsgRead() lets your server read data from the blocked client's address space, starting offset bytes from the beginning of the client-specified send buffer, into the buffer specified by msg for nbytes. The server doesn't block, and the client doesn't unblock. MsgRead() returns the number of bytes it actually read, or -1 if there was an error.
So let's think about how we'd use this in our write() example. The C Library write() function constructs a message with a header that it sends to the filesystem server, fs-qnx4. The server receives a small portion of the message via MsgReceive(), looks at it, and decides where it's going to put the rest of the message. The fs-qnx4 server may decide that the best place to put the data is into some cache buffers it's already allocated.
Let's track an example:
So, the client has decided to send 4 KB to the filesystem. (Notice how the C Library stuck a tiny header in front of the data so that the filesystem could tell just what kind of request it actually was — we'll come back to this when we look at multi-part messages, and in even more detail when we look at resource managers.) The filesystem reads just enough data (the header) to figure out what kind of a message it is:
// part of the headers, fictionalized for example purposes
struct _io_write {
uint16_t type;
uint16_t combine_len;
int32_t nbytes;
uint32_t xtype;
};
typedef union {
uint16_t type;
struct _io_read io_read;
struct _io_write io_write;
...
} header_t;
header_t header; // declare the header
rcvid = MsgReceive (chid, &header, sizeof (header), NULL);
switch (header.type) {
...
case _IO_WRITE:
number_of_bytes = header.io_write.nbytes;
...
At this point, fs-qnx4 knows that 4 KB are sitting in the client's address space (because the message told it in the nbytes member of the structure) and that it should be transferred to a cache buffer. The fs-qnx4 server can issue:
MsgRead (rcvid, cache_buffer [index].data,
cache_buffer [index].size, sizeof (header.io_write));
Notice that the message transfer has specified an offset of sizeof (header.io_write) to skip the write header that was added by the client's C library. We're assuming here that cache_buffer [index].size is actually 4096 (or more) bytes.
Similarly, for writing data to the client's address space, we have:
#include <sys/neutrino.h>
int MsgWrite (int rcvid,
const void *msg,
int nbytes,
int offset);
MsgWrite() lets your server write data to the client's address space, starting offset bytes from the beginning of the client-specified receive buffer. This function is most useful in cases where the server has limited space but the client wants to get a lot of information from the server.
For example, with a data acquisition driver, the client may specify a 4-megabyte data area and tell the driver to grab 4 megabytes of data. The driver really shouldn't need to have a big area like this lying around just in case someone asks for a huge data transfer.
The driver might have a 128 KB area for DMA data transfers, and then message-pass it piecemeal into the client's address space using MsgWrite() (incrementing the offset by 128 KB each time, of course). Then, when the last piece of data has been written, the driver MsgReply()'s to the client.
Note that MsgWrite() lets you write the data components at various places, and then either just wake up the client using MsgReply():
MsgReply (rcvid, EOK, NULL, 0);
or wake up the client after writing a header at the start of the client's buffer:
MsgReply (rcvid, EOK, &header, sizeof (header));
This is a fairly elegant trick for writing unknown quantities of data, where you know how much data you wrote only when you're done writing it. If you're using this method of writing the header after the data's been transferred, you must remember to leave room for the header at the beginning of the client's data area!