Receive IDs, channels, and other parameters

We didn't talk about the various parameters in the previous examples so that we could focus just on the message passing. Now let's take a look.

More about channels

In the server example, we saw that the server created just one channel. It could certainly have created more, but generally, servers don't do that. (The most obvious example of a server with two channels is the Transparent Distributed Processing (TDP, also known as Qnet) native network manager—definitely an odd piece of software!)

As it turns out, there really isn't much need to create multiple channels in the real world. The main purpose of a channel is to give the server a well-defined place to listen for messages, and to give the clients a well-defined place to send their messages (via a connection). About the only time that you have multiple channels in a server is if the server wants to provide either different services, or different classes of services, depending on which channel the message arrived on. The second channel can be used, for example, as a place to drop wake up pulses—this ensures that they're treated as a different class of service than messages arriving on the first channel.

Previously, we said that you can have a pool of threads running in a server, ready to accept messages from clients, and that it didn't really matter which thread got the request. This is another aspect of the channel abstraction. Under previous versions of the QNX family of operating systems (notably QNX 4), a client would target messages at a server identified by a node ID and process ID. Since QNX 4 is single-threaded, this means that there cannot be confusion about to whom the message is being sent. However, once you introduce threads into the picture, the design decision had to be made as to how you would address the threads (really, the service providers). Since threads are ephemeral, it really didn't make sense to have the client connect to a particular node ID, process ID, and thread ID. Also, what if that particular thread was busy? We would have to provide some method to allow a client to select a non-busy thread within a defined pool of service-providing threads.

Well, that's exactly what a channel is. It's the address of a pool of service-providing threads. The implication here is that a bunch of threads can issue a MsgReceive() function call on a particular channel, and block, with only one thread getting a message at a time.

Who sent the message?

Often a server needs to know who sent it a message. There are a number of reasons for this:

  • Accounting
  • Access control
  • Context association
  • Class of service
  • and so on

It would be cumbersome (and a security hole) to have the client provide this information with each and every message sent. Therefore, there's a structure filled in by the kernel whenever the MsgReceive() function unblocks because it got a message.

struct _msg_info
{
    int     nd;
    int     srcnd;
    pid_t   pid;
    int32_t chid;
    int32_t scoid;
    int32_t coid;
    int32_t msglen;
    int32_t tid;
    int16_t priority;
    int16_t flags;
    int32_t srcmsglen;
    int32_t dstmsglen;
};

You pass it to the MsgReceive() function as the last argument. If you pass a NULL, then nothing happens. (The information can be retrieved later via the MsgInfo() call, so it's not gone forever!)

Let's look at the fields:

nd, srcnd, pid, and tid
Node Descriptors, process ID, and thread ID of the client. (Note that nd is the receiving node's node descriptor for the transmitting node; srcnd is the transmitting node's node descriptor for the receiving node.)
priority
The priority of the sending thread.
chid, coid
Channel ID that the message was sent to, and the connection ID used.
scoid
Server Connection ID. This is an internal identifier used by the kernel to route the message from the server back to the client. You don't need to know about it, except for the interesting fact that it is a small integer that uniquely represents the client.
flags
Contains a variety of flag bits, _NTO_MI_ENDIAN_BIG, _NTO_MI_ENDIAN_DIFF, _NTO_MI_NET_CRED_DIRTY, and _NTO_MI_UNBLOCK_REQ. The _NTO_MI_ENDIAN_BIG and _NTO_MI_ENDIAN_DIFF tell you about the endian-ness of the sending machine (in case the message came over the network from a machine with a different endian-ness), _NTO_MI_NET_CRED_DIRTY is used internally; we'll look at _NTO_MI_UNBLOCK_REQ in Using the _NTO_MI_UNBLOCK_REQ .
msglen
Number of bytes received.
srcmsglen
The length of the source message, in bytes, as sent by the client. This may be greater than the value in msglen, as would be the case when receiving less data than what was sent. Note that this member is valid only if _NTO_CHF_SENDER_LEN was set in the flags argument to ChannelCreate() for the channel that the message was received on.
dstmsglen
The length of the client's reply buffer, in bytes. This field is only valid if the _NTO_CHF_REPLY_LEN flag is set in the argument to ChannelCreate() for the channel that the message was received on.

The receive ID (a.k.a. the client cookie)

In the code sample above, notice how we:

rcvid = MsgReceive (...);
...
MsgReply (rcvid, ...);

This is a key snippet of code, because it illustrates the binding between receiving a message from a client, and then being able to (sometime later) reply to that particular client. The receive ID is an integer that acts as a magic cookie that you need to hold onto if you want to interact with the client later. What if you lose it? It's gone. The client does not unblock from the MsgSend() until you (the server) die, or if the client has a timeout on the message-passing call (and even then it's tricky; see the TimerTimeout() function in the BlackBerry 10 OS C Library Reference, and the discussion about its use in Clocks, timers, and getting a kick every so often.

Note: Don't depend on the value of the receive ID to have any particular meaning—it may change in future versions of the operating system. You can assume that it is unique, in that you'll never have two outstanding clients identified by the same receive IDs (in that case, the kernel can't tell them apart either when you do the MsgReply()).

Also, note that except in one special case (the MsgDeliverEvent() function which we'll look at later), once you've done the MsgReply(), that particular receive ID ceases to have meaning.

This brings us to the MsgReply() function.

Replying to the client

MsgReply() accepts a receive ID, a status, a message pointer, and a message size. We've just finished discussing the receive ID; it identifies who the reply message should be sent to. The status variable indicates the return status that should be passed to the client's MsgSend() function. Finally, the message pointer and size indicate the location and size of the optional reply message that should be sent.

The MsgReply() function may appear to be very simple (and it is), but its applications require some examination.

Not replying to the client

There's absolutely no requirement that you reply to a client before accepting new messages from other clients via MsgReceive()! This can be used in a number of different scenarios. In a typical device driver, a client may make a request that won't be serviced for a long time. For example, the client may ask an Analog-to-Digital Converter (ADC) device driver to go out and collect 45 seconds worth of samples. In the meantime, the ADC driver shouldn't just close up shop for 45 seconds! Other clients might want to have requests serviced (for example, there might be multiple analog channels, or there might be status information that should be available immediately, and so on).

Architecturally, the ADC driver queues the receive ID that it got from the MsgReceive(), starts up the 45-second accumulation process, and goes off to handle other requests. When the 45 seconds are up and the samples have been accumulated, the ADC driver can find the receive ID associated with the request and then reply to the client.

You also want to hold off replying to a client in the case of the reply-driven server/subserver model (where some of the clients are the subservers). Since the subservers are looking for work, you would make a note of their receive IDs and store those away. When actual work arrived, then and only then would you reply to the subserver, thus indicating that it should do some work.

Replying with no data, or an errno

When you finally reply to the client, there's no requirement that you transfer any data. This is used in two scenarios. You may choose to reply with no data if the sole purpose of the reply is to unblock the client. Let's say the client just wants to be blocked until some particular event occurs, but it doesn't need to know which event. In this case, no data is required by the MsgReply() function; the receive ID is sufficient:

MsgReply (rcvid, EOK, NULL, 0);

This unblocks the client (but doesn't return any data) and returns the EOK success indication.

As a slight modification of that, you may want to return an error status to the client. In this case, you can't do that with MsgReply(), but instead must use MsgError() :

MsgError (rcvid, EROFS);

In the above example, the server detects that the client is attempting to write to a read-only filesystem, and, instead of returning any actual data, returns an errno of EROFS back to the client.

Alternatively, you may have already transferred the data (via MsgWrite() ), and there's no additional data to transfer.

Why the two calls? They're subtly different. While both MsgError() and MsgReply() unblock the client, MsgError() does not transfer any additional data, causes the client's MsgSend() function to return -1, and causes the client to have errno set to whatever was passed as the second argument to MsgError().

On the other hand, MsgReply() could transfer data (as indicated by the third and fourth arguments), and cause the client's MsgSend() function to return whatever was passed as the second argument to MsgReply(). MsgReply() has no effect on the client's errno.

Generally, if you're returning only a pass/fail indication (and no data), you use MsgError(), whereas if you're returning data, you use MsgReply(). Traditionally, when you do return data, the second argument to MsgReply() is a positive integer indicating the number of bytes being returned.

Finding the server's ND/PID/CHID

In the ConnectAttach() function, we required a Node Descriptor (ND), a process ID (PID), and a channel ID (CHID) to be able to attach to a server. So far we haven't talked about how the client finds this ND/PID/CHID information.

If one process creates the other, then it's easy—the process creation call returns with the process ID of the newly created process. Either the creating process can pass its own PID and CHID on the command line to the newly created process or the newly created process can issue the getppid() function call to get the PID of its parent and assume a well-known CHID.

What if we have two perfect strangers? This would be the case if, for example, a third party created a server and an application that you wrote wanted to talk to that server. The real issue is, how does a server advertise its location?

There are many ways of doing this. Here are four of them, in increasing order of programming elegance:

  1. Open a well-known filename and store the ND/PID/CHID there. This is the traditional approach taken by UNIX-style servers, where they open a file (for example, /etc/httpd.pid), write their process ID there as an ASCII string, and expect clients to open the file and fetch the process ID.
  2. Use global variables to advertise the ND/PID/CHID information. This is typically used in multi-threaded servers that need to send themselves messages, and is, by its nature, a very limited case.
  3. Use the name-location functions ( name_attach() and name_detach() , and then the name_open() and name_close() functions on the client side).
  4. Take over a portion of the pathname space and become a resource manager.

The first approach is very simple, but can suffer from pathname pollution, where the /etc directory has all kinds of *.pid files in it. Since files are persistent (meaning they survive after the creating process dies and the machine reboots), there's no obvious method of cleaning up these files, except perhaps to have a grim reaper task that runs around seeing if these things are still valid.

There's another related problem. Since the process that created the file can die without removing the file, there's no way of knowing whether or not the process is still alive until you try to send a message to it. Worse yet, the ND/PID/CHID specified in the file may be so stale that it would have been reused by another program! The message that you send to that program is at best rejected, and at worst may cause damage. So that approach is out.

The second approach, where we use global variables to advertise the ND/PID/CHID values, is not a general solution, as it relies on the client's being able to access the global variables. And since this requires shared memory, it certainly won't work across a network! This generally gets used in either tiny test case programs or in very special cases, but always in the context of a multithreaded program. Effectively, all that happens is that one thread in the program is the client, and another thread is the server. The server thread creates the channel and then places the channel ID into a global variable (the node ID and process ID are the same for all threads in the process, so they don't need to be advertised.) The client thread then picks up the global channel ID and performs the ConnectAttach() to it.

The third approach, where we use the name_attach() and name_detach() functions, works well for simple client/server situations.

The last approach, where the server becomes a resource manager, is definitely the cleanest and is the recommended general-purpose solution.

Note: POSIX file descriptors are implemented using connection IDs; that is, a file descriptor is a connection ID! The beauty of this scheme is that since the file descriptor that's returned from the open() is the connection ID, no further work is required on the client's end to be able to use that particular connection. For example, when the client calls read() later, passing it the file descriptor, this translates with very little overhead into a MsgSend() function.

What about priorities?

What if a low-priority process and a high-priority process send a message to a server at the same time?

Note: Messages are always delivered in priority order.

If two processes send a message simultaneously, the entire message from the higher-priority process is delivered to the server first.

If both processes are at the same priority, then the messages are delivered in time order (since there's no such thing as absolutely simultaneous on a single-processor machine — even on an SMP box there is some ordering as the CPUs arbitrate kernel access among themselves).

We'll come back to some of the other subtleties introduced by this question when we look at priority inversions later.

Reading and writing data

So far you've seen the basic message-passing primitives. These are all that you need. However, there are a few extra functions that make life much easier. Let's consider an example using a client and server where we might need other functions.

The client issues a MsgSend() to transfer some data to the server. After the client issues the MsgSend() it blocks; it's now waiting for the server to reply.

An interesting thing happens on the server side. The server has called MsgReceive() to receive the message from the client. Depending on the design that you choose for your messages, the server may or may not know how big the client's message is. Why on earth would the server not know how big the message is? Consider the filesystem example that we've been using. Suppose the client does:

write (fd, buf, 16);

This works as expected if the server does a MsgReceive() and specifies a buffer size of, say, 1024 bytes. Since our client sent only a tiny message (28 bytes), we have no problems.

However, what if the client sends something bigger than 1024 bytes, say 1 megabyte?

write (fd, buf, 1000000);

How is the server going to gracefully handle this? We can, arbitrarily, say that the client isn't allowed to write more than n bytes. Then, in the client-side C library code for write() , we can look at this requirement and split up the write request into several requests of n bytes each. This is awkward.

The other problem with this example would be, how big should n be?

You can see that this approach has major disadvantages:

  • All functions that use message transfer with a limited size have to be modified in the C library so that the function packetizes the requests. This in itself can be a fair amount of work. Also, it can have unexpected side effects for multi-threaded functions—what if the first part of the message from one thread gets sent, and then another thread in the client preempts the current thread and sends its own message. Where does that leave the original thread?
  • All servers must now be prepared to handle the largest possible message size that may arrive. This means that all servers have to have a data area that's big, or the library has to break up big requests into many smaller ones, thereby impacting speed.

Luckily, this problem has a fairly simple workaround that also gives us some advantages.

Two functions, MsgRead() and MsgWrite() , are especially useful here. The important fact to keep in mind is that the client is blocked. This means that the client isn't going to go and change data structures while the server is trying to examine them.

Note: In a multi-threaded client, the potential exists for another thread to mess around with the data area of a client thread that's blocked on a server. This is considered a bug (bad design) — the server thread assumes that it has exclusive access to a client's data area until the server thread unblocks the client.

The MsgRead() function looks like this:

#include <sys/neutrino.h>

int MsgRead (int rcvid,
             void *msg,
             int nbytes,
             int offset);

MsgRead() lets your server read data from the blocked client's address space, starting offset bytes from the beginning of the client-specified send buffer, into the buffer specified by msg for nbytes. The server doesn't block, and the client doesn't unblock. MsgRead() returns the number of bytes it actually read, or -1 if there was an error.

So let's think about how we'd use this in our write() example. The C Library write() function constructs a message with a header that it sends to the filesystem server, fs-qnx4. The server receives a small portion of the message via MsgReceive(), looks at it, and decides where it's going to put the rest of the message. The fs-qnx4 server may decide that the best place to put the data is into some cache buffers it's already allocated.

Let's track an example:

Diagram showing the fs-qnx4 message example.

So, the client has decided to send 4 KB to the filesystem. (Notice how the C Library stuck a tiny header in front of the data so that the filesystem could tell just what kind of request it actually was — we'll come back to this when we look at multi-part messages, and in even more detail when we look at resource managers.) The filesystem reads just enough data (the header) to figure out what kind of a message it is:

// part of the headers, fictionalized for example purposes
struct _io_write {
    uint16_t    type;
    uint16_t    combine_len;
    int32_t     nbytes;
    uint32_t    xtype;
};

typedef union {
    uint16_t           type;
    struct _io_read    io_read;
    struct _io_write   io_write;
    ...
} header_t;

header_t    header;    // declare the header

rcvid = MsgReceive (chid, &header, sizeof (header), NULL);

switch (header.type) {
...
case _IO_WRITE:
    number_of_bytes = header.io_write.nbytes;
    ...

At this point, fs-qnx4 knows that 4 KB are sitting in the client's address space (because the message told it in the nbytes member of the structure) and that it should be transferred to a cache buffer. The fs-qnx4 server can issue:

MsgRead (rcvid, cache_buffer [index].data,
         cache_buffer [index].size, sizeof (header.io_write));

Notice that the message transfer has specified an offset of sizeof (header.io_write) to skip the write header that was added by the client's C library. We're assuming here that cache_buffer [index].size is actually 4096 (or more) bytes.

Similarly, for writing data to the client's address space, we have:

#include <sys/neutrino.h>

int MsgWrite (int rcvid,
              const void *msg,
              int nbytes,
              int offset);

MsgWrite() lets your server write data to the client's address space, starting offset bytes from the beginning of the client-specified receive buffer. This function is most useful in cases where the server has limited space but the client wants to get a lot of information from the server.

For example, with a data acquisition driver, the client may specify a 4-megabyte data area and tell the driver to grab 4 megabytes of data. The driver really shouldn't need to have a big area like this lying around just in case someone asks for a huge data transfer.

The driver might have a 128 KB area for DMA data transfers, and then message-pass it piecemeal into the client's address space using MsgWrite() (incrementing the offset by 128 KB each time, of course). Then, when the last piece of data has been written, the driver MsgReply()'s to the client.

Diagram showing MsgWrite transferring several chunks.

Note that MsgWrite() lets you write the data components at various places, and then either just wake up the client using MsgReply():

MsgReply (rcvid, EOK, NULL, 0);

or wake up the client after writing a header at the start of the client's buffer:

MsgReply (rcvid, EOK, &header, sizeof (header));

This is a fairly elegant trick for writing unknown quantities of data, where you know how much data you wrote only when you're done writing it. If you're using this method of writing the header after the data's been transferred, you must remember to leave room for the header at the beginning of the client's data area!