Although the client/server model is easy to understand, and the most commonly used, there are two other variations on the theme. The first is the use of multiple threads and the second is a model called server/subserver that's sometimes useful for general design, but really shines in network-distributed designs. The combination of the two can be extremely powerful, especially on a network of SMP boxes!
As we discussed in Processes and threads, BlackBerry 10 OS has the ability to run multiple threads of execution in the same process. How can we use this to our advantage when we combine this with message passing?
The answer is fairly simple. We can start a pool of threads (using the thread_pool_*() functions that we talked about in Processes and threads), each of which can handle a message from a client:
This way, when a client sends us a message, we really don't care which thread gets it, as long as the work gets done. This has a number of advantages. The ability to service multiple clients with multiple threads, versus servicing multiple clients with just one thread, is a powerful concept. The main advantage is that the kernel can multitask the server among the various clients, without the server itself having to perform the multitasking.
On a single-processor machine, having a bunch of threads running means that they're all competing with each other for CPU time.
But, on an SMP box, we can have multiple threads competing for multiple CPUs, while sharing the same data area across those multiple CPUs. This means that we're limited only by the number of available CPUs on that particular machine.
Let's look at the server/subserver model, and then we'll combine it with the multiple threads model. In this model, a server still provides a service to clients, but because these requests may take a long time to complete, we need to be able to start a request and still be able to handle new requests as they arrive from other clients.
If we tried to do this with the traditional single-threaded client/server model, once one request was received and started, we wouldn't be able to receive any more requests unless we periodically stopped what we were doing, took a quick peek to see if there were any other requests pending, put those on a work queue, and then continued on, distributing our attention over the various jobs in the work queue. Not very efficient. You're practically duplicating the work of the kernel by time slicing between multiple jobs!
Imagine what this would look like if you were doing it. You're at your desk, and someone walks up to you with a folder full of work. You start working on it. As you're busy working, you notice that someone else is standing in the doorway of your cubicle with more work of equally high priority (of course)! Now you've got two piles of work on your desk. You're spending a few minutes on one pile, switching over to the other pile, and so on, all the while looking at your doorway to see if someone else is coming around with even more work.
The server/subserver model would make a lot more sense here. In this model, we have a server that creates several other processes (the subservers). These subservers each send a message to the server, but the server doesn't reply to them until it gets a request from a client. Then it passes the client's request to one of the subservers by replying to it with the job that it should perform. The following diagram illustrates this. Note the direction of the arrows — they indicate the direction of the sends!
If you were doing a job like this, you start by hiring some extra employees. These employees come to you (just as the subservers send a message to the server — hence the note about the arrows in the diagram above), looking for work to do. Initially, you might not have any, so you don't reply to their query. When someone comes into your office with a folder full of work, you say to one of your employees, here's some work for you to do. That employee then goes off and does the work. As other jobs come in, you delegate them to the other employees.
The trick to this model is that it's reply-driven — the work starts when you reply to your subservers. The standard client/server model is send-driven because the work starts when you send the server a message.
So why would the clients march into your office, and not the offices of the employees that you hired? Why are you arbitrating the work? The answer is fairly simple: you're the coordinator responsible for performing a particular task. It's up to you to ensure that the work is done. The clients who come to you with their work know you, but they don't know the names or locations of your (perhaps temporary) employees.
As you probably suspected, you can certainly mix multithreaded servers with the server/subserver model. The main trick is going to be determining which parts of the problem are best suited to being distributed over a network (generally those parts that won't use up the network bandwidth too much) and which parts are best suited to being distributed over the SMP architecture (generally those parts that want to use common data areas).
So why would we use one over the other? Using the server/subserver approach, we can distribute the work over multiple machines on a network. This effectively means that we're limited only by the number of available machines on the network (and network bandwidth, of course). Combining this with multiple threads on a bunch of SMP boxes distributed over a network yields clusters of computing, where the central arbitrator delegates work (via the server/subserver model) to the SMP boxes on the network.
File systems, serial ports, consoles, and sound cards all use the client/server model. A C language application program takes on the role of the client and sends requests to these servers. The servers perform whatever work was specified, and reply with the answer.
Some of these traditional client/server servers may actually be reply-driven (server/subserver) servers! This is because, to the ultimate client, they appear as a standard server, even though the server itself uses server/subserver methods to get the work done. What that means is that the client still sends a message to what it thinks is the service providing process. What actually happens is that the service providing process delegates the client's work to a different process (the subserver).
One of the more popular reply-driven programs is a fractal graphics program distributed over the network. The master program divides the screen into several areas, for example, 64 regions. At startup, the master program is given a list of nodes that can participate in this activity. The master program starts up worker (subserver) programs, one on each of the nodes, and then waits for the worker programs to send to the master.
The master then repeatedly picks unfilled regions (of the 64 on screen) and delegates the fractal computation work to the worker program on another node by replying to it. When the worker program has completed the calculations, it sends the results back to the master, which displays the result on the screen.
Because the worker program sent to the master, it's now up to the master to again reply with more work. The master continues doing this until all 64 areas on the screen have been filled.
Because the master program is delegating work to worker programs, the master program can't afford to become blocked on any one program! In a traditional send-driven approach, you expect the master to create a program and then send to it. Unfortunately, the master program isn't replied to until the worker program is done, meaning that the master program can't send simultaneously to another worker program, effectively negating the advantages of having multiple worker nodes.
The solution to this problem is to have the worker programs start up, and ask the master program if there's any work to do by sending it a message. Once again, we've used the direction of the arrows in the diagram to indicate the direction of the send. Now the worker programs are waiting for the master to reply. When something tells the master program to do some work, it replies to one or more of the workers, which causes them to go off and do the work. This lets the workers go about their business; the master program can still respond to new requests (it's not blocked waiting for a reply from one of the workers).
Multi-threaded servers are indistinguishable from single-threaded servers from the client's point of view. In fact, the designer of a server can just turn on multi-threading by starting another thread. In any event, the server can still make use of multiple CPUs in an SMP configuration, even if it is servicing only one client. What does that mean? Let's revisit the fractal graphics example. When a subserver gets a request from the server to compute, there's absolutely nothing stopping the subserver from starting up multiple threads on multiple CPUs to service the one request. In fact, to make the application scale better across networks that have some SMP boxes and some single-CPU boxes, the server and subserver can initially exchange a message whereby the subserver tells the server how many CPUs it has — this lets it know how many requests it can service simultaneously. The server would then queue up more requests for SMP boxes, allowing the SMP boxes to do more work than single-CPU boxes.