Starting a process

Let's look at the function calls available to deal with threads and processes. Any thread can start a process; the only restrictions imposed are those that stem from basic security (file access, privilege restrictions, and so on). In all probability, you've already started other processes; either from the system startup script, the shell, or by having a program start another program on your behalf.

Starting a process from the command line

For example, from the shell you can type:

$ program1

This instructs the shell to start a program called program1 and to wait for it to finish. Or, you can type:

$ program2 &

This instructs the shell to start program2 without waiting for it to finish. We say that program2 is running in the background.

Starting a process from within a program

You don't usually care that the shell creates processes—this is a basic assumption about the shell. In some application designs, you can rely on shell scripts (batches of commands in a file) to do the work for you, but in other cases you want to create the processes yourself. For example, in a large multi-process system, you may want to have one master program start all the other processes for your application based on some kind of configuration file. Another example would include starting up processes when certain operating conditions (events) have been detected.

Let's take a look at the functions that BlackBerry 10 OS provides for starting up other processes (or transforming into a different program):

Which function you use depends on two requirements: portability and functionality. As usual, there's a trade-off between the two.

The common thing that happens in all the calls that create a new process is the following. A thread in the original process calls one of the above functions. Eventually, the function gets the process manager to create an address space for a new process. Then, the kernel starts a thread in the new process. This thread executes a few instructions, and calls main(). (In the case of fork() and vfork(), of course, the new thread begins execution in the new process by returning from the fork() or vfork().)

Starting a process with the system() call

The system() function is the simplest; it takes a command line, the same as you would type it at a shell prompt, and executes it. system() actually starts up a shell to handle the command that you want to perform.

You may need to shell out, check out some samples, and then come back into your editor, all without losing your place. You can issue the command :!pwd for example, to display the current working directory. The editor runs this code for the :!pwd command:

system ("pwd");

Is system() suited for everything under the sun? Of course not, but it's useful for a lot of your process-creation requirements.

Starting a process with the exec() and spawn() calls

Let's look at some of the other process-creation functions. The next process-creation functions we should look at are the exec() and spawn() families. Before we go into the details, let's see what the differences are between these two groups of functions.

The exec() family transforms the current process into another one. What we mean by that is that when a process issues an exec() function call, that process ceases to run the current program and begins to run another program. The process ID doesn't change — that process changed into another program. What happened to all the threads in the process? We'll come back to that when we look at fork().

The spawn() family, on the other hand, doesn't do that. Calling a member of the spawn() family creates another process (with a new process ID) that corresponds to the program specified in the function's arguments.

Let's look at the different variants of the spawn() and exec() functions. In the table that follows, you see which ones are POSIX and which aren't. Of course, for maximum portability, use only the POSIX functions.

While these variants might appear to be overwhelming, there is a pattern to their suffixes:

A suffix of: Means:
l (lowercase L) The argument list is specified via a list of parameters given in the call itself, terminated by a NULL argument.
e An environment is specified.
p The PATH environment variable is used in case the full pathname to the program isn't specified.
v The argument list is specified via a pointer to an argument vector.

The argument list is a list of command-line arguments passed to the program.

Also, note that in the C library, spawnlp(), spawnvp(), and spawnlpe() all call spawnvpe(), which in turn calls spawnp(). The functions spawnle(), spawnv(), and spawnl() all eventually call spawnve(), which then calls spawn(). Finally, spawnp() calls spawn(). So, the root of all spawning functionality is the spawn() call.

Let's now take a look at the various spawn() and exec() variants in detail so that you can get a feel for the various suffixes used. Then, we'll see the spawn() call itself.

l suffix

For example, to invoke the ls command with the arguments -t, -r, and -l (meaning sort the output by time, in reverse order, and show the long version of the output), you can specify it as either:

/* To run ls and keep going: */
spawnl (P_WAIT, "/bin/ls", "/bin/ls", "-t", "-r", "-l", NULL);

/* To transform into ls: */
execl ("/bin/ls", "/bin/ls", "-t", "-r", "-l", NULL);

or, by using the v suffix variant:

char *argv [] =
{
    "/bin/ls",
    "-t",
    "-r",
    "-l",
    NULL
};

/* To run ls and keep going: */
spawnv (P_WAIT, "/bin/ls", argv);

/* To transform into ls: */
execv ("/bin/ls", argv);

Why the choice? It's provided as a convenience. You may have a parser already built into your program, and it would be convenient to pass around arrays of strings. In that case, use the v suffix variants. Or, you may be coding up a call to a program where you know what the parameters are. In that case, why bother setting up an array of strings when you know exactly what the arguments are? Just pass them to the l suffix variant.

Note that we passed the actual pathname of the program (/bin/ls) and the name of the program again as the first argument. We passed the name again to support programs that behave differently based on how they're invoked.

e suffix

The e suffix versions pass an environment to the program. An environment is just that—a kind of context for the program to operate in. For example, you may have a spelling checker that has a dictionary of words. Instead of specifying the dictionary's location every time on the command line, you can provide it in the environment:

$ export DICTIONARY=/home/rk/.dict

$ spellcheck document.1

The export command tells the shell to create a new environment variable (in this case, DICTIONARY), and assign it a value (/home/rk/.dict).

To use a different dictionary, you have to alter the environment before running the program. This is easy from the shell:

$ export DICTIONARY=/home/rk/.altdict

$ spellcheck document.1

But how can you do this from your own programs? To use the e versions of spawn() and exec(), you specify an array of strings representing the environment:

char *env [] =
{
    "DICTIONARY=/home/rk/.altdict",
    NULL
};

// To start the spell-checker:
spawnle (P_WAIT, "/usr/bin/spellcheck", "/usr/bin/spellcheck",
         "document.1", NULL, env);

// To transform into the spell-checker:
execle ("/usr/bin/spellcheck", "/usr/bin/spellcheck",
        "document.1", NULL, env);

p suffix

The p suffix versions searches the directories in your PATH environment variable to find the executable. You probably noticed that all the examples have a hard-coded location for the executable: /bin/ls and /usr/bin/spellcheck. What about other executables? Unless you want to first find out the exact path for that particular program, it is best to have the user tell your program all the places to search for executables. The standard PATH environment variable does just that. Here's the one from a minimal system:

PATH=/proc/boot:/bin

This tells the shell that when you type a command, it should first look in the directory /proc/boot, and if it can't find the command there, it should look in the binaries directory /bin part. PATH is a colon-separated list of places to look for commands. You can add as many elements to the PATH as you want, but keep in mind that all pathname components are searched (in order) for the executable.

If you don't know the path to the executable, then you can use the p variants. For example:

// Using an explicit path:
execl ("/bin/ls", "/bin/ls", "-l", "-t", "-r", NULL);

// Search your PATH for the executable:
execlp ("ls", "ls", "-l", "-t", "-r", NULL);

If execl() can't find ls in /bin, it returns an error. The execlp() function searches all the directories specified in the PATH for ls, and returns an error only if it can't find ls in any of those directories. This is also great for multiplatform support—your program doesn't have to be coded to know about the different CPU names, it just finds the executable.

What if you do something like this?

execlp ("/bin/ls", "ls", "-l", "-t", "-r", NULL);

Does it search the environment? No. You told execlp() to use an explicit pathname, which overrides the normal PATH searching rule. If it doesn't find ls in /bin that's it, no other attempts are made (this is identical to the way execl() works in this case).

Is it dangerous to mix an explicit path with a plain command name (for example, the path argument /bin/ls, and the command name argument ls, instead of /bin/ls)? This is usually pretty safe, because:

  • A large number of programs ignore argv [0] anyway
  • Those that do care usually call basename() , which strips off the directory portion of argv [0] and returns just the name.

The only compelling reason for specifying the full pathname for the first argument is that the program can print out diagnostics including this first argument, which can instantly tell you where the program was invoked from. This may be important when the program can be found in multiple locations along the PATH.

The spawn() functions all have an extra parameter; in all the above examples, we've always specified P_WAIT. There are four flags you can pass to spawn() to change its behavior:

P_WAIT
The calling process (your program) is blocked until the newly created program has run to completion and exited.
P_NOWAIT
The calling program doesn't block while the newly created program runs. This allows you to start a program in the background, and continue running while the other program does its thing.
P_NOWAITO
Identical to P_NOWAIT, except that the SPAWN_NOZOMBIE flag is set, meaning that you don't have to worry about doing a waitpid() to clear the process's exit code.
P_OVERLAY
This flag turns the spawn() call into the corresponding exec() call! Your program transforms into the specified program, with no change in process ID.

It's generally clearer to use the exec() call if that's what you meant—it saves the maintainer of the software from having to look up P_OVERLAY in the C Library Reference!

Plain spawn()

All spawn() functions eventually call the plain spawn() function. Here's the prototype for the spawn() function:

#include <spawn.h>

pid_t
spawn (const char *path,
       int fd_count,
       const int fd_map [],
       const struct inheritance *inherit,
       char * const argv [],
       char * const envp []);

We can immediately dispense with the path, argv, and envp parameters—we've already seen those above as representing the location of the executable (the path member), the argument vector (argv), and the environment (envp).

The fd_count and fd_map parameters go together. If you specify zero for fd_count, then fd_map is ignored, and it means that all file descriptors (except those modified by fcntl() 's FD_CLOEXEC flag) are inherited in the newly created process. If the fd_count is non-zero, then it indicates the number of file descriptors contained in fd_map; only the specified ones are inherited.

The inherit parameter is a pointer to a structure that contains a set of flags, signal masks, and so on. For more details, refer to the BlackBerry 10 OS C Library Reference.

Starting a process with the fork() call

Suppose you want to create a new process that's identical to the currently running process and have it run concurrently. You can approach this with a spawn() (and the P_NOWAIT parameter), giving the newly created process enough information about the exact state of your process so it can set itself up. However, this can be extremely complicated; describing the current state of the process can involve lots of data.

There is an easier way—the fork() function, which duplicates the current process. All the code is the same, and the data is the same as the creating (or parent) process's data.

Of course, it's impossible to create a process that's identical in every way to the parent process. Why? The most obvious difference between these two processes is going to be the process ID—we can't create two processes with the same process ID. If you look at fork()'s documentation in the BlackBerry 10 OS C Library Reference, you see that there is a list of differences between the two processes. You should read this list to be sure that you know these differences if you plan to use fork().

If both sides of a fork() look alike, how do you tell them apart? When you call fork(), you create another process executing the same code at the same location (that is, both are about to return from the fork() call) as the parent process. Let's look at some sample code:

int main (int argc, char **argv)
{
    int retval;

    printf ("This is most definitely the parent process\n");
    fflush (stdout);
    retval = fork ();
    printf ("Which process printed this?\n");

    return (EXIT_SUCCESS);
}

After the fork() call, both processes are going to execute the second printf() call! If you run this program, it prints something like this:

This is most definitely the parent process
Which process printed this?
Which process printed this?

Both processes print the second line.

The only way to tell the two processes apart is the fork() return value in retval. In the newly created child process, retval is zero; in the parent process, retval is the child's process ID.

Clear as mud? Here's another code snippet to clarify:

printf ("The parent is pid %d\n", getpid ());
fflush (stdout);

if (child_pid = fork ()) {
    printf ("This is the parent, child pid is %d\n",
            child_pid);
} else {
    printf ("This is the child, pid is %d\n",
            getpid ());
}

This program prints something like:

The parent is pid 4496
This is the parent, child pid is 8197
This is the child, pid is 8197

You can tell which process you are (the parent or the child) after the fork() by looking at fork()'s return value.

Starting a process with the vfork() call

The vfork() function can be a lot less resource intensive than the plain fork(), because it shares the parent's address space. The vfork() function creates a child, but then suspends the parent thread until the child calls exec() or exits (via exit() and friends). Additionally, vfork() works on physical memory model systems, whereas fork() can't—fork() needs to create the same address space, which just isn't possible in a physical memory model.

So what should you use?

Obviously, if you're porting existing code, you want to use whatever the existing code uses. For new code, you should avoid fork() if at all possible. Here's why:

  • Although fork() works with multiple threads, you need to register a pthread_atfork() handler and lock every single mutex before you fork, complicating the design.
  • The child of fork() duplicates all open file descriptors.

The choice between vfork() and the spawn() family boils down to portability, and what you want the child and parent to be doing. The vfork() function pauses until the child calls exec() or exits, whereas the spawn() family of functions can allow both to run concurrently. The vfork() function, however, is subtly different between operating systems.