1. Motivation
In the beginning, processes generally didn't talk to each other except occasionally by leaving notes in the filesystem. But faster methods were needed to get real-time cooperation between processes.
Most interactions between processes fall into a small number of patterns:
- Shared state
- A database, the Windows registry, the filesystem, the kernel scheduler data.
- Producer-consumer
- One process generates a sequence of data that is consumed by another process.
- Worker threads
- Several processes carry out tasks in parallel; requires a task queue and possibly some shared state the threads are operating on.
We want to provide tools for letting processes communicate with each other quickly to carry out tasks with these sort of structures.
2. Shared memory
The simplest form of IPC is shared memory. Here the main issue is ConcurrencyControl (which we've already talked about) and possibly MemoryManagement to map the same physical memory to two different address spaces (which we haven't talked about yet).
Advantage: Very fast.
Disadvantages: Requires careful locking to avoid trouble. Doesn't work across multiple machines. Doesn't (by itself) solve the producer-consumer problem.
3. Message passing
Here the model is that one process sends a message (some block of bits) and another process receives it, with actual delivery the responsibility of the operating system. The main advantage of message-passing is that it scales to multiple machines, is agnostic about the actual mechanism of delivery (see, for example, http://www.ietf.org/rfc/rfc1149.txt, the IETF standard for IP via Carrier Pigeon).
3.1. Interface
3.1.1. Channels and naming
One issue is how to specify where you are sending a message to and where you want to receive a message from. This is particularly tricky if you start out without knowing any friends.
A distinction can be made between direct delivery—sending messages to a specific process—and indirect delivery—sending messages to some channel without necessarily knowing what process might be on the other end. The latter is generally more useful, since it decouples a service (processing particular messages) from its implementation.
Some of the strategies used in various kinds of systems:
- Unix pipes
A pipe is an anonymous channel for shipping bytes. You create a pipe with the pipe system call, which gives you two file descriptors that you can then use standard POSIX I/O operations on. The recipient of a message is whoever is reading from the output end of the pipe. Since there is no way to hand file descriptors to an arbitrary process in most versions of Unix, pipe handles are usually inherited during a fork.
- TCP/IP
An address is given by an IP address (32 bits, or 128 bits for IPv6) and a port number (16 bits). IP addresses may be obtained from more human-readable names using DNS. Many port numbers are standardized, e.g. 21 for telnet, 22 for ssh, 25 for SMTP, 80 for HTTP, 6000 for X11. (Try running nmap localhost on the Zoo sometime.) The recipient of a message is whatever process asked the kernel to open that port for it. Sun RPC: Layered on top of TCP/IP. Port numbers are assigned dynamically by a portmapper service (which itself is at a standard port). Peer-to-peer systems: Implement a distributed lookup service to translate queries to IP addresses.
Even once you have a name, there is still the issue of setting up a channel or connection and what the behavior of that channel is. Some variants:
- Can anybody send to a channel or just a particular sender? In the former case, receiving a message should report who sent it.
Can more than one process receive from a channel? (A channel that can be received from by multiple recipients is often called a mailbox.)
- Are messages clearly divided or do you just get a stream of bytes? The latter strategy is what you mostly get from Unix pipes and the TCP parts of the BSD network stack (which is used almost everywhere). The advantage from the point of view of the transport medium is both reduced complexity and the ability to package messages together or split them apart (since it doesn't know where the boundaries are anyway). The disadvantage for the user is that they have to separate messages by hand (but this can be hidden by libraries, or by using protocols like UDP that are message-oriented instead of connection-oriented).
- Is communication synchronous (sender blocks until receiver receives) or asynchronous (message gets there eventually).
- What kind of buffering (if any) does the channel provide?
- What happens if messages are lost in transit?
- What happens if one of the participants in a channel dies?
3.1.2. Sending a message
The interface for sending a message can be pretty straightforward: something like send(msg, length, recipient), where recipient is the name of a channel. From the sender's perspective, the message is gone and it can go do whatever it likes.
3.1.3. Receiving a message
This is trickier. Since the recipient doesn't control when a message is sent (or how long it takes to arrive), we need to somehow get its attention. There are two basic approaches:
- Event handler
- Recipient registers a callback function with the kernel. When a message comes in, the kernel interrupts the recipient and runs the callback function.
- Event loop
Recipient checks its messages from time to time using either a blocking receive function or a non-blocking polling function.
The advantages and disadvantages are similar to those for preemptive vs non-preemptive scheduling. Messages coming in through an event loop are tidier but less timely than messages delivered through interruptions.
Note we can always simulate the event-loop model with an event handler by having the event handler store incoming messages in a buffer. Conversely, a multi-threaded processes can simulate an event handler by having a listener thread whose only purpose is to wait for incoming messages.
3.2. Implementation
When we look at implementation, we suddenly realize that message-passing doesn't solve the producer-consumer problem, it instantiates it.
3.2.1. Using shared memory
For asynchronous message-passing, we need to allocate buffer space somewhere in the kernel for each channel. A send operation is now handled by a system call that (a) checks if the buffer is full and blocks otherwise (e.g., using a semaphore), (b) then writes the message to the end of the buffer. Receive is the reverse, blocking first if the buffer is empty (a second semaphore), and then reading from the start of the buffer. The start and end of the buffer can be maintained by keeping track of two pointers that wrap around at the end, giving a ring buffer architecture. Another option that allows for unbounded buffering subject to the memory limits and patience of the kernel is a dynamically-allocated queue.
For synchronous message-passing, the buffer can be much smaller since it only has to hold one message in transit. Here the main implementation issue is how a send operation interacts with the scheduler. Since we are going to block the sender anyway, do we just queue it up or should we do something like give up the rest of its time-slice to the recipient to get a fast response? Further optimizations may be possible if we can immediately switch to the recipient, like passing very small messages in registers or very big messages by mapping physical memory between address spaces. These optimizations might be especially important when message-passing is a critical system bottleneck, as in microkernels.
3.2.2. Across a network
Typical approach: Buffer on sender's machine, and then separate kernel thread and/or network driver pulls data out of buffer and sends it out through the network hardware. Receiver has matching structure, with incoming data being stuffed in buffers for later delivery.
Many details to work out: acknowledgments, retransmission, checksums, etc. Not our problem.
3.3. Exceptions
Bad things can happen. Suppose recipient dies, how do we notify sender?
- Unix approach
SIGPIPE delivered to signal handler (or if not caught, sender dies with Broken Pipe error), errors on write operations.
- TCP approach
- Special acknowledgment message says connection is closed, kernel translates to appropriate errors.
How about lost messages?
- Retransmit
- Kernel takes responsibility for resending message until it gets through (and acknowledged).
- Ignore it
- Sometimes messages just get lost, too bad.
4. Remote procedure calls
Allows process P to call some specific procedure in process Q (generally process Q has to register the procedure as available) and gets result. Process Q may or may not be on the same machine.
Implementation is basically two messages: call and return. Arguments to call are translated into standard form (marshaled) for transmission. Process P generally blocks while waiting for the return.
Simplifies interface since we already understand procedure calls. But blocking P may be expensive (can be dealt with by using multiple threads in P).
Example: HTTP GET.
5. Remote method invocation
Fancy-pants Java RPC. Problem with RPC for OO languages is that arguments might be objects. RPC would send a bitwise copy of the object, but complex objects (particularly objects with state) can't be marshaled. So instead we replace each object in P that is included as an argument with a delegate, a new object that lives on Q but forwards any method invocations back to its progenitor on P (via RMI again, now running in reverse). Note that allowing these reverse calls means that P can't block completely during an RMI or we'll get deadlock. This is handled by having each process have a listener thread that processes incoming RMI requests, spawning worker threads to carry out the actual operations.
6. Efficiency issues
Any form of IPC can be expensive, since dealing with multiple processes probably involves both context switches and copying data. One way to deal with the cost is bundling: Many messages or procedure calls can be packed together so that the overhead is amortized. This is done, for example, by the X11 protocol and by the HTTP KeepAlive mechanism. But this usually requires user cooperation or at the minimum asynchronous communication (so the kernel can bundle the messages for the user).