Note: You are looking at a static copy of the former PineWiki site, used for class notes by James Aspnes from 2003 to 2012. Many mathematical formulas are broken, and there are likely to be other bugs as well. These will most likely not be fixed. You may be able to find more up-to-date versions of some of these notes at http://www.cs.yale.edu/homes/aspnes/#classes.

1. Overview

This assignment has three parts for you to implement:

A simple IPC mechanism.
A disk cache system.
A filesystem.

The IPC mechanism and cache systems can be implemented independently. The filesystem will depend on the cache system (and possibly the IPC system as well).

2. What you start with

Because we are dealing with relatively high-level structures, we will ask you to build them at the userspace level, in processes running on top of the standard Linux kernel. Nonetheless, we have set things up to look somewhat like what you would see inside a self-contained kernel.

There is a single 1 MiB shared-memory block that should be used for all communication and synchronization between processes. The function kmem_get in kmem.c or kmem in utils.c can be used to access this block.
There is a simulated disk (represented as a file in the real filesystem) that you should implement your disk cache and filesystem on top of. This simulated disk provides low-level operations disk_read and disk_write in disk.c for reading and writing, both of which are synchronous. Since they don't return until a block is actually read from or written to the physical disk, they are pretty slow. Part of the job of your cache system is to speed up disk operations by allowing reads to re-read data from the cache and writes to be delayed.
You can use the mutexes, condition variables, and (memory-based) semaphores provided by the pthreads library for synchronization. We have provided wrappers in utils.c for the initialization routines for these synchronization primitives that allow them to be used in the shared-memory block. You can find out more about how to use them from their man pages, e.g. man pthread_mutex_lock, man pthread_cond_wait, man sem_overview.

You should not use any other mechanisms to communicate between processes.

Several test programs are given. Run make test to build and run all the standard test programs; if your implementation works, make should finish without signaling an error (it's OK if it ignores an error when trying to read test_not_found). Run make run_sieve to run the Sieve of Eratosthenes demo, which is intended to flush out synchronization bugs; this should also finish without any errors. (You may want to send the output to a file).

3. What you need to write

3.1. mbox.c and mbox_internal.h

These files implement a simple message-passing system.

mbox_send: This should send a message with the given length (possibly 0). It should block if the queue is full. Since there is no mechanism for returning errors, you should do something reasonable if count is too big or too small.
mbox_recv: Receives a message, copying it into the provided buffer, and putting its size into *count. It should block if no messages are available.
mbox_init: Initializes the mboxes data structure that you defined in mbox_internal.h.

See mbox.h for details on arguments.

3.2. cache.c and cache_internal.h

The cache consists of a set of frames backed by disk blocks. It is accessed primarily by the cache_pin and cache_release functions, which behave a bit like malloc and free. Reading and writing cached data involves standard C pointer operations. Since we do not have a paging system to track writes for us (and I was too lazy to fake write detection using mprotect), cached pages must be marked dirty by hand when written to (only the filesystem should ever have to do this).

cache_pin: This pins a disk block in the cache. As long as the disk block is pinned, it can't be replaced (although you may write the data to disk). Returns a pointer to the shared-memory address that holds the disk block. It is possible to call cache_pin more than once; if this happens, each process should get the same address and the block should not be released until cache_release has been called once for each cache_pin.
cache_mark_dirty: Marks a pinned block as dirty.
cache_release: Undoes cache_pin. Second argument can be set to 1 to mark a block as dirty before releasing it; this is equivalent to calling cache_mark_dirty, but encourages the user to think about whether they are making blocks dirty.
cache_sync: Writes all dirty blocks back to disk.
cache_get_statistics: Returns statistics on cache hits, misses, and flushes since last call to cache_init.
cache_init: Initializes the cache data structure that you defined in cache_internal.h.

You should implement some reasonable page-replacement algorithm for your cache, where by reasonable we mean something that doesn't touch the disk after an initial load-up phase if the processes are using a set of blocks that fit in the cache (so FIFO, random, and the notorious MRU are not reasonable by this standard). Since the simulated disk driver uses synchronous I/O to bypass Linux's buffering, it should be pretty obvious if you are not caching enough. Cached blocks should not be written back to disk unless (a) they are marked as dirty, and (b) either the page-replacement algorithm needs space or the user calls cache_sync.

For more details on arguments, see cache.h.

3.3. fs.c and fs_internal.h

This is the most complicated part of the system, in terms of the number of entry points.

The basic idea is to provide a flat filesystem with 16-character filenames. To make life easier, files cannot be destroyed or truncated (so the free list can be pretty simple). A new file can be created using fs_create. An existing file can be opened using fs_open; note that fs_open should not create a file if it doesn't exist already. Any file opened by fs_open will have its current position set to the beginning of the file; if you want to append to a file, you will need to call fs_seek.

You can read from or write to a file using fs_read and fs_write. These behave much like standard Posix read and write; in particular, they may only partially complete the requested operation. This can occur in fs_read if you read past the end of a file. In fs_write, this should only occur if you run out of space. Both function return the number of bytes successfully read or written. The functions fs_read and fs_write should be atomic operations; a process should not be able to see only part of the results of an otherwise successful write, and it should be the case that once a write completes, all subsequent reads see its results.

The file position can be adjusted using fs_seek. File positions should be local to each process; it should be possible for process P to work on one part of a file at the same time process Q is working somewhere else. The corresponding function fs_tell returns the current file position. The length of a file can be determined by calling fs_eof.

The function fs_used should return the number of disk blocks used by the filesystem.

Two initialization functions should be provided: fs_init is called once by kinit to initialize the shared-memory data structures you define in fs_internal.h; fs_format can be called at any time to reformat that filesystem. Formatting the disk should have the effect of deleting all files and resetting the filesystem to its initial state. You don't actually have to wipe out all the data you've written to do this; it is enough to clear out directory structures.

3.4. README

You should submit a README file documenting the design of your modules and giving the results of your measurements (see below).

4. Measurements

In addition to writing these mechanisms, we will ask you to measure all of the following:

The average time to deliver a short message sent between two processes via your IPC system.
The average times to handle cache hits and cache misses in cache_pin, where all blocks are released immediately after being marked dirty.
The average time to write a single byte to a file, where the average is taken over some large number of bytes written to the same file starting in an empty filesystem. You should include the time needed to flush any pages to disk with the cache_sync call.

You may need to write your own test programs to measure these quantities. You should use wall-clock time on a lightly loaded Zoo node as a measure and document the properties of the machine you are running on (cat /proc/cpuinfo may be handy for this). You should attempt as much as possible to isolate the costs of what you are measuring from other costs that may be incurred.

5. Design review

For your design review, you should have designed the data structures needed to support the mbox and cache systems and should have a written description of the structure of your filesystem, including both the data stored on disk, the data stored in shared memory, and the data stored in each process's private address space.

6. Extra credit

We are offering up to two points extra credit for non-trivial extensions of the basic system. You can get 1 point for adding file deletion and truncation to the file system. You can get an additional point for adding a new feature of comparable complexity (check with us if you aren't sure if what you want to do is exciting enough). It is your responsibility to clearly document your new features and provide tools to demonstrate that they work.

7. Files

Filename	Modify?	Description
cache.h	no	Describes interface to cache system
cache_internal.h	yes	Defines internal data structures for cache system
cache.c	yes	Implementation of cache system
cache_sync.c	no	Utility for flushing cache
cache_stats.c	no	Utility for reporting cache statistics
disk.h	no	Interface to simulated disk
disk.c	no	Implementation of simulated disk
fs.h	no	Interface to filesystem
fs_internal.h	yes	Internal data structures for filesystem
fs.c	yes	Implementation of filesystem
kinit.c	maybe	Initializes shared-memory data structures
klib.h	no	Wrapper for kernel system interfaces
kmem.h	no	Interface to shared memory
kmem.c	no	Implementation of shared memory
kstruct.h	maybe	Organization of shared memory
Makefile	no	Makefile
mbox.h	no	IPC interface
mbox_internal.h	yes	IPC data structures
mbox.c	yes	IPC implementation
read_file.c	no	Copies files to stdout
read_mbox.c	no	Echoes messages to stdout
sieve.c	no	Parallel Sieve of Eratosthenes: test with `make run_sieve`
primes.txt	no	A list of all primes < 10000, if you want to compare with the results of sieve.
test_cache.c	no	Test code for cache system
test_disk.c	no	Test code for disk
test_fs.c	no	Test code for filesystem
test_kmem.c	no	Test code for kmem
test_mbox.c	no	Test code for IPC system
utils.h	no	Handy utility functions
utils.c	no	Handy utility functions
write_file.c	no	Copy stdin to file
write_mbox.c	no	Send message passed in `argv`
README	yes	Documentation

8. Notes

Normally you will run kinit to restart the system. This will call your initialization routines to set up the shared memory structure. Note that shared memory contents persist even if you don't have any processes running (they are hidden in /dev/shm/kmem<your uid>). This means that you can't make any assumptions about the initial state of shared memory. Note also that kinit doesn't kill any processes for you, and that odd things may happen if you leave old processes running with access to the shared memory. Running make kill should wipe everything out, but check with ps just in case.
Similarly, the state of the disk persists (it's stored in /tmp/disk<your uid>.img). Some programs (like test_fs) will format the disk for you. Most won't, so you will need to run the fs_format program before tinkering with the disk.
The disk driver does not protect against simultaneous disk operations; instead, it punishes them. You are responsible for making sure that only once process accesses the disk at a time. You may not modify the internal disk_lock and disk_unlock routines for this purpose, nor may you use Linux file locks; instead, use pthreads mutexes or some such structure as appropriate in cache.c.
The simplest design for the cache system is probably to have each process update the cache as needed itself, with enough locking to prevent confusion.