1. Overview
This assignment has three parts for you to implement:
- A simple IPC mechanism.
- A disk cache system.
- A filesystem.
The IPC mechanism and cache systems can be implemented independently. The filesystem will depend on the cache system (and possibly the IPC system as well).
2. What you start with
Because we are dealing with relatively high-level structures, we will ask you to build them at the userspace level, in processes running on top of the standard Linux kernel. Nonetheless, we have set things up to look somewhat like what you would see inside a self-contained kernel.
There is a single 1 MiB shared-memory block that should be used for all communication and synchronization between processes. The function kmem_get in kmem.c or kmem in utils.c can be used to access this block.
There is a simulated disk (represented as a file in the real filesystem) that you should implement your disk cache and filesystem on top of. This simulated disk provides low-level operations disk_read and disk_write in disk.c for reading and writing, both of which are synchronous. Since they don't return until a block is actually read from or written to the physical disk, they are pretty slow. Part of the job of your cache system is to speed up disk operations by allowing reads to re-read data from the cache and writes to be delayed.
You can use the mutexes, condition variables, and (memory-based) semaphores provided by the pthreads library for synchronization. We have provided wrappers in utils.c for the initialization routines for these synchronization primitives that allow them to be used in the shared-memory block. You can find out more about how to use them from their man pages, e.g. man pthread_mutex_lock, man pthread_cond_wait, man sem_overview.
You should not use any other mechanisms to communicate between processes.
Several test programs are given. Run make test to build and run all the standard test programs; if your implementation works, make should finish without signaling an error (it's OK if it ignores an error when trying to read test_not_found). Run make run_sieve to run the Sieve of Eratosthenes demo, which is intended to flush out synchronization bugs; this should also finish without any errors. (You may want to send the output to a file).
3. What you need to write
3.1. mbox.c and mbox_internal.h
These files implement a simple message-passing system.
- mbox_send
This should send a message with the given length (possibly 0). It should block if the queue is full. Since there is no mechanism for returning errors, you should do something reasonable if count is too big or too small.
- mbox_recv
Receives a message, copying it into the provided buffer, and putting its size into *count. It should block if no messages are available.
- mbox_init
Initializes the mboxes data structure that you defined in mbox_internal.h.
See mbox.h for details on arguments.
3.2. cache.c and cache_internal.h
The cache consists of a set of frames backed by disk blocks. It is accessed primarily by the cache_pin and cache_release functions, which behave a bit like malloc and free. Reading and writing cached data involves standard C pointer operations. Since we do not have a paging system to track writes for us (and I was too lazy to fake write detection using mprotect), cached pages must be marked dirty by hand when written to (only the filesystem should ever have to do this).
- cache_pin
- This pins a disk block in the cache. As long as the disk block is pinned, it can't be replaced (although you may write the data to disk). Returns a pointer to the shared-memory address that holds the disk block. It is possible to call cache_pin more than once; if this happens, each process should get the same address and the block should not be released until cache_release has been called once for each cache_pin.
- cache_mark_dirty
- Marks a pinned block as dirty.
- cache_release
- Undoes cache_pin. Second argument can be set to 1 to mark a block as dirty before releasing it; this is equivalent to calling cache_mark_dirty, but encourages the user to think about whether they are making blocks dirty.
- cache_sync
- Writes all dirty blocks back to disk.
- cache_get_statistics
Returns statistics on cache hits, misses, and flushes since last call to cache_init.
- cache_init
Initializes the cache data structure that you defined in cache_internal.h.
You should implement some reasonable page-replacement algorithm for your cache, where by reasonable we mean something that doesn't touch the disk after an initial load-up phase if the processes are using a set of blocks that fit in the cache (so FIFO, random, and the notorious MRU are not reasonable by this standard). Since the simulated disk driver uses synchronous I/O to bypass Linux's buffering, it should be pretty obvious if you are not caching enough. Cached blocks should not be written back to disk unless (a) they are marked as dirty, and (b) either the page-replacement algorithm needs space or the user calls cache_sync.
For more details on arguments, see cache.h.
3.3. fs.c and fs_internal.h
This is the most complicated part of the system, in terms of the number of entry points.
The basic idea is to provide a flat filesystem with 16-character filenames. To make life easier, files cannot be destroyed or truncated (so the free list can be pretty simple). A new file can be created using fs_create. An existing file can be opened using fs_open; note that fs_open should not create a file if it doesn't exist already. Any file opened by fs_open will have its current position set to the beginning of the file; if you want to append to a file, you will need to call fs_seek.
You can read from or write to a file using fs_read and fs_write. These behave much like standard Posix read and write; in particular, they may only partially complete the requested operation. This can occur in fs_read if you read past the end of a file. In fs_write, this should only occur if you run out of space. Both function return the number of bytes successfully read or written. The functions fs_read and fs_write should be atomic operations; a process should not be able to see only part of the results of an otherwise successful write, and it should be the case that once a write completes, all subsequent reads see its results.
The file position can be adjusted using fs_seek. File positions should be local to each process; it should be possible for process P to work on one part of a file at the same time process Q is working somewhere else. The corresponding function fs_tell returns the current file position. The length of a file can be determined by calling fs_eof.
The function fs_used should return the number of disk blocks used by the filesystem.
Two initialization functions should be provided: fs_init is called once by kinit to initialize the shared-memory data structures you define in fs_internal.h; fs_format can be called at any time to reformat that filesystem. Formatting the disk should have the effect of deleting all files and resetting the filesystem to its initial state. You don't actually have to wipe out all the data you've written to do this; it is enough to clear out directory structures.
3.4. README
You should submit a README file documenting the design of your modules and giving the results of your measurements (see below).
4. Measurements
In addition to writing these mechanisms, we will ask you to measure all of the following:
- The average time to deliver a short message sent between two processes via your IPC system.
The average times to handle cache hits and cache misses in cache_pin, where all blocks are released immediately after being marked dirty.
The average time to write a single byte to a file, where the average is taken over some large number of bytes written to the same file starting in an empty filesystem. You should include the time needed to flush any pages to disk with the cache_sync call.
You may need to write your own test programs to measure these quantities. You should use wall-clock time on a lightly loaded Zoo node as a measure and document the properties of the machine you are running on (cat /proc/cpuinfo may be handy for this). You should attempt as much as possible to isolate the costs of what you are measuring from other costs that may be incurred.
5. Design review
For your design review, you should have designed the data structures needed to support the mbox and cache systems and should have a written description of the structure of your filesystem, including both the data stored on disk, the data stored in shared memory, and the data stored in each process's private address space.
6. Extra credit
We are offering up to two points extra credit for non-trivial extensions of the basic system. You can get 1 point for adding file deletion and truncation to the file system. You can get an additional point for adding a new feature of comparable complexity (check with us if you aren't sure if what you want to do is exciting enough). It is your responsibility to clearly document your new features and provide tools to demonstrate that they work.
7. Files
Filename |
Modify? |
Description |
cache.h |
no |
Describes interface to cache system |
cache_internal.h |
yes |
Defines internal data structures for cache system |
cache.c |
yes |
Implementation of cache system |
cache_sync.c |
no |
Utility for flushing cache |
cache_stats.c |
no |
Utility for reporting cache statistics |
disk.h |
no |
Interface to simulated disk |
disk.c |
no |
Implementation of simulated disk |
fs.h |
no |
Interface to filesystem |
fs_internal.h |
yes |
Internal data structures for filesystem |
fs.c |
yes |
Implementation of filesystem |
kinit.c |
maybe |
Initializes shared-memory data structures |
klib.h |
no |
Wrapper for kernel system interfaces |
kmem.h |
no |
Interface to shared memory |
kmem.c |
no |
Implementation of shared memory |
kstruct.h |
maybe |
Organization of shared memory |
Makefile |
no |
Makefile |
mbox.h |
no |
IPC interface |
mbox_internal.h |
yes |
IPC data structures |
mbox.c |
yes |
IPC implementation |
read_file.c |
no |
Copies files to stdout |
read_mbox.c |
no |
Echoes messages to stdout |
sieve.c |
no |
Parallel Sieve of Eratosthenes: test with make run_sieve |
primes.txt |
no |
A list of all primes < 10000, if you want to compare with the results of sieve. |
test_cache.c |
no |
Test code for cache system |
test_disk.c |
no |
Test code for disk |
test_fs.c |
no |
Test code for filesystem |
test_kmem.c |
no |
Test code for kmem |
test_mbox.c |
no |
Test code for IPC system |
utils.h |
no |
Handy utility functions |
utils.c |
no |
Handy utility functions |
write_file.c |
no |
Copy stdin to file |
write_mbox.c |
no |
Send message passed in argv |
README |
yes |
Documentation |
8. Notes
Normally you will run kinit to restart the system. This will call your initialization routines to set up the shared memory structure. Note that shared memory contents persist even if you don't have any processes running (they are hidden in /dev/shm/kmem<your uid>). This means that you can't make any assumptions about the initial state of shared memory. Note also that kinit doesn't kill any processes for you, and that odd things may happen if you leave old processes running with access to the shared memory. Running make kill should wipe everything out, but check with ps just in case.
Similarly, the state of the disk persists (it's stored in /tmp/disk<your uid>.img). Some programs (like test_fs) will format the disk for you. Most won't, so you will need to run the fs_format program before tinkering with the disk.
The disk driver does not protect against simultaneous disk operations; instead, it punishes them. You are responsible for making sure that only once process accesses the disk at a time. You may not modify the internal disk_lock and disk_unlock routines for this purpose, nor may you use Linux file locks; instead, use pthreads mutexes or some such structure as appropriate in cache.c.
- The simplest design for the cache system is probably to have each process update the cache as needed itself, with enough locking to prevent confusion.