1. The problem
Suppose we have an abstract data type that represents some sort of container, such as a list or dictionary. We'd like to be able to do something to every element of the container; say, count them up. How can we write operations on the abstract data type to allow this, without exposing the implementation?
To make the problem more concrete, let's suppose we have an abstract data type that represents the set of all non-negative numbers less than some fixed bound. The core of its interface might look like this:
1.1. nums.h
1 /*
2 * Abstract data type representing the set of numbers from 0 to
3 * bound-1 inclusive, where bound is passed in as an argument at creation.
4 */
5 typedef struct nums *Nums;
6
7 /* Create a Nums object with given bound. */
8 Nums nums_create(int bound);
9
10 /* Destructor */
11 void nums_destroy(Nums);
12
13 /* Returns 1 if nums contains element, 0 otherwise */
14 int nums_contains(Nums nums, int element);
1.2. nums.c
1 #include <stdlib.h>
2 #include "nums.h"
3
4 struct nums {
5 int bound;
6 };
7
8 Nums nums_create(int bound)
9 {
10 struct nums *n;
11 n = malloc(sizeof(*n));
12 n->bound = bound;
13 return n;
14 }
15
16 void nums_destroy(Nums n) { free(n); }
17
18 int nums_contains(Nums n, int element)
19 {
20 return element >= 0 && element < n->bound;
21 }
From the outside, a Nums acts like the set of numbers from 0 to bound - 1; nums_contains will insist that it contains any int that is in this set and contains no int that is not in this set.
Let's suppose now that we want to loop over all elements of some Nums, say to add them together. In particular, we'd like to implement the following pseudocode, where nums is some Nums instance:
One way to do this would be to build the loop into some operation in nums.c, including its body. But we'd like to be able to substitute any body for the sum += i line. Since we can't see the inside of a Nums, we need to have some additional operation or operations on a Nums that lets us write the loop. How can we do this?
2. Option 1: Function that returns a sequence
A data-driven approach might be to add a nums_contents function that returns a sequence of all elements of some instance, perhaps in the form of an array or linked list. The advantage of this approach is that once you have the sequence, you don't need to worry about changes to (or destruction of) the original object. The disadvantage is that you have to deal with storage management issues, and have to pay the costs in time and space of allocating and filling in the sequence. This can be particularly onerous for a "virtual" container like Nums, since we could conceivably have a Nums instance with billions of elements.
Bearing these facts in mind, let's see what this approach might look like. We'll define a new function nums_contents that returns an array of ints, terminated by a -1 sentinel:
We might use it like this:
Despite the naturalness of the approach, returning a sequence in this case leads to the most code complexity of the options we will examine.
3. Option 2: Iterator with first/done/next operations
If we don't want to look at all the elements at once, but just want to process them one at a time, we can build an iterator. An iterator is an object that allows you to step through the contents of another object, by providing convenient operations for getting the first element, testing when you are done, and getting the next element if you are not. In C, we try to design iterators to have operations that fit well in the top of a for loop.
For the Nums type, we'll make each Nums its own iterator. The new operations are given here:
And we use them like this:
Not only do we completely avoid the overhead of building a sequence, we also get much cleaner code. It helps in this case that all we need to find the next value is the previous one; for a more complicated problem we might have to create and destroy a separate iterator object that holds the state of the loop. But for many tasks in C, the first/done/next idiom is a pretty good one.
4. Option 3: Iterator with function argument
Suppose we have a very complicated iteration, say one that might require several nested loops or even a recursion to span all the elements. In this case it might be very difficult to provide first/done/next operations, because it would be hard to encode the state of the iteration so that we could easily pick up in the next operation where we previously left off. What we'd really like to do is to be able to plug arbitrary code into the innermost loop of our horrible iteration procedure, and do it in a way that is reasonably typesafe and doesn't violate our abstraction barrier. This is a job for function pointers, and an example of the functional programming style in action.
We'll define a nums_foreach function that takes a function as an argument:
The f_data argument is used to pass extra state into the passed-in function f; it's a void * because we want to let f work on any sort of extra state it likes.
Now to do our summation, we first define an extra function sum_helper, which adds each element to an accumulator pointed to by f_data:
We then feed sum_helper to the nums_foreach function:
There is a bit of a nuisance in having to define the auxiliary sum_helper function and in all the casts to and from void, but on the whole the complexity of this solution is not substantially greater than the first/done/next approach. Which you should do depends on whether it's harder to encapsulate the state of the iterator (in which case the functional approach is preferable) or of the loop body (in which case the first/done/next approach is preferable), and whether you need to bail out of the loop early (which would require special support from the foreach procedure, perhaps checking a return value from the function). However, it's almost always straightforward to encapsulate the state of a loop body; just build a struct containing all the variables that it uses, and pass a pointer to this struct as f_data.
5. Appendix: Complete code for Nums
Here's a grand unified Nums implementation that provides all the interfaces we've discussed:
5.1. nums.h
1 /*
2 * Abstract data type representing the set of numbers from 0 to
3 * bound-1 inclusive, where bound is passed in as an argument at creation.
4 */
5 typedef struct nums *Nums;
6
7 /* Create a Nums object with given bound. */
8 Nums nums_create(int bound);
9
10 /* Destructor */
11 void nums_destroy(Nums);
12
13 /* Returns 1 if nums contains element, 0 otherwise */
14 int nums_contains(Nums nums, int element);
15 /*
16 * Returns a freshly-malloc'd array containing all elements of n,
17 * followed by a sentinel value of -1.
18 */
19 int *nums_contents(Nums n);
20
21 /* Three-part iterator */
22 int nums_first(Nums n); /* returns smallest element in n */
23 int nums_done(Nums n, int val); /* returns 1 if val is past end */
24 int nums_next(Nums n, int val); /* returns next value after val */
25
26 /* Call f on every element of n with with extra argument f_data */
27 void nums_foreach(Nums n, void (*f)(int, void *f_data), void *f_data);
5.2. nums.c
1 #include <stdlib.h>
2 #include "nums.h"
3
4 struct nums {
5 int bound;
6 };
7
8 Nums nums_create(int bound)
9 {
10 struct nums *n;
11 n = malloc(sizeof(*n));
12 n->bound = bound;
13 return n;
14 }
15
16 void nums_destroy(Nums n) { free(n); }
17
18 int nums_contains(Nums n, int element)
19 {
20 return element >= 0 && element < n->bound;
21 }
22
23 int *
24 nums_contents(Nums n)
25 {
26 int *a;
27 int i;
28 a = malloc(sizeof(*a) * (n->bound + 1));
29 for(i = 0; i < n->bound; i++) a[i] = i;
30 a[n->bound] = -1;
31 return a;
32 }
33
34 int nums_first(Nums n) { return 0; }
35 int nums_done(Nums n, int val) { return val >= n->bound; }
36 int nums_next(Nums n, int val) { return val+1; }
37
38 void nums_foreach(Nums n, void (*f)(int, void *), void *f_data)
39 {
40 int i;
41 for(i = 0; i < n->bound; i++) f(i, f_data);
42 }
And here's some test code to see if it all works:
5.3. test-nums.c
1 #include <stdio.h>
2 #include <setjmp.h>
3 #include <signal.h>
4 #include <unistd.h>
5 #include <stdlib.h>
6
7 #include "nums.h"
8 #include "tester.h"
9
10 static void sum_helper(int i, void *f_data)
11 {
12 *((int *) f_data) += i;
13 }
14
15 int
16 main(int argc, char **argv)
17 {
18 Nums nums;
19 int sum;
20 int *contents;
21 int *p;
22 int i;
23
24 tester_init();
25
26 TRY { nums = nums_create(100); } ENDTRY;
27 TEST(nums_contains(nums, -1), 0);
28 TEST(nums_contains(nums, 0), 1);
29 TEST(nums_contains(nums, 1), 1);
30 TEST(nums_contains(nums, 98), 1);
31 TEST(nums_contains(nums, 99), 1);
32 TEST(nums_contains(nums, 100), 0);
33
34 sum = 0;
35 contents = nums_contents(nums);
36 for(p = contents; *p != -1; p++) {
37 sum += *p;
38 }
39 free(contents);
40 TEST(sum, 4950);
41
42 sum = 0;
43 for(i = nums_first(nums); !nums_done(nums, i); i = nums_next(nums, i)) {
44 sum += i;
45 }
46 TEST(sum, 4950);
47
48 sum = 0;
49 nums_foreach(nums, sum_helper, (void *) &sum);
50 TEST(sum, 4950);
51 tester_report(stdout, argv[0]);
52 return tester_result();
53 }
$ make test gcc -g3 -ansi -pedantic -Wall -c -o test-nums.o test-nums.c gcc -g3 -ansi -pedantic -Wall -c -o nums.o nums.c gcc -g3 -ansi -pedantic -Wall -c -o tester.o tester.c gcc -g3 -ansi -pedantic -Wall -o test-nums test-nums.o nums.o tester.o ./test-nums OK!