[FrontPage] [TitleIndex] [WordIndex

Note: You are looking at a static copy of the former PineWiki site, used for class notes by James Aspnes from 2003 to 2012. Many mathematical formulas are broken, and there are likely to be other bugs as well. These will most likely not be fixed. You may be able to find more up-to-date versions of some of these notes at http://www.cs.yale.edu/homes/aspnes/#classes.

Linked lists are about the simplest data structure beyond arrays. They aren't very efficient for many purposes, but have very good performance for certain specialized applications.

1. Stacks and linked lists

On my desk is a pile of books that I am supposed to put away on my bookshelf. I don't care much about how they are organized, I just want to be able to dump a book quickly so that I can later go through and put all of them back at once. Because it's easiest just to dump each book on the top of the pile, I have effectively built a data structure called a stack, which supports a push operation (add a book to the top of the pile) and a pop operation (take the top book off and return it).

To build a stack in a program, I can adopt a similar strategy. When a new item appears, I will box it up inside a struct that will also include a pointer to the item below it in the stack. Since that item will point to the item below it, and so on, I end up with an arbitrarily long chain of pointers (usually ending in a null pointer).

The cost of push and pop operations are both O(1): they don't depend on the number of elements in the stack, since they are only working on the top element or two. Doing almost anything else (e.g., finding the k-th element of the stack or searching for a particular value) is likely to be much more expensive, O(n) in the worst case. The reason is that unlike an array, a linked list scatters its contents throughout memory, and the only way to get at a particular element is to crawl through all the ones that precede it.

1.1. Implementation

A very concrete implementation of a stack using a linked list might look like this:

   1 #include <stdlib.h>
   2 #include <string.h>
   3 
   4 /* Functional-style stack */
   5 /* All operations consume the old value and return the new value of the stack */
   6 
   7 typedef struct stack {
   8     char *book;         /* malloc'd name of book */
   9     struct stack *next; /* next item in stack, or 0 if there aren't any more */
  10 } *Stack;
  11 
  12 #define EMPTY_STACK (0)
  13 
  14 /* create a new empty stack */
  15 Stack
  16 stackCreate(void)
  17 {
  18     return EMPTY_STACK;
  19 }
  20 
  21 /* push a new book on the stack */
  22 /* copies second argument book (so you don't need to keep it around) */
  23 /* returns 0 on allocation error */
  24 Stack
  25 stackPush(Stack old, const char *book)
  26 {
  27     Stack new;
  28 
  29     new = malloc(sizeof(*new));
  30     if(new == 0) return 0;
  31 
  32     new->next = old;
  33     new->book = strdup(book);
  34 
  35     if(new->book == 0) {
  36         free(new);
  37         return 0;
  38     }
  39 
  40     return new;
  41 }
  42 
  43 /* pops a book off the stack */
  44 /* returns new tail of stack, stores book in *book */
  45 /* *book is malloc'd data that the caller should free */
  46 /* Stores 0 in *book if stack is empty */
  47 Stack
  48 stackPop(Stack old, char **book /*RETVAL*/)
  49 {
  50     Stack new;
  51 
  52     if(old == 0) {
  53         *book = 0;
  54         return 0;
  55     } else {
  56         new = old->next;
  57         *book = old->book;
  58         free(old);
  59         return new;
  60     }
  61 }
  62 
  63 /* frees a stack */
  64 void
  65 stackDestroy(Stack s)
  66 {
  67     char *book;
  68 
  69     while(s) {
  70         s = stackPop(s, &book);
  71         free(book);
  72     }
  73 }
functional_stack.c

The meat of this implementation is in stackPush and stackPop. These act like a constructor/destructor pair for individual stack elements. The stackPush funciton also does the work of linking the new element into the stack and copying the book argument. The choice to copy book is a design decision: it insulates the contents of the stack from any changes that might happen to book after stackPush exits (perhaps book is a buffer that is reused to read a new line from the input after each call to stackPush), but it may be more than what the user needs, and it forces the user to free the returned value in stackPop. An alternative would be to let the user call strdup for themselves.

1.2. A more complicated implementation

The simple implementation of a stack above doesn't really act like a single object; instead, it acts more like a family of constant values you can do arithmetic on, where applying stackPush or stackPop consumes some old value and generates a new one. For many applications, it is more natural to just think of having a single stack that changes over time.

We can do this with another layer of indirection. Instead of having a Stack be a pointer to the first element, we'll have it be a pointer to a pointer to the first element. This requires a little bit more work inside the implementation but looks nicer from the outside.

   1 #include <stdlib.h>
   2 #include <string.h>
   3 
   4 /* Imperative-style stack */
   5 /* All operations modify the stack in place */
   6 
   7 /* a Stack is a pointer to a pointer to the first element of a linked list */
   8 typedef struct stack {
   9     char *book;         /* malloc'd name of book */
  10     struct stack *next; /* next item in stack, or 0 if there aren't any more */
  11 } **Stack;
  12 
  13 /* create a new empty stack */
  14 Stack
  15 stackCreate(void)
  16 {
  17     Stack s;
  18     
  19     s = malloc(sizeof(struct stack *));
  20     *s = 0;
  21 
  22     return s;
  23 }
  24 
  25 /* push a new book on the stack */
  26 /* copies second argument book (so you don't need to keep it around) */
  27 /* returns 0 on allocation error or 1 on success */
  28 int
  29 stackPush(Stack s, const char *book)
  30 {
  31     struct stack *new;
  32 
  33     new = malloc(sizeof(*new));
  34 
  35     new->next = *s;
  36     new->book = strdup(book);
  37 
  38     if(new->book == 0) {
  39         free(new);
  40         return 0;
  41     }
  42 
  43     *s = new;
  44     return 1;
  45 }
  46 
  47 /* pops a book off the stack */
  48 /* returns 0 if stack is empty */
  49 char *
  50 stackPop(Stack s)
  51 {
  52     struct stack *new;
  53     char *book;
  54 
  55     if(*s == 0) {
  56         return 0;
  57     } else {
  58         book = (*s)->book;
  59 
  60         /* we have to save (*s)->next before we free it */
  61         new = (*s)->next;
  62         free(*s);
  63         *s = new;
  64         return book;
  65     }
  66 }
  67 
  68 /* returns true if s is empty */
  69 int
  70 stackEmpty(Stack s)
  71 {
  72     return (*s) == 0;
  73 }
  74 
  75 /* frees a stack */
  76 void
  77 stackDestroy(Stack s)
  78 {
  79     while(!stackEmpty(s)) {
  80         free(stackPop(s));
  81     }
  82 
  83     free(s);
  84 }
object_stack.c

Here we have added a new stackEmpty routine because it's no longer obvious how to check (and we might someday want to change Stack to some other type where testing (*s) == 0 is no longer the right way to do it).

Note that the structure of the linked list is exactly the same in both implementations: the only difference is that the "imperative" version adds an extra level of indirection at the beginning. Yet another alternative is to have a dummy stack element at the beginning and do pushes and pops starting with the second place in the stack (i.e., replace (*s) with s->next in the second implementation). This consumes more space (we end up allocating space to hold s->book even though we don't put anything there), but can make things simpler for some more complicated linked-list structures.

1.3. Building a stack out of an array

When the elements of a stack are small, or when a maximum number of elements is known in advance, it often makes sense to build a stack from an array (with a variable storing the index of the top element) instead of a linked list. The reason is that pushes and pops only require updating the stack pointer instead of calling malloc or free to allocate space, and pre-allocating is almost always faster than allocating as needed. This is the strategy used to store the function call stack in almost all programs (the exception is in languages like Scheme, where the call stack is allocated on the heap because stack frames may outlive the function call that creates them).

2. Looping over a linked list

Looping over a linked list is not hard if you have access to the next pointers. (For a more abstract way to do this see C/Iterators.)

Let's imagine somebody gave us a pointer to the first struct stack in a list; call this pointer first. Then we can write a loop like this that prints the contents of the stack:

   1 void
   2 stackPrint(struct stack *first)
   3 {
   4     struct stack *elt;
   5 
   6     for(elt = first; elt != 0; elt = elt->next) {
   7         puts(elt->book);
   8     }
   9 }

There's not a whole lot to notice here except that for is perfectly happy to iterate over something that isn't a range of integers. The running time is linear in the length of the list (O(n)).

3. Looping over a linked list backwards

What if we want to loop over a linked list backwards? The next pointers all go the wrong way, so we have to save a trail of breadcrumbs to get back. The safest way to do this is to reverse the original list into an auxiliary list:

   1 void
   2 stackPrintReversed(struct stack *first)
   3 {
   4     struct stack *elt;
   5     Stack s2;                   /* uses imperative implementation */
   6 
   7     s2 = stackCreate();
   8 
   9     for(elt = first; elt != 0; elt = elt->next) {
  10         s2 = stackPush(s2, elt->book);
  11     }
  12 
  13     stackPrint(s2);
  14     stackDestroy(s2);
  15 }

Pushing all the elements from the first list onto s2 puts the first element on the bottom, so when we print s2 out, it's in the reverse order of the original stack.

We can also write a recursive function that prints the elements backwards. This function effectively uses the function call stack in place of the extra stack s2 above.

   1 void
   2 stackPrintReversedRecursive(struct stack *first)
   3 {
   4     if(first != 0) {
   5         /* print the rest of the stack */
   6         stackPrintReversedRecursive(first->next);
   7 
   8         /* then print the first element */
   9         puts(first->book);
  10     }
  11 }

The code in stackPrintReversedRecursive is shorter than the code in stackPrintReversed, and it is likely to be faster since it doesn't require allocating a second stack and copying all the elements. But it will only work for small stacks: because the function call stack is really a fixed-size array, if the input to stackPrintReversedRecursive is too big the recursion will go too deep cause a stack overflow.

If we want to do this sort of thing a lot, we should build a doubly-linked list, with a pointer in each element both to the next element and the previous element instead of a singly-linked list (see below for more).

4. Queues

Stacks are last-in-first-out (LIFO) data structures: when we pop, we get the last item we pushed. What if we want a first-in-first-out (FIFO) data structure? Such a data structure is called a queue and can also be implemented by a linked list. The difference is that if we want O(1) time for both the enqueue (push) and dequeue (pop) operations, we must keep around pointers to both ends of the linked list.

Here is a simple queue holding ints:

   1 #include <stdlib.h>
   2 
   3 struct elt {
   4     int item;
   5     struct elt *next;
   6 };
   7 
   8 struct queue {
   9     struct elt *head;   /* first element in queue, or 0 if empty */
  10     struct elt *tail;   /* last element in queue */
  11 };
  12 
  13 typedef struct queue *Queue;
  14 
  15 Queue
  16 queueCreate(void)
  17 {
  18     Queue q;
  19 
  20     q = malloc(sizeof(*q));
  21 
  22     if(q) {
  23         q->head = 0;
  24     }
  25 
  26     return q;
  27 }
  28 
  29 void
  30 enqueue(Queue q, int item)
  31 {
  32     struct elt *e;
  33 
  34     e = malloc(sizeof(*e));
  35     if(e == 0) abort();
  36 
  37     e->item = item;
  38     e->next = 0;
  39     
  40     if(q->head == 0) {
  41         /* special case for empty queue */
  42         q->head = q->tail = e;
  43     } else {
  44         /* patch e in after tail */
  45         q->tail->next = e;
  46         q->tail = e;
  47     }
  48 }
  49 
  50 int
  51 queueEmpty(Queue q)
  52 {
  53     return q->head == 0;
  54 }
  55 
  56 #define EMPTY_QUEUE (-1)
  57 
  58 /* returns EMPTY_QUEUE if queue is empty */
  59 int
  60 dequeue(Queue q)
  61 {
  62     struct elt *e;
  63     int retval;
  64 
  65     if(queueEmpty(q)) {
  66         return EMPTY_QUEUE;
  67     } else {
  68         /* pop first element off */
  69         e = q->head;
  70         q->head = e->next;
  71 
  72         /* save its contents and free it */
  73         retval = e->item;
  74         free(e);
  75 
  76         return retval;
  77     }
  78 }
queue.c

It is a bit trickier to build a queue out of an array than to build a stack. The difference is that while a stack pointer can move up and down, leaving the base of the stack in the same place, a naive implementation of a queue would have head and tail pointers both marching ever onward across the array leaving nothing but empty cells in their wake. While it is possible to have the pointers wrap around to the beginning of the array when they hit the end, if the queue size is unbounded the tail pointer will eventually catch up to the head pointer. At this point (as in a stack that overflows), it is necessary to allocate more space and copy the old elements over.

5. Deques and doubly-linked lists

Suppose we want a data structure that represents a line of elements where we can push or pop elements at either end. Such a data structure is known as a deque (pronounced like "deck"), and can be implemented with all operations taking O(1) time by a doubly-linked list, where each element has a pointer to both its successor and its predecessor.

An ordinary singly-linked list is not good enough. The reason is that even if we keep a pointer to both ends as in a queue, when it comes time to pop an element off the tail, we have no pointer to its predecessor ready to hand; the best we can do is scan from the head until we get to an element whose successor is the tail, which takes O(n) time.

The solution is to build something like the code below. To avoid special cases, we use a lot of 2-element arrays of pointers, so that pushing or popping from either end of the queue is completely symmetric.

   1 typedef struct deque *Deque;
   2 
   3 #define FRONT (0)
   4 #define BACK (1)
   5 
   6 #define NUM_DIRECTIONS (2)
   7 
   8 #define EMPTY_QUEUE (-1)
   9 
  10 Deque dequeCreate(void);
  11 
  12 void dequePush(Deque d, int direction, int value);
  13 
  14 int dequePop(Deque d, int direction);
  15 
  16 int dequeEmpty(Deque d);
  17 
  18 void dequeDestroy(Deque d);
deque.h

   1 #include <stdlib.h>
   2 #include <assert.h>
   3 
   4 #include "deque.h"
   5 
   6 struct elt {
   7     int value;
   8     struct elt *next[NUM_DIRECTIONS];
   9 };
  10 
  11 struct deque {
  12     struct elt *head[NUM_DIRECTIONS];
  13 };
  14 
  15 Deque
  16 dequeCreate(void)
  17 {
  18     Deque d;
  19 
  20     d = malloc(sizeof(*d));
  21     if(d) {
  22         d->head[FRONT] = d->head[BACK] = 0;
  23     } 
  24 
  25     return d;
  26 }
  27 
  28 void
  29 dequePush(Deque d, int direction, int value)
  30 {
  31     struct elt *e;
  32 
  33     assert(direction == FRONT || direction == BACK);
  34 
  35     e = malloc(sizeof(*e));
  36     if(e == 0) abort();  /* kill process on malloc failure */
  37     
  38     e->next[direction] = d->head[direction];
  39     e->next[!direction] = 0;
  40     e->value = value;
  41 
  42     if(d->head[direction]) {
  43         d->head[direction]->next[!direction] = e;
  44     }
  45 
  46     d->head[direction] = e;
  47 
  48     /* special case for empty deque */
  49     /* e becomes the head on both sides */
  50     if(d->head[!direction] == 0) {
  51         d->head[!direction] = e;
  52     }
  53 }
  54 
  55 int
  56 dequePop(Deque d, int direction)
  57 {
  58     struct elt *e;
  59     int retval;
  60 
  61     assert(direction == FRONT || direction == BACK);
  62 
  63     e = d->head[direction];
  64 
  65     if(e == 0) return EMPTY_QUEUE;
  66 
  67     /* else remove it */
  68     d->head[direction] = e->next[direction];
  69 
  70     if(d->head[direction] != 0) {
  71         d->head[direction]->next[!direction] = 0;
  72     } else { 
  73         /* special case when e is the last element */
  74         /* clear other head field */
  75         d->head[!direction] = 0;
  76     }
  77 
  78     retval = e->value;
  79 
  80     free(e);
  81 
  82     return retval;
  83 }
  84 
  85 int
  86 dequeEmpty(Deque d)
  87 {
  88     return d->head[FRONT] == 0;
  89 }
  90 
  91 void
  92 dequeDestroy(Deque d)
  93 {
  94     while(!dequeEmpty(d)) {
  95         dequePop(d, FRONT);
  96     }
  97 
  98     free(d);
  99 }
deque.c

And here is some test code: test_deque.c, Makefile.

6. Circular linked lists

For some applications, there is no obvious starting or ending point to a list, and a circular list (where the last element points back to the first) may be appropriate. Circular doubly-linked lists can also be used to build deques; a single pointer into the list tracks the head of the deque, with some convention adopted for whether the head is an actual element of the list (at the front, say, with its left neighbor at the back) or a dummy element that is not considered to be part of the list.

The selling point of circular doubly-linked lists as a concrete data structure is that insertions and deletions can be done anywhere in the list with only local information. For example, here are some routines for manipulating a doubly-linked list directly. We'll make our lives easy and assume (for the moment) that the list has no actual contents to keep track of.

   1 #include <stdlib.h>
   2 
   3 /* directions for doubly-linked list next pointers */
   4 #define RIGHT (0)
   5 #define LEFT (1)
   6 
   7 struct elt {
   8     struct elt *next[2];
   9 };
  10 
  11 typedef struct elt *Elt;
  12 
  13 /* create a new circular doubly-linked list with 1 element */
  14 /* returns 0 on allocation error */
  15 Elt
  16 listCreate(void)
  17 {
  18     Elt e;
  19 
  20     e = malloc(sizeof(*e));
  21     if(e) {
  22         e->next[LEFT] = e->next[RIGHT] = e;
  23     }
  24 
  25     return e;
  26 }
  27 
  28 /* remove an element from a list */
  29 /* Make sure you keep a pointer to some other element! */
  30 /* does not free the removed element */
  31 void
  32 listRemove(Elt e)
  33 {
  34     /* splice e out */
  35     e->next[RIGHT]->next[LEFT] = e->next[LEFT];
  36     e->next[LEFT]->next[RIGHT] = e->next[RIGHT];
  37 }
  38     
  39 /* insert an element e into list in direction dir from head */
  40 void
  41 listInsert(Elt head, int dir, Elt e)
  42 {
  43     /* fill in e's new neighbors */
  44     e->next[dir] = head->next[dir];
  45     e->next[!dir] = head;
  46 
  47     /* make neigbhors point back at e */
  48     e->next[dir]->next[!dir] = e;
  49     e->next[!dir]->next[dir] = e;
  50 }
  51 
  52 /* split a list, removing all elements between e1 and e2 */
  53 /* e1 is the leftmost node of the removed subsequence, e2 rightmost */
  54 /* the removed elements are formed into their own linked list */
  55 /* comment: listRemove could be implemented as listSplit(e,e) */
  56 void
  57 listSplit(Elt e1, Elt e2)
  58 {
  59     /* splice out the new list */
  60     e2->next[RIGHT]->next[LEFT] = e1->next[LEFT];
  61     e1->next[LEFT]->next[RIGHT] = e2->next[RIGHT];
  62 
  63     /* fix up the ends */
  64     e2->next[RIGHT] = e1;
  65     e1->next[LEFT] = e2;
  66 }
  67 
  68 /* splice a list starting at e2 after e1 */
  69 /* e2 becomes e1's right neighbor */
  70 /* e2's left neighbor becomes left neighbor of e1's old right neighbor */
  71 void
  72 listSplice(Elt e1, Elt e2)
  73 {
  74     /* fix up tail end */
  75     e2->next[LEFT]->next[RIGHT] = e1->next[RIGHT];
  76     e1->next[RIGHT]->next[LEFT] = e2->next[LEFT];
  77 
  78     /* fix up e1 and e2 */
  79     e1->next[RIGHT] = e2;
  80     e2->next[LEFT] = e1;
  81 }
  82 
  83 /* free all elements of the list containing e */
  84 void
  85 listDestroy(Elt e)
  86 {
  87     Elt target;
  88     Elt next;
  89 
  90     /* we'll free elements until we get back to e, then free e */
  91     /* note use of pointer address comparison to detect end of loop */
  92     for(target = e->next[RIGHT]; target != e; target = next) {
  93         next = target->next[RIGHT];
  94         free(target);
  95     }
  96 
  97     free(e);
  98 }
circular.c

The above code might or might not actually work. What if it doesn't? It may make sense to include some sanity-checking code that we can run to see if our pointers are all going to the right place:

   1 /* assert many things about correctness of the list */
   2 /* Amazingly, this is guaranteed to abort or return no matter
   3    how badly screwed up the list is. */
   4 void
   5 listSanityCheck(Elt e)
   6 {
   7     Elt check;
   8 
   9     assert(e != 0);
  10 
  11     check = e;
  12 
  13     do {
  14 
  15         /* are our pointers consistent with our neighbors? */
  16         assert(check->next[RIGHT]->next[LEFT] == check);
  17         assert(check->next[LEFT]->next[RIGHT] == check);
  18 
  19         /* on to the next */
  20         check = check->next[RIGHT];
  21 
  22     } while(check != e);
  23 }

What if we want to store something in this list? The simplest approach is to extend the definition of struct elt:

   1 struct elt {
   2     struct elt *next[2];
   3     char *name;
   4     int socialSecurityNumber;
   5     int gullibility;
   6 };

But then we can only use the code for one particular type of data. An alternative approach is to define a new Elt-plus struct:

   1 struct fancyElt {
   2     struct elt *next[2];
   3     char *name;
   4     int socialSecurityNumber;
   5     int gullibility;
   6 };

and then use pointer casts to pretend the fancy structs into Elts:

   1     struct fancyElt *e;
   2 
   3     e = malloc(sizeof(*e));
   4 
   5     /* fill in fields on e */
   6 
   7     listInsert(someList, (Elt) e);

The trick here is that as long as the initial part of the struct fancyElt looks like a struct elt, any code that expects a struct elt will happily work with it and ignore the fields that happen to be sitting later in memory. (This trick is how C++ inheritance works.)

The downside is that if something needs to be done with the other fields (e.g freeing e->name if e is freed), then the Elt functions won't know to do this. So if you use this trick you should be careful.

A similar technique using void * pointers is described in C/GenericContainers.

7. What linked lists are and are not good for

Linked lists are good for any task that involves inserting or deleting elements next to an element you already have a pointer to; such operations can usually be done in O(1) time. They generally beat arrays (even resizeable arrays) if you need to insert or delete in the middle of a list, since an array has to copy any elements above the insertion point to make room; if inserts or deletes always happen at the end, an array may be better.

Linked lists are not good for any operation that requires random access, since reaching an arbitrary element of a linked list takes as much as O(n) time. For such applications, arrays are better if you don't need to insert in the middle; if you do, you should use some sort of tree (see BinaryTrees).

8. Further reading

KernighanPike gives an example of a linked list in Section 2.7. A description of many different kinds of linked lists with pictures can be found at Linked_list.


CategoryProgrammingNotes


2014-06-17 11:58