Note: You are looking at a static copy of the former PineWiki site, used for class notes by James Aspnes from 2003 to 2012. Many mathematical formulas are broken, and there are likely to be other bugs as well. These will most likely not be fixed. You may be able to find more up-to-date versions of some of these notes at http://www.cs.yale.edu/homes/aspnes/#classes.

These are notes on implementing graphs and graph algorithms in C. For a general overview of graphs, see GraphTheory. For pointers to specific algorithms on graphs, see GraphAlgorithms.

1. Graphs

A graph consists of a set of nodes or vertices together with a set of edges or arcs where each edge joins two vertices. Unless otherwise specified, a graph is undirected: each edge is an unordered pair {u,v} of vertices, and we don't regard either of the two vertices as having a distinct role from the other. However, it is more common in computing to consider directed graphs or digraphs in which edges are ordered pairs (u,v); here the vertex u is the source of the edge and vertex v is the sink or target of the edge. Directed edges are usually drawn as arrows and undirected edges as curves or line segments; see GraphTheory for examples. It is always possible to represent an undirected graph as a directed graph where each undirected edge {u,v} becomes two oppositely directed edges (u,v) and (v,u).

Given an edge (u,v), the vertices u and v are said to be incident to the edge and adjacent to each other. The number of vertices adjacent to a given vertex u is the degree of u; this can be divided into the out-degree (number of vertices v such that (u,v) is an edge) and the in-degree (number of vertices v such that (v,u) is an edge). A vertex v adjacent to u is called a neighbor of u, and (in a directed graph) is a predecessor of u if (v,u) is an edge and a successor of u if (u,v) is an edge. We will allow a node to be its own predecessor and successor.

2. Why graphs are useful

Graphs can be used to model any situation where we have things that are related to each other in pairs; for example, all of the following can be represented by graphs:

Family trees: Nodes are members, with an edge from each parent to each of their children.
Transportation networks: Nodes are airports, intersections, ports, etc. Edges are airline flights, one-way roads, shipping routes, etc.
Assignments: Suppose we are assigning classes to classrooms. Let each node be either a class or a classroom, and put an edge from a class to a classroom if the class is assigned to that room. This is an example of a bipartite graph, where the nodes can be divided into two sets S and T and all edges go from S to T.

3. Operations on graphs

What would we like to do to graphs? Generally, we first have to build a graph by starting with a set of nodes and adding in any edges we need, and then we want to extract information from it, such as "Is this graph connected?", "What is the shortest path in this graph from s to t?", or "How many edges can I remove from this graph before some nodes become unreachable from other nodes?" There are standard algorithms for answering all of these questions; the information these algorithms need is typically (a) given a vertex u, what successors does it have; and sometimes (b) given vertices u and v, does the edge (u,v) exist in the graph?

4. Representations of graphs

A good graph representation will allow us to answer one or both of these questions quickly. There are generally two standard representations of graphs that are used in graph algorithms, depending on which question is more important.

For both representations, we simplify the representation task by insisting that vertices be labeled 0, 1, 2, ..., n-1, where n is the number of vertices in the graph. If we have a graph with different vertex labels (say, airport codes), we can enforce an integer labeling by a preprocessing step where we assign integer labels, and then translate the integer labels back into more useful user labels afterwards. The preprocessing step can usually be done in O(n) time, which is likely to be smaller than the cost of whatever algorithm we are running on our graph, and the savings in code complexity and running time from working with just integer labels will pay this cost back many times over.

4.1. Adjacency matrices

An adjacency matrix is just a matrix a where a[i][j] is 1 if (i,j) is an edge in the graph and 0 otherwise. It's easy to build an adjacency matrix, and adding or testing for the existence of an edges takes O(1) time. The downsides of adjacency matrices are that enumerating the outgoing edges from a vertex takes O(n) time even if there aren't very many, and the O(n²) space cost is high for "sparse graphs," those with much fewer than n² edges.

4.2. Adjacency lists

An adjacency list representation of a graph creates a list of successors for each node u. These lists may be represented as linked lists (the typical assumption in algorithms textbooks), or in languages like C may be represented by variable-length arrays. The cost for adding an edge is still O(1), but testing for the existence of an edge (u,v) rises to O(d⁺(u)), where d⁺(u) is the out-degree of u (i.e., the length of the list of u's successors). The cost of enumerating the successors of u is also O(d⁺(u)), which is clearly the best possible since it takes that long just to write them all down. Finding predecessors of a node u is extremely expensive, requiring looking through every list of every node in time O(n+m), where m is the total number of edges.

Adjacency lists are thus most useful when we mostly want to enumerate outgoing edges of each node. This is common in search tasks, where we want to find a path from one node to another or compute the distances between pairs of nodes. If other operations are important, we can optimize them by augmenting the adjacency list representation; for example, using sorted arrays for the adjacency lists reduces the cost of edge existence testing to O(log(d⁺(u))), and adding a second copy of the graph with reversed edges lets us find all predecessors of u in O(d^-(u)) time, where d^-(u) is u's in-degree.

Adjacency lists also require much less space than adjacency matrices for sparse graphs: O(n+m) vs O(n²) for adjacency matrices. For this reason adjacency lists are more commonly used than adjacency matrices.

4.2.1. An implementation

Here is an implementation of a basic graph type using adjacency lists.

   1 /* basic directed graph type */
   2 
   3 typedef struct graph *Graph;
   4 
   5 /* create a new graph with n vertices labeled 0..n-1 and no edges */
   6 Graph graph_create(int n);
   7 
   8 /* free all space used by graph */
   9 void graph_destroy(Graph);
  10 
  11 /* add an edge to an existing graph */
  12 /* doing this more than once may have unpredictable results */
  13 void graph_add_edge(Graph, int source, int sink);
  14 
  15 /* return the number of vertices/edges in the graph */
  16 int graph_vertex_count(Graph);
  17 int graph_edge_count(Graph);
  18 
  19 /* return the out-degree of a vertex */
  20 int graph_out_degree(Graph, int source);
  21 
  22 /* return 1 if edge (source, sink) exists), 0 otherwise */
  23 int graph_has_edge(Graph, int source, int sink);
  24 
  25 /* invoke f on all edges (u,v) with source u */
  26 /* supplying data as final parameter to f */
  27 /* no particular order is guaranteed */
  28 void graph_foreach(Graph g, int source,
  29         void (*f)(Graph g, int source, int sink, void *data),
  30         void *data);

graph.h

   1 #include <stdlib.h>
   2 #include <assert.h>
   3 
   4 #include "graph.h"
   5 
   6 /* basic directed graph type */
   7 /* the implementation uses adjacency lists
   8  * represented as variable-length arrays */
   9 
  10 /* these arrays may or may not be sorted: if one gets long enough
  11  * and you call graph_has_edge on its source, it will be */
  12 
  13 struct graph {
  14     int n;              /* number of vertices */
  15     int m;              /* number of edges */
  16     struct successors {
  17         int d;          /* number of successors */
  18         int len;        /* number of slots in array */
  19         char is_sorted; /* true if list is already sorted */
  20         int list[1];    /* actual list of successors */
  21     } *alist[1];
  22 };
  23 
  24 /* create a new graph with n vertices labeled 0..n-1 and no edges */
  25 Graph
  26 graph_create(int n)
  27 {
  28     Graph g;
  29     int i;
  30 
  31     g = malloc(sizeof(struct graph) + sizeof(struct successors *) * (n-1));
  32     assert(g);
  33 
  34     g->n = n;
  35     g->m = 0;
  36 
  37     for(i = 0; i < n; i++) {
  38         g->alist[i] = malloc(sizeof(struct successors));
  39         assert(g->alist[i]);
  40 
  41         g->alist[i]->d = 0;
  42         g->alist[i]->len = 1;
  43         g->alist[i]->is_sorted= 1;
  44     }
  45     
  46     return g;
  47 }
  48 
  49 /* free all space used by graph */
  50 void
  51 graph_destroy(Graph g)
  52 {
  53     int i;
  54 
  55     for(i = 0; i < g->n; i++) free(g->alist[i]);
  56     free(g);
  57 }
  58 
  59 /* add an edge to an existing graph */
  60 void
  61 graph_add_edge(Graph g, int u, int v)
  62 {
  63     assert(u >= 0);
  64     assert(u < g->n);
  65     assert(v >= 0);
  66     assert(v < g->n);
  67 
  68     /* do we need to grow the list? */
  69     while(g->alist[u]->d >= g->alist[u]->len) {
  70         g->alist[u]->len *= 2;
  71         g->alist[u] =
  72             realloc(g->alist[u], 
  73                 sizeof(struct successors) + sizeof(int) * (g->alist[u]->len - 1));
  74     }
  75 
  76     /* now add the new sink */
  77     g->alist[u]->list[g->alist[u]->d++] = v;
  78     g->alist[u]->is_sorted = 0;
  79 
  80     /* bump edge count */
  81     g->m++;
  82 }
  83 
  84 /* return the number of vertices in the graph */
  85 int
  86 graph_vertex_count(Graph g)
  87 {
  88     return g->n;
  89 }
  90 
  91 /* return the number of vertices in the graph */
  92 int
  93 graph_edge_count(Graph g)
  94 {
  95     return g->m;
  96 }
  97 
  98 /* return the out-degree of a vertex */
  99 int
 100 graph_out_degree(Graph g, int source)
 101 {
 102     assert(source >= 0);
 103     assert(source < g->n);
 104 
 105     return g->alist[source]->d;
 106 }
 107 
 108 /* when we are willing to call bsearch */
 109 #define BSEARCH_THRESHOLD (10)
 110 
 111 static int
 112 intcmp(const void *a, const void *b)
 113 {
 114     return *((const int *) a) - *((const int *) b);
 115 }
 116 
 117 /* return 1 if edge (source, sink) exists), 0 otherwise */
 118 int
 119 graph_has_edge(Graph g, int source, int sink)
 120 {
 121     int i;
 122 
 123     assert(source >= 0);
 124     assert(source < g->n);
 125     assert(sink >= 0);
 126     assert(sink < g->n);
 127 
 128     if(graph_out_degree(g, source) >= BSEARCH_THRESHOLD) {
 129         /* make sure it is sorted */
 130         if(! g->alist[source]->is_sorted) {
 131             qsort(g->alist[source]->list,
 132                     g->alist[source]->d,
 133                     sizeof(int),
 134                     intcmp);
 135         }
 136         
 137         /* call bsearch to do binary search for us */
 138         return 
 139             bsearch(&sink,
 140                     g->alist[source]->list,
 141                     g->alist[source]->d,
 142                     sizeof(int),
 143                     intcmp)
 144             != 0;
 145     } else {
 146         /* just do a simple linear search */
 147         /* we could call lfind for this, but why bother? */
 148         for(i = 0; i < g->alist[source]->d; i++) {
 149             if(g->alist[source]->list[i] == sink) return 1;
 150         }
 151         /* else */
 152         return 0;
 153     }
 154 }
 155 
 156 /* invoke f on all edges (u,v) with source u */
 157 /* supplying data as final parameter to f */
 158 void
 159 graph_foreach(Graph g, int source,
 160     void (*f)(Graph g, int source, int sink, void *data),
 161     void *data)
 162 {
 163     int i;
 164 
 165     assert(source >= 0);
 166     assert(source < g->n);
 167 
 168     for(i = 0; i < g->alist[source]->d; i++) {
 169         f(g, source, g->alist[source]->list[i], data);
 170     }
 171 }

graph.c

And here is some test code: test_graph.c.

4.3. Implicit representations

For some graphs, it may not make sense to represent them explicitly. An example might be the word-search graph from CS223/2005/Assignments/HW10, which consists of all words in a dictionary with an edge between any two words that differ only by one letter. In such a case, rather than building an explicit data structure containing all the edges, we might generate edges as needed when computing the neighbors of a particular vertex. This gives us an implicit or procedural representation of a graph.

Implicit representations require the ability to return a vector or list of values from a the neighborhood-computing function; some ways of doing this are described in C/Iterators.

5. Searching for paths in a graph

A path is a sequence of vertices v₁, v₂, ... v_k where each pair (v_i, v_i+1) is an edge. Often we want to find a path from a source vertex s to a target vertex t, or more generally to detect which vertices are reachable from a given source vertex s. We can solve these problems by using any of several standard graph search algorithms, of which the simplest and most commonly used are DepthFirstSearch and BreadthFirstSearch.

Both of these search algorithms are a special case of a more general algorithm for growing a directed tree in a graph rooted at a given node s. Here we are using tree as a graph theorist would, to mean a set of k nodes joined by k-1 edges; this is similar to trees used in data structures except that there are no limits on the number of children a node can have and no ordering constraints within the tree.

The general tree-growing algorithm might be described as follows:

Start with a tree consisting of just s.
If there is at least one edge that leaves the tree (i.e. goes from a node in the current tree to a node outside the current tree), pick the "best" such edge and add it and its sink to the tree.
Repeat step 2 until no edges leave the tree.

Practically, steps 2 and 3 are implemented by having some sort of data structure that acts as a bucket for unprocessed edges. When a new node is added to the tree, all of its outgoing edges are thrown into the bucket. The "best" outgoing edge is obtained by applying some sort of pop, dequeue, or delete-min operation to the bucket, depending on which it provides; if this edge turns out to be an internal edge of the tree (maybe we added its sink after putting it in the bucket), we throw it away. Otherwise we mark the edge and its sink as belonging to the tree and repeat.

The output of the generic tree-growing algorithm typically consists of (a) marks on all the nodes that are reachable from s, and (b) for each such node v, a parent pointer back to the source of the edge that brought v into the tree. Often these two values can be combined by using a null parent pointer to represent the absence of a mark (this usually requires making the root point to itself so that we know it's in the tree). Other values that may be useful are a table showing the order in which nodes were added to the tree. For even more possibilities see DepthFirstSearch.

What kind of tree we get depends on what we use for the bucket---specifically, on what edge is returned when we ask for the "best" edge. Two easy cases are:

The bucket is a stack. When we ask for an outgoing edge, we get the last edge inserted. This has the effect of running along as far as possible through the graph before backtracking, since we always keep going from the last node if possible. The resulting algorithm is called DepthFirstSearch and yields a DepthFirstSearch tree. If we don't care about the lengths of the paths we consider, DepthFirstSearch is a perfectly good algorithm for testing connectivity, and has several other useful properties (described on the algorithm's own page).
The bucket is a queue. Now when we ask for an outgoing edge, we get the first edge inserted. This favors edges that are close to the root: we don't start consider edges from nodes adjacent to the root until we have already added all the root's successors to the tree, and similarly we don't start considering edges at distance k until we have already added all the closer nodes to the tree. This gives BreadthFirstSearch, which constructs a shortest-path tree in which every path from the root to a node in the tree has the minimum length.

Structurally, these algorithms are almost completely identical; indeed, if we organize the stack/queue so that it can pop from both ends, we can switch between DepthFirstSearch and BreadthFirstSearch just by choosing one operation or another. This is what is done in the implementation below. Since it's ugly to have a flag parameter to a function that radically changes its behavior, the combined search function is wrapped inside two separate functions dfs and bfs that are exported to the outside of the module.

The running time of either algorithm is very fast: we pay O(1) per vertex in setup costs and O(1) per edge during the search (assuming the input is in adjacency-list form), giving a linear O(n+m) total cost. Often it is more expensive to set up the graph in the first place than to run a search on it.

5.1. Depth-first and breadth-first search

   1 /* Typical usage:
   2  *
   3  *    struct search_info *s;
   4  *    int n;
   5  *
   6  *    s = search_info_create(g);
   7  *
   8  *    n = graph_vertices(g);
   9  *    for(i = 0; i < n; i++) {
  10  *        dfs(s, i);
  11  *    }
  12  *
  13  *    ... use results in s ...
  14  *
  15  *    search_info_destroy(s);
  16  *
  17  */
  18 
  19 /* summary information per node for dfs and bfs */
  20 /* this is not intended to be opaque---user can read it */
  21 /* (but should not write it!) */
  22 
  23 #define SEARCH_INFO_NULL (-1) /* for empty slots */
  24 
  25 struct search_info {
  26     Graph graph;
  27     int reached;        /* count of reached nodes */
  28     int *preorder;      /* list of nodes in order first reached */
  29     int *time;          /* time[i] == position of node i in preorder list */
  30     int *parent;        /* parent in DFS or BFS forest */
  31     int *depth;         /* distance from root */
  32 };
  33 
  34 /* allocate and initialize search results structure */
  35 /* you need to do this before passing it to dfs or bfs */
  36 struct search_info *search_info_create(Graph g);
  37 
  38 /* free search_info data---does NOT free graph pointer */
  39 void search_info_destroy(struct search_info *);
  40 
  41 /* perform depth-first search starting at root, updating results */
  42 void dfs(struct search_info *results, int root);
  43 
  44 /* perform breadth-first search starting at root, updating results */
  45 void bfs(struct search_info *results, int root);

search.h

   1 #include <stdlib.h>
   2 #include <assert.h>
   3 
   4 #include "graph.h"
   5 #include "search.h"
   6 
   7 /* create an array of n ints initialized to SEARCH_INFO_NULL */
   8 static int *
   9 create_empty_array(int n)
  10 {
  11     int *a;
  12     int i;
  13 
  14     a = malloc(sizeof(*a) * n);
  15     assert(a);
  16 
  17     for(i = 0; i < n; i++) {
  18         a[i] = SEARCH_INFO_NULL;
  19     }
  20 
  21     return a;
  22 }
  23 
  24 /* allocate and initialize search results structure */
  25 /* you need to do this before passing it to dfs or bfs */
  26 struct search_info *
  27 search_info_create(Graph g)
  28 {
  29     struct search_info *s;
  30     int n;
  31 
  32     s = malloc(sizeof(*s));
  33     assert(s);
  34 
  35     s->graph = g;
  36     s->reached = 0;
  37 
  38     n = graph_vertex_count(g);
  39 
  40     s->preorder = create_empty_array(n);
  41     s->time = create_empty_array(n);
  42     s->parent = create_empty_array(n);
  43     s->depth = create_empty_array(n);
  44 
  45     return s;
  46 } 
  47 
  48 /* free search_info data---does NOT free graph pointer */
  49 void
  50 search_info_destroy(struct search_info *s)
  51 {
  52     free(s->depth);
  53     free(s->parent);
  54     free(s->time);
  55     free(s->preorder);
  56     free(s);
  57 }
  58 
  59 /* used inside search routines */
  60 struct edge {
  61     int u;          /* source */
  62     int v;          /* sink */
  63 };
  64 
  65 /* stack/queue */
  66 struct queue {
  67     struct edge *e;
  68     int bottom;
  69     int top;
  70 };
  71 
  72 static void
  73 push_edge(Graph g, int u, int v, void *data)
  74 {
  75     struct queue *q;
  76 
  77     q = data;
  78 
  79     assert(q->top < graph_edge_count(g) + 1);
  80 
  81     q->e[q->top].u = u;
  82     q->e[q->top].v = v;
  83     q->top++;
  84 }
  85 
  86 /* this rather horrible function implements dfs if use_queue == 0 */
  87 /* and bfs if use_queue == 1 */
  88 static void
  89 generic_search(struct search_info *r, int root, int use_queue)
  90 {
  91     /* queue/stack */
  92     struct queue q;
  93 
  94     /* edge we are working on */
  95     struct edge cur;
  96 
  97     /* start with empty q */
  98     /* we need one space per edge */
  99     /* plus one for the fake (root, root) edge */
 100     q.e = malloc(sizeof(*q.e) * (graph_edge_count(r->graph) + 1));
 101     assert(q.e);
 102 
 103     q.bottom = q.top = 0;
 104 
 105     /* push the root */
 106     push_edge(r->graph, root, root, &q);
 107 
 108     /* while q.e not empty */
 109     while(q.bottom < q.top) {
 110         if(use_queue) {
 111             cur = q.e[q.bottom++];
 112         } else {
 113             cur = q.e[--q.top];
 114         }
 115 
 116         /* did we visit sink already? */
 117         if(r->parent[cur.v] != SEARCH_INFO_NULL) continue;
 118 
 119         /* no */
 120         assert(r->reached < graph_vertex_count(r->graph));
 121         r->parent[cur.v] = cur.u;
 122         r->time[cur.v] = r->reached;
 123         r->preorder[r->reached++] = cur.v;
 124         if(cur.u == cur.v) {
 125             /* we could avoid this if we were certain SEARCH_INFO_NULL */
 126             /* would never be anything but -1 */
 127             r->depth[cur.v] = 0;
 128         } else {
 129             r->depth[cur.v] = r->depth[cur.u] + 1;
 130         }
 131 
 132         /* push all outgoing edges */
 133         graph_foreach(r->graph, cur.v, push_edge, &q);
 134     }
 135 
 136     free(q.e);
 137 }
 138 
 139 void
 140 dfs(struct search_info *results, int root)
 141 {
 142     generic_search(results, root, 0);
 143 }
 144 
 145 void
 146 bfs(struct search_info *results, int root)
 147 {
 148     generic_search(results, root, 1);
 149 }

search.c

And here is some test code: test_search.c. You will need to compile test_search.c together with both search.c and graph.c to get it to work.

5.2. Other variations on the basic algorithm

Stacks and queues are not the only options for the bucket in the generic search algorithm. Some other choices are:

A priority queue keyed by edge weights. If the edges have weights, the generic tree-builder can be used to find a tree containing s with minimum total edge weight.¹ The basic idea is to always pull out the lightest edge. The resulting algorithm runs in O(n + m log m) time (since each heap operation takes O(log m) time), and is known as Prim's algorithm. See Prim's algorithm for more details.
A priority queue keyed by path lengths. Here we assume that edges have lengths, and we want to build a shortest-path tree where the length of the path is no longer just the number of edges it contains but the sum of their weights. The basic idea is to keep track of the distance from the root to each node in the tree, and assign each edge a key equal to the sum of the distance to its source and its length. The resulting search algorithm, known as Dijkstra's algorithm, will give a shortest-path tree if all the edge weights are non-negative. See ShortestPath or Dijkstra's algorithm.

CategoryProgrammingNotes

This only works if the graph is undirected, i.e. if for every edge uv there is a matching edge vu with the same weight. (1)