These are notes on implementing graphs and graph algorithms in C. For a general overview of graphs, see GraphTheory. For pointers to specific algorithms on graphs, see GraphAlgorithms.
1. Graphs
A graph consists of a set of nodes or vertices together with a set of edges or arcs where each edge joins two vertices. Unless otherwise specified, a graph is undirected: each edge is an unordered pair {u,v} of vertices, and we don't regard either of the two vertices as having a distinct role from the other. However, it is more common in computing to consider directed graphs or digraphs in which edges are ordered pairs (u,v); here the vertex u is the source of the edge and vertex v is the sink or target of the edge. Directed edges are usually drawn as arrows and undirected edges as curves or line segments; see GraphTheory for examples. It is always possible to represent an undirected graph as a directed graph where each undirected edge {u,v} becomes two oppositely directed edges (u,v) and (v,u).
Given an edge (u,v), the vertices u and v are said to be incident to the edge and adjacent to each other. The number of vertices adjacent to a given vertex u is the degree of u; this can be divided into the out-degree (number of vertices v such that (u,v) is an edge) and the in-degree (number of vertices v such that (v,u) is an edge). A vertex v adjacent to u is called a neighbor of u, and (in a directed graph) is a predecessor of u if (v,u) is an edge and a successor of u if (u,v) is an edge. We will allow a node to be its own predecessor and successor.
2. Why graphs are useful
Graphs can be used to model any situation where we have things that are related to each other in pairs; for example, all of the following can be represented by graphs:
- Family trees
- Nodes are members, with an edge from each parent to each of their children.
- Transportation networks
- Nodes are airports, intersections, ports, etc. Edges are airline flights, one-way roads, shipping routes, etc.
- Assignments
Suppose we are assigning classes to classrooms. Let each node be either a class or a classroom, and put an edge from a class to a classroom if the class is assigned to that room. This is an example of a bipartite graph, where the nodes can be divided into two sets S and T and all edges go from S to T.
3. Operations on graphs
What would we like to do to graphs? Generally, we first have to build a graph by starting with a set of nodes and adding in any edges we need, and then we want to extract information from it, such as "Is this graph connected?", "What is the shortest path in this graph from s to t?", or "How many edges can I remove from this graph before some nodes become unreachable from other nodes?" There are standard algorithms for answering all of these questions; the information these algorithms need is typically (a) given a vertex u, what successors does it have; and sometimes (b) given vertices u and v, does the edge (u,v) exist in the graph?
4. Representations of graphs
A good graph representation will allow us to answer one or both of these questions quickly. There are generally two standard representations of graphs that are used in graph algorithms, depending on which question is more important.
For both representations, we simplify the representation task by insisting that vertices be labeled 0, 1, 2, ..., n-1, where n is the number of vertices in the graph. If we have a graph with different vertex labels (say, airport codes), we can enforce an integer labeling by a preprocessing step where we assign integer labels, and then translate the integer labels back into more useful user labels afterwards. The preprocessing step can usually be done in O(n) time, which is likely to be smaller than the cost of whatever algorithm we are running on our graph, and the savings in code complexity and running time from working with just integer labels will pay this cost back many times over.
4.1. Adjacency matrices
An adjacency matrix is just a matrix a where a[i][j] is 1 if (i,j) is an edge in the graph and 0 otherwise. It's easy to build an adjacency matrix, and adding or testing for the existence of an edges takes O(1) time. The downsides of adjacency matrices are that enumerating the outgoing edges from a vertex takes O(n) time even if there aren't very many, and the O(n2) space cost is high for "sparse graphs," those with much fewer than n2 edges.
4.2. Adjacency lists
An adjacency list representation of a graph creates a list of successors for each node u. These lists may be represented as linked lists (the typical assumption in algorithms textbooks), or in languages like C may be represented by variable-length arrays. The cost for adding an edge is still O(1), but testing for the existence of an edge (u,v) rises to O(d+(u)), where d+(u) is the out-degree of u (i.e., the length of the list of u's successors). The cost of enumerating the successors of u is also O(d+(u)), which is clearly the best possible since it takes that long just to write them all down. Finding predecessors of a node u is extremely expensive, requiring looking through every list of every node in time O(n+m), where m is the total number of edges.
Adjacency lists are thus most useful when we mostly want to enumerate outgoing edges of each node. This is common in search tasks, where we want to find a path from one node to another or compute the distances between pairs of nodes. If other operations are important, we can optimize them by augmenting the adjacency list representation; for example, using sorted arrays for the adjacency lists reduces the cost of edge existence testing to O(log(d+(u))), and adding a second copy of the graph with reversed edges lets us find all predecessors of u in O(d-(u)) time, where d-(u) is u's in-degree.
Adjacency lists also require much less space than adjacency matrices for sparse graphs: O(n+m) vs O(n2) for adjacency matrices. For this reason adjacency lists are more commonly used than adjacency matrices.
4.2.1. An implementation
Here is an implementation of a basic graph type using adjacency lists.
1 /* basic directed graph type */
2
3 typedef struct graph *Graph;
4
5 /* create a new graph with n vertices labeled 0..n-1 and no edges */
6 Graph graph_create(int n);
7
8 /* free all space used by graph */
9 void graph_destroy(Graph);
10
11 /* add an edge to an existing graph */
12 /* doing this more than once may have unpredictable results */
13 void graph_add_edge(Graph, int source, int sink);
14
15 /* return the number of vertices/edges in the graph */
16 int graph_vertex_count(Graph);
17 int graph_edge_count(Graph);
18
19 /* return the out-degree of a vertex */
20 int graph_out_degree(Graph, int source);
21
22 /* return 1 if edge (source, sink) exists), 0 otherwise */
23 int graph_has_edge(Graph, int source, int sink);
24
25 /* invoke f on all edges (u,v) with source u */
26 /* supplying data as final parameter to f */
27 /* no particular order is guaranteed */
28 void graph_foreach(Graph g, int source,
29 void (*f)(Graph g, int source, int sink, void *data),
30 void *data);
1 #include <stdlib.h>
2 #include <assert.h>
3
4 #include "graph.h"
5
6 /* basic directed graph type */
7 /* the implementation uses adjacency lists
8 * represented as variable-length arrays */
9
10 /* these arrays may or may not be sorted: if one gets long enough
11 * and you call graph_has_edge on its source, it will be */
12
13 struct graph {
14 int n; /* number of vertices */
15 int m; /* number of edges */
16 struct successors {
17 int d; /* number of successors */
18 int len; /* number of slots in array */
19 char is_sorted; /* true if list is already sorted */
20 int list[1]; /* actual list of successors */
21 } *alist[1];
22 };
23
24 /* create a new graph with n vertices labeled 0..n-1 and no edges */
25 Graph
26 graph_create(int n)
27 {
28 Graph g;
29 int i;
30
31 g = malloc(sizeof(struct graph) + sizeof(struct successors *) * (n-1));
32 assert(g);
33
34 g->n = n;
35 g->m = 0;
36
37 for(i = 0; i < n; i++) {
38 g->alist[i] = malloc(sizeof(struct successors));
39 assert(g->alist[i]);
40
41 g->alist[i]->d = 0;
42 g->alist[i]->len = 1;
43 g->alist[i]->is_sorted= 1;
44 }
45
46 return g;
47 }
48
49 /* free all space used by graph */
50 void
51 graph_destroy(Graph g)
52 {
53 int i;
54
55 for(i = 0; i < g->n; i++) free(g->alist[i]);
56 free(g);
57 }
58
59 /* add an edge to an existing graph */
60 void
61 graph_add_edge(Graph g, int u, int v)
62 {
63 assert(u >= 0);
64 assert(u < g->n);
65 assert(v >= 0);
66 assert(v < g->n);
67
68 /* do we need to grow the list? */
69 while(g->alist[u]->d >= g->alist[u]->len) {
70 g->alist[u]->len *= 2;
71 g->alist[u] =
72 realloc(g->alist[u],
73 sizeof(struct successors) + sizeof(int) * (g->alist[u]->len - 1));
74 }
75
76 /* now add the new sink */
77 g->alist[u]->list[g->alist[u]->d++] = v;
78 g->alist[u]->is_sorted = 0;
79
80 /* bump edge count */
81 g->m++;
82 }
83
84 /* return the number of vertices in the graph */
85 int
86 graph_vertex_count(Graph g)
87 {
88 return g->n;
89 }
90
91 /* return the number of vertices in the graph */
92 int
93 graph_edge_count(Graph g)
94 {
95 return g->m;
96 }
97
98 /* return the out-degree of a vertex */
99 int
100 graph_out_degree(Graph g, int source)
101 {
102 assert(source >= 0);
103 assert(source < g->n);
104
105 return g->alist[source]->d;
106 }
107
108 /* when we are willing to call bsearch */
109 #define BSEARCH_THRESHOLD (10)
110
111 static int
112 intcmp(const void *a, const void *b)
113 {
114 return *((const int *) a) - *((const int *) b);
115 }
116
117 /* return 1 if edge (source, sink) exists), 0 otherwise */
118 int
119 graph_has_edge(Graph g, int source, int sink)
120 {
121 int i;
122
123 assert(source >= 0);
124 assert(source < g->n);
125 assert(sink >= 0);
126 assert(sink < g->n);
127
128 if(graph_out_degree(g, source) >= BSEARCH_THRESHOLD) {
129 /* make sure it is sorted */
130 if(! g->alist[source]->is_sorted) {
131 qsort(g->alist[source]->list,
132 g->alist[source]->d,
133 sizeof(int),
134 intcmp);
135 }
136
137 /* call bsearch to do binary search for us */
138 return
139 bsearch(&sink,
140 g->alist[source]->list,
141 g->alist[source]->d,
142 sizeof(int),
143 intcmp)
144 != 0;
145 } else {
146 /* just do a simple linear search */
147 /* we could call lfind for this, but why bother? */
148 for(i = 0; i < g->alist[source]->d; i++) {
149 if(g->alist[source]->list[i] == sink) return 1;
150 }
151 /* else */
152 return 0;
153 }
154 }
155
156 /* invoke f on all edges (u,v) with source u */
157 /* supplying data as final parameter to f */
158 void
159 graph_foreach(Graph g, int source,
160 void (*f)(Graph g, int source, int sink, void *data),
161 void *data)
162 {
163 int i;
164
165 assert(source >= 0);
166 assert(source < g->n);
167
168 for(i = 0; i < g->alist[source]->d; i++) {
169 f(g, source, g->alist[source]->list[i], data);
170 }
171 }
And here is some test code: test_graph.c.
4.3. Implicit representations
For some graphs, it may not make sense to represent them explicitly. An example might be the word-search graph from CS223/2005/Assignments/HW10, which consists of all words in a dictionary with an edge between any two words that differ only by one letter. In such a case, rather than building an explicit data structure containing all the edges, we might generate edges as needed when computing the neighbors of a particular vertex. This gives us an implicit or procedural representation of a graph.
Implicit representations require the ability to return a vector or list of values from a the neighborhood-computing function; some ways of doing this are described in C/Iterators.
5. Searching for paths in a graph
A path is a sequence of vertices v1, v2, ... vk where each pair (vi, vi+1) is an edge. Often we want to find a path from a source vertex s to a target vertex t, or more generally to detect which vertices are reachable from a given source vertex s. We can solve these problems by using any of several standard graph search algorithms, of which the simplest and most commonly used are DepthFirstSearch and BreadthFirstSearch.
Both of these search algorithms are a special case of a more general algorithm for growing a directed tree in a graph rooted at a given node s. Here we are using tree as a graph theorist would, to mean a set of k nodes joined by k-1 edges; this is similar to trees used in data structures except that there are no limits on the number of children a node can have and no ordering constraints within the tree.
The general tree-growing algorithm might be described as follows:
- Start with a tree consisting of just s.
- If there is at least one edge that leaves the tree (i.e. goes from a node in the current tree to a node outside the current tree), pick the "best" such edge and add it and its sink to the tree.
- Repeat step 2 until no edges leave the tree.
Practically, steps 2 and 3 are implemented by having some sort of data structure that acts as a bucket for unprocessed edges. When a new node is added to the tree, all of its outgoing edges are thrown into the bucket. The "best" outgoing edge is obtained by applying some sort of pop, dequeue, or delete-min operation to the bucket, depending on which it provides; if this edge turns out to be an internal edge of the tree (maybe we added its sink after putting it in the bucket), we throw it away. Otherwise we mark the edge and its sink as belonging to the tree and repeat.
The output of the generic tree-growing algorithm typically consists of (a) marks on all the nodes that are reachable from s, and (b) for each such node v, a parent pointer back to the source of the edge that brought v into the tree. Often these two values can be combined by using a null parent pointer to represent the absence of a mark (this usually requires making the root point to itself so that we know it's in the tree). Other values that may be useful are a table showing the order in which nodes were added to the tree. For even more possibilities see DepthFirstSearch.
What kind of tree we get depends on what we use for the bucket---specifically, on what edge is returned when we ask for the "best" edge. Two easy cases are:
The bucket is a stack. When we ask for an outgoing edge, we get the last edge inserted. This has the effect of running along as far as possible through the graph before backtracking, since we always keep going from the last node if possible. The resulting algorithm is called DepthFirstSearch and yields a DepthFirstSearch tree. If we don't care about the lengths of the paths we consider, DepthFirstSearch is a perfectly good algorithm for testing connectivity, and has several other useful properties (described on the algorithm's own page).
The bucket is a queue. Now when we ask for an outgoing edge, we get the first edge inserted. This favors edges that are close to the root: we don't start consider edges from nodes adjacent to the root until we have already added all the root's successors to the tree, and similarly we don't start considering edges at distance k until we have already added all the closer nodes to the tree. This gives BreadthFirstSearch, which constructs a shortest-path tree in which every path from the root to a node in the tree has the minimum length.
Structurally, these algorithms are almost completely identical; indeed, if we organize the stack/queue so that it can pop from both ends, we can switch between DepthFirstSearch and BreadthFirstSearch just by choosing one operation or another. This is what is done in the implementation below. Since it's ugly to have a flag parameter to a function that radically changes its behavior, the combined search function is wrapped inside two separate functions dfs and bfs that are exported to the outside of the module.
The running time of either algorithm is very fast: we pay O(1) per vertex in setup costs and O(1) per edge during the search (assuming the input is in adjacency-list form), giving a linear O(n+m) total cost. Often it is more expensive to set up the graph in the first place than to run a search on it.
5.1. Depth-first and breadth-first search
1 /* Typical usage:
2 *
3 * struct search_info *s;
4 * int n;
5 *
6 * s = search_info_create(g);
7 *
8 * n = graph_vertices(g);
9 * for(i = 0; i < n; i++) {
10 * dfs(s, i);
11 * }
12 *
13 * ... use results in s ...
14 *
15 * search_info_destroy(s);
16 *
17 */
18
19 /* summary information per node for dfs and bfs */
20 /* this is not intended to be opaque---user can read it */
21 /* (but should not write it!) */
22
23 #define SEARCH_INFO_NULL (-1) /* for empty slots */
24
25 struct search_info {
26 Graph graph;
27 int reached; /* count of reached nodes */
28 int *preorder; /* list of nodes in order first reached */
29 int *time; /* time[i] == position of node i in preorder list */
30 int *parent; /* parent in DFS or BFS forest */
31 int *depth; /* distance from root */
32 };
33
34 /* allocate and initialize search results structure */
35 /* you need to do this before passing it to dfs or bfs */
36 struct search_info *search_info_create(Graph g);
37
38 /* free search_info data---does NOT free graph pointer */
39 void search_info_destroy(struct search_info *);
40
41 /* perform depth-first search starting at root, updating results */
42 void dfs(struct search_info *results, int root);
43
44 /* perform breadth-first search starting at root, updating results */
45 void bfs(struct search_info *results, int root);
1 #include <stdlib.h>
2 #include <assert.h>
3
4 #include "graph.h"
5 #include "search.h"
6
7 /* create an array of n ints initialized to SEARCH_INFO_NULL */
8 static int *
9 create_empty_array(int n)
10 {
11 int *a;
12 int i;
13
14 a = malloc(sizeof(*a) * n);
15 assert(a);
16
17 for(i = 0; i < n; i++) {
18 a[i] = SEARCH_INFO_NULL;
19 }
20
21 return a;
22 }
23
24 /* allocate and initialize search results structure */
25 /* you need to do this before passing it to dfs or bfs */
26 struct search_info *
27 search_info_create(Graph g)
28 {
29 struct search_info *s;
30 int n;
31
32 s = malloc(sizeof(*s));
33 assert(s);
34
35 s->graph = g;
36 s->reached = 0;
37
38 n = graph_vertex_count(g);
39
40 s->preorder = create_empty_array(n);
41 s->time = create_empty_array(n);
42 s->parent = create_empty_array(n);
43 s->depth = create_empty_array(n);
44
45 return s;
46 }
47
48 /* free search_info data---does NOT free graph pointer */
49 void
50 search_info_destroy(struct search_info *s)
51 {
52 free(s->depth);
53 free(s->parent);
54 free(s->time);
55 free(s->preorder);
56 free(s);
57 }
58
59 /* used inside search routines */
60 struct edge {
61 int u; /* source */
62 int v; /* sink */
63 };
64
65 /* stack/queue */
66 struct queue {
67 struct edge *e;
68 int bottom;
69 int top;
70 };
71
72 static void
73 push_edge(Graph g, int u, int v, void *data)
74 {
75 struct queue *q;
76
77 q = data;
78
79 assert(q->top < graph_edge_count(g) + 1);
80
81 q->e[q->top].u = u;
82 q->e[q->top].v = v;
83 q->top++;
84 }
85
86 /* this rather horrible function implements dfs if use_queue == 0 */
87 /* and bfs if use_queue == 1 */
88 static void
89 generic_search(struct search_info *r, int root, int use_queue)
90 {
91 /* queue/stack */
92 struct queue q;
93
94 /* edge we are working on */
95 struct edge cur;
96
97 /* start with empty q */
98 /* we need one space per edge */
99 /* plus one for the fake (root, root) edge */
100 q.e = malloc(sizeof(*q.e) * (graph_edge_count(r->graph) + 1));
101 assert(q.e);
102
103 q.bottom = q.top = 0;
104
105 /* push the root */
106 push_edge(r->graph, root, root, &q);
107
108 /* while q.e not empty */
109 while(q.bottom < q.top) {
110 if(use_queue) {
111 cur = q.e[q.bottom++];
112 } else {
113 cur = q.e[--q.top];
114 }
115
116 /* did we visit sink already? */
117 if(r->parent[cur.v] != SEARCH_INFO_NULL) continue;
118
119 /* no */
120 assert(r->reached < graph_vertex_count(r->graph));
121 r->parent[cur.v] = cur.u;
122 r->time[cur.v] = r->reached;
123 r->preorder[r->reached++] = cur.v;
124 if(cur.u == cur.v) {
125 /* we could avoid this if we were certain SEARCH_INFO_NULL */
126 /* would never be anything but -1 */
127 r->depth[cur.v] = 0;
128 } else {
129 r->depth[cur.v] = r->depth[cur.u] + 1;
130 }
131
132 /* push all outgoing edges */
133 graph_foreach(r->graph, cur.v, push_edge, &q);
134 }
135
136 free(q.e);
137 }
138
139 void
140 dfs(struct search_info *results, int root)
141 {
142 generic_search(results, root, 0);
143 }
144
145 void
146 bfs(struct search_info *results, int root)
147 {
148 generic_search(results, root, 1);
149 }
And here is some test code: test_search.c. You will need to compile test_search.c together with both search.c and graph.c to get it to work.
5.2. Other variations on the basic algorithm
Stacks and queues are not the only options for the bucket in the generic search algorithm. Some other choices are:
A priority queue keyed by edge weights. If the edges have weights, the generic tree-builder can be used to find a tree containing s with minimum total edge weight.1 The basic idea is to always pull out the lightest edge. The resulting algorithm runs in O(n + m log m) time (since each heap operation takes O(log m) time), and is known as Prim's algorithm. See Prim's algorithm for more details.
A priority queue keyed by path lengths. Here we assume that edges have lengths, and we want to build a shortest-path tree where the length of the path is no longer just the number of edges it contains but the sum of their weights. The basic idea is to keep track of the distance from the root to each node in the tree, and assign each edge a key equal to the sum of the distance to its source and its length. The resulting search algorithm, known as Dijkstra's algorithm, will give a shortest-path tree if all the edge weights are non-negative. See ShortestPath or Dijkstra's algorithm.
This only works if the graph is undirected, i.e. if for every edge uv there is a matching edge vu with the same weight. (1)