[FrontPage] [TitleIndex] [WordIndex

Note: You are looking at a static copy of the former PineWiki site, used for class notes by James Aspnes from 2003 to 2012. Many mathematical formulas are broken, and there are likely to be other bugs as well. These will most likely not be fixed. You may be able to find more up-to-date versions of some of these notes at http://www.cs.yale.edu/homes/aspnes/#classes.

B-trees are a data structure designed to take advantage of the block structure of memory devices, particularly disks.

1. The memory hierarchy

Most of the time, programmers will assume that memory is flat, meaning that all memory references are equally expensive. While hardware designers work very hard to make memory references cheap, they are not all equally cheap.

Typical CPUs implement several layers of caching backed by system memory and then virtual memory. The cache layers are very fast memory (typically on the same die as the CPU); system memory is slower; virtual memory is slower by 6 orders of magnitude. Typically data is moved to the fastest available level of the cache when it is accessed; data that hasn't been accessed in a while is pushed out to slower levels to make room. To amortize the overhead costs of this process, data is managed in blocks, which may range from as little as 64 bytes for the on-chip caches to as much as 4096 bytes or more for the virtual memory system.

The effect of storing data in caches is that it's cheaper—sometimes much cheaper—to access data at an address close to other addresses you've looked at recently than to access data far away.

2. Caching and data structures

The structure of the cache system means that, all else being equal, a data structure will run faster if it has good locality, which means that pieces of the data structure accessed around the same time tend to be in about the same place. For example, given the choice between searching through an unsorted array and an unsorted linked list, you are better off searching the array: each position you look at in the array comes right after the previous position, so is likely to be stored in the same block. In contrast, the elements of the linked list will be wherever malloc managed to scrounge up space: in the worst case, one element per virtual memory block, each of which needs to be paged in from disk if you are running low on space. The disadvantage of the array over the linked list is that calling realloc may be expensive; one way to avoid this is to build a linked list of large nodes each of which holds an array, as done in this sample code: stackBlockList.c. This is actually slightly slower than just building a stack as an array (stackArray.c) but much faster than the simple linked-list approach (stackList.c).

A similar issue arises with binary search trees. Because we are only jumping through O(log n) nodes, the added cost of bad locality is not as noticeable as with linked lists, but we'd like to reduce it, especially if we expect our tree nodes to be stored on a slow device like a disk.

3. B-trees

B-trees solve this problem by grouping what would be many nodes together into a single giant node. The idea is that each node contains an array k keys and pointers to k+1 children, for a value of k that lies somewhere between m/2 and m (except for the root) where m is a parameter of the tree called the order. Searching a B-tree consists of finding the child in between the two keys that our target lies between. A picture of one node in a B-tree:

 14 38 57
A  B  C  D

Here there are three keys and four children A, B, C, and D. Node A is the root of a B-tree holding all keys less than 14, B the root of a B-tree holding all keys strictly between 14 and 38, etc.

At the bottom of the tree are leaf nodes that don't hold any pointers. A sophisticated implementation of a B-tree may take advantage of this to pack more keys into the leaves while still keeping the size of each node fitting neatly inside the block size (the implementation given below is not that sophisticated). At the top is a root node, which is just like any other internal node except that it may contain fewer than m/2 keys.

3.1. Insertions

To insert a key, we start by inserting it at the leaf, by moving any larger keys up to make room and then writing it into the sorted array. If the leaf becomes full, this may require some rebalancing.

B-trees are always completely balanced, with each leaf the same distance from the root. The mechanism for balancing a B-tree is that when a node becomes full, it splits into two nodes and pushes a key up into its parent. For example, we might split a three-key node into two one-key nodes like this:

 14 38 57
A  B  C  D
    
    (38)
 14      57
A  B    C  D

Pushing the key up into the parent will increase the number of keys in the parent and may force the parent to split as well. If we find ourselves splitting the root, we create a new root with one key and two children. This is the only way the height of the tree can increase.

3.2. Deletions

Deletion is mostly just the reverse of insertion, but has a lot of messy special cases. As in a binary search tree, it's complicated to delete a key from an internal node; the solution is to delete a key from a nearby leaf instead. We do not implement deletion in the code below, but you can read about it in HorowitzEtAl §11.2.4.

4. Implementation

Here is a basic implementation, not-very-tuned implementation of a B-tree. Despite the horrendous amount of copying in btInsert, the test program runs about four times faster with this implementation than an adapted version runs on the AVL tree code from C/AvlTree. There are many possible reasons for this (the AVL tree code is not very optimized), but cache performance and avoiding malloc overhead are probably the big factors.

   1 /* implementation of a B-tree */
   2 typedef struct btNode *bTree;
   3 
   4 /* create a new empty tree */
   5 bTree btCreate(void);
   6 
   7 /* free a tree */
   8 void btDestroy(bTree t);
   9 
  10 /* return nonzero if key is present in tree */
  11 int btSearch(bTree t, int key);
  12 
  13 /* insert a new element into a tree */
  14 void btInsert(bTree t, int key);
  15 
  16 /* print all keys of the tree in order */
  17 void btPrintKeys(bTree t);
bTree.h

   1 #include <stdio.h>
   2 #include <stdlib.h>
   3 #include <assert.h>
   4 #include <string.h>
   5 
   6 #include "bTree.h"
   7 
   8 #define MAX_KEYS (1024)
   9 
  10 struct btNode {
  11     int isLeaf;     /* is this a leaf node? */
  12     int numKeys;    /* how many keys does this node contain? */
  13     int keys[MAX_KEYS];
  14     struct btNode *kids[MAX_KEYS+1];  /* kids[i] holds nodes < keys[i] */
  15 };
  16 
  17 bTree
  18 btCreate(void)
  19 {
  20     bTree b;
  21 
  22     b = malloc(sizeof(*b));
  23     assert(b);
  24 
  25     b->isLeaf = 1;
  26     b->numKeys = 0;
  27 
  28     return b;
  29 }
  30 
  31 void
  32 btDestroy(bTree b)
  33 {
  34     int i;
  35 
  36     if(!b->isLeaf) {
  37         for(i = 0; i < b->numKeys + 1; i++) {
  38             btDestroy(b->kids[i]);
  39         }
  40     }
  41 
  42     free(b);
  43 }
  44 
  45 /* return smallest index i in sorted array such that key <= a[i] */
  46 /* (or n if there is no such index) */
  47 static int
  48 searchKey(int n, const int *a, int key)
  49 {
  50     int lo;
  51     int hi;
  52     int mid;
  53 
  54     /* invariant: a[lo] < key <= a[hi] */
  55     lo = -1;
  56     hi = n;
  57 
  58     while(lo + 1 < hi) {
  59         mid = (lo+hi)/2;
  60         if(a[mid] == key) {
  61             return mid;
  62         } else if(a[mid] < key) {
  63             lo = mid;
  64         } else {
  65             hi = mid;
  66         }
  67     }
  68 
  69     return hi;
  70 }
  71 
  72 int
  73 btSearch(bTree b, int key)
  74 {
  75     int pos;
  76 
  77     /* have to check for empty tree */
  78     if(b->numKeys == 0) {
  79         return 0;
  80     }
  81 
  82     /* look for smallest position that key fits below */
  83     pos = searchKey(b->numKeys, b->keys, key);
  84 
  85     if(pos < b->numKeys && b->keys[pos] == key) {
  86         return 1;
  87     } else {
  88         return(!b->isLeaf && btSearch(b->kids[pos], key));
  89     }
  90 }
  91 
  92 /* insert a new key into a tree */
  93 /* returns new right sibling if the node splits */
  94 /* and puts the median in *median */
  95 /* else returns 0 */
  96 static bTree
  97 btInsertInternal(bTree b, int key, int *median)
  98 {
  99     int pos;
 100     int mid;
 101     bTree b2;
 102 
 103     pos = searchKey(b->numKeys, b->keys, key);
 104 
 105     if(pos < b->numKeys && b->keys[pos] == key) {
 106         /* nothing to do */
 107         return 0;
 108     }
 109 
 110     if(b->isLeaf) {
 111 
 112         /* everybody above pos moves up one space */
 113         memmove(&b->keys[pos+1], &b->keys[pos], sizeof(*(b->keys)) * (b->numKeys - pos));
 114         b->keys[pos] = key;
 115         b->numKeys++;
 116 
 117     } else {
 118 
 119         /* insert in child */
 120         b2 = btInsertInternal(b->kids[pos], key, &mid);
 121         
 122         /* maybe insert a new key in b */
 123         if(b2) {
 124 
 125             /* every key above pos moves up one space */
 126             memmove(&b->keys[pos+1], &b->keys[pos], sizeof(*(b->keys)) * (b->numKeys - pos));
 127             /* new kid goes in pos + 1*/
 128             memmove(&b->kids[pos+2], &b->kids[pos+1], sizeof(*(b->keys)) * (b->numKeys - pos));
 129 
 130             b->keys[pos] = mid;
 131             b->kids[pos+1] = b2;
 132             b->numKeys++;
 133         }
 134     }
 135 
 136     /* we waste a tiny bit of space by splitting now
 137      * instead of on next insert */
 138     if(b->numKeys >= MAX_KEYS) {
 139         mid = b->numKeys/2;
 140 
 141         *median = b->keys[mid];
 142 
 143         /* make a new node for keys > median */
 144         /* picture is:
 145          *
 146          *      3 5 7
 147          *      A B C D
 148          *
 149          * becomes
 150          *          (5)
 151          *      3        7
 152          *      A B      C D
 153          */
 154         b2 = malloc(sizeof(*b2));
 155 
 156         b2->numKeys = b->numKeys - mid - 1;
 157         b2->isLeaf = b->isLeaf;
 158 
 159         memmove(b2->keys, &b->keys[mid+1], sizeof(*(b->keys)) * b2->numKeys);
 160         if(!b->isLeaf) {
 161             memmove(b2->kids, &b->kids[mid+1], sizeof(*(b->kids)) * (b2->numKeys + 1));
 162         }
 163 
 164         b->numKeys = mid;
 165 
 166         return b2;
 167     } else {
 168         return 0;
 169     }
 170 }
 171 
 172 void
 173 btInsert(bTree b, int key)
 174 {
 175     bTree b1;   /* new left child */
 176     bTree b2;   /* new right child */
 177     int median;
 178 
 179     b2 = btInsertInternal(b, key, &median);
 180 
 181     if(b2) {
 182         /* basic issue here is that we are at the root */
 183         /* so if we split, we have to make a new root */
 184 
 185         b1 = malloc(sizeof(*b1));
 186         assert(b1);
 187 
 188         /* copy root to b1 */
 189         memmove(b1, b, sizeof(*b));
 190 
 191         /* make root point to b1 and b2 */
 192         b->numKeys = 1;
 193         b->isLeaf = 0;
 194         b->keys[0] = median;
 195         b->kids[0] = b1;
 196         b->kids[1] = b2;
 197     }
 198 }
bTree.c

See also testBTree.c and Makefile.


2014-06-17 11:57