Note: You are looking at a static copy of the former PineWiki site, used for class notes by James Aspnes from 2003 to 2012. Many mathematical formulas are broken, and there are likely to be other bugs as well. These will most likely not be fixed. You may be able to find more up-to-date versions of some of these notes at http://www.cs.yale.edu/homes/aspnes/#classes.

Consider a typical C program. We have a large collection of global variables (mostly functions), and the code and/or data for each needs to go somewhere in memory. How do we decide where it goes?

1. Fixed addresses chosen by the programmer

The simplest approach (for the computer) is to let the programmer specify fixed addresses for everything. So hello() goes at 0x43770, exit() goes at 0x3817, etc. This has a lot of problems:

Programmer has to keep track of a lot of boring clerical details.
What if we put two things in the same place by accident?
What if we need to move something or make it bigger?

Consequently, nobody does this, and our compilation tools actually make it pretty hard to do it.

2. Fixed address chosen by the linker and/or loader

Instead, we design our object files to be relocatable: in addition to the compiled machine code, the file contains a symbol table that says what addresses are used in the code and where. A linker resolves these symbols to specific fixed locations, and writes its choices everywhere a symbol is used. The linker is also typically responsible for finding necessary routines from libraries.

On Unix, the linker is usually called ld and runs after the rest of the compilation is done. You can see what it is doing by looking at symbol tables using the nm program. For example, given the following short program short.c:

   1 int 
   2 main(int argc, char **argv)
   3 {
   4     return 0;
   5 }

The output of nm short.o after running gcc -c short.c (on an IA-64 Linux machine) looks like this:

0000000000000000 T main

This says that short.o contains one text (code) symbol main, which is assigned a rather boring address. The output of nm short after running gcc -o short short.o looks like this:

00000000005008c0 A __bss_start
000000000040042c t call_gmon_start
00000000005008c0 b completed.4829
00000000005006d0 d __CTOR_END__
00000000005006c8 d __CTOR_LIST__
00000000005008a8 D __data_start
00000000005008a8 W data_start
00000000004005c0 t __do_global_ctors_aux
0000000000400450 t __do_global_dtors_aux
00000000005008b0 D __dso_handle
00000000005006e0 d __DTOR_END__
00000000005006d8 d __DTOR_LIST__
00000000005006f0 D _DYNAMIC
00000000005008c0 A _edata
00000000005008c8 A _end
00000000004005f8 T _fini
00000000005008c0 a __fini_array_end
00000000005008c0 a __fini_array_start
0000000000400490 t frame_dummy
00000000004006c0 r __FRAME_END__
0000000000500888 D _GLOBAL_OFFSET_TABLE_
                 w __gmon_start__
00000000004003c0 T _init
00000000005008c0 a __init_array_end
00000000005008c0 a __init_array_start
0000000000400608 R _IO_stdin_used
00000000005006e8 d __JCR_END__
00000000005006e8 d __JCR_LIST__
                 w _Jv_RegisterClasses
0000000000400550 T __libc_csu_fini
00000000004004d0 T __libc_csu_init
                 U __libc_start_main@@GLIBC_2.2.5
00000000004004b8 T main
00000000005008b8 d p.4828
0000000000400400 T _start

Here we have a lot of extra library stuff. Note that one of the symbols __libc_start_main@@GLIBC_2.2.5 is unresolved; this will be filled in at load time using dynamic linking (see below).

3. Dynamic linking

The process of linking at load time (when a program starts) is called dynamic linking. This is often used for shared libraries, where static linking at link time would require making a copy—possibly a very large copy—of each library in each program. Instead, the OS uses address translation trickery to make a single copy of the library available in the address spaces of all processes that use it. But since we don't want to fix the location of this copy (since we don't know what libraries will be loaded and where), we delay resolution of library symbols until load time. This can either be done in the kernel itself of by a userspace program (e.g. ld.so in Linux).

In some systems (e.g. Multics), dynamic linking could even allow library routines to be replaced in a running process. Most modern systems don't provide this feature.

4. Dynamic loading

We can defer linking even further by allowing a program to run without including all of the procedures it might eventually need. Instead, we allow the program to load new object files after it is already running, a process called dynamic loading. In its most general form, this can be used to add new functionality to an already-compiled program, by allowing it to load in new modules. It can also be used to allow a program to defer loading into memory routines or modules that are infrequently used.

This latter technique of on-demand dynamic loading or autoloading typically involves replacing routines in dynamically-loaded modules with stubs, which call the dynamic loader the first time they are called and then replace themselves with the newly-loaded routine.

CategoryOperatingSystemsNotes