allows coders to use absract names (e.g. getline) instead of concrete names (e.g. "the location 612 bytes from the beginning of the executable code in module iosys")
the whole point is WHEN names are bound to addresses.
History of Address Binding
Machine language
Everything was absolute
binding happens at CODING TIME
(-) addresses bound too early => if anything changes, every other things should change accordingly
(SOLUTION) Assembler
Assembler Language
coders now use symbolic names
binding happens at COMPILE TIME (assemble-time?)
(-) libraries of code compound the address assignment problem
now we want to merge pre-built libraries into one
(SOLUTION) linker with two functions: relocation and library search
Linkers
Linkers allowed to link libraries into one binary
binding happens at LINK TIME
(-) With the advent of OS, it doesn't have entire memory at its disposal (multiprogramming)
now a program shares the memory with OS itself and other programs
(SOLUTION) relocating loader
Reloacating Loader
Now linkers and loaders are separate
binding happens at LOAD TIME
linker: address binding, assigning relative addresses within each program
loader: final relocation step to assign actual addresses
(-) program quickly became larger than available memory
(SOLUTION) Overlays
Overlays
programmers arrange for different parts of a program to share the same memory, with each overlay loadded on demand when another part of the program called into it.
(+) still useul in some memory-limited embedded environments
(+) now linker/loader got simpler since now each program has the entire memory space itself
linkers: programs linked to be loaded at fixed addresses
HW: address translation (i.e. relocation)
(-) need for sharing libraries between different programs (sharing across address space)
(SOLUTION) compiler and assembler create object code in multiple sections
Shared Libraries
shared some parts of programs, esp. executable code
(+) save storage
compilers/assembelrs now create object code in multiple sections
e.g. one section for read-only code, another for writable
linker have to combine all sections of each type so that linked program would have all code in one place and all data in another
Static shared libraries
each library is bound to specific addresses at the time the library is built
linker binds program references to library routines to those specific addresses at link time
(-) every time library changes, all programs which uses the program needs to be re-linked
Dynamically-lined libraries:
library sections adn symbols are not bound to actual addresses until the program which uses the library starts running
Linking vs Loading
Functionalities
Program Loading
Copy a program from disk into main memory so that it's ready to run
Sometimes, it literally just copys daa from disk to memory, but in others
allocate storage in memory
set protection bits
arrange for virtual memory to map virtual addresses to disk pages
Relocation
compilers/assembler generally create each object code with program addresses srtaing at zero
however, few coputers let you load the program at zero
relocation:
assign load address to various parts of the program, and
adjust the code/data in the program to reflect the assigned addresses
Symbol Resolution
when a program is built from multiple subprograms, the references from one subprogram to another are made through symbols.
linker resolves the symbol by noting the location assigned to sqrt in the library and patching the caller's object code so that the call instruction refers to that location
Two-Pass Linking
INPUT: set of input object files + (optional) load map + (optionsl) debugger symbols
each input object file contains:
a set of segments, contiguous blocks of code or data
at least one symbol table (exported symbols, imported symbols): (cf. XMR table)
What it does:
PASS 1: BUILD segment table:
scan input files to find the sizes of segnments and collect the definitions and references of all symbols exported/imported
PASS 2: read/relocate object code:
replace symbol refs to numeric addresses
Object Files
What's Inside Object Files
Header info: size of code, name of source files, creation date, etc.
Object code: binary instruction and data
Relocation info: list of places in the object code that have to be fixed up when linker changes the addresses of the object code
Symbols: global (exported) symbols defined in the modules, imported symbols from other modules or defined by the linker.
Debugging info: source file/line info, local symbols, original C typedef, etc.
Types of Object Files
linkable files:
contains extensive symbol and relocation info needed by the linker in addition to the object code
executable files:
object code (usually page-aligned so that they can be mapped to address space) with little or not relocation info
can have some symbol info in case dynamic link is done