linker loader - modrpc/info GitHub Wiki

Table of Contents

Overview

  • What are linkers/loader?
    • binds abstract names to concrete names
    • allows coders to use absract names (e.g. getline) instead of concrete names (e.g. "the location 612 bytes from the beginning of the executable code in module iosys")
    • the whole point is WHEN names are bound to addresses.

History of Address Binding

Machine language

  • Everything was absolute
  • binding happens at CODING TIME
  • (-) addresses bound too early => if anything changes, every other things should change accordingly
  • (SOLUTION) Assembler

Assembler Language

  • coders now use symbolic names
  • binding happens at COMPILE TIME (assemble-time?)
  • (-) libraries of code compound the address assignment problem
    • now we want to merge pre-built libraries into one
  • (SOLUTION) linker with two functions: relocation and library search

Linkers

  • Linkers allowed to link libraries into one binary
  • binding happens at LINK TIME
  • (-) With the advent of OS, it doesn't have entire memory at its disposal (multiprogramming)
    • now a program shares the memory with OS itself and other programs
(SOLUTION) relocating loader

Reloacating Loader

    • Now linkers and loaders are separate
  • binding happens at LOAD TIME
    • linker: address binding, assigning relative addresses within each program
    • loader: final relocation step to assign actual addresses
  • (-) program quickly became larger than available memory
  • (SOLUTION) Overlays

Overlays

  • programmers arrange for different parts of a program to share the same memory, with each overlay loadded on demand when another part of the program called into it.
  • (+) still useul in some memory-limited embedded environments
  • (-) programmer's responsiblity to handle sharing
  • (SOLUTION) Virtual memory (with HW-suppored relocation)

Virtual memory

  • (+) now linker/loader got simpler since now each program has the entire memory space itself
  • linkers: programs linked to be loaded at fixed addresses
  • HW: address translation (i.e. relocation)
  • (-) need for sharing libraries between different programs (sharing across address space)
  • (SOLUTION) compiler and assembler create object code in multiple sections

Shared Libraries

  • shared some parts of programs, esp. executable code
(+) save storage
  • compilers/assembelrs now create object code in multiple sections
    • e.g. one section for read-only code, another for writable
    • linker have to combine all sections of each type so that linked program would have all code in one place and all data in another
  • Static shared libraries
    • each library is bound to specific addresses at the time the library is built
    • linker binds program references to library routines to those specific addresses at link time
    • (-) every time library changes, all programs which uses the program needs to be re-linked
  • Dynamically-lined libraries:
    • library sections adn symbols are not bound to actual addresses until the program which uses the library starts running

Linking vs Loading

Functionalities

Program Loading

  • Copy a program from disk into main memory so that it's ready to run
  • Sometimes, it literally just copys daa from disk to memory, but in others
    • allocate storage in memory
    • set protection bits
    • arrange for virtual memory to map virtual addresses to disk pages

Relocation

  • compilers/assembler generally create each object code with program addresses srtaing at zero
  • however, few coputers let you load the program at zero
  • relocation:
    • assign load address to various parts of the program, and
    • adjust the code/data in the program to reflect the assigned addresses

Symbol Resolution

  • when a program is built from multiple subprograms, the references from one subprogram to another are made through symbols.
  • linker resolves the symbol by noting the location assigned to sqrt in the library and patching the caller's object code so that the call instruction refers to that location

Two-Pass Linking

  • INPUT: set of input object files + (optional) load map + (optionsl) debugger symbols
    • each input object file contains:
      • a set of segments, contiguous blocks of code or data
      • at least one symbol table (exported symbols, imported symbols): (cf. XMR table)
  • What it does:
    • PASS 1: BUILD segment table:
      • scan input files to find the sizes of segnments and collect the definitions and references of all symbols exported/imported
    • PASS 2: read/relocate object code:
      • replace symbol refs to numeric addresses

Object Files

What's Inside Object Files

  • Header info: size of code, name of source files, creation date, etc.
  • Object code: binary instruction and data
  • Relocation info: list of places in the object code that have to be fixed up when linker changes the addresses of the object code
  • Symbols: global (exported) symbols defined in the modules, imported symbols from other modules or defined by the linker.
  • Debugging info: source file/line info, local symbols, original C typedef, etc.

Types of Object Files

  • linkable files:
    • contains extensive symbol and relocation info needed by the linker in addition to the object code
  • executable files:
    • object code (usually page-aligned so that they can be mapped to address space) with little or not relocation info
    • can have some symbol info in case dynamic link is done
  • loadabale files:
⚠️ **GitHub.com Fallback** ⚠️