Linking - ltqusst/video_notes GitHub Wiki

Thanks to them:

https://reverseengineering.stackexchange.com/questions/1992/what-is-plt-got

https://www.technovelty.org/linux/plt-and-got-the-key-to-code-sharing-and-dynamic-libraries.html

https://www.airs.com/blog/archives/39

https://www.airs.com/blog/archives/40

https://stackoverflow.com/questions/6538501/linking-two-shared-libraries-with-some-of-the-same-symbols

https://www.bottomupcs.com/libraries_and_the_linker.xhtml

memo

  • readelf needs “-W”, or it will truncate result string
  • objdump can demangle C++ name-mangling into human readable form by option "-C".
to show command1 command2
all symbols readelf -s objdump --syms
dynamic symbols readelf --dyn-syms objdump -T
shared lib dependency readelf -d

-Wl,--default-symver can be used to add symbol version to all executable exported symbols so it won't be dynamic linked by external Dynamic-SO accidentally.

symbol

  • a name and a value.
  • function name and it's address
  • static/global variable's name and it's address

when the value of symbol is unknown, it's an undefined symbol.

linking process

(either static linking or dynamic linking):

  • assign an address to each defined symbol.
  • resolve each undefined symbol by finding a defined symbol with the same name and do relocations.

relocation

  • can be checked with commmand "readelf -r xxx | less"
  • represent a symbol referencing instance in the instruction or initialized data, for example, a jump instruction to some function, a register load instruction which requires the address of a variable/function, a static initialize of a pointer to an external function or variable.
  • composed of: an Offset, a target symbol and an addend
  • set this Offset in the contents to the value of this symbol plus this addend
  • This operation is needed because the target symbol's value cannot be determined until linking, local function call (call a static local function) do not need relocations.

-fPIC (Position Independent Code) for shared lib

linker will re-assign run-time virtual memory to each section of each object, for example, it needs combine all .data sections from every object into a big .data section. and all .text sections from objs into a big .text sections. relocation allows the referencing still work after the movements. static linker do relocations by modifying contents(portion of instruction code for example).

consider shared library, .so file will be loaded into a single physical memory region and mapped into multiple process's different virtual memory offsets, so it cannot make any assumption on the absolute runtime virtual addresses when referencing other symbols, and at load-time, relocation by modifying code like what static linker does would fail for sure because there is only one physical copy of code, but mapped into different virtual memory locations in multiple processes.

So, for shared library, Position independent code generation is invented (by passing flag -fPIC to compiler). few highlights

  1. dynamic linker load all sections of one .so as a whole into a physical memory and shares RO/TEXT sections between all processes requiring them, DATA sections are not shared.
  2. all symbols within sections including DATA are internally referenced in a position independent way.
  3. use relative addressing (relative to rip register) for internal (within same .so file) references.
  4. use PLT(Procedure Linkage Table) for external (out of same .so file) references.

PLT is an indirect jump entry acting as entries of an external symbol, it loads real target address from corresponding GOT(Global Offset Table) entry. GOT is within DATA section so can be/will be modified differently for each processes by dynamic linker.

relocations to be done for .so file can be checked by "readelf -r xxx.so".

example

xxx.cpp

const char * funcD(void); //external undefined symbol
static int local_subfunc(int i){
    return i*8;
}
static void local_func(int i){
    printf("can you see me=%d, %s?\n", local_subfunc(i), funcD());
}

objdump -d xxx.o

Disassembly of section .text:

0000000000000000 <_ZL13local_subfunci>:
   0:   55                      push   %rbp
   1:   48 89 e5                mov    %rsp,%rbp
   4:   89 7d fc                mov    %edi,-0x4(%rbp)
   7:   8b 45 fc                mov    -0x4(%rbp),%eax
   a:   c1 e0 03                shl    $0x3,%eax
   d:   5d                      pop    %rbp
   e:   c3                      retq   

000000000000000f <_ZL10local_funci>:
   f:   55                      push   %rbp
  10:   48 89 e5                mov    %rsp,%rbp
  13:   53                      push   %rbx
  14:   48 83 ec 18             sub    $0x18,%rsp
  18:   89 7d ec                mov    %edi,-0x14(%rbp)
  1b:   e8 00 00 00 00          callq  20                    # reloc: 00000000001c funcD
  20:   48 89 c3                mov    %rax,%rbx
  23:   8b 45 ec                mov    -0x14(%rbp),%eax
  26:   89 c7                   mov    %eax,%edi
  28:   e8 d3 ff ff ff          callq  0                     # no reloc needed, position independent code
  2d:   48 89 da                mov    %rbx,%rdx
  30:   89 c6                   mov    %eax,%esi
  32:   48 8d 3d 00 00 00 00    lea    0x0(%rip),%rdi        # reloc: 000000000035 .rodata
  39:   b8 00 00 00 00          mov    $0x0,%eax
  3e:   e8 00 00 00 00          callq  43                    # reloc: 00000000003f printf
  43:   90                      nop
  44:   48 83 c4 18             add    $0x18,%rsp
  48:   5b                      pop    %rbx
  49:   5d                      pop    %rbp
  4a:   c3                      retq 

readelf -r xxx.o

Relocation section '.rela.text' at offset 0xb10 contains 12 entries:
  Offset          Info           Type           Sym. Value    Sym. Name + Addend
00000000001c  001400000004 R_X86_64_PLT32    0000000000000000 _Z5funcDv - 4
000000000035  000600000002 R_X86_64_PC32     0000000000000000 .rodata - 4
00000000003f  001500000004 R_X86_64_PLT32    0000000000000000 printf - 4

conclusion:

  • referencing to local symbols (static function/variable) is completed using position independent code (using ip-relative addressing) without the need of relocations.
  • referencing to data in other sections needs relocation of type R_X86_64_PC32, because linker will grouping & combine sections.
  • referencing to external symbols need relocation of type R_X86_64_PLT32
  1. position independent code (relative addressing).
  2. indirect jump through an function table (like C++ virtual function pointer table).

method 1 is good for symbols within same .so file. we need method 2 to access/reference symbols in other .so.

               section .rodata
 8a0:                           db     "can you see me=%d, %s?\n", 0
Disassembly of section .plt:
00000000000006a0 <_Z5funcDv@plt>:
 6a0:   ff 25 72 09 20 00       jmpq   *0x200972(%rip)        # 201018 <_GLOBAL_OFFSET_TABLE_+0x18>
 6a6:   68 00 00 00 00          pushq  $0x0
 6ab:   e9 e0 ff ff ff          jmpq   690 <_init+0x28>

Disassembly of section .plt.got:
00000000000006b0 <.plt.got>:
 6b0:   ff 25 22 09 20 00       jmpq   *0x200922(%rip)        # 200fd8 <_DYNAMIC+0x1e0> printf@GLIBC_2.2.5 + 0
 6b6:   66 90                   xchg   %ax,%ax

Disassembly of section .text:
00000000000007df <_ZL10local_funci>:
 7df:   55                      push   %rbp
 7e0:   48 89 e5                mov    %rsp,%rbp
 7e3:   53                      push   %rbx
 7e4:   48 83 ec 18             sub    $0x18,%rsp
 7e8:   89 7d ec                mov    %edi,-0x14(%rbp)
 7eb:   e8 b0 fe ff ff          callq  6a0 <_Z5funcDv@plt>                  # trampline: through GOT@201018
 7f0:   48 89 c3                mov    %rax,%rbx
 7f3:   8b 45 ec                mov    -0x14(%rbp),%eax
 7f6:   89 c7                   mov    %eax,%edi
 7f8:   e8 d3 ff ff ff          callq  7d0 <_ZL13local_subfunci>            # no reloc, internal reference
 7fd:   48 89 da                mov    %rbx,%rdx
 800:   89 c6                   mov    %eax,%esi
 802:   48 8d 3d 97 00 00 00    lea    0x97(%rip),%rdi        # 8a0         # no reloc, internal reference
 809:   b8 00 00 00 00          mov    $0x0,%eax
 80e:   e8 9d fe ff ff          callq  6b0 <_Z5funcDv@plt+0x10>             # trampline through GOT@200fd8
 813:   90                      nop
 814:   48 83 c4 18             add    $0x18,%rsp
 818:   5b                      pop    %rbx
 819:   5d                      pop    %rbp
 81a:   c3                      retq   

Section .got
200fd8                          dq     0x000000000000000                     # dynamic relocation:  

Section .got.plt
_Z5funcDv@GOT
201018                          dq     0x000000000202000                     # dynamic relocation: 


readelf -r xxx.so

Relocation section '.rela.dyn' at offset 0x548 contains 11 entries:
  Offset          Info           Type           Sym. Value    Sym. Name + Addend
000000200de0  000000000008 R_X86_64_RELATIVE                    7a0
000000200de8  000000000008 R_X86_64_RELATIVE                    760
000000201020  000000000008 R_X86_64_RELATIVE                    201020
000000200fd8  000400000006 R_X86_64_GLOB_DAT 0000000000000000 printf@GLIBC_2.2.5 + 0
000000201028  000200000001 R_X86_64_64       0000000000000000 _Z5funcDv + 0

Relocation section '.rela.plt' at offset 0x650 contains 1 entries:
  Offset          Info           Type           Sym. Value    Sym. Name + Addend
000000201018  000200000007 R_X86_64_JUMP_SLO 0000000000000000 _Z5funcDv + 0

⚠️ **GitHub.com Fallback** ⚠️