Linking - ltqusst/video_notes GitHub Wiki
Thanks to them:
https://reverseengineering.stackexchange.com/questions/1992/what-is-plt-got
https://www.technovelty.org/linux/plt-and-got-the-key-to-code-sharing-and-dynamic-libraries.html
https://www.airs.com/blog/archives/39
https://www.airs.com/blog/archives/40
https://www.bottomupcs.com/libraries_and_the_linker.xhtml
- readelf needs “-W”, or it will truncate result string
- objdump can demangle C++ name-mangling into human readable form by option "-C".
| to show | command1 | command2 | ||
|---|---|---|---|---|
| all symbols | readelf -s | objdump --syms | ||
| dynamic symbols | readelf --dyn-syms | objdump -T | ||
| shared lib dependency | readelf -d |
-Wl,--default-symver can be used to add symbol version to all executable exported symbols so it won't be dynamic linked by external Dynamic-SO accidentally.
- a name and a value.
- function name and it's address
- static/global variable's name and it's address
when the value of symbol is unknown, it's an undefined symbol.
(either static linking or dynamic linking):
- assign an address to each defined symbol.
- resolve each undefined symbol by finding a defined symbol with the same name and do relocations.
- can be checked with commmand "readelf -r xxx | less"
- represent a symbol referencing instance in the instruction or initialized data, for example, a jump instruction to some function, a register load instruction which requires the address of a variable/function, a static initialize of a pointer to an external function or variable.
- composed of: an Offset, a target symbol and an addend
- set this Offset in the contents to the value of this symbol plus this addend
- This operation is needed because the target symbol's value cannot be determined until linking, local function call (call a static local function) do not need relocations.
linker will re-assign run-time virtual memory to each section of each object, for example, it needs combine all .data sections from every object into a big .data section. and all .text sections from objs into a big .text sections. relocation allows the referencing still work after the movements. static linker do relocations by modifying contents(portion of instruction code for example).
consider shared library, .so file will be loaded into a single physical memory region and mapped into multiple process's different virtual memory offsets, so it cannot make any assumption on the absolute runtime virtual addresses when referencing other symbols, and at load-time, relocation by modifying code like what static linker does would fail for sure because there is only one physical copy of code, but mapped into different virtual memory locations in multiple processes.
So, for shared library, Position independent code generation is invented (by passing flag -fPIC to compiler). few highlights
- dynamic linker load all sections of one .so as a whole into a physical memory and shares RO/TEXT sections between all processes requiring them, DATA sections are not shared.
- all symbols within sections including DATA are internally referenced in a position independent way.
- use relative addressing (relative to rip register) for internal (within same .so file) references.
- use PLT(Procedure Linkage Table) for external (out of same .so file) references.
PLT is an indirect jump entry acting as entries of an external symbol, it loads real target address from corresponding GOT(Global Offset Table) entry. GOT is within DATA section so can be/will be modified differently for each processes by dynamic linker.
relocations to be done for .so file can be checked by "readelf -r xxx.so".
xxx.cpp
const char * funcD(void); //external undefined symbol
static int local_subfunc(int i){
return i*8;
}
static void local_func(int i){
printf("can you see me=%d, %s?\n", local_subfunc(i), funcD());
}objdump -d xxx.o
Disassembly of section .text:
0000000000000000 <_ZL13local_subfunci>:
0: 55 push %rbp
1: 48 89 e5 mov %rsp,%rbp
4: 89 7d fc mov %edi,-0x4(%rbp)
7: 8b 45 fc mov -0x4(%rbp),%eax
a: c1 e0 03 shl $0x3,%eax
d: 5d pop %rbp
e: c3 retq
000000000000000f <_ZL10local_funci>:
f: 55 push %rbp
10: 48 89 e5 mov %rsp,%rbp
13: 53 push %rbx
14: 48 83 ec 18 sub $0x18,%rsp
18: 89 7d ec mov %edi,-0x14(%rbp)
1b: e8 00 00 00 00 callq 20 # reloc: 00000000001c funcD
20: 48 89 c3 mov %rax,%rbx
23: 8b 45 ec mov -0x14(%rbp),%eax
26: 89 c7 mov %eax,%edi
28: e8 d3 ff ff ff callq 0 # no reloc needed, position independent code
2d: 48 89 da mov %rbx,%rdx
30: 89 c6 mov %eax,%esi
32: 48 8d 3d 00 00 00 00 lea 0x0(%rip),%rdi # reloc: 000000000035 .rodata
39: b8 00 00 00 00 mov $0x0,%eax
3e: e8 00 00 00 00 callq 43 # reloc: 00000000003f printf
43: 90 nop
44: 48 83 c4 18 add $0x18,%rsp
48: 5b pop %rbx
49: 5d pop %rbp
4a: c3 retq
readelf -r xxx.o
Relocation section '.rela.text' at offset 0xb10 contains 12 entries:
Offset Info Type Sym. Value Sym. Name + Addend
00000000001c 001400000004 R_X86_64_PLT32 0000000000000000 _Z5funcDv - 4
000000000035 000600000002 R_X86_64_PC32 0000000000000000 .rodata - 4
00000000003f 001500000004 R_X86_64_PLT32 0000000000000000 printf - 4
conclusion:
- referencing to local symbols (static function/variable) is completed using position independent code (using ip-relative addressing) without the need of relocations.
- referencing to data in other sections needs relocation of type R_X86_64_PC32, because linker will grouping & combine sections.
- referencing to external symbols need relocation of type R_X86_64_PLT32
- position independent code (relative addressing).
- indirect jump through an function table (like C++ virtual function pointer table).
method 1 is good for symbols within same .so file. we need method 2 to access/reference symbols in other .so.
section .rodata
8a0: db "can you see me=%d, %s?\n", 0
Disassembly of section .plt:
00000000000006a0 <_Z5funcDv@plt>:
6a0: ff 25 72 09 20 00 jmpq *0x200972(%rip) # 201018 <_GLOBAL_OFFSET_TABLE_+0x18>
6a6: 68 00 00 00 00 pushq $0x0
6ab: e9 e0 ff ff ff jmpq 690 <_init+0x28>
Disassembly of section .plt.got:
00000000000006b0 <.plt.got>:
6b0: ff 25 22 09 20 00 jmpq *0x200922(%rip) # 200fd8 <_DYNAMIC+0x1e0> printf@GLIBC_2.2.5 + 0
6b6: 66 90 xchg %ax,%ax
Disassembly of section .text:
00000000000007df <_ZL10local_funci>:
7df: 55 push %rbp
7e0: 48 89 e5 mov %rsp,%rbp
7e3: 53 push %rbx
7e4: 48 83 ec 18 sub $0x18,%rsp
7e8: 89 7d ec mov %edi,-0x14(%rbp)
7eb: e8 b0 fe ff ff callq 6a0 <_Z5funcDv@plt> # trampline: through GOT@201018
7f0: 48 89 c3 mov %rax,%rbx
7f3: 8b 45 ec mov -0x14(%rbp),%eax
7f6: 89 c7 mov %eax,%edi
7f8: e8 d3 ff ff ff callq 7d0 <_ZL13local_subfunci> # no reloc, internal reference
7fd: 48 89 da mov %rbx,%rdx
800: 89 c6 mov %eax,%esi
802: 48 8d 3d 97 00 00 00 lea 0x97(%rip),%rdi # 8a0 # no reloc, internal reference
809: b8 00 00 00 00 mov $0x0,%eax
80e: e8 9d fe ff ff callq 6b0 <_Z5funcDv@plt+0x10> # trampline through GOT@200fd8
813: 90 nop
814: 48 83 c4 18 add $0x18,%rsp
818: 5b pop %rbx
819: 5d pop %rbp
81a: c3 retq
Section .got
200fd8 dq 0x000000000000000 # dynamic relocation:
Section .got.plt
_Z5funcDv@GOT
201018 dq 0x000000000202000 # dynamic relocation:
readelf -r xxx.so
Relocation section '.rela.dyn' at offset 0x548 contains 11 entries:
Offset Info Type Sym. Value Sym. Name + Addend
000000200de0 000000000008 R_X86_64_RELATIVE 7a0
000000200de8 000000000008 R_X86_64_RELATIVE 760
000000201020 000000000008 R_X86_64_RELATIVE 201020
000000200fd8 000400000006 R_X86_64_GLOB_DAT 0000000000000000 printf@GLIBC_2.2.5 + 0
000000201028 000200000001 R_X86_64_64 0000000000000000 _Z5funcDv + 0
Relocation section '.rela.plt' at offset 0x650 contains 1 entries:
Offset Info Type Sym. Value Sym. Name + Addend
000000201018 000200000007 R_X86_64_JUMP_SLO 0000000000000000 _Z5funcDv + 0