StudyNote by freestyle - iamroot9C-arm/linux GitHub Wiki

=================================================================== lwn.net issues

- embedded linux conference
	http://events.linuxfoundation.org/events/archive/2016/embedded-linux-conference
- The dynamic debugging interface
	https://lwn.net/Articles/434833/
- Documentation/dynamic-debug-howto.txt
	https://lwn.net/Articles/434856/
- dynamic debug
	https://lwn.net/Articles/286191/

=================================================================== ์งˆ๋ฌธ๋“ค

- head.S๋ถ€ํ„ฐ ๋””๋ฒ„๊น… ํ•˜๋ ค๋ฉด?
	(gdb) file vmlinux
	(gdb) target remote :1234
	(gdb) load vmlinux
	(gdb) b *0x80008000
	add-symbol-file arch/arm/boot/compressed/vmlinux <- ํ•„์š”ํ•œ๊ฐ€?

- ๋ถ€ํŒ…์‹œ ์–ด๋””๋ถ€ํ„ฐ ์‹คํ–‰๋˜๋‚˜?
	vmlinux uImage zImage โ€ฆ

- atags / devicetree
	์™œ devicetree์ธ๊ฐ€?
- coprocessor ๋ช…๋ น์–ด๋Š” ์™œ ํ•„์š”ํ•œ๊ฐ€?

[aarch64 ๊ด€๋ จ]
- aarch64์™€ arm64 ์šฉ์–ด์˜ ์ฐจ์ด?
  arm ๊ณต์‹ ๋งค๋‰ด์–ผ์—์„œ๋Š” ๋ญ๋ผ๊ณ  ํ‘œํ˜„ํ•˜๋Š”๊ฐ€?
  ์ปค๋„์—์„œ๋Š” ์–ด๋–ค ํ‘œ๊ธฐ๋ฅผ ์‚ฌ์šฉํ•˜๋‚˜?
- arm64๋Š” ์™œ mach, plat ๋””๋ ‰ํ† ๋ฆฌ๊ฐ€ ํ•„์š”ํ•˜์ง€ ์•Š๊ฒŒ ๋˜์—ˆ๋‚˜?
- arm64์˜ FVP, RTSM, Foundation Model์€ ๋ฌด์—‡์ธ๊ฐ€? qemu์ฒ˜๋Ÿผ ๋ถ„์„์‹œ ์‚ฌ์šฉํ•  ์ˆ˜ ์žˆ๋‚˜?

=================================================================== QEMU with ddd

GDB client using gdbremote protocol

Advantages
+ Qemu is open source and has gdbremote stub
+ No โ€œrealโ€ hardware required
+ Good for testing generic kernel code, on many architectures
+ Good environment for developing GDB linux awareness extensions
Disadvantages
- If your bug is SoC or board related it is unlikely to be useful


toolschain download
	http://www.linaro.org/downloads/

	wget โ€ฆ / tar -xJf โ€ฆ
	sudo mkdir -p /opt/crosstools
	sudo mv gcc-linaro-5.3-2016.02-x86_64_arm-linux-gnueabihf /opt/crosstools
	export PATH=$PATH:/opt/crosstools/โ€ฆ/bin
	export CROSS_COMPILE=arm-linux-gnueabihf-

1. make ARCH=arm vexpress_defconfig
2. make ARCh=arm xconfig
3. make ARCH=arm zImage -j8 -sw V=1 2>&1 > build.log
4. make ARCH=arm vexpress-v2p-ca9.dtb

QEMU
	git clone git://git.qemu.org/qemu.git qemu.git
	cd qemu.git
	./configure --target-list=aarch64-softmmu,arm-softmmu,arm-linux-user --enable-fdt
		ERROR: DTC (libfdt) version >= 1.4.0 not present. Your options:
	         (1) Preferred: Install the DTC (libfdt) devel package
	         (2) Fetch the DTC submodule, using:
	             git submodule update --init dtc	<โ€”โ€”โ€” ์‹คํ–‰
	make


[์ฐธ๊ณ ] http://www.bennee.com/~alex/blog/2014/05/09/running-linux-in-qemus-aarch64-system-emulation-mode/		

qemu-system-arm -kernel arch/arm/boot/zImage -dtb arch/arm/boot/dts/vexpress-v2p-ca9.dtb -m 512 -M vexpress-a9 -cpu cortex-a9 -append "console=ttyAMA0"




qemu linux build...
http://imvoid.wordpress.com/2013/05/21/building-and-booting-linux-using-qemu/
http://www.linuxforu.com/2011/06/qemu-for-embedded-systems-development-part-1/

Debugging ARM programs inside QEMU
http://balau82.wordpress.com/2010/08/17/debugging-arm-programs-inside-qemu/

qemu build
(enable-sdl์„ ์œ„ํ•ด libsdl ์ œ๊ณตํ•˜๋Š” package ์„ค์น˜ํ•ด์•ผ ํ•จ, target-list์—์„œ softmmu์™€ arm-linux-user ์ฐจ์ด์ ์€?)
$ ./configure --enable-sdl --disable-kvm --enable-debug --target-list="arm-softmmu arm-linux-user"
$ make -j4
$ make install



qemu-system-arm -M vexpress-a9 -dtb ./arch/arm/boot/dts/vexpress-v2p-ca9.dtb -kernel ./arch/arm/boot/zImage -append "root=/dev/mmcblk0 console=ttyAMA0" -sd ../Images/RootFS.ext3 -serial stdio -s -S

# qemu-system-arm -kernel arch/arm/boot/zImage -dtb ./rtsm_ve-cortex_a15x4.dtb -m 512 -M vexpress-a15 -serial stdio -append "console=ttyAMA0"

$ qemu-system-aarch64 -m 512 -kernel linux-system-foundation.axf -hda saucy-arm64-multiarch.img

https://balau82.wordpress.com/2012/03/31/compile-linux-kernel-3-2-for-arm-and-emulate-with-qemu/

Running Linux in QEMUโ€™s aarch64 system emulation mode
http://www.bennee.com/~alex/blog/2014/05/09/running-linux-in-qemus-aarch64-system-emulation-mode/

=================================================================== AArch64

aarch64 memory model์— ๋Œ€ํ•œ ํŒŒ์•…
https://www.kernel.org/doc/Documentation/arm64/memory.txt



- 64bit OS์—์„œ  32bit application ์ˆ˜ํ–‰์‹œ ์–ด๋–ป๊ฒŒ aarch32๋กœ ์ „ํ™˜๋˜์–ด ์ˆ˜ํ–‰๋˜๋Š”๊ฐ€?
	process reset?
	mode control๋กœ ?
- Automatic event signaling ???
	This enables power-efficient, high-performance spinlocks.
- Larger register files
	Thirty-one 64-bit general-purpose registers increase performance and reduce stack use.
	banked๊นŒ์ง€ ํฌํ•จ???
- Hardware-accelerated cryptography
	instruction์ด ์ œ๊ณต???

2016.05.14 chapter 2๊นŒ์ง€ ๋ฆฌ๋”ฉ.

hypervisor๊ฐ€ guest os๋ฅผ 


Chapter 3.Fundamentals of ARMv8
- Exception level : In ARMv8, execution occurs at one of four Exception levels.
    In AArch64, the Exception level determines the level of privilege, in a similar way to the privilege levels defined in ARMv7. 
- security states : ARMv8-A provides two security states, Secure and Non-secure. The Non-secure state is also referred to as the Normal World.
- Execution States: The ARMv8 architecture defines two Execution States, AArch64 and AArch32.

- In AArch64, the processor modes are mapped onto the Exception levels as in Figure 3-6.
  As in ARMv7 (AArch32) when an exception is taken, the processor changes to the Exception level (mode) that supports the handling of the exception.

To change between execution states at the same Exception level, you have to switch to a higher Exception level then return to the original Exception level.
	=> ๊ฐ™์€ exception level์—์„œ execution state๋ฅผ ๋ฐ”๊พธ๋ ค๋ฉด ์ƒ์œ„ EL๋กœ ์˜ฌ๋ผ๊ฐ”๋‹ค๊ฐ€ return ํ•ด์•ผ ํ•œ๋‹ค.

There are times when you must change the execution state of your system.
This could be, for example, if you are running a 64-bit operating system, and want to run a 32-bit application at EL0. To do this, the system must change to AArch32.
64๋น„ํŠธ os์—์„œ 32๋น„ํŠธ elf๋ฅผ ์‹คํ–‰ํ•  ๋•Œ, execution state๋ฅผ 32๋น„ํŠธ๋กœ ์ „ํ™˜์‹œ์ผœ์•ผ ํ• ํ…๋ฐ, ๊ทธ ์ž‘์—…์„ ์ˆ˜ํ–‰ํ•˜๋Š” ์ปค๋„ ์ฝ”๋“œ๋Š”?



2016.05.28 chapter 3๊นŒ์ง€ ๋ฆฌ๋”ฉ.


chapter 4.

The selected Stack Pointer can be indicated by a suffix to the Exception level:
	t Indicates use of the SP0 Stack Pointer.
	h Indicates use of the SPx Stack Pointer.
Note
The t and h suffixes are based on the terminology of thread and handler, introduced in ARMv7-M.

When in AArch64 at an Exception level other than EL0, the processor can use either:
โ€ข A dedicated 64-bit stack pointer associated with that Exception level (SP_ELn).
โ€ข The stack pointer associated with EL0 (SP_EL0).
EL0 can only ever access SP_EL0.

EL0๋Š” SP_EL0๋งŒ ์ ‘๊ทผ ๊ฐ€๋Šฅํ•˜๋‹ค.

Exception level		Options
==================================
EL0			EL0t
EL1			EL1t, EL1h	; EL1t๋ฅผ ๋งŒ๋“  ์ด์œ ๊ฐ€ ๋ญ˜๊นŒ? EL0์˜ SP๋ฅผ ์ ‘๊ทผํ•˜๋Š” ๊ฒƒ์ธ๋ฐโ€ฆ
EL2			EL2t, EL2h
EL3			EL3t, EL3h

ERET ๋ช…๋ น์–ด๋กœ  exception์—์„œ ๋ณต๊ท€ํ•ด ๋Œ์•„๊ฐˆ ๋•Œ SPSR_ELn์ด PSTATE๋กœ ๋ณต์‚ฌ๋œ๋‹ค. ์ฃผ์†Œ๋Š” ELR_ELn์— ์ €์žฅ๋œ ์ฃผ์†Œ๋ถ€ํ„ฐ ์ˆ˜ํ–‰๋œ๋‹ค.

Table 8-4 shows how the CPSR is replaced by named fields within PSTATE.
	MRS X0, NZCV


* CP15 ๋ช…๋ น์ด ์‚ฌ๋ผ์ง€๊ณ  system configuration register๋กœ ์ œ์–ด๊ฐ€๋Šฅํ•˜๋‹ค.
In AArch64, system configuration is controlled through system registers, and accessed using MSR and MRS instructions.
This contrasts with ARMv7-A, where such registers were typically accessed through coprocessor 15 (CP15) operations.
The name of a register tells you the lowest Exception level that it can be accessed from.
For example:
	โ€ข TTBR0_EL1 is accessible from EL1, EL2, and EL3.	// EL1 ๋ ˆ์ง€์Šคํ„ฐ๋Š” EL1, 2, 3์—์„œ ์ ‘๊ทผ ๊ฐ€๋Šฅ.
	โ€ข TTBR0_EL2 is accessible from EL2 and EL3.


aarch64๋กœ ๋นŒ๋“œ๋œ ์ปค๋„์ด CONFIG_COMPAT์ผ ๊ฒฝ์šฐ aarch32์™€ aarch64๋ฅผ ๋ชจ๋‘ ์ง€์›ํ•œ๋‹ค.
	- elf header๋ฅผ ๋ณด๊ณ  ๋ณ€๊ฒฝ์‹œ์ผœ ์ฃผ๋Š” ๊ฒƒ์ธ์ง€?
	- system call ๋ฐœ์ƒ์‹œํ‚ค๋Š” instruction์„ decoding ํ•ด ํŒ๋‹จํ•˜๋Š” ๊ฒƒ์ธ์ง€?



6.3.11 Synchronization primitives
	LDXR/STXR
	LDXRP/STXRP : to allow code to atomically update a location that spans two registers.
	LDAXR/STLXR

	CONFIG_ARM64_LSE_ATOMICS

13.2.1 One-way barriers



13.2 Barriers



gcc -dM -E -xc /dev/null



6.18
5.1.2 Addressing
	Alignment checking



6.25์ผ์—
6.3 ๋ณผ ์ฐจ๋ก€

	zero extension
	sign extension
		            00 1010
		0000 0000 0000 1010

			11 1111 0001
		1111 1111 1111 0001

	x29 : fp
	x30 : lr
	sp, ๋”ฐ๋กœ, pc ๋”ฐ๋กœ


7.2์ผ
	8. porting A64 ๋ณผ ์ฐจ๋ก€

	* addressing mode
	  - offset mode
		LDR X0, [X1, X2, LSL, #3]
	  - pre-index
	    LDR X0, [X1, #8]!
	  - post-index
		LDR X0, [X1], #8
	  - pc-relative (new)
		LDR X0, <label>

	* ldar / stlr
	  - load-acquire
	  - store-release


7.9
	
	9. The ABI for ARM 64-bit Architecture
	10.2.6 The Exception Syndrome Register


7.16
	11.1๊นŒ์ง€.


7.23
	armv8์˜ page granual
	vipt-nonaliasing

	A53, A57 ๋น„๊ต
	Table 2-1 Comparison of ARMv8-A processors

	WA, RA (Write alloc, Read alloc) ์„ค์ •ํ•˜๋Š” ๊ณณ?
		๋‹ค๋ฅธ ๊ณณ ์„ค๋ช…์„ ๋ณด๋ฉด ๋ฉ”๋ชจ๋ฆฌ ๋กœ์ผ€์ด์…˜๋งˆ๋‹คโ€ฆ

	11.4 PoC, PoU
		PoC : Point of Coherency
		PoU : Point of Unification

	11.5 Cache maintenance
		For each of these operations, you can select which of the entries the operation should apply to: ๋ถ€ํ„ฐโ€ฆ

7.30
	TLB ์˜ต์…˜์œผ๋กœ IS๋งŒ ์‚ฌ์šฉ. ํ•  ๋•Œ๊ฐ€ ์žˆ๊ณ  ์•ˆ ํ•  ๋•Œ๊ฐ€ ์žˆ๋Š”๋ฐ?

	inner shareable, outer shareable : http://infocenter.arm.com/help/index.jsp?topic=/com.arm.doc.den0024a/CEGDBEJE.html
	13.3.1
	https://community.arm.com/thread/4800
	https://community.arm.com/thread/3005
	http://events.linuxfoundation.org/sites/events/files/slides/weak-to-weedy.pdf
	https://community.arm.com/thread/5400
	https://www.arm.com/files/pdf/CacheCoherencyWhitepaper_6June2011.pdf

		The division between inner and outer is implementation defined, but typically the set of inner attributes is used by caches that are integrated
		into the processor, whereas the outer attributes are exported from the processor to the external memory bus and are therefore potentially used
		by cache hardware external to the core or cluster.

	local -> none shareable, is -> inner shareable
     66 static inline void local_flush_tlb_all(void)
     67 {
     68     dsb(nshst);
     69     asm("tlbi   vmalle1");
     70     dsb(nsh);
     71     isb();
     72 }
     73
     74 static inline void flush_tlb_all(void)
     75 {
     76     dsb(ishst);
     77     asm("tlbi   vmalle1is");
     78     dsb(ish);
     79     isb();
     80 }

	12-6 ํŽ˜์ด์ง€ ๋ณผ ์ฐจ๋ก€


		local_flush_tlb_all	asm("tlbi   vmalle1");
		flush_tlb_all		asm("tlbi   vmalle1is");

			=> inner๊ฐ€ ๋”ฐ๋กœ ๋ฌถ์—ฌ ์žˆ๋Š” ๊ฒฝ์šฐ์—๋Š” ์–ด๋–ค ๋ช…๋ น์œผ๋กœ? ์•„๋‹ˆ๋ฉด ์•„์˜ˆ ๋ช…๋ น์„ ๋‚ด๋ฆด ํ•„์š”๊ฐ€ ์—†๋‚˜?

8.6
	TLB entry๋งˆ๋‹ค ์ง€์ •ํ•˜๋Š” size๊ฐ€ ๋‹ค๋ฅผํ…๋ฐ, ๊ทธ๊ฒƒ์€ ์–ด๋–ป๊ฒŒ ๊ตฌ๋ถ„๋˜๋Š”์ง€?
	VA | PA | attribute   <- attribute์˜ size ์ •๋ณด? VA?

	The ARMv8-A architecture provides a feature known as contiguous block entries to efficiently use TLB space.
	Translation table block entries each contain a contiguous bit. When set,			<-  ์—ฌ๊ธฐ์„œ ๋งํ•˜๋Š” block์ด๋ž€?
	this bit signals to the TLB that it can cache a single entry covering translations for multiple blocks. A lookup can
	index anywhere into an address range covered by a contiguous block. 
	
	์ œ๊ณตํ•˜๋Š” granule???
		Page Table์€ granule 4KB, 16KB, 64KB (CONFIG_ARM64_4K_PAGES,16K, 64K)
		ID_AA64MMFR0_EL1์— ๋ ˆ์ง€์Šคํ„ฐ TGRAN์— ์„ธํŒ….

	config PGTABLE_LEVELS
	    int
	    default 2 if ARM64_16K_PAGES && ARM64_VA_BITS_36
	    default 2 if ARM64_64K_PAGES && ARM64_VA_BITS_42
	    default 3 if ARM64_64K_PAGES && ARM64_VA_BITS_48
	    default 3 if ARM64_4K_PAGES && ARM64_VA_BITS_39
	    default 3 if ARM64_16K_PAGES && ARM64_VA_BITS_47
	    default 4 if !ARM64_64K_PAGES && ARM64_VA_BITS_48


	์šฐ๋ฆฌ ๋ณธ ์ฝ”๋“œ ๊ธฐ์ค€
	arch/arm/kernel/head.S
		mcr p15, 0, r4, c2, c0, 0       @ load page table pointer

	arch/arm/mm/proc-v7.S
	__v7_ca9mp_setup:
		v7_ttb_setup r10, r4, r8, r5        @ TTBCR, TTBRx setup
			mcr p15, 0, \zero, c2, c0, 2    @ TTB control register
				// ARM B3.5.4 Selecting between TTBR0 and TTBR1, Short-descriptor translation table format
				// split์„ 0 to 2**(32-N)-1 (N์€ 0~7). ์ฆ‰ 2G ์•„๋ž˜๋กœ๋งŒ split์ด ๊ฐ€๋Šฅํ•˜๋‹ค. ์ปค๋„์ด arm์ด ๋งŒ๋“ค์–ด ๋†“์€ ๋ชฉ์ ๋Œ€๋กœ ttbr0,1์„ ์‚ฌ์šฉํ•˜์ง€ ์•Š๋Š” ์ด์œ .
			ALT_SMP(orr \ttbr0, \ttbr0, #TTB_FLAGS_SMP)
			ALT_SMP(orr \ttbr1, \ttbr1, #TTB_FLAGS_SMP)
			mcr p15, 0, \ttbr1, c2, c0, 1   @ load TTB1			// ttb1๋งŒ ์ €์žฅํ•œ๋‹ค. swapper_pg_dir ๋ฐฑ์—…

	arch/arm/mm/proc-v7-2level.S
	ENTRY(cpu_v7_switch_mm)
	์ƒˆ๋กœ์šด mm์œผ๋กœ context๋ฅผ ์‹œ์ž‘ํ•˜๊ธฐ ์œ„ํ•ด TTB0์™€ context ID๋ฅผ ์„ค์ •ํ•œ๋‹ค.
	mcr p15, 0, r0, c2, c0, 0       @ set TTB 0					// context switch์—์„œ ttbr0๋ฅผ current->mm.pgd๋กœ ๊ต์ฒด.


	arch/arm/include/asm/mmu_context.h
		check_and_switch_context			// Required during context switch to avoid speculative page table walking with the wrong TTBR.

	secondary_startup
		ldr r4, [r7, lr]            @ get secondary_data.pgdir		// __cpu_up์—์„œ ์ค€๋น„ํ•ด๋‘ 
		ldr r8, [r7, lr]            @ get secondary_data.swapper_pg_dir
		adr lr, BSYM(__enable_mmu)      @ return address
		ARM(   add pc, r10, #PROCINFO_INITFUNC ) @ initialise processor	// __v7_proc __v7_ca9mp_setup์—์„œ v7_ttb_setup r8์„ ttbr1.
	__enable_mmu:
		secondary_startup์—์„œ r4: idmap_pgd (__turn_mmu_on ~ cpu_v7_reset ์˜์—ญ์— ๋Œ€ํ•ด์„œ๋งŒ ๋งคํ•‘, early_initcall(init_static_idmap)์—์„œ.), r8: swapper_pg_dir์ด ๋“ค์–ด๊ฐ„๋‹ค.
					idmap_pgd = pgd_alloc(&init_mm);	// init_static_idmap()์—์„œ ํ˜ธ์ถœ๋˜๋ฉฐ, kernel ์˜์—ญ๊ณผ IO ์˜์—ญ์˜ page table์„ ๋ณต์‚ฌํ•œ๋‹ค.
		r8์„ ๋„ฃ์–ด์ค€ ์œ„์น˜๋Š”?
		mcr p15, 0, r4, c2, c0, 0       @ load page table pointer (TTB 0์— ์ €์žฅ)
		b __turn_mmu_on
			__secondary_switched					// secondary_startup์—์„œ ๋“ค์–ด์˜ฌ ๋•Œ.
				b   secondary_start_kernel
					cpu_switch_mm(mm->pgd, mm);		// init_mm์˜ pgd๋Š” swapper_pg_dir.
						cpu_v7_switch_mm
							mcr p15, 0, r0, c2, c0, 0       @ set TTB 0

	[์ฐธ๊ณ ์ž๋ฃŒ]
		https://lkml.org/lkml/2013/6/26/544
		http://elinux.org/Tims_Notes_on_ARM_memory_allocation


	12-7 ํŽ˜์ด์ง€ ๋ณด๋Š” ์ค‘. ์ฑ•ํ„ฐ 12.2
		https://community.arm.com/thread/5400



8.13
	64KB granule, 42-bit์ผ ๋•Œ 8192 entry.

	the page table entry refers to a 512MB page (it is a block descriptor).

	Because we have a 512MB page, bits [28:0] of the VA are taken to form PA[28:0]. 


	12.4.2 Effect of granule sizes on translation tables

	2level
	1. Granule๊ณผ VA๋ฅผ ๋จผ์ € ์„ค์ •ํ•œ๋‹ค.
		G=64KB, VA=42bit
	   VA์˜ [63:42]๊ฐ€ 1์ด๋ฉด TTBR1, 0์ด๋ฉด TTBR0 
	2. contains 8192 64-bit page table entries, and is indexed via VA[41:29]. 
	   Bits [28:16] of the VA are used to index the level 3 page table entry.
	3. 64KB page [15:0]

		12.5.Virtual address cacheing ๋ด„.

9.3 

	TTBR0_EL2
	Holds the base address of the translation table for the stage 1 translation of memory accesses from EL2.

	VTTBR_EL2
	Holds the base address of the translation table for the stage 2 translation of memory accesses from Non-secure EL0 and EL1.
		Guest OS์—์„œ ๋„˜์–ด์˜จ IPA์— ๋Œ€ํ•œ PT.
		VMID: The VMID for the translation table.

	
	The SCTLR_ELn bits can be cached in a TLB entry. Therefore, changing the bit in the SCTLR might not affect entries already in the TLBs. When modifying these bits, a TLB invalidate and ISB sequence is necessary. 

	SCR ๋ ˆ์ง€์Šคํ„ฐ์˜ This bit is permitted to be cached in a TLB. ???


	arch/arm64/kernel/process.c tpidr context switch์‹œ ์Šค์œ„์นญ
	arch/arm64/mm/kernel.c

	12์ฃผ ์™„๋ฃŒ.



9.10
	ARM B2.8.2 memory type
	Strongly-ordered	Device-nGnRnE
	Device memory type	Device-nGnRE

	normal๊ณผ device ๋‘ ๊ฐ€์ง€์˜ ํƒ€์ž….
	โ€œstrongly orderedโ€์™€ device๊ฐ€ ๋ณ„ ์ฐจ์ด ์—†๊ธฐ ๋•Œ๋ฌธ์— ARMv8์—์„œ๋Š” ์—†์• ๋ฒ„๋ ธ๋‹ค.

	In addition to the memory type, attributes also provide control over cacheability, shareability, access, and execution permissions.
	Shareable and cache properties pertain only to Normal memory. Device regions are always deemed to be non-cacheable and outer-shareable. 


	The memory type is not directly encoded in the translation table entry.
	Instead, each block entry specifies a 3-bit index into a table of memory types.
	This table is stored in the Memory Attribute Indirection Register MAIR_ELn. 

	http://blog.tek-life.com/%E7%90%86%E8%A7%A3armv8-device-memory%E7%9A%84%E4%B8%89%E4%B8%AA%E5%B1%9E%E6%80%A7/
	This means that a DSB barrier, executed by the processor that performed the write to the No Early Write Acknowledgement location, completes only after the write has reached its endpoint in the memory system. Typically, this endpoint is a peripheral or physical memory.


	Load-Acquire (LDAR)
	All loads and stores that are after an LDAR in program order, and that match the shareability domain of the target address, must be observed after the LDAR.
		LDAR ๋‹ค์Œ์˜ ๋ชจ๋“  load์™€ store ๋ช…๋ น๋“ค(target address์™€ shareability domain์ด ์ผ์น˜ํ•˜๋Š” ๊ฒƒ๋“ค)์€ LDAR ๋‹ค์Œ์— ๊ด€์ฐฐ๋˜๊ฒŒ ํ•œ๋‹ค.

	Store-Release (STLR)
	All loads and stores preceding an STLR that match the shareability domain of the target address, must be observed before the STLR.

	There are also exclusive versions of the above, LDAXR and STLXR, available.

	Unlike the data barrier instructions, which take a qualifier to control which shareability domains see the effect of the barrier,
	the LDAR and STLR instructions use the attribute of the address accessed.

	13.2.1 One-way barriers ๋ณด๋Š” ์ค‘
	page table entry์—์„œ 3bit๊ฐ€ MAIR์˜ index๋กœ ์‚ฌ์šฉ๋˜๋Š”๋ฐ, page table entry๋ฅผ ์ง์ ‘ ํ™•์ธํ•ด ๋ด์•ผ๊ฒ ๋‹ค.


9.24
	p.1774 Attribute fields in stage 2 VMSAv8-64 Block and Page descriptors
	Lower attributes
		SH, bits[9:8]		Shareability field,
		MemAttr, bits[5:2]	Stage 2 memory attributes,
		see The memory region attributes for stage 2 translations, EL1&0 translation regime on page D4-1786.

	An LDAR instruction guarantees that any memory access instructions after the LDAR, are only visible after the load-acquire.
	A store-release guarantees that all earlier memory accesses are visible before the store-release
	becomes visible and that the store is visible to all parts of the system capable of storing cached data at the same time.
		store-release๋Š” 
		๊ทธ์™€ ๋™์‹œ์— โ€ฆ ์บ์‹œ๋œ ๋ฐ์ดํ„ฐ๋ฅผ ์ €์žฅํ•  ์ˆ˜ ์žˆ๋Š” ์‹œ์Šคํ…œ์˜ ๋ชจ๋“  ๋ถ€๋ถ„์— ๋Œ€ํ•ด store๋ฅผ visible ํ•˜๊ฒŒ ํ•œ๋‹ค.


	*** VA length์™€ Page size๊ฐ€ ์ง€์ •๋˜๋ฉด translation level์ด ๊ฒฐ์ •๋œ๋‹ค.
		L1 Translation์œผ๋กœ ์ง€์ •ํ•  ์ˆ˜ ์žˆ๋Š” block์˜ ํฌ๊ธฐ๋Š”?

		Figure 12-6 Translation table control register
		The two-bit Translation Granule (TG) TG1 and TG0 fields give the granule size for kernel or user space respectively, 00=4KB, 01=16KB, 11=64KB.

	13.3 Memory Attribute

	14.
		MPIDR_EL1
			; PG์—๋Š” MPDIR_EL3๋„ ์จ๋‘์—ˆ๋Š”๋ฐ, ARMv8 ARM์—๋Š” ๊ทธ๋Ÿฐ ์ด๋ฆ„์ด ์—†๋‹ค.
			; ATF์—์„œ plat ์ฝ”๋“œ์— mpdir aff0์€ cpu id๋ฅผ, aff1์€ cluster id๋ฅผ ์ฝ์–ด์˜ฌ ๋•Œ ์‚ฌ์šฉํ•˜๊ณ  ์žˆ๋‹ค.

	์—ฌ๊ธฐ๋ณด๋Š” ์ค‘
	Interrupt handling can also be load balanced across cores. This can help improve performance or save energy. Balancing interrupts across cores or reserving cores for particular types of interrupts can result in reduced interrupt latency. This might also result in reduced cache use which helps improve performance.

ARM p.1519
On taking an exception to AArch64 state:
โ€ข The PE state is saved in the SPSR_ELx at the Exception level the exception is taken to. See Saved Program Status Registers (SPSRs) on page D1-1506.
โ€ข The preferred return address is saved in the ELR_ELx at the Exception level the exception is taken to. See Exception Link Registers (ELRs) on page D1-1509.
โ€ข All of PSTATE.{D, A, I, F} are set to 1. See Process state, PSTATE on page D1-1511.
โ€ข If the exception is a synchronous exception or an SError interrupt, information characterizing the reason for the exception is saved in the ESR_ELx at the Exception level the exception is taken to. See Use of the ESR_EL1, ESR_EL2, and ESR_EL3 on page D1-1521.
โ€ข Execution moves to the target Exception level, and starts at the address defined by the exception vector. Which exception vector is used is also an indicator of whether the exception came from a lower Exception level or the current Exception level. See Exception vectors on page D1-1520.
โ€ข The stack pointer register selected is the dedicated stack pointer register for the target Exception level. See The stack pointer registers on page D1-1505.

D1.14 Asynchronous exception types, routing, masking and priorities
p.1553


10.01

	Timer
	CNTFRQ_EL0, Counter-timer Frequency register
		Holds the clock frequency of the system counter.
		It is only the register that is per-core
	CNTPCT_EL0, Counter-timer Physical Count register
		Holds the 64-bit physical count value.
	CNTKCTL_EL1 controls whether EL0 can access the system timer.


	SMP
		Symmetric Multi-Processing (SMP) is a software architecture that dynamically determines the roles of individual cores.
		Each core in the cluster has the same view of memory and of shared hardware.
	AMP
		An Asymmetric Multi-processing (AMP) system enables you to statically assign individual roles
		to a core within a cluster so that, in effect, you have separate cores, each performing separate jobs within each cluster. 


	14.3๋ถ€ํ„ฐ ๋ณผ์ฐจ๋ก€โ€ฆ

10.22
	shareability attribute๋Š” ์–ด๋””์— ์žˆ๋‚˜?
		D4.3.3 Memory attribute fields in the VMSAv8-64 translation table format descriptors
		lower attributes ์ค‘ SH, bits[9:8].
		Table D4-36 SH[1:0] field encoding for Normal memory, VMSAv8-64 translation table format

		The ARMv8 processors use the MOESI protocol. 


	http://egloos.zum.com/studyfoss/v/5144244
	http://jake.dothome.co.kr/cache4/
	http://infocenter.arm.com/help/index.jsp?topic=/com.arm.doc.dai0425/ch03s12s01.html


	A write can only be performed if the cache line is in a Modified or Exclusive state.
	If it is in a Shared state, all other cached copies must be invalidated first. A write moves the line into a Modified state.
		=> O ์ผ ๋•Œ๋Š”?

	A cache can discard a shared line at any time, changing it to an Invalid state. A Modified line is written back first.
		=> O ์ผ ๋•Œ๋Š”?

	If a cache holds a line in a Modified state, reads from other caches in the system receive the updated data from the cache.
	Conventionally, this is achieved by first writing the data to main memory and then changing the cache line to a Shared state, before performing a read.
		=> core 0์€ Modified -> Owned
		   core 1์€ Invalid -> Shared

	A cache that has a line in an Exclusive state must move the line to a Shared state when another cache reads that line.
		=> E ์ด์—ˆ์ง€๋งŒ ๋‹ค๋ฅธ cache๊ฐ€ ์ฝ์–ด๊ฐ€๋ฉด S๋กœ ๋ฐ”๊พผ๋‹ค.

	A Shared state might not be precise. If one cache discards a Shared line,
	another cache might not be aware that it can now move the line to an Exclusive state.
		=> S์˜€๋‹ค๊ฐ€ ๊ทธ๊ฒƒ์„ ๋ฒ„๋ฆด ๋•Œ, ๋‹ค๋ฅธ cache๋Š” E๋กœ ๋ฐ”๋€” ์ˆ˜ ์žˆ๋Š”๋ฐ, ๋ชจ๋ฅผ ์ˆ˜ ์žˆ๋‹ค.


	ACP๋Š” SCU์— ์—ฐ๊ฒฐ.
	When the FPGA-based accelerator produces a new result, the data needs
	to be passed back to the processor as quickly as possible, so that the processor can update its view of the data.

	The Accelerator Coherency Port (ACP) is an optional AXI 64-bit slave port that can be connected to non-cached
	AXI master peripherals, such as a DMAengine or cryptographic engine.

	ARM Cortex-A9 processor-based SoC FPGAs include a feature called an Accelerator Coherency Port (ACP).
	Through the ACP, new data produced by an FPGA-based hardware accelerator is transferred directly to the
	processorโ€™s L2 cache, via a low-latency direct connection (Figure 1). This operation is performed not just
	quickly, but coherently too.
	=> cache์— ์žˆ์œผ๋ฉด cache ๋ฐ์ดํ„ฐ๋ฅผ ์“ฐ๊ฒ ๋‹ค.


	cluster ๊ฐ„์„ ํฌํ•จํ•œ ๋‹ค๋ฅธ master ์‚ฌ์ด์˜ cache coherence๋Š” CCI-400์œผ๋กœ ํ•œ๋‹ค.

	Extending hardware coherency to a multi-cluster system requires a coherent bus protocol.
	The AMBA 4 ACE specification includes the AXI Coherency Extensions (ACE).
	The full ACE interface enables hardware coherency between clusters and enables an SMP operating system to run on many cores.


10.29
	14.4.1 Compute subsystems and mobile applications ํ•  ์ฐจ๋ก€

	Idle state
	โ€ข Standby. - core is left powerf-up, and clock-gating. WFI๋‚˜ WFE๋กœ ์ง„์ž…๋จ. retention๊ณผ ์œ ์‚ฌ. 
		The difference is evident to an external debugger and in hardware implementation,
	โ€ข Retention. (์œ ์ง€) - External Debug Request๋ฅผ ์ œ์™ธํ•˜๊ณ  standy์™€ ์œ ์‚ฌํ•˜๋‹ค.
	โ€ข Power down.
	โ€ข Dormant mode. (ํœด๋ฉด)
		Dormant mode is an implementation of a power-down state. In dormant mode, the core logic is powered down, but the cache RAMs are left powered up. 
		Dormant mode is therefore much more likely to be useful in a single core environment rather than in a cluster.
	โ€ข Hotplug. - power down๊ณผ ์–ด๋–ป๊ฒŒ ๋‹ค๋ฅธ๊ฐ€? 1,2๋ฒˆ์„ ๋‹ค ์ฝ์–ด๋„ ํ™•์‹คํ•˜์ง€ ์•Š์€๋ฐ, power down์—์„œ๋Š” context๋ฅผ ์ €์žฅํ•˜๊ณ  ๋ณต์›. hotplug๋Š” secondary core ๋ถ€ํŒ…ํ•˜๋“ฏ์ด..


	17. Security
	Only stage one translations are allowed in the Secure world and there is no TTBR1_EL3. ์ง„์งœ? stage one๋งŒ?
	The AArch64 EL1 translation table registers are not banked between security states and therefore the value of TTBR0_EL1, TTBR1_EL1,
	and TCR_EL1 must be saved and restored for each world as part of the Secure Monitorโ€™s context switching operation. 


	17.1 TrustZone hardware architecture

		ํŽ˜์ด์ง€ ์•„๋ž˜์—์„œ ๋ฐ‘์—์„œ ๋‘ ๋ฒˆ์งธ.
		TTBR1_EL3๋Š” ๊ตฌ์กฐ์ ์œผ๋กœ ์—†๋‹ค.
		TTBR0_EL1, TTBR1_EL1, and TCR_EL1 must be saved and restored for each world as part of the Secure Monitorโ€™s context switching operation.
		

11.05
	17 ๋‚˜๋จธ์ง€.
		ARMv7์€ SCR (Secure Configuration Register) ๋ ˆ์ง€์Šคํ„ฐ์˜ NS ๋น„ํŠธ ์„ค์ •.

		ARM ARM B1.5 The Security Extensions
			Changing from Secure to Non-secure state


* PERF/FTRACE ๋ถ„์„
		https://github.com/brendangregg/perf-tools

		ftrace, perf-events

		dtrace

	PERF
		http://www.brendangregg.com/perf.html
		https://perf.wiki.kernel.org/index.php/Main_Page
		https://www.youtube.com/watch?v=kWnx6eOGVYo
		
	์ฑ… - ์ฃผ์ œ์— ํ•ด๋‹นํ•˜๋Š” ๋‚ด์šฉ
		context switching
		timer (out of date - wheel)

	๋ฆฌ๋ˆ…์Šค API์˜ ๋ชจ๋“  ๊ฒƒ
	The Linux Programming Interface: A Linux and UNIX System Programming Handbook

	ftrace
	์ž๊ฒฉ ์ฆ๋ช… : ๋ณด์•ˆ ๊ด€๋ จ๋œ ๊ฒƒ

========================================================================================================================================================== kernel 2016.11.05 : trace basic 2016.11.12 : ๊ด‘์žฅ 2016.11.19 : ftrace internal http://blog.daum.net/_blog/BlogTypeView.do?blogid=0YW8F&articleno=127&_bloghome_menu=recenttext

* ftrace ๊ธฐ๋ณธ ์‚ฌ์šฉ ๋ฐฉ๋ฒ•

	mount -t debugfs nodev /sys/kernel/debug
	cd /sys/kernel/debug/tracing
	cat available_tracer
	echo function > ./current_tracer
	vi tracer
	echo function_graph > ./current_tracer


* ์ปค๋„์—์„œ FTRACE ์‚ฌ์šฉํ•˜๊ธฐ

	Makefile์˜ ๋‚ด์šฉ : ์ปดํŒŒ์ผ๋Ÿฌ ์˜ต์…˜์„ ํ†ตํ•ด ๋ฐฐ๋‹ˆ์–ด ์ฝ”๋“œ๋ฅผ ์‚ฌ์šฉ

	ifdef CONFIG_FUNCTION_TRACER
	KBUILD_CFLAGS   += -pg
	ifdef CONFIG_DYNAMIC_FTRACE
		ifdef CONFIG_HAVE_C_RECORDMCOUNT
			BUILD_C_RECORDMCOUNT := y
			export BUILD_C_RECORDMCOUNT
		endif
	endif
	endif

* [Linux Kernel Trace] 01_1 - ftrace ์‚ฌ์šฉํ•˜๊ธฐ



	echo 'main(){}' | arm-linux-gnueabihf-gcc -x c -S -o output - -pg -static

	โ€ฆ
	main:
		push lr
		bl      __gnu_mcount_nc

	arch/arm64/kernel/stacktrace.c


	arch/arm/kernel/entry-common.S   : __gnu_mcount_nc
	arch/arm64/kernel/entry-ftrace.S : _mcount

	DYNAMIC_TRACE์ธ ๊ฒฝ์šฐ, nop์„ ํƒ€๊ฒŒ ํ•ด๋’€๋‹ค๊ฐ€, tracer๊ฐ€ ์ง€์ •๋˜๋ฉด nop์„ ๋ฐ”๊ฟ”์น˜๊ธฐ ํ•œ๋‹ค.
	arch/arm64/kernel/ftrace.c : ftrace_update_ftrace_func
		ftrace_stub

	mcount, tracepoint, kprobees

	- mcount : -pg๋ฅผ ํ†ตํ•ด ํ•จ์ˆ˜ ํ˜ธ์ถœ์‹œ๋งˆ๋‹ค bl _mount. arm์˜ ๊ฒฝ์šฐ entry-common.S์— mcount ๊ตฌํ˜„. tracer์— ๋”ฐ๋ผ ๋™์ž‘
	- tracepoint : 
		https://www.kernel.org/doc/Documentation/trace/tracepoints.txt
		https://www.kernel.org/doc/Documentation/trace/tracepoint-analysis.txt
	- kprobes : __arm_kprobe() ํ•จ์ˆ˜๋ฅผ ํ˜ธ์ถœํ•˜์—ฌ ์›๋ž˜์˜ opcode๋ฅผ breakpoint instruction (int3)์œผ๋กœ ๊ต์ฒดํ•œ๋‹ค.
	  (kprobe, jprobe, kretprobe)

==========================================================================================================================================================

Page Reclaim ULK Ch.17

Block LKD Ch.14

buffer head : ํ•˜๋‚˜์˜ ๋ฒ„ํผ์— ๋Œ€ํ•œ descriptor

block ๋‹จ์œ„๋กœ io๋ฅผ ํ•œ๋‹ค. buffer๋ผ๊ณ  ํ•˜๋Š”

block : data๋ฅผ readํ•œ ๋’ค๋‚˜ write๊ฐ€ ๋ฉ”๋ชจ๋ฆฌ์— pending ๋˜์–ด ์žˆ๋‹ค.

struct bio, bio_vec

page cache : VFS ๊ณ„์ธต์—์„œ ๋””์Šคํฌ ์ ‘๊ทผ์„ ์ตœ์†Œํ™” ํ•˜๊ธฐ ์œ„ํ•ด ํ•œ ๋ฒˆ ์ ‘๊ทผํ•œ ๋””์Šคํฌ์˜ ๋‚ด์šฉ์„ ์ €์žฅํ•˜๊ณ  ์žˆ๋‹ค.

o Architecture ๊ตฌ๋ถ„ Multi SMP : ๋‘ ๊ฐœ ์ด์ƒ์˜ ๋™์ผํ•œ ํ”„๋กœ์„ธ์„œ๊ฐ€ ํ•˜๋‚˜์˜ ๋ฉ”๋ชจ๋ฆฌ, I/O ๋””๋ฐ”์ด์Šค, ์ธํ„ฐ๋ŸฝํŠธ ๋“ฑ ์ž์›์„ ๊ณต์œ ํ•˜์—ฌ ๋‹จ์ผ ์‹œ์Šคํ…œ ๋ฒ„์Šค๋ฅผ ํ†ตํ•ด ๊ฐ๊ฐ์˜ ํ”„๋กœ์„ธ์„œ๊ฐ€ ๋‹ค๋ฅธ ํ”„๋กœ๊ทธ๋žจ์„ ์‹คํ–‰ํ•˜๊ณ , ๋‹ค๋ฅธ ๋ฐ์ดํ„ฐ๋ฅผ ์ฒ˜๋ฆฌํ•˜๋Š” ์‹œ์Šคํ…œ. UMA - ๋ชจ๋“  ํ”„๋กœ์„ธ์„œ๊ฐ€ ๋™์ผํ•œ ๋ฉ”๋ชจ๋ฆฌ ๋ฒ„์Šค๋ฅผ ๊ณต์œ ํ•˜๋Š” ์‹œ์Šคํ…œ. NUMA - non-uniform memory access http://en.wikipedia.org/wiki/Non-uniform_memory_access

		๋ฉ€ํ‹ฐํ”„๋กœ์„ธ์‹ฑ ์‹œ์Šคํ…œ์—์„œ ์ง€์—ญ์ ์œผ๋กœ ๋ฉ”๋ชจ๋ฆฌ๋ฅผ ๊ณต์œ ํ•˜์—ฌ bottleneck
		NUMA SMP ์‹œ์Šคํ…œ์˜ ํด๋Ÿฌ์Šคํ„ฐ๋“ค ์‚ฌ์ด์— ์žˆ๋Š” ๋ฒ„์Šค์—์„œ๋Š” SCI (scalable coherent interface) ๊ธฐ์ˆ ์„ ์‚ฌ์šฉํ•˜์—ฌ ๋ฐ์ดํ„ฐ๊ฐ€ ์›€์ง์ธ๋‹ค. SCI๋Š” ๋‹ค์ค‘ ํด๋Ÿฌ์Šคํ„ฐ์˜ ๋…ธ๋“œ์— ๊ฑธ์ณ ์บ์‹œ ์ผ๊ด€์„ฑ์ด๋ผ๊ณ  ๋ถˆ๋ฆฌ๋Š” ๊ฒƒ๊ณผ ๋Œ€๋“ฑํ•˜๋‹ค.
AMP : ๋‘ ๊ฐœ ์ด์ƒ์˜ ํ”„๋กœ์„ธ์„œ๊ฐ€ ๊ฐ๊ฐ ์ž์‹ ๋งŒ์˜ ํŠน์ • ๊ธฐ๋Šฅ์„ ์ˆ˜ํ–‰ํ•˜๋Š” ์•„ํ‚คํ…์ณ. 
  ์˜ˆ๋ฅผ ๋“ค์–ด ํ•˜๋‚˜์˜ ํ”„๋กœ์„ธ์„œ๊ฐ€ ๋ฉ”์ธ ์šด์˜์ฒด์ œ๋ฅผ ์‹คํ–‰ํ•˜๊ณ , ๋‹ค๋ฅธ ํ”„๋กœ์„ธ์„œ๋Š” I/O ์˜คํผ๋ ˆ์ด์…˜ ๊ธฐ๋Šฅ์„ ์ˆ˜ํ–‰ํ•˜๋Š” ํ˜•ํƒœ.
  ์ด ๋•Œ, ๋‘ ๊ฐœ์˜ ํ”„๋กœ์„ธ์„œ๋Š” ๋ฉ”์ธ ๋ฉ”๋ชจ๋ฆฌ์—์„œ ์ž์‹ ์˜ ์ปค๋„ ์ด๋ฏธ์ง€๋ฅผ ์‹คํ–‰ํ•˜๊ณ  ์ฃผ์†Œ ๊ณต๊ฐ„ ์—ญ์‹œ ๋ถ„๋ฆฌ๋˜์–ด ์žˆ๋Š” ์ปจ์…‰.
  (์ปค๋„์€ ๋™์ผํ•œ ์ปค๋„์„ ์‚ฌ์šฉํ•  ์ˆ˜๋„, ๋‹ค๋ฅธ ์ปค๋„์„ ์‚ฌ์šฉํ•  ์ˆ˜๋„ ์žˆ๋‹ค)


heterogeneous multi-core

intel developers guide
www.coreboot.org

o ๋ถ„์„ tools / knowhow - ํŠน์ • ํŒŒ์ผ์˜ ์ „์ฒ˜๋ฆฌ๋œ ํŒŒ์ผ ์ƒ์„ฑ make mm/memory.i

- ํŠน์ • ๋””๋ ‰ํ† ๋ฆฌ (๋ชจ๋“ˆ) ๋นŒ๋“œ
	make SUBDIRS=drivers/block/company modules

- ๋นŒ๋“œ์‹œ ์ „์ฒด ์ค‘๊ฐ„ํŒŒ์ผ ์ƒ์„ฑ
	KBUILD_CFLAGS์— -save-temps=obj ์ถ”๊ฐ€

- dump ํŒŒ์ผ ์ƒ์„ฑ (d: dump, S: full contents, s: source, l:line number, x: all-headers)
	arm-linux-gnueabihf-objdump -ds vmlinux > vmlinux.txt

- ์ปค๋„ ๋ถ„์„ ํ›„ ๊ฐ ๋””๋ ‰ํ† ๋ฆฌ์— build์‹œ ์‚ฌ์šฉ๋œ command๊ฐ€ ๋‚จ์•„ ์žˆ๋‹ค.
	vi .vmlinux.cmd

o profiling DDMS oprofile

o ftrace https://www.kernel.org/doc/Documentation/trace/ftrace.txt https://www.kernel.org/doc/Documentation/trace/ftrace-design.txt http://elinux.org/Ftrace

http://lwn.net/Articles/322666/
http://lwn.net/Articles/365835/
http://lwn.net/Articles/366796/
http://lwn.net/Articles/370423/

http://events.linuxfoundation.org/slides/2010/linuxcon_japan/linuxcon_jp2010_rostedt.pdf

http://www.kandroid.org/board/board.php?board=conference&command=body&no=81

- kernel internal operation tracer
- kenrel ๋ถ„์„์‹œ ์ข…์ข… ๋ดค๋˜ trace ๊ด€๋ จ ํ•จ์ˆ˜๋“ค์ด ์ด๋Ÿฐ ๋ถ„์„์„ ์œ„ํ•œ ๊ฒƒ์ด์—ˆ๊ตฌ๋‚˜
- sys/debugfs๋ฅผ ์‚ฌ์šฉํ•˜๋Š” ๊ฒƒ์œผ๋กœ ๋ณด์•„ runtime์‹œ์— ์ง์ ‘ ํ™•์ธํ•  ์ˆ˜ ์žˆ๋Š” ๊ฒƒ์ด์—ˆ๊ตฌ๋‚˜.

- config์˜
โ€œKernel hackingโ€
	โ€œTracersโ€

file:///Users/freestyle/Downloads/30.kandroid-8th-debugging-ftrace-20111016-1550.pdf

o http://elinux.org/Kernel_Trace_Systems

o Data Structure

TREE - rbtree

	http://thesoul214.tistory.com/113
	http://blog.secmem.org/177

	http://ko.wikipedia.org/wiki/%EB%A0%88%EB%93%9C-%EB%B8%94%EB%9E%99_%ED%8A%B8%EB%A6%AC
	http://en.wikipedia.org/wiki/Red%E2%80%93black_tree
	http://sweeper.egloos.com/900135


TREE - ast-tree
	http://talkingaboutme.tistory.com/514
	https://www.google.co.kr/#newwindow=1&q=ast+tree


LIST - doubly linked list

HLIST

o DMA http://elinux.org/images/4/49/20140429-dma.pdf

DMA Engine != DMA mapping

* architecture specific
#include <asm/cacheflush.h>
#include <asm/outercache.h>


* DMA Mapping API

linux/dma-mapping.h
โ”‚
โ”œโ”€ linux/dma-attrs.h
โ”œโ”€ linux/dma-direction.h
โ”œโ”€ linux/scatterlist.h
โ”‚
#ifdef CONFIG_HAS_DMA
โ””โ”€ asm/dma-mapping.h
#else
โ””โ”€ asm-generic/dma-mapping-broken.h
#endif

linux/dma-mapping.h
โ”‚
โ”œโ”€ linux/dma-attrs.h
โ”œโ”€ linux/dma-direction.h
โ”œโ”€ linux/scatterlist.h
โ”‚
โ””โ”€ arch/arm/include/asm/dma-mapping.h
 โ”‚
 โ”œโ”€ asm-generic/dma-mapping-common.h
 โ””โ”€ asm-generic/dma-coherent.h

o dma mapping arch/arm/mm/dma-mapping.c get_dma_ops()์—์„œ archdata๊ฐ€ ๋ณ„๋„๋กœ ์ง€์ •๋˜์ง€ ์•Š์•˜์„ ๊ฒฝ์šฐ ์‚ฌ์šฉ๋จ.

struct dma_map_ops arm_dma_ops = {
	.alloc		= arm_dma_alloc,
	.free		= arm_dma_free,
	.mmap		= arm_dma_mmap,
	โ€ฆ
};

o DMA Engine Driver PL08x drivers/dma/amba-pl08x.c

64bit VA arm64:numa: Add numa support for arm64 platforms.

percpu first chunk๋ฅผ lowmem์— ์œ„์น˜ <- ๋‹ค๋ฅธ node์— ์žˆ์–ด์„œ ๋А๋ฆฌ์ง€ ์•Š๋ƒ?
CPU์ˆ˜๋งŒํผ copy.
numa๋Š” dynamic chunk๋ฅผ numa์— ์œ„์น˜.

39bit

SMP

o percpu

o spinlock
	if (smp)	// spinlock_api_smp.h
		preempt_disable();
		do_raw_spin_lock();
	else		// spinlock_api_up.h
		preempt_disable();

o smp_mb() ? dmb() : barrier()

scheduling o preempt_disable() #define preempt_disable()
do {
inc_preempt_count();
barrier();
} while (0)

o preempt_count
	#define inc_preempt_count() add_preempt_count(1)
  	# define add_preempt_count(val) do { preempt_count() += (val); } while (0)
	#define preempt_count() (current_thread_info()->preempt_count)

o barrier()	/* compiler optimize barrier */
	#define barrier() __asm__ __volatile__("": : :"memory")

thread_info struct thread_info { . . . int preempt_count; /* 0 => preemptable, <0 => bug */ struct task_struct task; / main task structure */ . . . }

struct task_struct {
	. . .
	void *stack;
	. . .
}

/* task_struct -> thread_info */
#define task_thread_info(task)  ((struct thread_info *)(task)->stack)

union thread_union {
	struct thread_info thread_info;
	unsigned long stack[THREAD_SIZE/sizeof(long)];
};

o arch/arm/include/asm/thread_info.h
static inline struct thread_info *current_thread_info(void)
{
	register unsigned long sp asm ("sp");
	return (struct thread_info *)(sp & ~(THREAD_SIZE - 1));
}

    ZONE_MOVABLE

* zone_movable์ด๋ž€? page ๋‹จํŽธํ™”์„ ์ฒ˜๋ฆฌํ•  ์ˆ˜ ์žˆ๊ฒŒ ์›€์ง์ผ ์ˆ˜ ์žˆ๋Š” page.
* user page๋Š” ๊ฑฐ์˜ ๋‹ค movable table
* zone_movable์˜ ํฌ๊ธฐ๋Š”?
* NUMAโ€ฆ

    SPINLOCK

/* DEBUG ๊ฐœ๋…์„ ๋ฐฐ์ œํ•œ ์ƒํƒœ */ ***** include/linux/spinlock_api_smp.h

static inline void __raw_spin_lock(raw_spinlock_t lock) { preempt_disable(); / ์„ ์ ๋ถˆ๊ฐ€ */ do_raw_spin_lock(); }

include/linux/spinlock.h

static inline void spin_lock(spinlock_t *lock) { raw_spin_lock(&lock->rlock); }

#define raw_spin_lock(lock) _raw_spin_lock(lock)

***** include/linux/spinlock_api_up.h #define _raw_spin_lock(lock) __LOCK(lock)

do_raw_spin_lock() { arch_spin_lock }

static inline void spin_lock_irq(spinlock_t *lock) { raw_spin_lock_irq(&lock->rlock); }

#define spin_lock_irqsave(lock, flags)
do { raw_spin_lock_irqsave(spinlock_check(lock), flags);
} while (0)


    completion

kernel/sched/completion.c complete() signals a single thread waiting on this completion complete_all() signals all threads waiting on this completion

wait_for_completion()
	waits for completion of a task
wait_for_completion_timeout()
	waits for completion of a task (w/timeout)

    compiler syntax

o <linux/compiler.h>

o __typeof__
	expression์˜ type์„ ์ฐธ์กฐํ•œ๋‹ค.

#define max(a,b)
({ typeof (a) _a = (a);
typeof (b) _b = (b);
_a > _b ? _a : _b; })

http://gcc.gnu.org/onlinedocs/gcc/Typeof.html


    per-CPU allocator

[์ฐธ๊ณ ] o http://studyfoss.egloos.com/5375570 o http://studyfoss.egloos.com/5377666 o http://www.makelinux.net/ldd3/chp-8-sect-5 o http://lwn.net/Articles/22911/

[๋ชฉ์ ] o CPU๋งˆ๋‹ค ๊ฐ™์€ ํƒ€์ž…์˜ ๋ณ€์ˆ˜ ์‚ฌ๋ณธ์„ ๋‘์–ด lock ๊ฒฝ์Ÿ ์—†์ด ๋ณ€์ˆ˜์— ์ ‘๊ทผํ•œ๋‹ค. o CPU๋งˆ๋‹ค ๋ณ„๋„๋กœ ์กด์žฌํ•˜๋ฏ€๋กœ HW Cache์˜ hit rate๊ฐ€ ๋†’์•„์ง€๋Š” ํšจ๊ณผ๊ฐ€ ์žˆ๋‹ค.

[์ข…๋ฅ˜] o ์™œ ์ •์ /๋™์ ์ด ํ•„์š”ํ• ๊นŒ? ์ •์  - compile time์— ํ™•์ •๋œโ€ฆ kernel ์ž์ฒด, built-in module, reserved์˜ ์ •์ฒด๋Š”? ๋™์  - module, ์ถ”๊ฐ€ ๋˜๋Š” ํ™•์žฅ ๊ฐ€๋Šฅํ•œ instance?

#include <linux/percpu.h>

o ์ •์  per-CPU ๋ณ€์ˆ˜ ์„ ์–ธ
	DEFINE_PER_CPU(type, name)
o ์ •์  per-CPU ๋ณ€์ˆ˜ ์ ‘๊ทผ
	get_cpu_var(sockets_in_use)++;
	put_cpu_var(sockets_in_use);
o ๋‹ค๋ฅธ cpu์˜ per-CPU ๋ณ€์ˆ˜ ์ ‘๊ทผ
	per_cpu(variable, int cpu_id);			// ๋ณ€์ˆ˜๋ฅผ ์ „๋‹ฌ๋ฐ›์•„ ๋ณ€์ˆ˜๋ฅผ ๋ฆฌํ„ดํ•˜๋Š” ์ธํ„ฐํŽ˜์ด์Šค


o ๋™์  per-CPU ๋ณ€์ˆ˜ ์„ ์–ธ
	void *alloc_percpu(type);
	void *__alloc_percpu(size_t size, size_t align);
o ๋™์  per-CPU ๋ณ€์ˆ˜ ํ•ด์ œ
	free_percpu
o ๋™์  per-CPU ๋ณ€์ˆ˜ ์‚ฌ์šฉ
	per_cpu_ptr(void *per_cpu_var, int cpu_id);	// ํฌ์ธํ„ฐ๋ฅผ ์ „๋‹ฌ๋ฐ›์•„ ํฌ์ธํ„ฐ๋ฅผ ๋ฆฌํ„ดํ•˜๋Š” ์ธํ„ฐํŽ˜์ด์Šค

[์˜ˆ์ œ] o ํ†ต๊ณ„ ๋“ฑ์— ์‚ฌ์šฉ๋  ๋•Œ ๊ฐ CPU์˜ ๊ฐ’์„ ํ•ฉ์‚ฐํ•ด ๋ณด์—ฌ์ฃผ๋ฉด ๋œ๋‹ค. o ๊ฐ๊ฐ ์‚ฌ์šฉ ์˜ˆ๋ฅผ ์“ฐ๋ฉด ๋˜๊ฒ ๋‹ค.

[๊ตฌํ˜„] o setup_per_cpu_area 1. pcpu_embed_first_chunk 2. delta = pcpu_base_addr - __per_cpu_start; pcpu_base_addr : group์ด bootmem์œผ๋กœ ํ• ๋‹น ๋ฐ›์€ ๋ฉ”๋ชจ๋ฆฌ ์ค‘ ๊ฐ€์žฅ ๋‚ฎ์€ ์ฃผ์†Œ __per_cpu_start : .data..percpu ์˜์—ญ์˜ ์‹œ์ž‘ ์ฃผ์†Œ 3. cpu๋งˆ๋‹ค ๋Œ๋ฉด์„œ __per_cpu_offset[cpu] = delta + pcpu_unit_offsets[cpu]; pcpu_unit_offsets๋Š” group base_offset์— unit ๋‹น offset์„ ๋”ํ•œ ๊ฐ’

o __this_cpu_ptr
setup_per_cpu_area์—์„œ ์ฑ„์šด __per_cpu_offset์—์„œ ํ˜„์žฌ์˜ processor id๋กœ ๊ฐ’์„ ์ฐพ์•„์˜จ๋‹ค.


o ๋™์  ๋ณ€์ˆ˜๋ฅผ ์ ‘๊ทผํ•  ๋•Œ ์‚ฌ์šฉํ•˜๋Š” per_cpu_ptr์„ ๋ถ„์„ํ•ด ๋ณด์ž.

#define per_cpu_ptr(ptr, cpu) SHIFT_PERCPU_PTR((ptr), per_cpu_offset((cpu)))

SHIFT_PERCPU_PTR(p, off) p + off๊ฐ€ ๋ฆฌํ„ด๋œ๋‹ค.

#define per_cpu_offset(x) (__per_cpu_offset[x])

unsigned long __per_cpu_offset[NR_CPUS] __read_mostly; ์ „์—ญ๋ณ€์ˆ˜ ๋ฐฐ์—ด์—์„œ x๋ฒˆ์งธ ๋ฉค๋ฒ„๋ฅผ ๊ฐ€์ ธ์˜จ๋‹ค. ์ด ์ „์—ญ๋ณ€์ˆ˜๋Š” setup_per_cpu_area์—์„œ ์ดˆ๊ธฐํ™” ๋˜์—ˆ๋‹ค.

๊ฒฐ๊ตญ ํŠน์ • cpu์˜ unit block (๊ฐ 32KB)์—์„œ ptr์˜ ์œ„์น˜๊ฐ€ ๋ฆฌํ„ด๋œ๋‹ค.


    kmem_cache

o kmem_cache_create o kmem_cache_destory

o kmem_cache_open kmem_cache ๊ตฌ์กฐ์ฒด ์ž์ฒด ์„ค์ •, kmem_cache_cpu, kmem_cache_node ํ• ๋‹น ๋ฐ ์ดˆ๊ธฐํ™” o kmem_cache_close kmem_cache_cpu, kmem_cache_node ํ•ด์ œ


    RCU (read, copy, update)

๋™๊ธฐํ™” ๋งค์ปค๋‹ˆ์ฆ˜(synchronization mechanism)์˜ ํ•˜๋‚˜.
ํ•˜๋‚˜์˜ updater์™€ concurrentํ•˜๊ฒŒ reader๋“ค์„ ๋™์ž‘ํ•˜๋„๋ก ํ•˜์—ฌ ์‹œ์Šคํ…œ์˜ scalability๋ฅผ ํ–ฅ์ƒ์‹œํ‚ค๋Š” lock free ๋ฉ”์ปค๋‹ˆ์ฆ˜์ด๋‹ค.
	* ์ฃผ์˜ : updater๋“ค๊ฐ„์˜ ๋™๊ธฐํ™”๋Š” ๋ณด์žฅํ•˜์ง€ ์•Š๋Š”๋‹ค. ์ฆ‰ updater๊ฐ„ ๋ณ„๋„์˜ ๋™๊ธฐํ™” ๋งค์ปค๋‹ˆ์ฆ˜์ด ํ•„์š”ํ•˜๋‹ค.

์ฃผ๋กœ ์ฝ๊ธฐ ์—ฐ์‚ฐ๋งŒ ์ผ์–ด๋‚˜๊ณ , ์“ฐ๊ธฐ ์—ฐ์‚ฐ์€ ๊ฑฐ์˜ ์ผ์–ด๋‚˜์ง€ ์•Š๋Š” ๊ฐ์ฒด์— ์ฃผ๋กœ ์“ฐ์ธ๋‹ค.
Reader-Writer lock๊ณผ ์œ ์‚ฌํ•˜์ง€๋งŒ, ์ฝ๊ธฐ์‹œ์— block์ด ๋ฐœ์ƒํ•˜์ง€ ์•Š๊ณ (wait-free) ์˜ค๋ฒ„ํ—ค๋“œ๊ฐ€ ๊ทน๋„๋กœ ์ž‘๋‹ค.
๋ฐ˜๋ฉด ์“ฐ๊ธฐ ์—ฐ์‚ฐ์˜ ์˜ค๋ฒ„ํ—ค๋“œ๋Š” ํฌ๋‹ค. ์“ฐ๊ธฐ์— ํ•„์š”ํ•œ ๋™๊ธฐํ™”์˜ ์‹œ๊ฐ„ ๋ณต์žก๋„๊ฐ€ RCU๋กœ ๋ณดํ˜ธ๋˜๋Š” ๊ฐ์ฒด์˜ ํฌ๊ธฐ์— ๋น„๋ก€ํ•ด ์ปค์ง„๋‹ค.


๊ธฐ๋ณธ ์›๋ฆฌ
1. Publish-Subscribe mechanism (๋ฆฌ๋” ์ ‘๊ทผ. ์‚ญ์ œ)
	; publish - rcu_assign_pointer(), subscribe - rcu_dereference()
2. Wait For Pre-Existing RCU Readers to Complete (์—…๋ฐ์ดํ„ฐ. ์‚ญ์ œ)
	; destructive operation
3. Maintain Multiple Versions of Recently Updated Objects (๋ฆฌ๋” ์ ‘๊ทผ.)


example)
rcu_read_lock(void)
rcu_dereference()
rcu_read_unlock(void)



1. Publish-Subscribe mechanism (for insertion)

	Category | Publish			Retract					Subscribe

	Pointers | rcu_assign_pointer()		rcu_assign_pointer(..., NULL)		rcu_dereference()

	Lists	 | list_add_rcu()		list_del_rcu()				list_for_each_entry_rcu()	
		   list_add_tail_rcu()
		   list_replace_rcu()

	Hlists   | hlist_add_after_rcu()	hlist_del_rcu()				hlist_for_each_entry_rcu()
		   hlist_add_before_rcu()
		   hlist_add_head_rcu()
		   hlist_replace_rcu()


2. Wait For Pre-Existing RCU Readers to Complete (for delete)

	RCU๋ฅผ ์‚ฌ์šฉํ•˜๋Š” ๊ฐ€์žฅ ํฐ ๋ชฉ์ ์€ performance degradation์ด๋‚˜ scalability์— ๋Œ€ํ•œ ์ œํ•œ,
	๋ณต์žกํ•œ deadlock problem์ด๋‚˜ ๋ฉ”๋ชจ๋ฆฌ leak hazard์™€ ๊ฐ™์€ ๋ฌธ์ œ์ ์— ๋Œ€ํ•œ ๊ฑฑ์ • ์—†์ด ์‚ฌ์šฉํ•  ์ˆ˜ ์žˆ๋‹ค.


	* destructive operation์ด ๋‘ ๋‹จ๊ณ„(phase)๋กœ ๋‚˜๋‰˜๋Š” ๊ฒŒ rcu์˜ ํ•ต์‹ฌ์ด๋‹ค.
	removal phase : ์ œ๊ฑฐ๋  data์— ๋Œ€ํ•œ ์ƒˆ๋กœ์šด ์ฐธ์กฐ๋ฅผ ๋ง‰๋Š”๋‹ค. ์ƒˆ๋กœ์šด ๋ฒ„์ „์˜ data์—๋Š” ์ ‘๊ทผ ๊ฐ€๋Šฅํ•˜๋‹ค.
			๋‹จ ๊ฐ data์— ๋Œ€ํ•œ ์ฐธ์กฐ๋Š” ์ค‘๊ฐ„ ์ƒํƒœ๋ฅผ ํ—ˆ์šฉํ•˜์ง€ ์•Š๋Š”๋‹ค.
			์ด๊ฒƒ์ด reader์˜ ๋™์‹œ์ ‘๊ทผ์ด ๊ฐ€๋Šฅํ•œ ์ด์œ ์ด๋‹ค.
	reclamation phase : ์ด์ „ data์— ๋Œ€ํ•œ ๋ฉ”๋ชจ๋ฆฌ ํ•ด์ œ ๋‹จ๊ณ„.
			reader์˜ ์ด์ „ data์— ๋Œ€ํ•œ ์ฐธ์กฐ๊ฐ€ ๋ชจ๋‘ ๋๋‚˜๊ธฐ ์ „๊นŒ์ง€ reclamation phase๋Š” ์‹œ์ž‘๋˜์ง€ ์•Š๋Š”๋‹ค.

	updater์— ์˜ํ•ด removal phase๋Š” ๋ฐ”๋กœ ์‹คํ–‰๋  ์ˆ˜ ์žˆ๋Š” ๋ฐ˜๋ฉด,
	reclamation phase๋Š” removal phase ๋™์•ˆ active ๋˜์–ด ์žˆ๋Š” reader๊ฐ€ ์™„๋ฃŒ๋  ๋•Œ๊นŒ์ง€ ์—ฐ๊ธฐ๋œ๋‹ค.
	์ฆ‰, reader๊ฐ€ critical section์„ ๋น ์ ธ๋‚˜๊ฐ”์Œ์„ block ์ƒํƒœ๋กœ ๋Œ€๊ธฐํ•˜๊ฑฐ๋‚˜ callback์„ ๋“ฑ๋กํ•˜์—ฌ ์•Œ ์ˆ˜ ์žˆ์–ด์•ผ ํ•œ๋‹ค.

	ํ•˜์ง€๋งŒ rcu_read_lock(), rcu_read_unlock()์€ ๋‹จ์ง€ preempt_disable()/preempt_enable()์ผ ๋ฟ์ด๋ฉฐ
	non-PREEMPT์ผ ๊ฒฝ์šฐ ์–ด๋–ค ์ฝ”๋“œ๋„ ์ƒ์„ฑํ•˜์ง€ ์•Š๋Š”๋‹ค. ๋”ฐ๋ผ์„œ reader๊ฐ€ ์ž„๊ณ„๊ตฌ์—ญ์„ ๋น ์ ธ๋‚˜๊ฐ”๋Š”์ง€ ์•Œ ์ˆ˜ ์žˆ๋Š” ๋ฐฉ๋ฒ•์ด ์—†๋‹ค.

	๋Œ€์‹  ์ „ํ†ต์ ์ธ rcu์—์„œ๋Š” ์ด read-side ์ž„๊ณ„๊ตฌ์—ญ์—์„œ sleep์ด๋‚˜ block์ด ๋ฐœ์ƒํ•  ์ˆ˜ ์—†๋‹ค๋Š” ํŠน์ง•์„ ์ด์šฉํ•ด context switching์ด ๋ฐœ์ƒํ•˜๋ฉด
	์ž„๊ณ„๊ตฌ์—ญ์„ ๋ฒ—์–ด๋‚ฌ๋‹ค๊ณ  ํŒ๋‹จํ•œ๋‹ค. ๋”ฐ๋ผ์„œ ๋ชจ๋“  cpu์—์„œ ํ•œ ๋ฒˆ ์ด์ƒ context switching์ด ๋ฐœ์ƒํ–ˆ๋‹ค๋ฉด read-side ์ž„๊ณ„๊ตฌ์—ญ์ด ๋๋‚ฌ๋‹ค๊ณ  ํ•  ์ˆ˜ ์žˆ๋‹ค.

	ํ•˜์ง€๋งŒ interrupt, NMI, CPU hotplug ๋“ฑ์— ์˜ํ•ด ์‹ค์ œ ๊ตฌํ˜„์€ ๋” ๋ณต์žกํ•˜๋‹ค. ๋˜ํ•œ RT๋Š” ๋ณด๋‹ค ๋ณต์žกํ•ด ๋‹ค๋ฅธ ์ ‘๊ทผ์„ ํ•„์š”๋กœ ํ•œ๋‹ค.


3. Maintain Multiple Versions of Recently Updated Objects (for readers)








์ฃผ์š” API

	http://lwn.net/Articles/264090/

	The core RCU API is quite small:
	a.  rcu_read_lock()
	b.  rcu_read_unlock()
	c.  synchronize_rcu() / call_rcu()
	d.  rcu_assign_pointer()
	e.  rcu_dereference()



	1. ํ• ๋‹น
	rcu_assign_pointer() - assign to RCU-protected pointer
		ensuring that any concurrent RCU readers will see any prior initialization.
		(์ƒˆ๋กœ์šด structure๋ฅผ publication ํ•œ๋‹ค)



	rcu_read_lock()			; current์˜ rcu_read_lock_nesting ์ฆ๊ฐ€, barrier
		rcu_dereference() - publish๋œ ๊ฐ’์„ ์ฝ๋Š”๋‹ค.
	rcu_read_unlock()

	RCU๋Š” update์™€ reader๋“ค ์‚ฌ์ด์— concurrency๋ฅผ ํ–ฅ์ƒ์‹œํ‚ค๋Š” ๊ฒƒ. update๋“ค ๊ฐ„์—  concurrency๋ฅผ ๋ณด์žฅํ•˜๋Š” ๋งค์ปค๋‹ˆ์ฆ˜์ด ์•„๋‹ˆ๋‹ค.


	* update-side primitives
		synchronize_rcu() - Wait For Pre-Existing RCU Readers to Complete
			PREEMPT_RCU์ธ ๊ฒฝ์šฐ : call_rcu
			non-PREEMPT RCU์ธ ๊ฒฝ์šฐ : synchronize_sched
		call_rcu() 
		


์šฉ์–ด
srcu : sleepable RCU


์œ„์น˜
include/linux/rcupdate.h
	rcu_read_lock
	rcu_assign_pointer
include/linux/rculist.h
	list_for_each_entry_rcu
kernel/rcutiny.c
kernel/rcutree.c



[์ฐธ๊ณ ]
	barrios RCU
	http://www.rdrop.com/users/paulmck/RCU/whatisRCU.html
	http://summerlight.tistory.com/11


* Grace Period
	rcu_start_gp
		rsp->gpnum++;

		rcu_for_each_node_breadth_first(rsp, rnp) {
			rnp->gpnum = rsp->gpnum;

	note_new_gpnum
		__note_new_gpnum
			rdp->gpnum = rnp->gpnum;

    completion

[์ž๋ฃŒ] https://www.linux.co.kr/home/lecture/index.php?cateNo=&secNo=&theNo=&leccode=11129

task ์‚ฌ์ด์˜ โ€˜completeโ€™ ์ด๋ฒคํŠธ๋ฅผ ๋ณด์žฅํ•˜๊ธฐ ์œ„ํ•œ ๋™๊ธฐํ™” ๋งค์ปค๋‹ˆ์ฆ˜.
โ€˜completeโ€™์ด๋ฒคํŠธ๋ฅผ ๊ธฐ๋‹ค๋ฆฌ๋Š” task๋Š” sleep ์ƒํƒœ๋กœ ๋Œ€๊ธฐํ•˜๋ฏ€๋กœ interrupt context์—์„œ ์‚ฌ์šฉ๋  ์ˆ˜ ์—†๋‹ค.

completion์ด ๋จผ์ € ํ˜ธ์ถœ๋˜์—ˆ๋‹ค๋ฉด ์ด๋ฏธ ์™„๋ฃŒ๋˜์—ˆ์œผ๋ฏ€๋กœ scheduling ๋˜์ง€ ์•Š๊ณ  ๋ฐ”๋กœ ๋ฆฌํ„ดํ•œ๋‹ค.


struct completion completion;

init_completion(&completion);
wait_for_completion(&completion);

/* kernel/sched/core.c */
complete(&complete);		// ํ•˜๋‚˜์˜ task๋งŒ ๊นจ์šด๋‹ค.
complete_all(&complete);	// ํ์— ๋Œ€๊ธฐ ์ค‘์ธ ๋ชจ๋“  task๋ฅผ ๊นจ์šด๋‹ค.

See also:  complete(), wait_for_completion() (and friends _timeout,
 _interruptible, _interruptible_timeout, and _killable), init_completion(),
 and macros DECLARE_COMPLETION(), DECLARE_COMPLETION_ONSTACK(), and INIT_COMPLETION().

    bitops

#include <linux/bitops.h>
#include <asm/bitops.h>		/* architecture specific, linux์—์„œ includeํ•จ */

bit ๊ฒ€์ƒ‰ bit ์„ค์ • / ์›์ž์  ์—ฐ์‚ฐ hweight

arch/arm/include/asm/bitops.h
o set_bit๋ฅ˜
	#define set_bit(nr,p)           ATOMIC_BITOP(set_bit,nr,p)
	#define clear_bit(nr,p)         ATOMIC_BITOP(clear_bit,nr,p)
	#define change_bit(nr,p)        ATOMIC_BITOP(change_bit,nr,p)
o fls๋ฅ˜
o find๋ฅ˜
	find_first_zero_bit -> _find_first_zero_bit_le

o test_and_set_bit๋ฅ˜


arch/arm/lib/ ์•„๋ž˜
	bitops.h          ; bitop, testop assembly macro ์„ ์–ธ ์•„๋ž˜ assem routine์—์„œ include
	changebit.S       ; bitop   _change_bit, eor
	clearbit.S        ; bitop   _clear_bit, bic
	setbit.S          ; bitop   _set_bit, orr
	testchangebit.S   ; testop  _test_and_change_bit, eor, str
	testclearbit.S    ; testop  _test_and_clear_bit, bicne, strne
	testsetbit.S      ; testop  _test_and_set_bit, orreq, streq

	findbit.S
	_find_first_zero_bit_le
	_find_next_zero_bit_le
	_find_first_bit_le
	_find_next_bit_le

syscall (system call)

arch/arm/kernel/entry-common.S
	    .equ NR_syscalls,0
	#define CALL(x) .equ NR_syscalls,NR_syscalls+1		// NR_syscalls๊ฐ€ CALL ์„ ์–ธ๋œ ์ˆ˜๋งŒํผ ์ฆ๊ฐ€ํ•œ ์ตœ์ข…๊ฐ’์œผ๋กœ ์„ ์–ธ.
	#include "calls.S"

arch/arm/kernel/calls.S
	CALL(sys_restart_syscall)
	CALL(sys_exit)
	โ€ฆ

include/linux/syscalls.h - Linux syscall interfaces (non-arch-specific)
	โ€ฆ
	#define SYSCALL_DEFINE3(name, ...) SYSCALL_DEFINEx(3, _##name, __VA_ARGS__)

	#define SYSCALL_DEFINEx(x, sname, ...)              \
	    __SYSCALL_DEFINEx(x, sname, __VA_ARGS__)

	#define __SYSCALL_DEFINEx(x, name, ...)
	    asmlinkage long sys##name(__SC_DECL##x(__VA_ARGS__))

	asmlinkage long sys_ioctl(unsigned int fd, unsigned int cmd,
	                unsigned long arg);


fs/ioctl.c
	#include <linux/syscalls.h>
	SYSCALL_DEFINE3(ioctl, unsigned int, fd, unsigned int, cmd, unsigned long, arg)





* system call ์„ ์–ธ ์œ„์น˜ ์ฐพ๊ธฐ
	1) arch/arm/kernel/calls.S ์—์„œ ๋ชฉ๋ก ํ™•์ธ
	
	2-1) System.map ์—์„œ symbol ์ด๋ฆ„์œผ๋กœ ์ฐพ๊ธฐ
	$ arm-linux-gnueabihf-addr2line โ€”f e vmlinux 800b5c54
	sys_read
	/home/freestyle/kernel/iamroot9C/fs/read_write.c:463

	2-2 make ARCH=arm tags
	:tj sys_read

* system call trace
	$ strace ./user_program






* ARM Exception Vector Table (low/high)

Exception			Entry Mode
โ€”โ€”โ€”โ€”โ€”โ€”โ€”โ€”โ€”โ€”โ€”โ€”โ€”โ€”โ€”โ€”โ€”โ€”โ€”โ€”โ€”โ€”โ€”โ€”โ€”โ€”โ€”โ€”โ€”โ€”โ€”
Reset				SVC
Undefined Instruction		UNDEF
Software Interrupt		SVC
Prefetch Abort			ABT
Data Abort			ABT
Reserved			Reserved
IRQ				IRQ
FIQ				FIQ		


* arch/arm/kernel/entry-armv.S
==============================
.LCvswi:
    .word   vector_swi				// vector_swi์˜ ์ฃผ์†Œ

	โ€ฆ

    .globl  __vectors_start
__vectors_start:
 ARM(   swi SYS_ERROR0  )
 THUMB( svc #0      )
 THUMB( nop         )
    W(b)    vector_und + stubs_offset
    W(ldr)  pc, .LCvswi + stubs_offset		//<- svc(=swi) ํ˜ธ์ถœ์‹œ vector_swi ์ฃผ์†Œ๋ฅผ pc์— ์ €์žฅ
    W(b)    vector_pabt + stubs_offset
    W(b)    vector_dabt + stubs_offset
    W(b)    vector_addrexcptn + stubs_offset
    W(b)    vector_irq + stubs_offset
    W(b)    vector_fiq + stubs_offset
==============================


* arch/arm/kernel/entry-common.S
==============================
ENTRY(vector_swi)
    sub sp, sp, #S_FRAME_SIZE
    stmia   sp, {r0 - r12}          @ Calling r0 - r12
 ARM(   add r8, sp, #S_PC       )
 ARM(   stmdb   r8, {sp, lr}^       )   @ Calling sp, lr
    mrs r8, spsr            @ called from non-FIQ mode, so ok.
    str lr, [sp, #S_PC]         @ Save calling PC
    str r8, [sp, #S_PSR]        @ Save CPSR
    str r0, [sp, #S_OLD_R0]     @ Save OLD_R0
    zero_fp
	โ€ฆ
    adr lr, BSYM(ret_fast_syscall)  @ return address
    ldrcc   pc, [tbl, scno, lsl #2]     @ call sys_* routine	//<- syscall ํ˜ธ์ถœ ํ›„ ret_fast_syscall

	โ€ฆ

ret_fast_syscall:
	โ€ฆ
    bl  do_work_pending			//<- if (likely(thread_flags & _TIF_NEED_RESCHED)) schedule();
==============================




* user program์—์„œ syscall ํ˜ธ์ถœ์ฝ”๋“œ ์ฐพ๊ธฐ

==============================
#define _GNU_SOURCE      /* or _BSD_SOURCE or _SVID_SOURCE */
#include <unistd.h>
#include <sys/syscall.h> /* For SYS_xxx definitions */
#include <sys/types.h>   /* For pid_t */

static inline pid_t my_gettid (void)
{
    return (pid_t) syscall (SYS_gettid); /* or __NR_gettid */
}
==============================

	${CROSS_COMPILE}gcc -o ex ex.c -static
	${CROSS_COMPILE}objdump -dSlx ex | cat > ex_dump.txt

	1) glibc์— ์˜ํ•œ system call
		__libc_do_syscall()์—์„œ  โ€˜svc 0โ€™

	2) syscall์— ์˜ํ•œ ํ˜ธ์ถœ
		syscall()์—์„œ โ€˜svc 0โ€™

    Virtual Process Memory

struct mm_struct	; memory management information for the process.
struct vm_area_struct	; 

==================================================================================

    Kernel Memory Layout on ARM Linux

    Russell King <[email protected]>
         November 17, 2005 (2.6.15)

This document describes the virtual memory layout which the Linux kernel uses for ARM processors. It indicates which regions are free for platforms to use, and which are used by generic code.

The ARM CPU is capable of addressing a maximum of 4GB virtual memory space, and this must be shared between user space processes, the kernel, and hardware devices.

As the ARM architecture matures, it becomes necessary to reserve certain regions of VM space for use for new facilities; therefore this document may reserve more VM space over time.

Start End Use

ffff8000 ffffffff copy_user_page / clear_user_page use. For SA11xx and Xscale, this is used to setup a minicache mapping.

ffff4000 ffffffff cache aliasing on ARMv6 and later CPUs.

ffff1000 ffff7fff Reserved. Platforms must not use this address range.

ffff0000 ffff0fff CPU vector page. The CPU vectors are mapped here if the CPU supports vector relocation (control register V bit.)

fffe0000 fffeffff XScale cache flush area. This is used in proc-xscale.S to flush the whole data cache. (XScale does not have TCM.)

fffe8000 fffeffff DTCM mapping area for platforms with DTCM mounted inside the CPU.

fffe0000 fffe7fff ITCM mapping area for platforms with ITCM mounted inside the CPU.

fff00000 fffdffff Fixmap mapping region. Addresses provided by fix_to_virt() will be located here.

ffc00000 ffefffff DMA memory mapping region. Memory returned by the dma_alloc_xxx functions will be dynamically mapped here.

ff000000 ffbfffff Reserved for future expansion of DMA mapping region.

VMALLOC_START VMALLOC_END-1 vmalloc() / ioremap() space. Memory returned by vmalloc/ioremap will be dynamically placed in this region. Machine specific static mappings are also located here through iotable_init(). VMALLOC_START is based upon the value of the high_memory variable, and VMALLOC_END is equal to 0xff000000.

PAGE_OFFSET high_memory-1 Kernel direct-mapped RAM region. This maps the platforms RAM, and typically maps all platform RAM in a 1:1 relationship.

PKMAP_BASE PAGE_OFFSET-1 Permanent kernel mappings One way of mapping HIGHMEM pages into kernel space.

MODULES_VADDR MODULES_END-1 Kernel module space Kernel modules inserted via insmod are placed here using dynamic mappings.

00001000 TASK_SIZE-1 User space mappings Per-thread mappings are placed here via the mmap() system call.

00000000 00000fff CPU vector page / null pointer trap CPUs which do not support vector remapping place their vector page here. NULL pointer dereferences by both the kernel and user space are also caught via this mapping.

Please note that mappings which collide with the above areas may result in a non-bootable kernel, or may cause the kernel to (eventually) panic at run time.

Since future CPUs may impact the kernel mapping layout, user programs must not access any memory which is not mapped inside their 0x0001000 to TASK_SIZE address range. If they wish to access these areas, they must set up their own mappings using open() and mmap().

==================================================================================

  • ๊ณต๋ถ€ํ•  ๋ชฉ๋ก

Task - ํƒœ์Šคํฌ ์ƒ์„ฑ, ์‹คํ–‰, ์ƒํƒœ ์ „์ด - scheduling ; ์ปค๋„์ด ์‹œ์Šคํ…œ์— ์กด์žฌํ•˜๋Š” ์—ฌ๋Ÿฌ ๊ฐœ์˜ ์‹คํ–‰ ๊ฐ€๋Šฅ thread ์ค‘ cpu๋ผ๋Š” ์ž์›์„ ์–ด๋А thread์—๊ฒŒ ํ• ๋‹นํ•  ๊ฒƒ์ธ๊ฐ€ ๊ฒฐ์ •ํ•˜๋Š” ์•Œ๊ณ ๋ฆฌ์ฆ˜ O(1) CFS - signal - ipc

Memory - ๋ฌผ๋ฆฌ ๋ฉ”๋ชจ๋ฆฌ ๊ด€๋ฆฌ slab/slob/slub buddy allocator (lazy buddy) - ๊ฐ€์ƒ ๋ฉ”๋ชจ๋ฆฌ ๊ด€๋ฆฌ vmalloc vmap - segmentation / paging - page fault - page cache

- kmem_cache_create, kmem_cache_alloc, kmem_cache_free, kmem_cache_destroy
	/proc/slabinfo
	struct kmem_cache
- ๋™์ž‘ ๋ฉ”๋ชจ๋ฆฌ ํ• ๋‹น(kmalloc, vmalloc, __get_page_size?)๊ณผ ํšŒ์ˆ˜์‹œ, ๋ฉ”๋ชจ๋ฆฌ ๋ถ€์กฑ์‹œ์— ์–ด๋–ป๊ฒŒ ๋™์ž‘ํ•˜๋Š”๊ฐ€?

FS - file management - access control - inode ๊ด€๋ฆฌ - directory ๊ด€๋ฆฌ - super block ๊ด€๋ฆฌ

Network - socket interface - tcp/ip protocol

Device Driver - driver

Kernel Mechanism - critical section / locking spinlock semaphore mutex rwlock : reader์™€ writer๊ฐ€ ๋‹ค๋ฅธ lock ์กฐ๊ฑด์„ ์‚ฌ์šฉ. writer๊ฐ€ lock์„ ์žก์€๋™์•ˆ ๋‹ค๋ฅธ writer๋‚˜ reader๋Š” ์ ‘๊ทผํ•  ์ˆ˜ ์—†๊ณ , reader์™€ reader์˜ ์ค‘์ฒฉ๋œ lock์€ ํ—ˆ์šฉ๋œ๋‹ค. ๋‹จ, writer ์—ญ์‹œ reader๊ฐ€ lock์„ ์žก์€๋™์•ˆ ๋Œ€๊ธฐํ•ด์•ผ ํ•˜๋ฏ€๋กœ writer๊ฐ€ ๊ตถ์ฃผ๋ฆฌ๋Š” ๋ฌธ์ œ๊ฐ€ ๋ฐœ์ƒํ•  ์ˆ˜ ์žˆ๋‹ค. ๊ตฌํ˜„์€ include/linux/rwlock.h. writer๋Š” lock์ด 0์ด์–ด์•ผ ์ ‘๊ทผ ๊ฐ€๋Šฅ, ์ฆ‰, writer๋‚˜ reader๊ฐ€ lock์„ ๋ณด์œ ํ•˜๊ณ  ์žˆ๋‹ค๋ฉด ์ ‘๊ทผํ•  ์ˆ˜ ์—†๋‹ค. ์ ‘๊ทผ ํ›„ 0x80000000(์Œ์ˆ˜ ์ตœ๋Œ€๊ฐ’)์œผ๋กœ ์„ค์ •ํ•จ. reader๋Š” lock์ด ์–‘์ˆ˜์—ฌ์•ผ ์ ‘๊ทผ ๊ฐ€๋Šฅ. ์ฆ‰, writer๊ฐ€ lock์„ ๋ณด์œ ํ•˜๊ณ  ์žˆ๋‹ค๋ฉด ์ ‘๊ทผ ํ•  ์ˆ˜ ์—†๋‹ค. reader์‚ฌ์ด์—๋Š” lock์ด ์ค‘์ฒฉ๋  ์ˆ˜ ์žˆ๋‹ค. ์ ‘๊ทผ ํ›„ lock+1. rcu percpu - cgroup - userspace <-> kernel communication (syscall) ioctl netlink socket - sysfs/debugfs

Architecture Specific - SMP - atomic operation - interrupt - MMU/TLB (pagetable)

Other thing - scripts/*** - tools/perf

o function call trace
	kernel/list.txt์— ์ •๋ฆฌ
	Ftrace
o UML๋กœ call routine ์ •๋ฆฌ

โ€”โ€”โ€”โ€”โ€”โ€”โ€”โ€”โ€”โ€”โ€”โ€”โ€”โ€”โ€”โ€”โ€”โ€”โ€”โ€”โ€”โ€”โ€”โ€”โ€”โ€”โ€”โ€”โ€”โ€”โ€”โ€”โ€”โ€”โ€”โ€”โ€”โ€”โ€”โ€”โ€”โ€”โ€”โ€”โ€”โ€”โ€”โ€”โ€”โ€”โ€”โ€”โ€”โ€”โ€”โ€”โ€”โ€”โ€”โ€”โ€”โ€”โ€”โ€”โ€”โ€”โ€”โ€”โ€”โ€”โ€”โ€”โ€”โ€”โ€”โ€”โ€”โ€”โ€”โ€”โ€”โ€”โ€”โ€”โ€”โ€”โ€”โ€”โ€”โ€”โ€”โ€”โ€”โ€”โ€”โ€”โ€”โ€”โ€”โ€”โ€”โ€”โ€”โ€”โ€”โ€”โ€”โ€”โ€”โ€”โ€”โ€”

program counter, instruction counter and location counter are same and it is a type of register which holds the location of current program being executed by holding its address and the contents of the program counter is transferred to the control unit basing on the MDR (Memory Data Register)instruction.

And the location counter transfer the control to the MAR(memory Address Register) , now it also contains the address of the current executed memory location and now MBR(memory buffer register) contains a copy of the content of memory which is specified by MAR.

label์ด ๋ฐ”์ด๋„ˆ๋ฆฌ ์ƒํƒœ์ผ ๋•Œ๋Š” location counter์— ๋”ฐ๋ฅธ ์ƒ๋Œ€์ฃผ์†Œ์ด์ง€๋งŒ, ๋ฉ”๋ชจ๋ฆฌ์— ์˜ฌ๋ผ๊ฐ€๋ฉด MAR์™€ ๋”ํ•ด ์‹ค์ œ ์ฃผ์†Œ๋กœ ๋ณ€๊ฒฝ๋œ๋‹ค.

pc-related ๋ฐฉ์‹์˜ ์žฅ์ ์€ ๋ฌด์—‡์ธ๊ฐ€?

์‹œ์ž‘ Makefile, lds๋ฅผ ๋ด์•ผํ•œ๋‹ค. vexpress build์‹œ ์–ด๋–ค ํŒŒ์ผ์„ ๋ณผ์ง€ (๋ถ€ํŠธ๋กœ๋”๊ฐ€ compressed ๋ฅผ ์–ด๋””์— ์˜ฌ๋ฆฌ๋‚˜?)

VMSA <- arm arm 1602 ID_MMFR0 r11 <- MMFR0๊ฐ’ Ctrl + W ]

MMU ์ผค ๋•Œ๋Š” Cache๋ถ€ํ„ฐ ์ผœ๋†“๊ณ , MMU Enable r3 0x60004000

ARM VMSA B3-17 p.1457 Table ํ˜•ํƒœ B3-41 Table. P.1466

Cortex A Programming Guide Page Table Entry Format

Page Directory์™€ Page Table

ARM ARM P.1541 Domain ์„ค์ •

adr์˜ ์‚ฌ์šฉ์ด์œ  : pc + label(์ƒ๋Œ€์ฃผ์†Œ) -> ์ ˆ๋Œ€์ฃผ์†Œ

piggy.gzip <- Image ์••์ถ•ํ•œ ๊ฒƒ

arch/arm/boot/compressed/head.S 115 line๋ถ€ํ„ฐ

  • info as๋กœ directive ๊ฒ€์ƒ‰์ด ๊ฐ€๋Šฅํ•˜๋‹ค

as ๋ฌธ์„œ directive .type

arch/arm/kernel/head.S

.align <- arm์—์„œ๋Š” 2 ** n http://www.spinics.net/lists/arm-kernel/msg11567.html [from as manual] The way the required alignment is specified varies from system to system.

http://sourceware.org/binutils/docs/as/ARM-Directives.html .align expression [, expression] This is the generic .align directive. For the ARM however if the first argument is zero (ie no alignment is needed) the assembler will behave as if the argument had been 2 (ie pad to the next four byte boundary). This is for compatibility with ARM's own assembler.

20120915 1420๋ผ์ธ๋ถ€ํ„ฐ. ์™œ flush๋ฅผ ํ•˜๋Š”๋ฐ ์™œ cache line์˜ ํฌ๊ธฐ, index ์‚ฌ์ด์ฆˆ ์ด๋Ÿฐ ๊ฒƒ์„ ์ง์ ‘ ๊ตฌํ•ด์™€์„œ ๋ช…๋ น์„ ๋‚ด๋ฆฌ๋Š” ๊ตฌ์กฐ์ผ๊นŒ?

CP15DSB <- full system 0 DSB <- ์ธ์ŠคํŠธ๋Ÿญ์…˜์ด ์ถ”๊ฐ€๋˜์—ˆ๋‹ค. This option is referred to as the full system DSB

GOT? ์•„์ง๋„ relocate๋˜์—ˆ๋‹ค๋Š” ๊ฒƒ์˜ ์˜๋ฏธ๊ฐ€ ์ดํ•ด ์•ˆ ๊ฐ„๋‹ค.

.word LC0 <- ์–ด์…ˆ๋ธ” ํƒ€์ž„์— ๊ฐ’์ด ๋“ค์–ด๊ฐ€๋Š” ๊ฒƒ ์•„๋‹Œ๊ฐ€? ๋ง์ปค๊ฐ€ ๋‚˜์ค‘์— ์ˆ˜์ •ํ•ด ์ฃผ์ง€ ์•Š๊ณ . ๋‚ด ์ƒ๊ฐ์€ ๊ฐ„๋‹จํ•˜๋‹ค. overlap์‹œ ํ•ด์ค€ relocate ๋งํ•˜๋Š” ๊ฒŒ ์•„๋‹๊นŒ.

restart: adr r0, LC0 (LC0์˜ ์ฃผ์†Œ๋ฅผ r0์— load. adr์€ assemble time์— ์ฒ˜๋ฆฌ๋˜๋Š” ์˜์‚ฌ ๋ช…๋ น์–ด) -> add r0, pc, #offset <- ์‹คํ–‰์‹œ์— pc๊ฐ€ location counter ๊ฐ’๊ณผ ๋‹ฌ๋ฆฌ ๋ฉ”๋ชจ๋ฆฌ location ๊ฐ’?

๋‹จ์„œ If delta is zero, we are running at the address we were linked at.

restart: adr r0, LC0 <- r0์—๋Š” pc + offset์ด ๋“ค์–ด๊ฐ€๊ณ , ldmia r0, {r1, r2, r3, r6, r10, r11, r12} ; r1์—๋Š” .word LC0๊ฐ€ ๋“ค์–ด๊ฐ„๋‹ค. sub r0, r0, r1 ; ๋‹น์—ฐํžˆ r0-r1์ด ๋‹ค๋ฅผ ์ˆ˜ ๋ฐ–์— ์—†์ง€ ์•Š๋Š” ๊ฒƒ์ธ๊ฐ€.

20120922

KERNEL_RAM_VADDR

swapper_pg_dir ์ดˆ๊ธฐ ํŽ˜์ด์ง€ ํ…Œ์ด๋ธ”์˜ ๊ฐ€์ƒ ์ฃผ์†Œ page tables 16K below KERNEL_RAM_VADDR

inline assembly http://wiki.kldp.org/KoreanDoc/html/EmbeddedKernel-KLDP/app3.basic.html

.macro <- assembly output์„ ์ƒ์„ฑํ•œ๋‹ค. ์ด๋ฆ„ ๋‹ค์Œ์— ์˜ค๋Š” ๊ฒƒ๋“ค์€ argument.

Processor ID๋Š” __lookup_processor_type

mrc p15, 0, r9, c0, c0 @ get processor id bl __lookup_processor_type @ r5=procinfo r9=cpuid .proc.info.init <- arch/arm/mm/proc-v7.S์— ์ •์˜๋œ .section

procinfo ๊ตฌ์กฐ์ฒด ์„ ์–ธ์€ asm/procinfo.h์— ์žˆ๋‹ค๊ณ  ํ•œ๋‹ค. .long . .long __proc_info_begin <- ์–ด์…ˆ๋ธ”์‹œ์— ์‚ฌ์šฉ๋˜๋Š” ์ด ์ฃผ์†Œ๋“ค์€ ๋‹ค VA?

vmlinux.lds๋ฅผ ๋ณด๋ฉด __proc_info_begin = .; *(.proc.info.init) __proc_info_end = .; .proc.info.init์€ arch/arm/mm/proc-v7.S์— ์žˆ๋‹ค. __v7_ca9mp_proc_info: <- cortex-a9 mp

๊ตฌ์กฐ์ฒด arch/arm/mm/cache-v7.S

__initdata http://blog.naver.com/PostView.nhn?blogId=wjdguddnr00&logNo=20018010170&parentCategoryNo=9&viewDate=&currentPage=1&listtype=0

arch/arm/Makefile TEXT_OFFSET := $(textofs-y)

PAGE_OFFSET [VA] 0x8000 0000 TEXT_OFFSET [offset] (make ์‹œ์— 0x00008000)

adr๋กœ label ์ฃผ์†Œ๋ฅผ load ํ•˜๋ฉด ๋ฌผ๋ฆฌ์ฃผ์†Œ? . , label ๊ธฐ๋ณธ์ ์œผ๋กœ VA

mum disable์ผ ๋•Œ VA๊ฐ€ ๊ทธ๋Œ€๋กœ PA ์ ‘๊ทผ์ด ๋˜๋Š” ๊ฑฐ๊ฒ ์ง€? if the MMU is disabled, they use the flat address mapping, and all mappings are considered global. label ๊ฐ’๋“ค์€ ๊ธฐ๋ณธ์ ์œผ๋กœ offset์ด๊ฒ ์ง€? mum disable์ผ ๋•Œ pc ๊ฐ’์€ VA? PA?

http://www.iamroot.org/xe/61099#comment_61348

head.S r5 : procinfo r9 : get processor id r8 : PAGE_OFFSET(virtual => physical)

arch/arm/include/asm/memory.h __pv_stub __virt_to_phys __phys_to_virt

vmlinux.lds .init.pv_table : { __pv_table_begin = .; *(.pv_table) __pv_table_end = .; }

์ƒ์œ„ 8๋น„ํŠธ (ํ•˜์œ„ 24๋น„ํŠธ๋Š” ์™œ align?) __PV_BITS_31_24 0x81000000 0x80008000 <- ์ปค๋„ ์‹œ์ž‘ r8์€ PA๋กœ ๋ณ€ํ™˜ํ•œ PAGE_OFFSET ์ฃผ์†Œ๊ฐ€ ๋“ค์–ด ์žˆ์Œ str r8, [r7] @ save computed PHYS_OFFSET to __pv_phys_offset

__HEAD           <- .section    ".head.text","ax"

__fixup_pv_table: adr r0, 1f ldmia r0, {r3-r5, r7}
sub r3, r0, r3 @ PHYS_OFFSET - PAGE_OFFSET ; r4,r5,r7๋ฅผ PA๋กœ ๋ณ€ํ™˜ add r4, r4, r3 @ adjust table start address ; r8์—๋Š” ์œ„์—์„œ PA๋กœ ๋ณ€ํ™˜ํ•œ PAGE_OFFSET add r5, r5, r3 @ adjust table end address add r7, r7, r3 @ adjust __pv_phys_offset address ; r3๋Š” PA-VA ; VA์—๋‹ค ๋•Œ๋ ค์ฃผ๋ฉด PA str r8, [r7] @ save computed PHYS_OFFSET to __pv_phys_offset mov r6, r3, lsr #24 @ constant for add/sub instructions ; r6๋Š” (PA-VA)์˜ ์ƒ์œ„ 8๋น„ํŠธ teq r3, r6, lsl #24 @ must be 16MiB aligned THUMB( it ne @ cross section branch ) bne __error str r6, [r7, #4] @ save to __pv_offset ; __pv_phys_offset + 4 ์— r6 ์ €์žฅ b __fixup_a_pv_table ENDPROC(__fixup_pv_table)

.align  

1: .long . @r3 .long __pv_table_begin @r4 .long __pv_table_end @r5 2: .long __pv_phys_offset @r7

.text   



.data   
.globl  __pv_phys_offset
.type   __pv_phys_offset, %object                                                                                                                         

__pv_phys_offset: .long 0 .size __pv_phys_offset, . - __pv_phys_offset __pv_offset: .long 0

#define __pa(x) __virt_to_phys((unsigned long)(x))

pv_table ์™œ ์“ฐ๋Š”๊ฐ€? ๋ฌด์—‡์ด ์ €์žฅ๋˜๋‚˜?

ldrcc r7, [r4], #4
r3๋Š” PA-VA

b   2f                                                                                                                                                    

1: ldr ip, [r7, r3] ; ๊ฐ€์ ธ์˜จ add/sub instruction ์ฃผ์†Œ(VA)๋ฅผ PA๋กœ ๋ณ€ํ™˜ํ•ด ๊ฐ’์„ ๊ฐ€์ ธ์˜ด bic ip, ip, #0x000000ff ; ํ•˜์œ„ 8๋น„ํŠธ๋ฅผ ๋‚ ๋ฆผ orr ip, ip, r6 @ mask in offset bits 31-24 ; r6๋Š” (PA-VA)์˜ ์ƒ์œ„ 8๋น„ํŠธ. instruction์„ ๋ณ€๊ฒฝํ•จ str ip, [r7, r3] 2: cmp r4, r5 ldrcc r7, [r4], #4 @ use branch for delay slot ; r4 __pv_table_begin์˜ entry๊ฐ’์„ r7์— ์ €์žฅ (r7์€ VA) bcc 1b
mov pc, lr

pv_table ์šฉ์ฒ˜๋Š”? ์™œ ์ด๊ฑธ ๋‹ค ๋ฐ”๊ฟ”๋‘์ง€?

PHYS_OFFSET vmlinux.lds.S 0x80000000 PAGE_OFFSET 0x80000000

arch/arm/include/asm/page.h 147 #ifdef CONFIG_ARM_LPAE 148 #include <asm/pgtable-3level-types.h> 149 #else 150 #include <asm/pgtable-2level-types.h> 151 #endif

MMU table entry cortex a series pg

config ARM_PATCH_PHYS_VIRT bool "Patch physical to virtual translations at runtime" if EMBEDDED default y depends on !XIP_KERNEL && MMU depends on !ARCH_REALVIEW || !SPARSEMEM help Patch phys-to-virt and virt-to-phys translation functions at boot and module load time according to the position of the kernel in system memory.

  This can only be used with non-XIP MMU kernels where the base 
  of physical memory is at a 16MB boundary.

  Only disable this option if you know that you do not require
  this feature (eg, building a kernel for a single machine) and
  you need to shrink the kernel to the minimal size.

๋ถ€ํŠธ ํƒ€์ž„, ๋ชจ๋“ˆ ๋กœ๋“œ ํƒ€์ž„์ผ ๋•Œ PV->VA, VA->PA๋ฅผ patchํ•˜๋Š” ์ฝ”๋“œ. (์—ฌ๊ธฐ์„œ patch๊ฐ€ ๋ฌด์Šจ ์˜๋ฏธ์ผ๊นŒ?) ์ปค๋„์˜ ์‹œ์Šคํ…œ ๋ฉ”๋กœ๋ฆฌ์ƒ์˜ ์œ„์น˜๋ฅผ ๊ธฐ์ค€์œผ๋กœ ํ•˜๊ธฐ ๋•Œ๋ฌธ์—, XIP ์ปค๋„์—์„œ๋Š” ์‚ฌ์šฉํ•  ์ˆ˜ ์—†๊ณ , VA<->PA ๋ณ€ํ™˜์ด๊ธฐ์— MMU๋ฅผ ์‚ฌ์šฉํ•˜๋Š” ์‹œ์Šคํ…œ์—์„œ ์‚ฌ์šฉํ•œ๋‹ค.

ํผํฌ๋จผ์Šค๋ฅผ ์œ„ํ•ด ์ด๋ฏธ์ง€ ์‚ฌ์ด์ฆˆ๊ฐ€ ๋” ๋Š˜์–ด๋‚˜๋”๋ผ๋„ ์ด ์ž‘์—…(์–ด๋–ค?)์„ ์ˆ˜ํ–‰ํ•œ๋‹ค. where the base of physical memory is at a 16MB boundary. 16MB๋กœ ์ •๋ ฌ๋˜์–ด์•ผ ํ•˜๋Š” ์ด์œ  -> 0xAABBBBBB 0xAA๋งŒ ์‚ฌ์šฉํ•˜๊ธฐ ๋•Œ๋ฌธ์—

๋Ÿฐํƒ€์ž„์— PA<->VA๋ฅผ ๋ฐ”๊พผ๋‹ค. ๋ณ€ํ™˜ ํ˜•์‹์€ ๋‹ค์Œ๊ณผ ๊ฐ™๋‹ค. PA = VA + ([PA]PHYS_OFFSET - [VA]PAGE_OFFSET) VA = PA - ([PA]PHYS_OFFSET - [VA]PAGE_OFFSET)

MMU ์ดˆ๊ธฐํ™”์— ์•ž์„œ ๋Ÿฐํƒ€์ž„์— ([PA]PHYS_OFFSET - [VA]PAGE_OFFSET) ๊ณ„์‚ฐ. ๊ทธ๋Ÿฌ๋‹ˆ๊นŒ MMU๊ฐ€ ์ดˆ๊ธฐํ™” ๋˜๊ธฐ ์ „

ํŒจ์น˜๋‚ด์šฉ http://www.mail-archive.com/[email protected]/msg42598.html

Subject: [RFC 4/5] ARM: P2V: introduce phys_to_virt/virt_to_phys runtime patching

This idea came from Nicolas, Eric Miao produced an initial version, which was then rewritten into this.

Patch the physical to virtual translations at runtime. As we modify the code, this makes it incompatible with XIP kernels, but on allows is to achieve this with minimal loss of performance.

As many translations are of the form:

    physical = virtual + (PHYS_OFFSET - PAGE_OFFSET)
    virtual = physical - (PHYS_OFFSET - PAGE_OFFSET)

we generate an 'add' instruction for __virt_to_phys(), and a 'sub' instruction for __phys_to_virt(). We calculate at run time (PHYS_OFFSET

  • PAGE_OFFSET) by comparing the address prior to MMU initialization with where it should be once the MMU has been initialized, and place this constant into the above add/sub instructions.

Once we have (PHYS_OFFSET - PAGE_OFFSET), we can calcuate the real PHYS_OFFSET as PAGE_OFFSET is a build-time constant, and save this for the C-mode PHYS_OFFSET variable definition to use.

At present, we are unable to support Realview with Sparsemem enabled as this uses a complex mapping function, and MSM as this requires a constant which will not fit in our math instruction.

Signed-off-by: Russell King [email protected]

.๊ณผ add 1f๋Š” label์˜ physical address๋ฅผ ๊ตฌํ•˜๊ธฐ ์œ„ํ•จ. [r7] (__pv_phys_offset)์— ์ €์žฅํ•œ ๊ฒƒ์€ [PA]PHYS_OFFSET [r7, #4] (__pv_offset)์— ์ €์žฅํ•œ ๊ฒƒ์€ (PHYS_OFFSET - PAGE_OFFSET)์ƒ์œ„ ํ•œ ๋ฐ”์ดํŠธ

str r8, [r7]        @ save computed PHYS_OFFSET to __pv_phys_offset
mov r6, r3, lsr #24 @ constant for add/sub instructions @ r6์— (PHYS_OFFSET - PAGE_OFFSET) ์ƒ์œ„ ํ•œ ๋ฐ”์ดํŠธ
teq r3, r6, lsl #24 @ must be 16MiB aligned
bne __error
str r6, [r7, #4]    @ save to __pv_offset
b   __fixup_a_pv_table

1: .long . @ r3๋กœ ๊ฐ€์ ธ์™€์„œ adr (1f [PA] - . [VA])๋ฅผ ๊ตฌํ•ด r3์— ์ €์žฅ .long __pv_table_begin @ r4๋กœ ๊ฐ€์ ธ์™€์„œ PA๋กœ ๋ณ€ํ™˜ .long __pv_table_end @ r5๋กœ ๊ฐ€์ ธ์™€์„œ PA๋กœ ๋ณ€ํ™˜ 2: .long __pv_phys_offset @ r7๋กœ ๊ฐ€์ ธ์™€์„œ PA๋กœ ๋ณ€ํ™˜

...

.align

2: .long __pv_phys_offset

.data
.globl  __pv_phys_offset
.type   __pv_phys_offset, %object

__pv_phys_offset:
.long 0
.size __pv_phys_offset, . - __pv_phys_offset __pv_offset: .long 0

1: ldr ip, [r7, r3] @ r7์€ pv_table์˜ ๊ฐ’, r3๋Š” PA-VA. pv_table์˜ ๋‚ด์šฉ์€ VA์ด๋ฏ€๋กœ PA๋กœ ๋ณ€ํ™˜ํ•ด ์‹ค์ œ ์œ„์น˜๋ฅผ ๊ตฌํ•ด ip๋กœ ๋ฐ์ดํ„ฐ(add/sub instruction)๋ฅผ ๊ฐ€์ ธ์˜ด bic ip, ip, #0x000000ff @ ๋งˆ์ง€๋ง‰ ํ•œ ๋ฐ”์ดํŠธ ํด๋ฆฌ์–ด (instruction์˜ encoding ์ฐธ๊ณ ) orr ip, ip, r6 @ mask in offset bits 31-24 @ ๊ตฌํ•ด๋‘” offset์˜ ์ฒซ ๋ฐ”์ดํŠธ๋ฅผ ์ €์žฅ @ (๋ช…๋ น๋‚ด์šฉ์ด __virt_to_phys, __phys_to_virt ์ด๋ฏ€๋กœ) str ip, [r7, r3] @ ๋ณ€๊ฒฝํ•œ ๋ฐ์ดํ„ฐ๋ฅผ ๋‹ค์‹œ ์ €์žฅ 2: cmp r4, r5 @ __pv_table_begin ~ __pv_table_end ์‚ฌ์ด ์ˆ˜ํ–‰ ldrcc r7, [r4], #4 @ use branch for delay slot @ "branch delay slot"??? bcc 1b @ r4 (pv_table)์— ์ €์žฅ๋œ ๊ฐ’, .long 1b ์ฆ‰ instruction์˜ ์ฃผ์†Œ๋ฅผ ๊ฐ€์ ธ์˜ด

20121020 r5 variant r6 revision -> variant|revision 4๋น„ํŠธ๋ถ€ํ„ฐ 12๊ฐœ์˜ ๋น„ํŠธ๋ฅผ ์ถ”์ถœ ubfx r0, r0, #4, #12 @ primary part number

cortex-a9 TRM์„ ๋ณด๋ฉด processor id reset value ๋“ฑ์ด ๋‚˜์˜จ๋‹ค.

errata arch/arm/mach-vexpress/Kconfig์—์„œ ์„ค์ •

.alt.smp.init

arch/arm/kernel/vmllinux.lds.S #ifdef CONFIG_SMP_ON_UP .init.smpalt : { __smpalt_begin = .;
*(.alt.smp.init) __smpalt_end = .; } #endif

head.S์—์„œ ํ…Œ์ด๋ธ”์„ ์‹น ๊ฐˆ์•„์—Ž๋„ค __fixup_smp_on_up:

ALT_SMP() ALT_UP()

__create_page_tables /** arch/arm/mm/proc-v7.S __v7_proc __v7_ca9mp_setup **/ ARM( add pc, r10, #PROCINFO_INITFUNC ) ๋กœ mm/proc-v7.S ํŒŒ์ผ์˜ __v7_setup: โ€ฆ

architecture๋ณ„ setup ์‹คํ–‰. ์•„ํ‚คํ…์ฒ˜ ๋ฒ„๊ทธ ํ”ฝ์Šค๋„ ์—ฌ๊ธฐ์„œ ํ•˜๋„ค.

mm/proc-v7-2level.S .macro v7_ttb_setup, zero, ttbr0, ttbr1, tmp

Cortex-A Series Programmer's Guide 10. MMU Translation Table.. ์ฐธ๊ณ 

TTBRx, TTBCR TTBR0 - TTBR1 - TTBCR: Translation Table Base Control Register

page walk TLB miss -> page walk ํ• ์ง€ translate fault๋กœ ํ• ์ง€.

v7_ttb_setup r10, r4, r8, r5 @ TTBCR, TTBRx setup

r10 => zero r4 => ttbr0 ; ttbr0 kernel. ๋ฐ”๋€Œ์ง€ ์•Š๋Š” ๊ฐ’ r8 => ttbr1 ; ttbr1 Application r5 => tmp

/* PTWs cacheable, inner WBWA shareable, outer WBWA not shareable */ #define TTB_FLAGS_SMP TTB_IRGN_WBWA|TTB_S|TTB_NOS|TTB_RGN_OC_WBWA ((0 << 0) | (1 << 6)) (1 << 1) (1 << 5) (1 << 3) #define PMD_FLAGS_SMP PMD_SECT_WBWA|PMD_SECT_S

#ifdef CONFIG_MMU mcr p15, 0, r10, c8, c7, 0 @ invalidate I + D TLBs v7_ttb_setup r10, r4, r8, r5 @ TTBCR, TTBRx setup ldr r5, =PRRR @ PRRR ; 0x034 ldr r6, =NMRR @ NMRR ; 0x038 mcr p15, 0, r5, c10, c2, 0 @ write PRRR mcr p15, 0, r6, c10, c2, 1 @ write NMRR #endif /* * Macro for setting up the TTBRx and TTBCR registers. * - \ttb0 and \ttb1 updated with the corresponding flags. */ .macro v7_ttb_setup, zero, ttbr0, ttbr1, tmp mcr p15, 0, \zero, c2, c0, 2 @ TTB control register orr \ttbr0, \ttbr0, #TTB_FLAGS_SMP orr \ttbr1, \ttbr1, #TTB_FLAGS_SMP mcr p15, 0, \ttbr1, c2, c0, 1 @ load TTB1 .endm

cache... inner region outer region

RPPP NMRR

external coherency management

ThumbEE <- Architecture ๋ช‡ ๋ถ€ํ„ฐ ์ถ”๊ฐ€๋˜์—ˆ๋‚˜? mrc p15, 0, r0, c0, c1, 0

ThumbEE register summary ARM p.94

r5, r6 clear=0x0120c302, mmuset=0x10c03c7d, r0์— control register๋ฅผ ์ฝ์–ด, ์œ„ ๊ฐ’์„ ์ดˆ๊ธฐํ™”, ์„ธํŒ… ํ•ด์ฃผ๊ณ  ๋ฆฌํ„ด (๋ฆฌํ„ดํ•˜๊ณ  ์–ด๋””์„œ ์“ฐ์ด๋‚˜)

SCTLR.TRE ARM p.1688

Security Extensions ARM p.1154

__LINUX_ARM_ARCH__๋Š”

arch/arm/Makefile arch-$(CONFIG_CPU_32v7) :=-D__LINUX_ARM_ARCH__=7 $(call cc-option,-march=armv7-a,-march=armv5t -Wa$(comma)-march=armv7-a)

๋ฆฌ๋ˆ…์Šค ์ปค๋„์€ CPU_USE_DOMAINS๋ฅผ ์•ˆ ์“ด๋‹ค. AP๋กœ๋งŒโ€ฆ TEX๋„ ์•ˆ ์“ด๋‹ค๊ณ ?

__enable_mmu์—์„œ mcr p15, 0, r4, c2, c0, 0 @ load page table pointer (TTBR0) (r4๋Š” [PA] page table pointer)

mmu ์ผœ๊ธฐ ์ „์— mov r0, r0, isb ์ด๋Ÿฐ ๊ฒƒ์„ ๋•Œ๋ ค์ฃผ๋Š” ์ด์œ ๋Š”? guideline ๋ฌธ์„œ๋Š”?

head-common.S __mmap_switched: mmu๋ฅผ ์ผ  ์ƒํƒœ์—์„œ ์ง„ํ–‰๋จ

__data_loc, _sdata, __bss_start

http://sourceware.org/binutils/docs/ld/ http://korea.gnu.org/manual/release/ld/ld-sjp/ld-ko_3.html (๋ฒˆ์—ญ) http://korea.gnu.org/manual/release/ld/ld-mahajjh/ld_3.html (๋ฒˆ์—ญ)

http://wiki.kldp.org/wiki.php/XIPOverview

AT ( ldadr ) - ld manual AT ํ‚ค์›Œ๋“œ ๋’ค์— ๋”ฐ๋ผ ์˜ค๋Š” ํ‘œํ˜„์‹ ldadr๋Š” ๊ทธ ์„น์…˜์˜ ๋กœ๋“œ ์ฃผ์†Œ๋ฅผ ์ง€์ •ํ•œ๋‹ค. ๋””ํดํŠธ(AT ํ‚ค์›Œ๋“œ๋ฅผ ์‚ฌ์šฉํ•˜์ง€ ์•Š๋Š”๋‹ค๋ฉด)๋Š” ์œ„์น˜ ์žฌ์ง€์ • ์ฃผ์†Œ์™€ ๋™์ผํ•˜๋‹ค. ์ด ๊ธฐ๋Šฅ์€ ROM ์ด๋ฏธ์ง€๋ฅผ ๋นŒ๋“œํ•˜๊ธฐ ์‰ฝ๋„๋ก ๊ณ ์•ˆ๋œ ๊ฒƒ์ด๋‹ค. ์˜ˆ๋ฅผ ๋“ค์–ด์„œ ์ด SECTIONS ์ •์˜๋Š” ๋‘๊ฐ€์ง€ ์ถœ๋ ฅ ์„น์…˜๋“ค์„ ์ƒ์„ฑํ•œ๋‹ค: ํ•˜๋‚˜๋Š” .text'๋ผ๊ณ  ๋ถˆ๋ฆฌ๋Š” ๊ฒƒ์ด๋ฉฐ ์ด๊ฒƒ์€ 0x1000์—์„œ ์‹œ์ž‘ํ•œ๋‹ค. ๊ทธ๋ฆฌ๊ณ  ๋‹ค๋ฅธ ํ•˜๋‚˜๋Š” .mdata'๋ผ๊ณ  ๋ถˆ๋ฆฌ๋Š” ๊ฒƒ์ด๋ฉฐ ์ด๊ฒƒ์˜ ์œ„์น˜ ์žฌ์ง€์ • ์ฃผ์†Œ๊ฐ€ 0x2000์ผ์ง€๋ผ๋„ `.text' ์„น์…˜์˜ ๋์— ๋กœ๋“œ๋œ๋‹ค. ์‹ฌ๋ฒŒ _data์€ code{0x2000} ๊ฐ’์œผ๋กœ ์ •์˜๋œ๋‹ค:

CONFIG_SMP์—์„œ๋Š” ALT_SMP, ALT_UP ๋ชจ๋‘ ์“ฐ์ž„ ALT_SMP <- SMP ์ปค๋„์ด SMP์—์„œ ๋Œ์•„๊ฐˆ ๋•Œ? ALT_UP <- SMP ์ปค๋„์ด UP์—์„œ ๋Œ์•„๊ฐˆ ๋•Œ?

r4 Save processor ID r5 Save machine type r6 Save atags pointer r7 cr_alignment:, cr_no_alignment:

start_kernel ํ˜ธ์ถœ์‹œ ์ „๋‹ฌํ•˜๋Š” argument

ALIGN () - ld manual

kernel build system - .cmd ํŒŒ์ผ ์šฉ๋„๋Š”??? vmlinux.lds.S, vmlinux.lds.h -> vmlinux.lds

:set incsearch

__turn_mmu_on๊ณผ ์ปค๋„ ์˜์—ญ์— ๋Œ€ํ•œ entry ์ƒ์„ฑ์€ ๋™์ผํ•˜์ง€ ์•Š๋‚˜? ํ™•์ธํ•ด ๋ณด์ž.

2012.11.03

http://gcc.gnu.org/onlinedocs/gcc/Function-Attributes.html http://www.iamroot.org/xe/7111

__init (initํ•  ๋•Œ๋งŒ ํ•„์š”. init์ด ๋๋‚œ ๋’ค์—๋Š” memory์—์„œ ํ•ด์ œ)

#define __init __section(.init.text) __cold notrace (init section์˜ ํ•ด์ œ์‹œ์ ์€?)

likely(), unlikely() <- ๋‚ด๋ถ€์ ์œผ๋กœ ์–ด๋–ป๊ฒŒ ๋™์ž‘?

__cold ์ฝœ๋“œ๋กœ ๋”ฐ๋กœ ๋บ์œผ๋‹ˆ ์‚ฌ์ด์ฆˆ๊ฐ€ ์ค„์–ด๋“  ๊ฑฐ๊ฒ ์ง€? paths leading???

๊ฐ€์žฅ ๊ฐ€๊นŒ์šด #ifdef, #else๋กœ ์ž๋™์œผ๋กœ ์ด๋™ํ•  ๋ฐฉ๋ฒ• ์—†๋‚˜?

asmlinkage void __init start_kernel(void)

void __init __weak smp_setup_processor_id(void) ๊ป๋ฐ๊ธฐ๋งŒ ๋งŒ๋“ค์–ด ๋†“๊ณ , ๊ตฌํ˜„๋œ architecture ์—์„œ๋Š” ์‹ค์ œ implement ํ•œ ๋‚ด์šฉ์ด ํ˜ธ์ถœ๋œ๋‹ค.

is_smp !!smp_on_up <- boolean์œผ๋กœ ๋„˜๊ธฐ๊ธฐ ์œ„ํ•ด..

c์—์„œ extern์œผ๋กœ smp_on_up ??

smp_setup_processor_id cpu_logical_map ๋ฒˆํ˜ธ๋ฅผ ์ง€์ •ํ•จ

MPIDR ; Multiprocessor affinity register <- TRM ๋ฌธ์„œ๋ฅผ ๋ด์•ผํ•จ. (cluster์˜ ๊ฐœ๋…์€?)

cgroup http://studyfoss.egloos.com/5505982 http://studyfoss.egloos.com/5506102 http://www.kernel.org/doc/Documentation/cgroups/cgroups.txt

cgroup์ด ํŒŒ์ผ์‹œ์Šคํ…œ ๊ธฐ๋Šฅ์„ ์ด์šฉํ•ด ๋™์ ์œผ๋กœ ์ฒ˜๋ฆฌ๋œ๋‹ค๊ณ ? ๊ทธ๋Ÿผ ์ปค๋„ ๋‚ด์—์„œ๋Š”? ์ปค๋„ ์™ธ๋ถ€์—์„œ๋Š” ์–ด๋–ค ์šฉ๋„๋กœ?

cgroup_init_early

์ปค๋„์˜ ๋™๊ธฐํ™” atomic_t atomic_set(&atomic, 1); atomic_add <- ldrex, strex๋กœ ๊ตฌํ˜„.

open access exclusive access

2๊ฐœ ์ผ ๋•Œ ldrex ldrex strex strex <- ์‹คํŒจ. ์„ฑ๊ณต

http://www.iamroot.org/xe/66152 <- 3๊ฐœ์ผ ๋•Œ Thread1์˜ strex๊ฐ€ ์„ฑ๊ณตํ•ด ๋ฒ„๋ฆฌ๋Š” ๋ฌธ์ œ๊ฐ€ ์žˆ๋‹ค. http://en.wikipedia.org/wiki/Load-link/store-conditional http://infocenter.arm.com/help/index.jsp?topic=/com.arm.doc.dui0204ik/Cihbghef.html

local monitor

TRM 6.2 TLB Organization Micro TLB, Main TLB

ldrex, strex ํ•œ ์Œ์œผ๋กœ ์ด๋ค„์ ธ์•ผ ํ•œ๋‹ค. strex๊ฐ€ ๋‹ค๋ฅธ ์Œ ์•ˆ์— ๋“ค์–ด์˜ค๋ฉด ์‹คํŒจํ•˜๊ณ  1์„ ๋ฆฌํ„ด 1์ด ๋ฆฌํ„ด๋˜์—ˆ๋‹ค๋ฉด ldrex๋ถ€ํ„ฐ ๋‹ค์‹œ ์‹œ๋„ <- ์ด๋ž˜์•ผ ์ž์‹ ์˜ ldrex, strex ์‚ฌ์ด์˜ ์›์ž์  ์ˆ˜ํ–‰์„ ๋ณด์žฅํ•จ

ARMv6K๋ถ€ํ„ฐ ์ถ”๊ฐ€๋œ ์›์ž์  ์—ฐ์‚ฐ. spinlock, semaphore๋„ ๋‚ด๋ถ€์ ์œผ๋กœ ์ด ๋ช…๋ น์„ ์‚ฌ์šฉํ•ด ๊ตฌํ˜„๋จ

๋ฌธ๋งฅ์ „ํ™˜์ด ๋˜์–ด ๊ฐ™์€ ๋ฉ”๋ชจ๋ฆฌ์— ๋Œ€ํ•ด ldrex, strex๊ฐ€ ์ด๋ค„์ง€๋Š” ๊ฒฝ์šฐ

clrex๊ฐ€ ํ•„์š”ํ•œ ์ด์œ  -> ์‹คํŒจ๋กœ ๋งŒ๋“ค์–ด์•ผ ์ƒˆ๋กœ ์ฝ๋Š”๋‹ค.

global exclusive monitor โ€ฆ ๋ฉ€ํ‹ฐ thread.

A3.4.1 Exclusive access instructions and Non-shareable memory regions

Load-link/store-conditional Micro TLB, Main TLB <- cortex-a9์—์„œ๋Š” ๋‘ ๊ฐœ๋ฅผ ๋‚˜๋ˆ„๋„ค?

2012.11.10 inline assembly clobber list๊ฐ€ ๋ญ๋ƒ?

o <- offsetํ™” ๊ฐ€๋Šฅํ•œ address์ด๋‹ค. ์ด๋Ÿฐ constraint๋ฅผ ์™œ ์ฃผ๋Š” ๊ฑฐ์ง€? ๋‹ค ์ ‘๊ทผ ๊ฐ€๋Šฅํ•œ ๊ฒŒ ์•„๋‹Œ๊ฐ€?

94 asm volatile("@ atomic_add\n" 95 "1: ldrex %0, [%3]\n" 96 " add %0, %0, %4\n" 97 " strex %1, %0, [%3]\n" 98 " teq %1, #0\n" 99 " bne 1b" 100 : "=&r" (result), "=&r" (tmp), "+Qo" (v->counter) 101 : "r" (&v->counter), "Ir" (i) 102 : "cc");

ldrex result, &v->counter <- result add result, result, i strex tmp, result, &v->counter <- tmp teq tmp, #0 bne 1b

cgroups task (๋ถ„๋ฅ˜๋ณ„)๋ฌถ์Œ. subsystems์— ํŒŒ๋ผ๋ฏธํ„ฐ์˜ ์…‹์œผ๋กœ ํƒœ์Šคํฌ ์…‹์„ ์ „๋‹ฌํ•œ๋‹ค.

subsystem: ํƒœ์Šคํฌ์˜ ๊ทธ๋ฃน์œผ๋กœ ํŠน์ • ๊ธฐ๋Šฅ์„ ํ•˜๋Š” ๋ชจ๋“ˆ. ๋ณดํ†ต ๋ฆฌ์†Œ์Šค ์ปจํŠธ๋กค๋Ÿฌ. per-cgroup ์ œํ•œ์„ ๋‘”๋‹ค๊ฑฐ๋‚˜.

http://www.kernel.org/doc/Documentation/cgroups/cgroups.txt http://studyfoss.egloos.com/5505982 http://studyfoss.egloos.com/5506102

cgroup

/* Per-subsystem/per-cgroup state maintained by the system. */ struct cgroup_subsys_state struct cgroup cgroup; / ์ด subsystem์ด ๋ถ™์€ cgroup (๋ญ” ์†Œ๋ฆฌ์•ผ? ์„œ๋ธŒ์‹œ์Šคํ…œ์ด cgroup์— ๋ถ™๋‚˜?) */

init_task <- EXPORT_SYMBOL

css_set๊ณผ cgroups์˜ ๊ด€๊ณ„๋Š”? ์™œ init_css_set_link๋กœ ๊ด€๋ฆฌ๋ฅผ ํ•˜๋‚˜?

2012.11.10

init task์— ๋Œ€ํ•ด ๋ดค์Œ

__read_mostly #define __read_mostly attribute((section(".data..read_mostly")))

2012.11.17

cgroup ์ค‘๊ตญ ๊ทธ๋ฆผ http://linux.chinaunix.net/techdoc/net/2008/12/23/1054425.shtml

๊น€๋‚จํ˜• ๋ธ”๋กœ๊ทธ http://studyfoss.egloos.com/5505982 http://studyfoss.egloos.com/5506102

cgroup : ํ”„๋กœ์„ธ์Šค๋ฅผ ๊ทธ๋ฃน์œผ๋กœ ๋ฌถ์–ด์ค€ ํด๋ž˜์Šค cgroup์˜ hierarchy์˜ ์˜๋ฏธ๋Š”?

cgroup(rootnode ๋‚ด)์˜ css_sets ๋ฆฌ์ŠคํŠธ์— init_css_set_link.cgrp_link_list๋ฅผ ์ถ”๊ฐ€ํ•œ๋‹ค. ์™œ? ์–ธ์ œ ์“ฐ๋ ค๊ณ ? init_css_set์˜ cg_links ๋ฆฌ์ŠคํŠธ์— init_css_set_link.cg_link_list๋ฅผ ์ถ”๊ฐ€ํ•œ๋‹ค. ์™œ? ์–ธ์ œ ์“ฐ๋ ค๊ณ ? css_set์œผ๋กœ๋ถ€ํ„ฐ ์ฐธ์กฐ๋˜๋Š” cgroups

css_set : ํ”„๋กœ์„ธ์Šค๊ฐ€ ์†ํ•œ cgroup์— ์„ค์ •๋œ ์„œ๋ธŒ์‹œ์Šคํ…œ์˜ ์„ค์ • ์ •๋ณด(cgroup_subsys_state)๋ฅผ ๋ชจ์•„ ๋†“์€ ๊ฒƒ

cg_cgroup_link - ์กด์žฌ ์ด์œ ๋Š”? css_set๊ณผ cgroup(s)์„ ์—ฐ๊ฒฐ์‹œ์ผœ ์ค€๋‹ค.

cgroup_subsys

cgroup_subsys_state

cgroupfs_root (&rootnode) ์•ˆ์— cgroup์ด top_cgroup์ด ํฌํ•จ๋˜์–ด ์žˆ๋‹ค.

cftsets๊ฐ€ ๋ญ์ง€?

/* init base cftset */ cgroup_init_cftsets(ss); if (ss->base_cftypes) { ss->base_cftset.cfts = ss->base_cftypes; list_add_tail(&ss->base_cftset.node, &ss->cftsets); }

cgroup_init_cftsets(&cgroup_subsys)

  • base_cftset(struct cftype_set

is embedded in subsys itself ss->base_cftset.cfts = ss->base_cftypes /*

  struct cgroup_subsys cpuset_subsys = {
    .base_cftypes = files;    /* struct cftype files. */
  };
*/

๊ฐ cgroup_subsys-> cftsets /* list of cftype_sets (struct list_head) */ struct cftype *base_cftypes; struct cftype_set base_cftset;

struct cftype_set struct list_head node /* chained at subsys->cftsets */ struct cftype *cfts

cpu_subsys ss->base_cftypes ; cpuset.c์— cftype์— ๋Œ€ํ•œ ํฌ์ธํ„ฐ

cgroup_init_early

cgroup_init

rootnode๋Š” ์–ด๋–ค subsystem์—์„œ๋„ ์‚ฌ์šฉ๋˜์ง€ ์•Š๋Š” cgroup๋“ค์„ ์œ„ํ•œ ๊ณ„์ธต๊ตฌ์กฐ #define dummytop (&rootnode.top_cgroup)

๋ช‡ ๊ฐ€์ง€ ์ž๋ฃŒ๊ตฌ์กฐ๋ฅผ ์ค‘์‹ฌ์œผ๋กœ ์‚ดํŽด๋ด์•ผ ํ•œ๋‹ค.

  1. static init_css_set init_css_set; cgroup_init_early()์—์„œ ๊ฐ€์žฅ ๋จผ์ € ์ดˆ๊ธฐํ™” ๋œ๋‹ค. ์ฃผ์„์„ ๋ณด๋ฉด, init๊ณผ ๊ทธ ์ž์‹์— ์˜ํ•ด ์‚ฌ์šฉ๋˜๋ฉฐ ์–ด๋–ค '๊ณ„์ธต๊ตฌ์กฐ(hierarchies)'๊ฐ€ ๋งˆ์šดํŠธ ๋˜๊ธฐ ์ „์— ์„ ํ–‰๋˜์–ด์•ผ ํ•œ๋‹ค๊ณ  ํ•œ๋‹ค. ๊ฐ ์„œ๋ธŒ์‹œ์Šคํ…œ์„ ์œ„ํ•œ root state์— ๋Œ€ํ•œ ํฌ์ธํ„ฐ๋ฅผ ํฌํ•จํ•˜๊ณ  ์žˆ๋‹ค. ๋˜ํ•œ css_sets๋ผ๋Š” ๋ฆฌ์ŠคํŠธ๋กœ์˜ ์ค‘๊ณ„์ž๋กœ ์‚ฌ์šฉ๋œ๋‹ค.

  2. static struct cgroupfs_root rootnode; ์ฃผ์„์„ ๋ณด๋ฉด, rootnode๋Š” "dummy hierarchy"์ด๋‹ค. unattached๋œ subsystem์„ ์œ„ํ•ด ์˜ˆ์•ฝ๋˜์–ด ์žˆ๋‹ค. ์ด๊ฒƒ์€ ๋‹จ์ผ cgroup๋งŒ ํฌํ•จํ•˜๊ณ  ์žˆ๋‹ค. ๋ชจ๋“  ํƒœ์Šคํฌ๋“ค์€ ๊ทธ cgroup์— ํฌํ•จ๋œ๋‹ค. -> unattached์˜ ์˜๋ฏธ๋Š”?

    struct cgroupfs_root๋Š”

struct str {}; // ๋น„์–ด ์žˆ๋Š” ๊ตฌ์กฐ์ฒด ์„ ์–ธ struct str x; // ๋ณ€์ˆ˜ ์„ ์–ธ ๊ฐ€๋Šฅํ•˜๋‹ค

2012.11.24

cgrp->subsys[ss->subsys_id]

work_queue์— fn์„ ๋“ฑ๋ก์‹œํ‚ค๋Š”๋“ฏ INIT_WORK(&css->dput_work, css_dput_fn);

process ๊ด€์  - cgroup - css_set - subsys

task์™€ css_set์˜ ๊ด€๊ณ„๋Š”? ์–ธ์ œ css_set์ด ์ƒˆ๋กœ ๋งŒ๋“ค์–ด์ง€๋‚˜?

cache line bouncing http://barriosstory.blogspot.kr/2008/03/cache.html

tick

CHECKER <- sparse ์ •์ ๋ถ„์„ํˆด

arch_spin_lock ldrex / strex (LL/SC ๋ช…๋ น)

local_irq_save(flags) <- local cpu์˜ irq ์ƒํƒœ๋ฅผ ์ €์žฅํ•˜๊ณ  interrupt disable local_irq_restore(flags)

arch_local_irq_save() cpsid

2012.12.01

#define arch_spin_lock_flags(lock, flags) arch_spin_lock(lock)

wfe() <- ๋ฌดํ•œ๋Œ€๊ธฐ?

/*

  • Prevent the compiler from merging or refetching accesses. The compiler
  • is also forbidden from reordering successive instances of ACCESS_ONCE(),
  • but only when the compiler is aware of some particular ordering. One way
  • to make the compiler aware of ordering is to put the two invocations of
  • ACCESS_ONCE() in different C statements.
  • This macro does absolutely -nothing- to prevent the CPU from reordering,
  • merging, or refetching absolutely anything at any time. Its main intended
  • use is to mediate communication between process-level code and irq/NMI
  • handlers, all running on the same CPU. / #define ACCESS_ONCE(x) ((volatile typeof(x) *)&(x))

lockval.tickets.next != lockval.tickets.owner

ARMEB <- Big endian์ผ ๋•Œ

wfe / sev <- instruction ์Œ Spinlock์—์„œ ์‚ฌ์šฉ. wait for event / signal cortex-a9 mpcore TRM 2.5 Event communication with an external agent using WFE/SEV A peripheral connected on the coherency port or any other external agent can participate in the WFE/SEV event communication of the Cortex-A9 MPCore processor by using the EVENTI pin.

When this pin is asserted, it sends an event message to all the Cortex-A9 processors in the cluster. This is similar to executing a SEV instruction on one processor of the cluster. This enables the external agent to signal to the processors that it has released a semaphore and that the processors can leave the power saving mode. The EVENTI input pin must remain high at least one CPUCLK clock cycle to be visible by the processors.

The external agent can see that at least one of the Cortex-A9 processors in the cluster has executed an SEV instruction by checking the EVENTO pin. This pin is set high for one CLK clock cycle when any of the Cortex-A9 processor in the cluster executes an SEV instruction.

Notifier Chain : ๋ณ„๋„ ์„น์…˜์œผ๋กœ ์ด๋™.

__builtin_return_address(0)

์ž์‹ ์ด ์†ํ•œ #ifdef, #ifndef ์‹œ์ž‘๋ฌธ์œผ๋กœ ๊ฐ€๋Š” ๋ฐฉ๋ฒ•์€?

2012.12.08

clockevent_chain์€ static์œผ๋กœ ํŒŒ์ผ์Šค์ฝ”ํ”„๋ฅผ ๊ฐ€์ง„๋‹ค. tick_notifier๋ฅผ ๋“ฑ๋ก. raw_notifier_chain_register ์‹ค์ œ๋กœ ๋“ฑ๋กํ•ด์ฃผ๋Š” ํ•จ์ˆ˜๋Š” ๋ช‡ ๊ตฐ๋ฐ์„œ ํ˜ธ์ถœํ•ด์ฃผ๋„ค.

raw_smp_processor_id ์—์„œ ๋”ฐ๋ผ์˜ค๋Š” ๊ฒƒ... static inline struct thread_info *current_thread_info(void) attribute_const;

attribute((const))

register unsigned long sp asm ("sp"); http://gcc.gnu.org/onlinedocs/gcc-4.1.2/gcc/Global-Reg-Vars.html

struct task_struct init_task = INIT_TASK(init_task); <- ๊ทธ๋™์•ˆ ์–˜๋ฅผ ์‹ ๊ฒฝ ์“ฐ์ง€ ์•Š๊ณ  ๋„˜์–ด๊ฐ”๋„ค? union thread_union init_thread_union __init_task_data =
{ INIT_THREAD_INFO(init_task) };

#define __init_task_data attribute((section(".data..init_task"))) <- ์ด ์„น์…˜ ์ด๋ฆ„์€?

#define INIT_TASK(tsk)
. . . .stack = &init_thread_info, <-- #define init_thread_info (init_thread_union.thread_info) // init_thread_union์€ init/init_task.c์— ์ „์—ญ์œผ๋กœ ์„ ์–ธ . . .

current_thread_info() return (struct thread_info *)(sp & ~(THREAD_SIZE - 1)); <- init_thread_info์˜ thread_info์ž„.

INIT_TASK์˜ .stack / INIT_THREAD_INFO์˜ .task๋กœ ์„œ๋กœ ์—ฐ๊ด€๋˜์–ด ์žˆ๋‹ค.

๋‹ค์Œ long์˜ ๋ฐฐ์ˆ˜๋กœ ๋งŒ๋“œ๋Š” ๊ฒƒ์ด๋„ค. ((nr) + (8 * 4) - 1) / (8 * 4)

#define BITS_TO_LONGS(nr) DIV_ROUND_UP(nr, BITS_PER_BYTE * sizeof(long)) #define DIV_ROUND_UP(n,d) (((n) + (d) - 1) / (d)) <- ๋น„ํŠธ ์—ฐ์‚ฐ์ด ๋‚ซ์ง€ ์•Š๋‚˜? ์–ด๋–ป๊ฒŒ?

#define set_bit(nr,p) ATOMIC_BITOP(set_bit,nr,p) #define ATOMIC_BITOP(name,nr,p) _##name(nr,p)

_set_bit(nr, p) <- nr bitop _set_bit, orr <- setbit.S ๋งคํฌ๋กœ .macro bitop, name, instr <- macro ๋ถ„์„ํ•ด ๋ณด๋ฉด ํŠน์ • ์—ฐ์†๋œ word์—์„œ word index ์ฐพ์•„ ๊ทธ ์œ„์น˜์˜ ํ•ด๋‹น ๋น„ํŠธ๋ฅผ orr๋กœ ์„ค์ • ํ•ด์ฃผ๋Š” ๊ฒƒ์ธ๋ฐ, ์ด๋ ‡๊ฒŒ ์ง์ ‘ bitop์„ ์–ด์…ˆ์œผ๋กœ ํ˜ธ์ถœํ•œ ์ด์œ ๊ฐ€ ldrex, strex ๋•Œ๋ฌธ์ธ ๋“ฏ ํ•˜๋‹ค.

UNWIND(.fnstart) <- divide by 0 ๊ฐ™์€ ๊ฒฝ์šฐ, ์–ด๋””์„œ ํ˜ธ์ถœํ–ˆ๋Š”์ง€ unwind table์— ๋“ฑ๋ก์‹œํ‚ค๊ธฐ ์œ„ํ•ด ์‚ฌ์šฉ โ€ฆ UNWIND(.fnend) http://sourceware.org/binutils/docs/as/ARM-Unwinding-Tutorial.html http://lkml.indiana.edu/hypermail/linux/kernel/1105.1/02577.html http://sourceware.org/binutils/docs/as/ARM-Directives.html#arm%5ffnstart

#define set_bit(nr,p) ATOMIC_BITOP(set_bit,nr,p) #define clear_bit(nr,p) ATOMIC_BITOP(clear_bit,nr,p) #define change_bit(nr,p) ATOMIC_BITOP(change_bit,nr,p) #define test_and_set_bit(nr,p) ATOMIC_BITOP(test_and_set_bit,nr,p) #define test_and_clear_bit(nr,p) ATOMIC_BITOP(test_and_clear_bit,nr,p) #define test_and_change_bit(nr,p) ATOMIC_BITOP(test_and_change_bit,nr,p)

bitop _set_bit, orr bitop _clear_bit, bic bitop _change_bit, eor

page_address_pool page_address_map[LAST_PKMAP] <- ์–ด๋””์— ์“ฐ๋Š” ๊ฒƒ์ธ๊ฐ€? cache ์šฉ๋„? #define LAST_PKMAP PTRS_PER_PTE

ํ˜„์žฌ ๋ฆฌ๋ˆ…์Šค์—์„œ ์‚ฌ์šฉํ•˜๋Š” ๊ฒŒ ๋ช‡ ๋‹จ๊ณ„ ํŽ˜์ด์ง•์ธ๊ฐ€?

INIT_LIST_HEAD(head_name) list_add(๋‹ฌ๊ณ ์‹ถ์€์ž๋ฃŒ์˜.list, head_name) <- ํ•ญ์ƒ head ๋‹ค์Œ์— ๋„ฃ๋Š”๋‹ค.

page_address_htable โ€ฆ hash table์„ ์–ด๋–ป๊ฒŒ?

http://gcc.gnu.org/onlinedocs/cpp/Standard-Predefined-Macros.html STDC <- ์ปดํŒŒ์ผ๋Ÿฌ์— ๋“ค์–ด๊ฐ€๊ธด ํ•˜๋Š”๋ฐ, ์ด๋Ÿฐ ๊ฒƒ๋“ค ์–ด๋–ป๊ฒŒ ํ™•์ธํ•˜๋‚˜?

__glue(name,fn) #ifdef STDC #define ____glue(name,fn) name##fn <- ์šฐ๋ฆฌ๋Š” ์ด๊ฑฐ์ง€ ๋ญ... #else #define ____glue(name,fn) name/**/fn <- namefn์ด๋ผ๋Š” ์‹ฌ๋ณผ๋กœ ์ธ์‹๋˜์ง€ ์•Š๋„๋ก #endif

static union { char c[4]; unsigned long l; } endian_test __initdata = { { 'l', '?', '?', 'b' } }; #define ENDIANNESS ((char)endian_test.l)

2012.12.15

armv7 VIPT nonaliasing tag bit๋ฅผ ์กฐ๊ธˆ ๋” ์‚ฌ์šฉํ•ด์„œ โ€ฆ

r8(PA) <= r8(VA) + r4(PA-VA) --- head.S PAGE_OFFSET (VA) -> PA๋กœ ๋ณ€ํ™˜ํ•œ ๊ฒฐ๊ณผ๊ฐ€ PHYS_OFFSET. __pv_phys_offset (vexpress๋Š” 2G/2G split์„ ์‚ฌ์šฉํ•˜์—ฌ CONFIG_PAGE_OFFSET๋กœ 0x80000000) PHYS_OFFSET ์„ __pv_phys_offset ์— ์ €์žฅ

arch/arm/mach-vexpress/v2m.c MACHINE_START ์—ฌ๋Ÿฌ ๊ฐœ

__tagtable_begin, __tagtable_end ์— ์ €์žฅ๋˜์–ด ์žˆ๋Š” ๊ฒƒ์€ tagtable. { tag; parse; }

head-common.S์—์„œ ๋„˜์–ด์˜จ __atags_pointer๋ฅผ ์ฐธ๊ณ ํ•ด PA๋กœ ๋ณ€ํ™˜ํ•ด tags์— ์ €์žฅ

ATAGS : mandatory๋กœ ATAG_MEM์ด ์ฃผ์–ด์ง„๋‹ค. size์™€ address. http://www.simtec.co.uk/products/SWLINUX/files/booting_article.html

arch/arm/kernel/setup.c์—์„œ ATAG_MEM ์ฒ˜๋ฆฌ.

u-boot์—์„œ ATAG๋ฅผ ํ†ตํ•ด kernel์— parameter๋ฅผ ์ „๋‹ฌ

R0 => 0
R1 => machine type
R2 => ATAG์˜ base address
  		Physical memory address (mandatary)

2012.12.22

attribute ((used)) : inline assembly์—์„œ ์‚ฌ์šฉ๋  ๊ฑฐ๋‹ˆ๊นŒ ๋‹ค๋ฅธ ๊ณณ์—์„œ ๋ช…์‹œ์ ์œผ๋กœ ํ˜ธ์ถœํ•˜์ง€ ์•Š์•„๋„ ์ตœ์ ํ™” ๊ณผ์ •์—์„œ ๋‚จ๊ฒจ๋‘ฌ๋ผ attribute ((unused)) : ์•ˆ ์“ฐ์ด๋”๋ผ๋„ buildํ•  ๋•Œ warning์„ ์ฐ์ง€ ๋ง์•„๋ผ

size -= start & ~PAGE_MASK; // 3K๊ฐ€ ํŠ€์–ด ๋‚˜์™”๋‹ค๋ฉด 3K๋งŒํผ์„ size์—์„œ ๊น๋‹ค. bank->start = PAGE_ALIGN(start); // start๋ฅผ align ํ•ด์ฃผ๋Š” ๋ถ€๋ถ„

bank->size = size & ~(phys_addr_t)(PAGE_SIZE - 1); // ์‹ค์ œ๋กœ round down ํ•ด์ฃผ๋Š” ๋ถ€๋ถ„

#define tag_next(t) ((struct tag *)((__u32 *)(t) + (t)->hdr.size)) ์ด๊ฑธ ์™œ ์ด๋Ÿฐ์‹์œผ๋กœ ํ•ด? ๋ช…์‹œ์ ์œผ๋กœ ์•ˆ ์จ์ฃผ๊ณ . t + (4 * size) #define tag_size(type) ((sizeof(struct tag_header) + sizeof(struct type)) >> 2)

/* untouched command line saved by arch-specific code / boot_command_line / Untouched saved command line (eg. for /proc) */ char saved_command_line; / Command line for parameter parsing */ static char *static_command_line;

cmd_line <- boot_command_line ๋ณต์‚ฌ *cmdline_p = cmd_line ๋Œ€์ž…

์ˆœ์„œ? setup_arch(&command_line) ์—์„œ setup_machine_tags() ํ˜ธ์ถœ setup_machine_tags() ์—์„œ from = default_command_line(=CONFIG_CMDLINE), strlcpy(boot_command_line, from, โ€ฆ) strlcpy(cmd_line, boot_command_line, โ€ฆ) *cmdline_p = cmd_line; (cmdline_p๊ฐ€ command_line) setup_command_line(command_line) ์—์„œ strcpy (saved_command_line, boot_command_line); strcpy (static_command_line, command_line);

  • ATAGS์— ATAG_CMDLINE์ด ํฌํ•จ๋˜์—ˆ๋‹ค๋ฉด parse_tag_cmdline() ์—์„œ ATAG๋กœ ๋„˜์–ด์˜จ cmdline์„ default_command_line ์— ๋ณต์‚ฌ

2013.01.05

EXPORT_SYMBOL /proc/ksyms module ์—์„œ ์ปค๋„ ๋‚ด์˜ ํ•จ์ˆ˜๋ฅผ ์‚ฌ์šฉํ•˜๊ธฐ ์œ„ํ•ด ์‹ฌ๋ณผ ํ…Œ์ด๋ธ”์„ ๋งŒ๋“ค์–ด ๋‘ .

__ksymtab_strings ํ•˜๋‚˜์™€ ์‹ฌ๋ณผ๋งˆ๋‹ค __ksymtab+SYM ๋ฅผ ๋‘๊ณ  ์žˆ์Œ (_gpl๋„ ์žˆ๊ณ ) ; __ksymtab ์ฐพ์„ ๋•Œ๋Š” __*_strings์—์„œ ์œ„์น˜๋ฅผ ์ฐพ์•„ section์˜ ์ฃผ์†Œ๋ฅผ ์ฐพ๊ณ , ํ•ด๋‹น ์‹ฌ๋ณผ์— ์ ‘๊ทผํ•œ๋‹ค.

typeof <- ํ•จ์ˆ˜์—๋„ ์ผ๋„ค? http://gcc.gnu.org/onlinedocs/gcc/Typeof.html

__setup_param ๋งคํฌ๋กœ์—์„œ .init.setup์— ๋„ฃ์–ด์ฃผ๊ณ , do_early_param์—์„œ ํ•˜๋‚˜์”ฉ ๋’ค์ ธ๊ฐ€๋ฉด์„œ ์‹คํ–‰ํ•œ๋‹ค. #define early_param() __setup_param <- parameter์™€ ์‹คํ–‰ํ•  ํ•จ์ˆ˜์Œ

do_early_param์—์„œ __setup_start ~ __setup_end ์‚ฌ์ด๋ฅผ ๋ˆ๋‹ค.

  • console ์‚ด๋ฆฌ๊ธฐ ์ „์— buffer์—๋งŒ ์Œ“์•„๋‘”๋‹ค. (printk ๋ถ„์„ํ•œ b์กฐ)

earlyprintk parameter๋Š” x86 ๋“ฑ์—๋งŒ ๋“ค์–ด๊ฐ„๋‹ค.

2013.01.12

Makefile์—์„œ include/generated/utsrelease.h: include/config/kernel.release FORCE $(call filechk,utsrelease.h)

include/config/kernel.release ์—๋Š” 3.6.0-rc1+

์ƒ์„ฑ๋œ include/generated/utsrelease.h ์—๋Š” #define UTS_RELEASE "3.6.0-rc1+"

sort.c ํ•จ์ˆ˜ (1. heapify 2. sort - ํฐ ์ˆ˜๋ฅผ ๋นผ์„œ ๋ฐฐ์—ด์˜ ๊ฐ€์žฅ ๋์— ๋„ฃ๊ณ , ๋‹ค์‹œ ํž™์ด ๋˜๋„๋ก ์žฌ๊ตฌ์„ฑ์„ ๋ฐ˜๋ณต) compare ํ•จ์ˆ˜์™€ swap ํ•จ์ˆ˜๋ฅผ ์ „๋‹ฌ๋ฐ›์•„ ์‚ฌ์šฉํ•œ๋‹ค.

์—ฌ๊ธฐ์„œ๋Š” ๋ฌผ๋ฆฌ์ฃผ์†Œ๋กœ๋ถ€ํ„ฐ pfn index๋ฅผ ๊ตฌํ•ด ๋น„๊ตํ•œ๋‹ค (__phys_to_pfn)

meminfo์—๋Š” ๋ฌด์—‡์„ ์ €์žฅํ•˜๋Š”๊ฐ€? meminfo๋ฅผ sort ํ•˜๋Š” ์ด์œ ? mem=SIZE@ADDR

bank์˜ ์šฉ๋„๋Š”? ์ˆœ์ „ํžˆ ๋ฌผ๋ฆฌ์ ์ธ ๊ฐœ๋…์œผ๋กœ๋งŒ ์‚ฌ์šฉ๋˜์ง€ ์•Š๋Š” ๊ฒŒ, highmem์—์„œ vmalloc_min๊ณผ ๊ฒน์น˜๋ฉด bank๋ฅผ ํ•˜๋‚˜ ์ถ”๊ฐ€ํ•œ๋‹ค.

sanity_check_meminfo ์—์„œ continue ํ•˜๋ฉด bank๋Š” ๊ทธ๋Œ€๋กœ ๋‚จ์•„ ์žˆ๋Š” ๊ฒƒ ์•„๋‹Œ๊ฐ€? struct membank *bank = &meminfo.bank[j]; // bank๋Š” j๋ฒˆ์งธ ๋ฑ…ํฌ์— ๋Œ€ํ•œ ํฌ์ธํ„ฐ (continue๋กœ ์˜ค๋ฉด j ์ฆ๊ฐ€ ์•ˆ ํ•จ) *bank = meminfo.bank[i]; // i๋ฒˆ์งธ ๋ฑ…ํฌ๊ฐ’์„ *bank์— ๋„ฃ๋Š”๋‹ค. (i๋Š” continues๋กœ ์™€๋„ ์ฆ๊ฐ€. ๋”ฐ๋ผ์„œ continue๋กœ ์˜ค๋ฉด ํ•ด๋‹น ๋ฑ…ํฌ๊ฐ€ ๋ฌด์‹œ๋œ๋‹ค.)

arm์€ highmem์„ ์•ˆ ์“ฐ๋‚˜? (depends on MMU)

  • ์“ด๋‹ค. ์šฐ์„  high memory์˜ ์˜๋ฏธ๋Š” ์˜๊ตฌ์ ์œผ๋กœ mapping (address space์™€ ์‹ค์ œ physical memory ์‚ฌ์ด์˜ ์—ฐ๊ฒฐ) ๋˜์ง€ ์•Š์€ ๋ฌผ๋ฆฌ ๋ฉ”๋ชจ๋ฆฌ๋ฅผ ๋งํ•œ๋‹ค.

  • arm์€ 4๊ธฐ๊ฐ€ ์ฃผ์†Œ ๊ณต๊ฐ„์„ ์‚ฌ์šฉํ•œ๋‹ค. ์ด ์•ˆ์—์„œ ์‚ฌ์šฉ์ž ๋ฉ”๋ชจ๋ฆฌ ๊ณต๊ฐ„, ์ปค๋„ ๊ณต๊ฐ„ (memmory mapped IO ํฌํ•จ)์„ ์‚ฌ์šฉํ•˜๊ณ  ์žˆ๋‹ค.

  • ํฐ ๋ฌผ๋ฆฌ ๋ฉ”๋ชจ๋ฆฌ๋ฅผ ๋‹จ๋‹ค๋ฉด, ์ „์ฒด ๋ฉ”๋ชจ๋ฆฌ๋ฅผ ๋‹ค ์˜๊ตฌ ๋งคํ•‘ํ•˜์ง€ ๋ชปํ•œ๋‹ค๋Š” ๋ง์ด๋‹ค.

  • ์ฝ”๋“œ์ƒ์œผ๋กœ ๋ณด๋ฉด bank->start๊ฐ€ 4๊ธฐ๊ฐ€ ์ด์ƒ์ด๋ฉด highmem์ด๋‹ค. (CONFIG_HIGHMEM์ด ์ผœ ์žˆ๋“  ์•ˆ ์ผœ ์žˆ๋“ )

  • ์ด์–ด CONFIG_HIGHMEM์ด ์ผœ์ ธ ์žˆ์„ ๋•Œ ์‹คํ–‰๋˜๋Š” ์ฝ”๋“œ์—์„œ if (__va(bank->start) >= vmalloc_min || // bank ์‹œ์ž‘ ์ฃผ์†Œ๊ฐ€ vmalloc_min๋ณด๋‹ค ํฌ๊ฑฐ๋‚˜ __va(bank->start) < (void *)PAGE_OFFSET) // PAGE_OFFSET (RAM์˜ ์‹œ์ž‘์ฃผ์†Œ:VA)๋ณด๋‹ค ์ž‘์œผ๋ฉด highmem์ด๋‹ค. highmem = 1;

๊ฒฐ๊ตญ arm_lowmem_limit (start + size - 1)์— ๋Œ€ํ•œ ๊ฐ€์ƒ์ฃผ์†Œ + 1์„ ๊ตฌํ•ด์™€ high_memory์— ์ €์žฅํ•˜๊ณ , memblock_set_current_limit()์„ ํ˜ธ์ถœํ•˜์—ฌ memblock.current_limit์— ์ €์žฅํ•œ๋‹ค.

  • memblock์€ ๋˜ ๋ญ์•ผ? memory ์˜์—ญ์™€ reserved regions์— ๋Œ€ํ•œ ์ •๋ณด๋ฅผ ๊ฐ€์ง€๊ณ  ์žˆ๋‹ค. memory ์˜์—ญ์€ meminfo๋ฅผ ๋ฐ”ํƒ•์œผ๋กœ ๊ตฌ์„ฑ๋˜๋ฉฐ, overlap๋˜์—ˆ์„ ๊ฒฝ์šฐ mergeํ•ด ํ•˜๋‚˜์˜ region์œผ๋กœ ๋‚˜ํƒ€๋‚ธ๋‹ค. reserved ์˜์—ญ์€ kernel์ด ์˜ฌ๋ผ๊ฐ„ ๋ฉ”๋ชจ๋ฆฌ, inited๊ฐ€ ์˜ฌ๋ผ๊ฐ„ ๋ฉ”๋ชจ๋ฆฌ์ฒ˜๋Ÿผ ์˜ˆ์•ฝ๋œ ๊ณต๊ฐ„์„ ๋‚˜ํƒ€๋‚ธ๋‹ค.

vmalloc : ๊ฐ€์ƒ์œผ๋กœ ์—ฐ์† (vmalloc=size๋ฅผ ์•ˆ ์ฃผ๋ฉด erarly_vmalloc์ด ํ˜ธ์ถœ๋˜์ง€ ์•Š๊ณ , ๊ธฐ๋ณธ์œผ๋กœ 240์ด ๋œ๋‹ค.) kmalloc : ๋ฌผ๋ฆฌ์ ์œผ๋กœ ์—ฐ์†

vmalloc_min ์˜ ์šฉ๋„๋Š”? (์™œ vmalloc๊ณผ ๊ฒน์น˜๋ฉด ์ƒˆ๋กœ์šด ๋ฑ…ํฌ์— ์ €์žฅํ• ๊นŒ?) ff00 0000 (VMALLOC_END) <- ์™œ VMALLOC_END๊ฐ€ (16MB์ธ์ง€๋Š” ๋ชจ๋ฅด๊ฒ ๋‹ค)

  • f00 0000 (240 << 20)
  • 80 0000 (VMALLOC_OFFSET)

ef80 0000 (vmalloc_min)

  • printk (KERN_CRIT, ...) ์ผ ๋•Œ๋Š” while๋กœ ๋ฃจํ”„๋ฅผ ๋ˆ๋‹ค (B์กฐ ์ด์•ผ๊ธฐ)

  • (kernel ์˜ฌ๋ผ์˜ค๋ฉด์„œ page table ์— ๋งคํ•‘ ์–ด๋–ป๊ฒŒ ํ–ˆ์—ˆ์ง€? ์ปค๋„ ์˜์—ญ๋งŒ ๋งคํ•‘ ํ–ˆ๋‚˜?)

2013.01.19

ํ—ท๊ฐˆ๋ฆฌ๋ฉด ์•ˆ ๋  ๊ฒƒ์ด

  1. kernel memory space vexpress 1/3, 2/2, 3/1 ์ค‘์— 2/2

  2. kernel์—์„œ์˜ ์‹ค์ œ ๋งคํ•‘ ๊ฐ€๋Šฅ ๋ฉ”๋ชจ๋ฆฌ ์˜์—ญ

sanity_check_meminfo() ์—์„œ bank->start ๋“ฑ์˜ ์ฃผ์†Œ๋ฅผ VA๋กœ ๋ณ€ํ™˜ํ•ด์„œ ํŒ๋‹จํ•˜๋„ค

arm kernel์—์„œ ๋ฌผ๋ฆฌ ๋ฉ”๋ชจ๋ฆฌ๋ฅผ ์–ผ๋งˆ๊นŒ์ง€ ์žก์„ ์ˆ˜ ์žˆ์„ ๊ฒƒ์ธ๊ฐ€? vexpress 2/2 split

ld์˜ application ๊ฐ€์ƒ๋ฒˆ์ง€ ์‹œ์ž‘์œ„์น˜ ???

2๊ธฐ๊ฐ€ ๊ฐ€๊นŒ์ด ๋ฐ”๋กœ ์žก์„ ์ˆ˜ ์žˆ๋„๋ก ํ•˜๋Š” ...

ZONE_NORMAL ๋ฌผ๋ฆฌ์ฃผ์†Œ ๊ฐ€์ƒ์ฃผ์†Œ 1:1 mapping ZONE_HIGH mapping table

๋ฌผ๋ฆฌ ๋ฉ”๋ชจ๋ฆฌ ์ธ์‹์‹œํ‚ค๋Š” ๋ฐฉ์‹ - page table๋กœ 1:1 ์žก๋Š” ๋ฐฉ์‹ - lpae (4GB ์ด์ƒ์˜ ๋ฐฉ์‹)

arm์—์„œ์˜ highmem์€ ์šฐ๋ฆฌ์˜ ๊ฒฝ์šฐ 2GB๊นŒ์ง€??? ์•„๋ฌดํŠผ ๋ฌผ๋ฆฌ ๋ฉ”๋ชจ๋ฆฌ ์ค‘ kernel์— 1:1 mapping ํ•  ์ˆ˜ ์—†๋Š” ์˜์—ญ

http://lwn.net/Articles/440221/

arm_lowmem_limit ; ์ด ๋ณ€์ˆ˜์˜ ์šฉ๋„๋Š”? (๋ฌผ๋ฆฌ์ฃผ์†Œ) memblock์— current_limit์„ ์„ค์ •ํ•  ๋•Œ๋„ ์“ฐ์ธ๋‹ค.

arch/arm/Kconfig config HIGHMEM ์„ค๋ช… ์ฐธ๊ณ 

CONFIG_HIGHMEM์ด ์•ˆ ๋˜์–ด ์žˆ๋Š”๋ฐ highmem์ด ์ผœ ์žˆ๋‹ค๋ฉด (ULONG_MAX ์ดˆ๊ณผ), ๊ทธ ๋ฑ…ํฌ๋ฅผ ๋นผ๊ณ  ๋‹ค์Œ ๋ฑ…ํฌ๋ฅผ ์ง„ํ–‰ํ•œ๋‹ค.

bank์˜ ๋ฉ”๋ชจ๋ฆฌ๊ฐ€ ๊ฒน์น˜๊ฒŒ ๋“ค์–ด์™”๋‹ค๋ฉด ์–ด๋–ป๊ฒŒ ๋˜๋‚˜?

  • ํ˜„์žฌ ๋ผ์ธ์ด macro ์ค‘ ์–ด๋”” ์•ˆ์— ๋“ค์–ด๊ฐ€ ์žˆ๋Š”์ง€ ๋ณด์—ฌ์ฃผ๋ฉด ์ข‹์„ ๊ฒƒ ๊ฐ™๋‹ค.

setup.c์˜ cacheid_init์—์„œ cacheid๋ฅผ CACHEID_VIPT_NONALIASING ๋กœ ์„ค์ •ํ–ˆ์Œ CACHEID_VIPT_ALIASING

VIVT aliasing ๋ฌธ์ œ์— ๋Œ€ํ•ด ์•Œ์•„๋ณด์ž. ; ์—ฌ๋Ÿฌ virtual address๊ฐ€ ํ•˜๋‚˜์˜ physical address์— ๋ฌผ๋ ค ์žˆ์–ด์„œ coherency problem์ด ๋ฐœ์ƒํ•œ๋‹ค.

VIPT ์—์„œ๋Š” tag๊ฐ€ ๋ฌผ๋ฆฌ์ฃผ์†Œ๋ผ aliasing์„ ๊ฒ€์ถœํ•  ์ˆ˜ ์žˆ๋‹ค? vipt-nonaliasing vipt-aliasing

MMU corelink400

sanity_check_meminfo(); <- ๊ฒฐ๊ตญ high_memory์„ ์„ค์ • arm_lowmem_limit๋ฅผ memblock์˜ current_limit์œผ๋กœ ์„ค์ •ํ•œ๋‹ค. memblock์„ ์–ด๋–ป๊ฒŒ ์‚ฌ์šฉํ•˜๋‚˜?

arm_memblock_init(&meminfo, mdesc) memento๋Š” arm_add_memory์—์„œ bank ์„ ์–ธํ•  ๋•Œ ๋‚˜์™”๋‹ค. mdesc๋Š” setup_arch ๋‚ด์˜ ์ง€์—ญ๋ณ€์ˆ˜์ธ๋ฐ,

meminfo์˜ bank์˜ ์ •๋ณด(nr_banks, bank - start/size/highmem)๋ฅผ ๋Œ๋ฉด์„œ memblock_add๋กœ memblock์— ์ถ”๊ฐ€. (memblock ; logical memory block. region์ด๋ผ๋Š” ๋…ผ๋ฆฌ์  ๊ณต๊ฐ„) memblock.memory์— region์„ ์ถ”๊ฐ€ํ•œ๋‹ค.

base, end <- meminfo ์—์„œ ๊บผ๋‚ด์˜จ ์ •๋ณด๋“ค

memblock_add_region ์—์„œ ์ฒซ๋ฒˆ์งธ๋Š” counts the number of regions needed to accommodate the new data, ๋‘ ๋ฒˆ์งธ๋Š” ์‹ค์ œ ์ถ”๊ฐ€ (์ฒซ๋ฒˆ์งธ๋Š” ์ถ”๊ฐ€ํ•ด์•ผ ํ•  region์„ ์„ผ๋‹ค)

2013.02.16

Cortex a series PG MMU

level-2 PGD, PTE
level-3 PGD, PMD, PTE (LPAE)

arch/arm/include/asm/pgtable-2level.h hw : PGD (4096), PTE (256) linux : three level page table. 2๋‹จ๊ณ„ ํŽ˜์ด์ง€ ํ…Œ์ด๋ธ” ๊ตฌ์กฐ์—๋„ ๋งž๋Š”๋‹ค. ์›๋ž˜ ๋ฆฌ๋ˆ…์Šค 3๊ฐœ์ธ๋ฐ 2๊ฐœ(PGD, PTE)๋งŒ ์“ด๋‹ค.

linux ์—์„œ๋Š” first level์—์„œ 2048๊ฐœ, ๊ฐ๊ฐ 8bytes (second level์— ๋Œ€ํ•œ 2๊ฐœ์˜ pointer) second level์—์„œ 512๊ฐœ

์‹ค์ œ ํŽ˜์ด์ง€ ๋””๋ ‰ํ„ฐ๋ฆฌ์˜ ๊ฐ ์—”ํŠธ๋ฆฌ๋Š” 1MB๋กœ ๋Œ€์‘ํ•˜์ง€๋งŒ, ARM์šฉ ๋ฆฌ๋ˆ…์Šค ์ปค๋„์—์„œ๋Š” 2MB ๋‹จ์œ„๋กœ ์—”ํŠธ๋ฆฌ๋“ค์„ ๊ด€๋ฆฌํ•œ๋‹ค.

protection_map[] ์šฉ๋„? mem_types[MT_LOW_VECTORS]

prot_l1 ํ•˜๋“œ์›จ์–ด์šฉ pte๊ฐ’์„ ์ €์žฅํ•˜๋Š” ๊ณณ

build_mem_type_table <- ์ด๊ฑด ๋ญ”๊ฐ€?

mem_types์˜ ์šฉ๋„๋Š”?

mem_types์— ๋“ค์–ด๊ฐ€๋Š” ์ •๋ณด๋“ค์„ ๋งŒ๋“ค์–ด ์ฃผ๊ณ , kern_pgprot = user_pgprot = cp->pte // cache policy ๊ฐ™์€ ๊ฒƒ์œผ๋กœ ์„ค์ • kern_pgprot |= L_PTE_SHARED; mem_types์˜ ๊ฐ ๋ฉค๋ฒ„์˜ ์†์„ฑ์— ์ง€์ •ํ•ด์ค€๋‹ค. (mem_types[MT_LOW_VECTORS] ์ฒ˜๋Ÿผ)

208 static struct mem_type mem_types[] = {
209     [MT_DEVICE] = {       /* Strongly ordered / ARMv6 shared device */
210         .prot_pte   = PROT_PTE_DEVICE | L_PTE_MT_DEV_SHARED |
211                   L_PTE_SHARED,
212         .prot_l1    = PMD_TYPE_TABLE,
213         .prot_sect  = PROT_SECT_DEVICE | PMD_SECT_S,
214         .domain     = DOMAIN_IO,                                                                                                                                            
215     }, 

pgprot_user, pgprot_kernel?

prepare_page_table #define KERNEL_RAM_VADDR (PAGE_OFFSET + TEXT_OFFSET) ; 0x80000000 + 0x8000 .equ swapper_pg_dir, KERNEL_RAM_VADDR - PG_DIR_SIZE ; 0x80008000 - 0x4000

#define pgd_index(addr) ((addr) >> PGDIR_SHIFT) ; PGDIR_SHIFT๋Š” 21. 2MB๋‹จ์œ„์˜ ์ธ๋ฑ์Šค #define pgd_offset(mm, addr) ((mm)->pgd + pgd_index(addr)) ; pgd์˜ pgd_index๋ฅผ ๋”ํ•ด pgd์—์„œ์˜ entry ์œ„์น˜(์ฃผ์†Œ)๋ฅผ ๊ตฌํ•œ๋‹ค. #define pgd_offset_k(addr) pgd_offset(&init_mm, addr)

4level์ผ ๊ฒฝ์šฐ (LPAE?) PGD PUD PMD PTE

ํ˜„์žฌ 2level-page์˜ ๊ฒฝ์šฐ PUD, PMD๋Š” ํˆฌ๋ช…ํ•˜๊ฒŒ ์ฒ˜๋ฆฌ๋œ๋‹ค. 2MB

clean_pmd_entry tlb_op (TLB_DCLEAN, โ€ฆ)

B4.1.46 DCCMVAC, Data Cache Clean by MVA to PoC, VMSA MVA์—์„œ PoC๊นŒ์ง€ Modified Virtual Address (MVA), point of coherency (PoC)

MVA The term Modified Virtual Address (MVA) relates to the Fast Context Switch Extension (FCSE) mechanism, described in Appendix J Fast Context Switch Extension (FCSE). When the FCSE is absent or disabled, the MVA and VA have the same value.

PoC Point of Coherency

(์ž ๊น) ์ด ์ฐจ์ด๊ฐ€ ๋ญ์˜€์ง€? cache flush / clean invalidate

prepare_page_table 0 ~ MODULES_VADDR ~ PAGE_OFFSET (XIP_KERNEL์ด ์•„๋‹ˆ๋ฉด ์ด์–ด์„œโ€ฆ) memblock.memory.regions[0]์˜ ๋์ฃผ์†Œ ~ VMALLOC_START (VMALLOC_START๋Š” high_memory ๋‹ค์Œ 8MB align ์ฃผ์†Œ) ์ด ์˜์—ญ๋“ค์— ํ•ด๋‹นํ•˜๋Š” pmd๋ฅผ ํด๋ฆฌ์–ด(pmd pointer๋ฅผ 0์œผ๋กœ ์ดˆ๊ธฐํ™” ํ•ด์ฃผ๊ณ , clean) (์ž ๊น, memblock ์ฒซ๋ฒˆ์งธ region์— kernel์ด ์˜ฌ๋ผ๊ฐ€ ์žˆ๋Š” ๊ฑฐ์˜€๋‚˜?)

map_lowmem map_desc : page table mapping constructs...

2013.02.23

memblock.region mergeโ€ฆ

arch/arm/mm/proc-v7.S __v7_proc __v7_ca9mp_setup

arch/arm/mm/tlb-v7.S define_tlb_functions v7wbi, v7wbi_tlb_flags_up, flags_smp=v7wbi_tlb_flags_smp

#define v7wbi_tlb_flags_smp (TLB_WB | TLB_DCLEAN | TLB_BARRIER |
TLB_V7_UIS_FULL | TLB_V7_UIS_PAGE | TLB_V7_UIS_ASID) D0380000

Rn AND Operand2์—์„œ CPSR ํ”Œ๋ž˜๊ทธ ์—…๋ฐ์ดํŠธ ne, Not equal. Z==0

PTE_HWTABLE_OFF + PTE_HWTABLE_SIZE #define PTRS_PER_PTE 512 #define PTRS_PER_PMD 1 #define PTRS_PER_PGD 2048

#define PTE_HWTABLE_PTRS (PTRS_PER_PTE) #define PTE_HWTABLE_OFF (PTE_HWTABLE_PTRS * sizeof(pte_t)) #define PTE_HWTABLE_SIZE (PTRS_PER_PTE * sizeof(u32))

memblock_alloc_base 4096, 4096, 0 memblock_alloc_base_nid(size, align, max_addr, MAX_NUMNODES = 1); memblock_find_in_range_node(0, max_addr, size, align, nid) 0 4096 4096 1

phys_addr_t __init_memblock memblock_find_in_range_node(phys_addr_t start, phys_addr_t end, phys_addr_t size, phys_addr_t align, int nid) 0, 0=>memblock.current_limit, 4096, 4096, 1

                                 ์ง€์—ญ ๋ณ€์ˆ˜ this_start, this_end์˜ ์ฃผ์†Œ๋ฅผ ๋„ฃ๋Š”๋‹ค

for_each_free_mem_range_reverse(i, nid, &this_start, &this_end, NULL)

                       ?, 1,    0,      memblock.current_limit, NULL

__next_free_mem_range_rev(&i, nid, p_start, p_end, p_nid)

             i = (u64)ULLONG_MAX
                       &i, 1, &this_start, &this_end, NULL

void __init_memblock __next_free_mem_range_rev(u64 *idx, int nid, phys_addr_t *out_start, phys_addr_t *out_end, int *out_nid)

paging_init ์—์„œ create_mapping

(์ฐธ์„ 6์ธ)

2013.03.02

  • memblock dump๋ฅผ ์ผœ์„œ ์‹คํ–‰ํ•ด๋ณผ ๊ฒƒ

paging_init map_lowmem create_mapping <- 2MB ๋‹จ์œ„๋กœ loop alloc_init_pud <- pud๋Š” pgd ๋™์ผํ•˜๋ฏ€๋กœ next๋Š” ๋„˜์–ด์˜จ end์™€ ๋™์ผํ•˜๊ฒŒ ์ˆ˜ํ–‰๋œ๋‹ค. alloc_init_section <- end๋„ ์ •๋ ฌ๋˜์ง€ ์•Š์œผ๋ฉด pte๋ฅผ ์ƒ์„ฑํ•œ๋‹ค. ๊ทธ๋ ‡๋‹ค๋ฉด 2MB ๋ชจ๋‘ pte์ƒ์„ฑ. alloc_init_pte early_alloc early_alloc_aligned <- __va(pa)๋กœ ๋ฆฌํ„ดํ•จ memblock_alloc <- memblock_alloc์€ reserved์— ๋„ฃ์–ด์ค€๋‹ค. memblock_alloc_base __memblock_alloc_base memblock_alloc_base_nid memblock_find_in_range_node for_each_free_mem_range_reverse __next_free_mem_range_rev

mem->cnt : 1 rsv->cnt : 3

mi : memory.regions cnt-1 ri : reserved.regions cnt <- ri๋Š” ์ƒˆ๋กœ reserved๋กœ ๋„ฃ์–ด์ค„ ๋ถ€๋ถ„์ด๋‹ˆ๊นŒ cnt๋กœ ์žก์•˜๋‹ค.

kernel ์‹คํ–‰์ฝ”๋“œ ์˜์—ญ, initrd, swapper_pg_dir, device tree(NULL), chip๋งˆ๋‹ค ์‚ฌ์šฉํ•˜๋Š” ์˜์—ญ ๋“ฑ์„ memblock.reserved์— ๋„ฃ์Œ -> reserved์— ๋„ฃ์—ˆ๋‹ค๊ณ  memory์—์„œ ๋นผ์ง„ ์•Š์Œ

for_each_free_mem_range_reverse ์ดˆ๊ธฐ์กฐ๊ฑด : i = ULLONG_MAX, __next_free_mem_range_rev(&i, nid, p_start, p_end, p_nid); ์ข…๋ฃŒ์กฐ๊ฑด : i != ULLONG_MAX๊ฐ€ ๊ฑฐ์ง“์ด๋ฉดโ€ฆ

if (r_end <= m_start) ; m_start break; if (m_end > r_start) ; r_start๋Š” ์ด์ „ reserved region์˜ end ์ฃผ์†Œ์ด์ž, ์•ž์œผ๋กœ reserved๋กœ ๋„ฃ์„ ์˜์—ญ โ€ฆ

kernel.h์— #define clamp(val, min, max) โ€ฆ val์„ ์ตœ์†Œ๊ฐ’์„ min์œผ๋กœ, val์˜ ์ตœ๋Œ€๊ฐ’์„ max๋กœ

#define __round_mask(x, y) ((typeof(x))((y)-1)) #define round_up(x, y) ((((x)-1) | __round_mask(x, y))+1) #define round_down(x, y) ((x) & ~__round_mask(x, y)) round_down(x,y) <- x๋Š” alignํ•  ์ฃผ์†Œ, align์€ ๋‹จ์œ„ ์™œ round_down์ผ๊นŒ? ์ตœ์†Œ ์š”๊ตฌ ํฌ๊ธฐ๋ฅผ ์ถฉ์กฑ์‹œํ‚ค๋Š” align๋œ ๋ฉ”๋ชจ๋ฆฌ ์‹œ์ž‘ ์ฃผ์†Œ๋ฅผ

this_start, this_end ; ์‚ฌ์šฉ ๊ฐ€๋Šฅํ•œ ์˜์—ญ์„ ๋ฆฌํ„ด

Documentation/printk-formats.txt %pF <- __builtin_return_address ๊ฐ™์€โ€ฆ

arch/arm/include/asm/pgtable-2level.h ์•„์ง ๋‚ด๊ฐ€ ์ดํ•ด ๋ชปํ•œ ๋ถ€๋ถ„์ด ์™œ hwtable๊ณผ linux pt๋ฅผ ๋ถ„๋ฆฌํ•ด ๋‘” ๊ฒƒ์ธ๊ฐ€? ์‚ฌ์ด์ฆˆ๋Š” ์™œ 512์—ฌ์•ผ ํ•˜๋Š” ๊ฒƒ์ธ๊ฐ€?

Cortex-A series PG p.139 build_mem_type_table ํ•จ์ˆ˜์—์„œ t->prot_l1 |= PMD_DOMAIN(t->domain); ์„ ์„ธํŒ…ํ•ด์คŒ.

2013.03.09

tst <- bit and. ๊ฒ€์‚ฌ๋ฅผ ์›ํ•˜๋Š” ๋น„ํŠธ๋ฅผ ์ „๋‹ฌ XXXne <- ์ด๊ฒƒ์œผ๋กœ ๊ฒ€์‚ฌ

MOVS ๋ช…๋ น์„ ์‚ฌ์šฉํ•  ๋•Œ Rd๊ฐ€ R15์ธ ๊ฒฝ์šฐ์—๋Š” SPSR์˜ ๊ฐ’์ด CPSR๋กœ ๋ณต์‚ฌ๊ฐ€ ๋˜์–ด Exception ๋ณต๊ตฌ ๋ช…๋ น์— ์‚ฌ์šฉ๋˜๊ธฐ๋„ ํ•œ๋‹ต๋‹ˆ๋‹ค. (Exception ์ชฝ ๋ณด์‹œ๊ณ  ๋‚œ ํ›„์—” ์•„ํ•˜~ ํ•˜์‹ค ๊ฑฐ์—์š”. )

2013.03.16

teq <- xor

<- ์˜ค๋Š˜ ๋ณธ ์ฝ”๋“œ ์ดํ•ด ๋‹ค ๋ชปํ–ˆ์Œ. ldrex, strex ; STREX{cond} Rd, Rt, [Rn {, #offset}] ๋‹จ๋… ์ˆ˜ํ–‰์‹œ Rd๊ฐ€ 0. ์•„๋‹ˆ๋ฉด 1. http://en.wikipedia.org/wiki/Load-link/store-conditional http://www.iamroot.org/xe/66152

cmpxchg http://en.wikipedia.org/wiki/Compare-and-swap

TLS๊ฐ€ ๋ญ์ง€? Thread Local Storage Documentation/arm/kernel_user_helpers.txt

.rep (.repr์€ ํ™•์‹คํžˆ ๋ฐ˜๋ณต ์ฝ”๋“œ์ธ๋ฐโ€ฆ) .endr

arch/arm/mm/proc-v7.S .macro __v7_proc initfunc, mm_mmuflags = 0, io_mmuflags = 0, hwcaps = 0

arch/arm/mm/cache-v7.S .macro define_cache_functions name:req .long \name()_coherent_kern_range

์ด ํŒŒ์ผ ์œ„์ชฝ์— ๋‹ค ์จ ๋†“๊ณ , ๋งจ ์•„๋ž˜ define_cache_functions v7 ๋กœ macro๋ฅผ ํ˜ธ์ถœํ•˜๋Š”๊ตฐ.

sigreturn handler http://studyfoss.egloos.com/5182475

include/asm/assembler.h ์—์„œ #define USER(xโ€ฆ) ์—์„œ __ex_table์€ ๋ญ๋ฅผ ์ €์žฅํ•˜๋Š” ๊ณณ? 9001f๋Š” ํ˜ธ์ถœํ•œ arch/arm/mm/cache-v7.S ์— ์žˆ๋Š” label์ด๋‹ค.

DOMAIN ์ด์ œ ์—†์–ด์งˆ ์˜ˆ์ •์ด๋ผ๊ณ .. v7๋ถ€ํ„ฐ ์•ˆ ์“ด๋‹ค.

arm_lowmem_limit = bank->start + bank->size; high_memory = __va(arm_lowmem_limit - 1) + 1;

2013.03.23

devicemaps_init struct map_desc map map ๊ตฌ์กฐ์ฒด์˜ ์ •๋ณด๋ฅผ ์ฑ„์›Œ create_mapping ํ•จ์ˆ˜ ํ˜ธ์ถœ ; create_mapping ํ•จ์ˆ˜๋Š” page table์„ ์ฑ„์šฐ๋Š” ํ•จ์ˆ˜

2013.03.30

ct_ca9x4_io_desc

vm_area_add_early

vm_area_addr_early๋กœ ๊ฐ๊ฐ ์ถ”๊ฐ€

/* 2MB large area for motherboard's peripherals static mapping */ #define V2M_PERIPH 0xf8000000

/* Tile's peripherals static mappings should start here */ #define V2T_PERIPH 0xf8200000

fill_pmd_gaps

unified tlb instruction tlb, data tlb

Multiprocessor effects on TLB maintenance operations on page B3-1379.

arm_bootmem_init boot_pages - pfn์„ ๋น„ํŠธ๋งต์œผ๋กœ ํ‘œํ˜„ํ•˜๊ธฐ ์œ„ํ•ด ํ•„์š”ํ•œ ํŽ˜์ด์ง€์ˆ˜

contig_page_data์˜ ์˜๋ฏธ (contiguous)

paging_init devicemaps_init, kmap_init ์™„๋ฃŒ, bootmem_init ๋ณด๋Š” ์ค‘

(์ถ”๊ฐ€) mm/bootmem.c struct pglist_data __refdata contig_page_data = { โ€ฆ } #define __refdata __section(.ref.data) <- .ref.data๋กœ ์„น์…˜์ง€์ •. ref ์„น์…˜์€? # define __section(S) attribute ((section(#S)))

init_bootmem_node(pgdat, __phys_to_pfn(bitmap), start_pfn, end_pfn) pgdat ์ดˆ๊ธฐํ™” (bitmap ์œ„์น˜๋„ node_bootmem_map์— ์ฑ„์›Œ์คŒ)

2013.04.06

cpsid i <- ์˜ฌ ์ˆ˜ ์žˆ๋Š” bit๋Š” a, i, f 3๊ฐ€์ง€ ARM.pdf 1147 A, bit[8] Asynchronous abort mask bit. I, bit[7] IRQ mask bit. F, bit[6] FIQ mask bit.

bx lr ; subroutine์—์„œ ๋Œ์•„๊ฐˆ ๋•Œ ์“ฐ๋Š” ์ฝ”๋“œ.

test_and_set_bit bitops ; bit ์—ฐ์‚ฐ์„ ์ง์ ‘ assembly๋กœ ๊ตฌํ˜„ํ•ด ๋†“์•˜๋‹ค.

gcc -dM -E - < /dev/null

2013.04.13

enum zone_type <- config์— ๋”ฐ๋ผ zone ์ด ์ •ํ•ด์ง.

(์ž ๊น, memblock์˜ reserved ์žก์„ ๋•Œ memory ์˜์—ญ์—์„œ ๋น ์ง€๋‚˜?) min : meminfo ์˜ ์ฒซ pfn, max_low ๋งˆ์ง€๋ง‰ pfn pfn์˜ ์ˆ˜ํ–‰์‹œzone_size : max_low - min zhole_size : hole์˜ pfn. ์˜๋ฌธ ์ฃผ์„์„ ๋ณด๋ฉด holes = node_size - sum(bank_sizes)

pg_data_t *pgdat; pgdat = NODE_DATA(0); (bootmem์€ pgdat->bdata ๋งŒ ํ•ด์คฌ์Œ)

typedef struct pglist_data { โ€ฆ } pg_data_t; node_spanned_pages : ํ™€์„ ํฌํ•จํ•˜๋Š” ๋ฌผ๋ฆฌ page์˜ ์ „์ฒด ๊ฐœ์ˆ˜ (NUMA์ผ ๊ฒฝ์šฐ ๋‹ค ํ•ฉํ•ด์„œ.) node_present_pages : ํ™€์„ ๋บ€ ๋ฌผ๋ฆฌ pages์˜ ์ „์ฒด ๊ฐœ์ˆ˜

http://studyfoss.egloos.com/5226996 http://www.mjmwired.net/kernel/Documentation/kbuild/modules.txt

#define __ref __section(.ref.text) noinline #define __init_refok __ref

ref์— ์žˆ๋Š” ์„น์…˜์€ .init, .exit ์ฝ”๋“œ๋ฅผ ์ฐธ์กฐํ•ด๋„ modpost ๋‹จ๊ณ„์—์„œ ์—๋Ÿฌ๊ฐ€ ๋ฐœ์ƒํ•˜์ง€ ์•Š๋Š”๋‹ค.

static bool __section(.data.unlikely) __warned; .data.unlikely ์„น์…˜์— ๋ชจ์•„๋‘๋Š”๋ฐ,

unlikely(x) __builtin_expect(!!(x), 0) __builtin_return_address (level) -> ๋‚˜์ค‘์— ์‹ค์ œ ์–ด๋–ค ์ฝ”๋“œ์ธ์ง€ ํ™•์ธํ•ด ๋ณด์ž.

WARN_ON

preemption_disable preempt_count๋ฅผ ์ฆ๊ฐ€์‹œํ‚จ๋‹ค. 0์ด๋ฉด preemptable <- ์Šค์ผ€์ค„๋Ÿฌ์—์„œ preemptable์„ ๊ฒ€์‚ฌํ•  ๋•Œ ์ด๊ฑธ ๋ณด๋‚˜?

#define preempt_count() (current_thread_info()->preempt_count)

http://studyfoss.egloos.com/5682616 https://lwn.net/Articles/508991/ http://www.kernel.org/doc/Documentation/volatile-considered-harmful.txt #define ACCESS_ONCE(x) (*(volatile typeof(x) *)&(x))

unwind_backtrace () ๋ณด๋Š” ์ค‘ ๋๋ƒˆ์Œ

Module.symvers

2013.04.20

include/linux/mm_types.h struct page { โ€ฆ }

2013.04.27

arm_bootmem_free free_area_init_node alloc_node_mem_map ์™„๋ฃŒ <โ€” mem_map ํ• ๋‹น free_area_init_core ๋ถ„์„ ์ค‘

20130427: paging_init: ... free_area_init_core ๋ถ„์„ ์ค‘

2013.05.04

pageblock??? ์™œ PB_migrate๋งŒ ๋“ค์–ด ์žˆ๋‚˜? pageblock์ด ํ•„์š”ํ•œ ๊ฒŒ ๋น„ํŠธ 3๊ฐœ๊ฐ€ ํ•„์š”.

page movable๋„ ์ข€ ๋ด์•ผ์ง€. Documentation/memory-hotplug.txt Documentation/vm/page_migration

pfn์ด pageblock_nr_pages ๋‹จ์œ„๋กœ ๋–จ์–ด์ง€๋Š” ๋†ˆ์— ๋Œ€ํ•ด์„œ๋งŒ set_pageblock_migratetype (page, MIGRATE_MOVABLE)๋กœ ํ˜ธ์ถœ.

set_pageblock_flags_group์„ ํ˜ธ์ถœ
  PB_migrate ~ PB_migrate_end (0~2)์‚ฌ์ด
    bitidx (page block nr์— ํ•ด๋‹นํ•˜๋Š” index) + start_bitidx๋กœ ๋น„ํŠธ์œ„์น˜๋ฅผ ์ฐพ์•„
    flag์ธ ๋น„ํŠธ๋Š” set, ๊ทธ ์™ธ๋Š” clear. ์—ฌ๊ธฐ์„œ flags๋Š” MIGRATE_MOVABLE์ด๋‹ค.

2013.05.11

__flush_dcache_page __cpuc_flush_dcache_area == v7_flush_kern_dcache_area

define_cache_functions v7

clean, flush [์ถœ์ฒ˜] http://www.iamroot.org/xe/?mid=Kernel_4_ARM11&page=3&listStyle=gallery&document_srl=5705

[Flush] : -------------------------------------------------------------

  • ์ €์žฅ๋˜์–ด ์žˆ๋Š” ์–ด๋–ค ๋ฐ์ดํ„ฐ๋ฅผ 0์œผ๋กœ ํด๋ฆฌ์–ด ์‹œํ‚จ๋‹ค.
  • ์บ์‹œ๋ผ์ธ ์•ˆ์— ์žˆ๋Š” ์œ ํšจ ๋น„ํŠธ๋ฅผ 0 ์œผ๋กœ ํด๋ฆฌ์–ด ํ•ด์ค€๋‹ค.
  • <์ฃผ์˜์ > . Write-Back ๋ฐฉ์‹์„ ์‚ฌ์šฉํ•  ๊ฒฝ์šฐ, Flush ์ „์— Clean ์„ ํ•ด์•ผํ•จ (์™œ) ๋ฐ์ดํƒ€ ์†์‹ค์ด ๋ฐœ์ƒํ•œ๋‹ค.

[Clear] :-------------------------------------------------------------- ์บ์‹œ๋กœ ๋ถ€ํ„ฐ์˜ ๋”ํ‹ฐ ์บ์‹œ ๋ผ์ธ์˜ ๊ฐ’์„ ๊ฐ•์ œ๋กœ ์ฃผ ๋ฉ”๋กœ๋ฆฌ์— ์“ด ๋‹ค์Œ ์บ์‹œ ๋ผ์ธ ์•ˆ์— ์žˆ๋Š” ๋”ํ‹ฐ ๋น„ํŠธ๋ฅผ 0์œผ๋กœ ์ดˆ๊ธฐํ™”ํ•œ๋‹ค.

  • ์ฆ‰, ์ฃผ๋ฉ”๋ชจ๋ฆฌ์— ์บ์‹œ์˜ ๋ฐ์ดํ„ฐ๋ฅผ ์ €์žฅํ•˜๋Š” ์ž‘์—…

[Flush ,Clear ๋ชฉ์  ] :------------------------------------------------- ์‹œ์Šคํ…œ์˜ ๋ฉ”๋ชจ๋ฆฌ ์„ค์ •์„ ๋ณ€๊ฒฝํ•  ํ•  ๊ฒฝ์šฐ(์•„๋ž˜์˜ ๊ฒฝ์šฐ) Flush๋‚˜ Clean์„ ํ•ด์ค€๋‹ค.

  • ์ ‘๊ทผ ๊ถŒํ•œ
  • ์บ์‹œ, ๋ฒ„ํผ ์ •์ฑ… ๋ณ€๊ฒฝ
  • ๊ฐ€์ƒ ์ฃผ์†Œ ๋ฆฌ๋งคํ•‘ํ•˜๋Š” ๋™์ž‘

[๋ถ€์—ฐ ์„ค๋ช…] -----------------------------------------------------------

  • ์œ ํšจ๋น„ํŠธ : . ์บ์‹œ๋ผ์ธ์ด ํ™œ์„ฑํ™” ๋˜์–ด ์žˆ์Œ์„ ํ‘œ์‹œ , . ์ฃผ๋ฉ”๋ชจ๋ฆฌ์—์„œ ์ฒ˜์Œ ์ฝ์–ด์˜จ ๋ฐ์ดํ„ฐ๋ฅผ ํฌํ•จ ํ•˜๊ณ  ์žˆ์Œ . ํ˜„์žฌ ํ”„๋กœ์„ธ์„œ ์ฝ”์–ด๊ฐ€ ์‚ฌ์šฉ ๊ฐ€๋Šฅํ•˜๋‹ค๋Š” ์˜๋ฏธ

  • ๋”ํ‹ฐ ๋น„ํŠธ : . ์บ์‹œ ๋ผ์ธ์ด ์ฃผ๋ฉ”๋ชจ๋ฆฌ ์•ˆ์— ์ €์žฅ๋œ ๊ฐ’๊ณผ ๋‹ค๋ฅธ ๋ฐ์ดํ„ฐ๋ฅผ ํฌํ•จํ•˜๋Š”์ง€ ๊ฒฐ์ •ํ•œ๋‹ค. . '1' ์ด๋ฉด ์ฃผ๋ฉ”๋ชจ๋ฆฌ ์™€ ์บ์‹œ ๋ฉ”๋ชจ๋ฆฌ ๋‚ด์šฉ์ด ๋‹ค๋ฆ„

[๊ฒฐ๋ก ] ------------------------------------------------------------- 2008๋…„ 1์›”12์ผ์— ์˜๋ฌธ์ ์€ ์•„๋ž˜์™€ ๊ฐ™์ด ๊ฐ„๋‹จํžˆ ๊ฒฐ๋ก  ๋‚ด๋ฉด ๋˜๊ฒ ๋„ค์š”.

  • FLUSH : ๊ทธ๋ƒฅ ์ง€์›€
  • CLEAN : ์ฃผ๋ฉ”๋ชจ๋ฆฌ์— ์˜ฎ๊ธฐ๊ณ  ๋‚œ ํ›„ , ์ง€์›€

vivt vipt

cache color http://www.spinics.net/lists/arm-kernel/msg19798.html

pgprot_kernel = __pgprot(L_PTE_PRESENT | L_PTE_YOUNG | L_PTE_DIRTY | kern_pgprot);

XN

2013.05.18

paging_init __flush_dcache_page flush_pfn_alias set_top_pte

MCR , , , , {, } MCRR , , , ,

/** 20120818 ๋ถ€ํŠธ๋กœ๋” ์ •๋ณด(ID, ATAGS) ์ €์žฅ**/ mov r7, r1 @ save architecture ID โ€ฆ mov r3, r7

decompress_kernel() <- arch_id machine_arch_type์€ CONFIG_MACH_XXX ์—†์„ ๊ฒฝ์šฐ decompress_kernel์—์„œ ๋„˜์–ด์˜จ ๊ฐ’

mdesc = setup_machine_tags(machine_arch_type); for_each_machine_desc(p) <- v2m.c

arch/arm/tools/mach-types

machine_is_xxx CONFIG_xxxx MACH_TYPE_xxx number

vexpress MACH_VEXPRESS VEXPRESS 2272

arch_write_lock

iotable_init vmlist ์ดˆ๊ธฐํ™” vm_area_add_early๋กœ vmlist์— ๋“ฑ๋ก. ct_ca9x4_io_desc์— ์žˆ๋Š” ๊ฒƒ.

scu controller, .handle_irq ๋“ฑ๋กํ•œ gic_handle_irq

qemu vexpress ์ฐธ๊ณ  https://developer.mozilla.org/en-US/docs/Developer_Guide/Virtual_ARM_Linux_environment https://wiki.linaro.org/PeterMaydell/QemuVersatileExpress https://wiki.linaro.org/Boards/Vexpress http://www.tiedyedfreaks.org/eric/src/linux/Documentation/devicetree/bindings/arm/vexpress.txt

(์ถ”๊ฐ€) iomem_resource resource ๊ตฌ์กฐ์ฒด. io memory ๋ฆฌ์†Œ์Šค๋“ค์˜ ํŠธ๋ฆฌ ์ƒ ๋ฃจํŠธ. http://blog.daum.net/english_100/80

request_standard_resources request_resource(&iomem_resource, res);

2013.06.01

clocks_calc_mult_shift

for (sft = 32; sft > 0; sft--) {
    tmp = (u64) to << sft;
    tmp += from / 2;
    do_div(tmp, from);
    if ((tmp >> sftacc) == 0)
        break;
}

from 24MHz to 1000000000 (0x3b9aca00)

tmp 1000000000 0x00000000 += 12MHz

static DEFINE_TIMER(sched_clock_timer, sched_clock_poll, 0, 0);

machine_desc <- v2m.c

MACHINE_START(VEXPRESS, "ARM-Versatile Express") .nr = MACH_TYPE_VEXPRESS .name = "ARM-Versatile Express"

struct clock_data {

};

/* calculate the mult/shift to convert counter ticks to ns. */ clocks_calc_mult_shift(&cd.mult, &cd.shift, rate, NSEC_PER_SEC, 0);

epoch๋ž‘ jiffies์˜ ์ƒ๊ด€๊ด€๊ณ„๋Š”? jiffies : timer interrupt๊ฐ€ ๋ฐœ์ƒํ•  ๋•Œ๋งˆ๋‹ค 1์”ฉ ์ฆ๊ฐ€ epoch : ํ•œ ์„ธ๋Œ€? time quantum ์†Œ๋น„์— ๋Œ€ํ•œ? ์‹ ๊ธฐ์› ์ดํ›„์˜ โ€ฆ? 1970-01-01 ์ดํ›„โ€ฆ?

HZ -> 1์ดˆ์— 100๋ฒˆ, 10ms resolution timer๋ฅผ ์‚ฌ์šฉํ•œ๋‹ค. sched_clock_timer.data = msecs_to_jiffies (w - (w/10)); (m + (MSEC_PER_SEC / HZ) - 1) / (MSEC_PER_SEC / HZ) ____ (m + 10 - 1) / 10

m / (1000 / 100) = m / 1000 * 100 = ์ดˆ * (1/์ฃผ๊ธฐ) ๋ช‡ tick์ด๋ƒ...


timers/highres.txt ๋Š” ๋ญ์•ผ?

๋ฆฌ๋ˆ…์Šค 64byte, cpu๋Š” 32byte๋ผ overflow ๋ฐœ์ƒํ•ด์„œ ์ดˆ๊ธฐํ™” ๋ ํ…๋ฐ, ๊ทธ ์ฆˆ์Œ์— ๋‹ค์‹œ ์ดˆ๊ธฐํ™”๋ฅผ ์‹œ์ผœ์ค˜์•ผ ํ•˜๋Š”๋ฐ, ...

2013.06.08

20130601: setup_arch: ์ข…๋ฃŒ(์ •๋ฆฌ ํ•„์š”)

kernel/cpu.c ์ดˆ๊ธฐํ™”๋ฅผ ์–ด๋””์„œ ํ•ด์ฃผ๋Š” ๊ฑฐ์ง€? cpu_possible_mask cpu_online_mask cpu_present_mask

2013.06.15

pcpu_alloc_alloc_info - allocate percpu allocation info ์ง€๋‚œ ์ฃผ์— ๋งˆ์ง€๋ง‰์œผ๋กœ ๋ณธ ํ•จ์ˆ˜

struct pcpu_alloc_info ์ „์ฒด ์ •๋ณด๋ฅผ ๊ฐ€์ง€๊ณ  ์žˆ๋Š” ๋…€์„์ธ๊ฐ€? ๋งˆ์ง€๋ง‰์— groups[]๊ฐ€ ํฌ์ธํ„ฐ๋กœ ์กด์žฌํ•˜๊ณ ,

struct pcpu_group_info unsigned int cpu_map; / unit->cpu map, empty entries contain NR_CPUS */

pcpu_alloc_alloc_info

upa /* units_per_alloc */ ํ• ๋‹น ๋‹จ์œ„.

nr_units๋Š” ๊ฐ ๊ทธ๋ฃน์˜ group_cnt์˜ ํ•ฉ.

cpu_map <- ๊ฐ ๊ทธ๋ฃน์€ ์ž์‹  ๊ทธ๋ฃน์— ํ•ด๋‹นํ•˜๋Š” ์‹œ์ž‘์œ„์น˜๋ฅผ ๊ฐ–๊ณ  ์žˆ๋‹ค.

๊ทธ๋ฃน์„ ์ˆœํšŒํ•˜๋ฉฐ gi->base_offset = unit * ai->unit_size; // gi->base_offset์„ ๋Œ€์ž… (unit_size * unit ๊ฐœ์ˆ˜) unit += gi->nr_units; // ๋‹ค์Œ loop์„ ์œ„ํ•ด ๊ทธ๋ฃน๋“ค์˜ nr_units๋ฅผ ๋ˆ„์ 

group_map[0,1,2,3] = 0; // ๊ฐ cpu๊ฐ€ ๋ช‡ ๋ฒˆ group์ด๋ƒ๋ฅผ ๋‚˜ํƒ€๋‚ด๋Š” map

gi->cpu_map[gi->nr_units++] = cpu; // ๊ฐ group์˜ cpu_map์€ ๋ช‡ ๋ฒˆ cpu๋ฅผ ๊ฐ–๊ณ  ์žˆ๋Š”์ง€ ๋‚˜ํƒ€๋ƒ„

pcpu_embed_first_chunk pcpu_page_first_chunk

gi->nr_units /* aligned # of units */

min_unit_size = max_t(size_t, size_sum, PCPU_MIN_UNIT_SIZE); alloc_size = roundup(min_unit_size, atom_size);

// alloc size์— unit์ด ๋ช‡ ๊ฐœ ๋“ค์–ด๊ฐ€๋А๋ƒ upa = alloc_size / min_unit_size; ai->unit_size = alloc_size / upa;

  • areas๋Š” ๊ทธ๋ฃน ๋‹จ์œ„๋กœ ํ• ๋‹น ๋ฐ›์€ ํฌ๊ธฐ. ๋‚˜์ค‘์— unit ๋‹จ์œ„๋กœ ์œ„์น˜๋ฅผ ๊ฐ€๋ฆฌํ‚จ๋‹ค.
  • base = min(ptr, base); loop์„ ๋Œ๋ฉด์„œ group ๋‹จ์œ„๋กœ ๋ฐ›์€ ํฌ๊ธฐ ์ค‘ ๊ฐ€์žฅ ์ž‘์€ ํฌ๊ธฐ ai->groups[group].base_offset = areas[group] - base

ai->unit_size๋Š” 32KB

loop์„ ๋Œ๋ฉด์„œ static ์˜์—ญ์„ ๊ทธ๋ฃน์˜ ๊ฐ cpu์— ํ•ด๋‹นํ•˜๋Š” ์˜์—ญ์— ๋ณต์‚ฌํ•ด์ค€๋‹ค.

include/linux/init.h #define __initdata __section(.init.data)

  • alloc_bootmem๊ณผ alloc_bootmem_nopanic์˜ ์ฐจ์ด๋Š”?

kernel/smp.c int nr_cpu_ids __read_mostly = NR_CPUS; nr_cpu_ids๋Š” ์‹ค์ œ cpu ๊ฐœ์ˆ˜๊ฐ€ ๋” ์ž‘์œผ๋ฉด ๊ทธ๊ฒŒ ๋“ค์–ด๊ฐ€๊ณ โ€ฆ

ai->groups[group].base_offset = areas[group] - base; // areas๋Š” group๋‹น ํ• ๋‹น๋ฐ›์€ ํฌ๊ธฐ base์—์„œ๋ถ€ํ„ฐ ์–ผ๋งˆ๋‚˜ ๋–จ์–ด์ ธ ์žˆ๋А๋ƒ offset์œผ๋กœ ๊ตฌํ•ด์™”์Œ.

cpu, unit์ด ์ฐจ์ด? unit์€ [0 1 2 3] [4 5 6 7] ์ˆœ์„œ๋Œ€๋กœ ์˜ฌ๋ผ๊ฐ€๋Š” ๋…€์„. cpu๋Š” ๊ทธ๋ ‡์ง€ ์•Š์„ ์ˆ˜๋„ ์žˆ๋Š” ๋…€์„. unit_map[cpu]

base_offset

arch/arm/kernel/vmlinux.lds.S PERCPU_SECTION(L1_CACHE_BYTES)

arch/arm/kernel/vmlinux.lds

<linux/percpu.h> <asm-generic/percpu.h>

start_kernel()์—์„œ setup_per_cpu_areas() pcpu_embed_first_chunk(reserved_size, dyn_size, atom_size) /* PERCPU_MODULE_RESERVE = 8k, PERCPU_DYNAMIC_RESERVE = 12k, PAGE_SIZE = 4k) */

    pcpu_build_alloc_info(reserved_size, dyn_size, atom_size)
        ai = pcpu_alloc_alloc_info()
        /* ai์˜ ๋ฉค๋ฒ„๋ฅผ ์ฑ„์šด๋‹ค. */

        for (0 ~ nr_groups) /* ai->groups[group] ์„ ์ˆœํšŒํ•˜๋ฉฐ ๊ฐ gi๋ฅผ ์ฑ„์šด๋‹ค. */
    size_sum = ai->static_size + ai->reserved_size + ai->dyn_size;
    areas_size = PFN_ALIGN(ai->nr_groups * sizeof(void *));
    areas = alloc_bootmem_nopanic(areas_size);

    /* allocate, copy and determine base address */
for (0 ~ nr_groups)
        
    /*
     * Copy data and free unused parts.  This should happen after all
     * allocations are complete; otherwise, we may end up with
     * overlapping groups.
     */
for (0 ~ nr_groups)

    /* base address is now known, determine group base offsets */
for (0 ~ nr_groups)

    rc = pcpu_setup_first_chunk(ai, base);  /* ์•„๋ž˜์— ์ •๋ฆฌ */

delta = (unsigned long)pcpu_base_addr - (unsigned long)__per_cpu_start;
for_each_possible_cpu(cpu)
    __per_cpu_offset[cpu] = delta + pcpu_unit_offsets[cpu];
    /* ์—ฌ๊ธฐ์„œ per_cpu_offset(x)์œผ๋กœ ์ฐพ์•„ ์“ธ ๊ฐ’์„ ์„ค์ •ํ•œ๋‹ค. */

/*

  • percpu first chunk๋ฅผ ์„ค์ •ํ•œ๋‹ค.

  • (first chunk: kernel์—์„œ ์‚ฌ์šฉํ•˜๋Š” static per-cpu๋ฅผ ํ‘œํ˜„ํ•˜๋Š” chunk)

  • chunk ์ž๋ฃŒ๊ตฌ์กฐ๋ฅผ ํ• ๋‹นํ•˜๊ณ  ์ดˆ๊ธฐํ™”. */ pcpu_setup_first_chunk(ai, base)

    pcpu_nr_groups = ai->nr_groups; /* ์ขŒ๋ณ€์€ ์ „์—ญ ๋ณ€์ˆ˜, ์šฐ๋ณ€์€ alloc_bootmem์œผ๋กœ ํ• ๋‹น ๋ฐ›์€ ๋ฉ”๋ชจ๋ฆฌ ๊ณต๊ฐ„์ด๋ฉฐ, ์ดˆ๊ธฐํ™”๋จ */ pcpu_group_offsets = group_offsets; pcpu_group_sizes = group_sizes; pcpu_unit_map = unit_map; pcpu_unit_offsets = unit_off;

pcpu_setup_first_chunk ์—์„œ schunk์™€ dchunk๊ฐ€ ๊ฐ๊ฐ ๋งŒ๋“ค์–ด ์ง€๊ณ , dchunk๊ฐ€ pcpu_first_chunk๊ฐ€ ๋œ๋‹ค. ๊ทธ๋ฆฌ๊ณ  pcpu_chunk_relocate(pcpu_first_chunk, -1);๋ฅผ ํ˜ธ์ถœํ•ด size์— ๋งž๋Š” slot์œผ๋กœ chunk๋ฅผ ์ด๋™์‹œํ‚จ๋‹ค.

๊ทธ๋Ÿผ schunk๋Š” ์–ด๋””์— ์œ„์น˜ํ•˜๋‚˜?

struct pcpu_chunk { struct list_head list; /* linked to pcpu_slot lists / int free_size; / free bytes in the chunk / int contig_hint; / max contiguous size hint */ void base_addr; / base address of this chunk / int map_used; / # of map entries used / int map_alloc; / # of map entries allocated */ int map; / allocation map */ void data; / chunk data / bool immutable; / no [de]population allowed / unsigned long populated[]; / populated bitmap */ };

์‚ฌ์šฉ ์ค‘์ธ ๊ณต๊ฐ„์€ ์Œ์ˆ˜, ์—ฌ์œ ๊ณต๊ฐ„์€ ์–‘์ˆ˜๋กœ ํ‘œ์‹œ map[0] = -1024; map[1] = 64; map[2] = -128; โ€ฆ

์–ด๋–ค ํฌ์ธํ„ฐ ์ค‘์—์„œ ์–ด๋–ค cpu์— ํ•ด๋‹นํ•˜๋Š” ๋ณ€์ˆ˜์˜ ์œ„์น˜ #define per_cpu_ptr(ptr, cpu) SHIFT_PERCPU_PTR((ptr), per_cpu_offset((cpu)))

#define SHIFT_PERCPU_PTR(__p, __offset) ({
__verify_pcpu_ptr((__p));
RELOC_HIDE((typeof(*(__p)) __kernel __force )(__p), (__offset));
/
๊ฒฐ๊ตญ ptr + off */ })

์ฃผ์•ˆ์ 

  • percpu ํ•„์š”์„ฑ
  • ์šฉ์–ด์˜ ์ •์˜ chunk - unit (???) DFL ๊ฐœ์ˆ˜์˜ ๋ฐฐ์ˆ˜๋งŒํผ ํ• ๋‹น๋œ ๋ฉ”๋ชจ๋ฆฌ ๊ณต๊ฐ„์— ๋Œ€ํ•œ descriptor group - cpu (???)
  • percpu ๋ณ€์ˆ˜ ์‚ฌ์šฉ
    1. ์„ ์–ธ : ๋™์ , ์ •์ 
    2. ์‚ฌ์šฉ

pcpu_alloc ; ๋™์  ํ• ๋‹น

pcpu_chuck_slot ; chunk์˜ free_size ๊ธฐ์ค€์œผ๋กœ slot์˜ index๋ฅผ ๋ฝ‘์•„์˜จ๋‹ค. (chunk์™€ slot์˜ ๊ด€๊ณ„๋Š”?)

/*

  • percpu first chunk๋ฅผ ์„ค์ •ํ•œ๋‹ค.
  • (first chunk: kernel์—์„œ ์‚ฌ์šฉํ•˜๋Š” static per-cpu๋ฅผ ํ‘œํ˜„ํ•˜๋Š” chunk)
  • chunk ์ž๋ฃŒ๊ตฌ์กฐ๋ฅผ ํ• ๋‹นํ•˜๊ณ  ์ดˆ๊ธฐํ™”. */ pcpu_setup_first_chunk

first chunk์„ ์ œ์™ธํ•œ ๋‚˜๋จธ์ง€ chunk๋“ค์€ ์ดํ›„ vmalloc ์ฃผ์†Œ ๊ณต๊ฐ„์— ๋ฐฐ์น˜๋˜๋Š”๋ฐ

percpu

2.6.30 ๋ฒ„์ „์—์„œ๋ถ€ํ„ฐ ๋™์  percpu ์˜์—ญ์˜ ํ• ๋‹น๊ณผ ์ •์  percpu ์˜์—ญ์˜ ํ• ๋‹น ๋ฐฉ์‹์ด ํ†ตํ•ฉ๋˜์–ด ๋™์ผํ•œ ์ธํ„ฐํŽ˜์ด์Šค๋ฅผ ํ†ตํ•ด ์ด์šฉํ•  ์ˆ˜ ์žˆ๊ฒŒ ๋˜์—ˆ๋‹ค. percpu ์˜์—ญ์€ ๋‚ด๋ถ€์ ์œผ๋กœ chunk๋ผ๋Š” ๋‹จ์œ„๋กœ ๊ด€๋ฆฌ๋˜๋ฉฐ <- ์ •๋Ÿ‰์ ์ธ๊ฐ€? ๊ฐ chunk๋Š” ํ•˜๋‚˜์˜ CPU๋งˆ๋‹ค ๋ถ€์—ฌ๋œ unit์œผ๋กœ ์ด๋ฃจ์–ด์ง„๋‹ค. ํ•˜์ง€๋งŒ NUMA ์‹œ์Šคํ…œ์˜ ๊ฒฝ์šฐ ๊ฐ CPU๋Š” ์„œ๋กœ ๋‹ค๋ฅธ ๋…ธ๋“œ์— ์†ํ•  ์ˆ˜ ์žˆ์œผ๋ฏ€๋กœ ๊ฐ™์€ ๋…ธ๋“œ์— ์†ํ•œ CPU (unit)๋“ค์€ ๋‹ค์‹œ group์œผ๋กœ ๋ฌถ์ด๊ฒŒ ๋œ๋‹ค.

first chunk๊ฐ€ ํ• ๋‹น๋˜๊ณ  ๋‚˜๋ฉด pcpu_setup_first_chunk() ํ•จ์ˆ˜๊ฐ€ ํ˜ธ์ถœ๋˜์–ด ๋‹ค์Œ๊ณผ ๊ฐ™์€ percpu์— ๊ด€๋ จ๋œ ์ •๋ณด๋“ค์„ ๋ชจ๋‘ ์ €์žฅํ•ด ๋‘๊ณ  ์ด ํ›„์˜ ํ• ๋‹น ์‹œ์— ์ฐธ์กฐํ•˜๋„๋ก ํ•œ๋‹ค. pcpu_base_addr: first chunk๊ฐ€ ํ• ๋‹น๋œ ๊ฐ€์ƒ ์ฃผ์†Œ pcpu_nr_groups: percpu group์˜ ์ˆ˜ (NUMA node์˜ ์ˆ˜) pcpu_group_offsets: ๊ฐ group์˜ ์‹œ์ž‘ offset์„ ์ €์žฅํ•œ ํ…Œ์ด๋ธ” pcpu_group_sizes: ๊ฐ group์˜ ํฌ๊ธฐ๋ฅผ ์ €์žฅํ•œ ํ…Œ์ด๋ธ” pcpu_nr_units: percpu unit์˜ ์ˆ˜ (CPU์˜ ์ˆ˜) pcpu_unit_map: cpu ๋ฒˆํ˜ธ๋ฅผ ํ†ตํ•ด unit ๋ฒˆํ˜ธ๋ฅผ ์•Œ์•„๋‚ด๊ธฐ ์œ„ํ•œ ํ…Œ์ด๋ธ” pcpu_unit_offsets: ๊ฐ unit์˜ ์‹œ์ž‘ offset์„ ์ €์žฅํ•œ ํ…Œ์ด๋ธ” pcpu_unit_size: unit์˜ ํฌ๊ธฐ pcpu_atom_size: page ํ• ๋‹น์„ ์œ„ํ•œ ์ตœ์†Œ ๋‹จ์œ„ pcpu_nr_slots: chunk ๊ด€๋ฆฌ๋ฅผ ์œ„ํ•œ slot์˜ ์ˆ˜

2013.06.22

percpu

2013.06.29

start_kernel setup_per_cpu_areas ์™„๋ฃŒ build_all_zonelists nr_free_pagecache_pages

2013.07.06

start_kernel build_all_zonelists nr_free_pagecache_pages

stop_machine ๋ณด๋Š” ์ค‘.

get_online_cpus might_sleep might_resched

CONFIG_PREEMPT_VOLUNTARY

CONFIG_PREEMPT_NONE <- KConfig๋ฅผ ๋ณด๋ฉด default๋กœ ๋˜์–ด ์žˆ์Œ. ์™œ? 2.6๋ถ€ํ„ฐ ์„ ์ ํ˜• ์ปค๋„์„ ์ง€์›ํ•œ๋‹ค๊ณ  ๋“ค์—ˆ๋Š”๋ฐ?

clrex์˜ ์‚ฌ์šฉ.

mutex_lock ๋ณด๋Š” ์ค‘

  • rcu_lock์€ ๋‚ ์งœ๋ฅผ ์žก์•„ ๋ณด์ž.

mutex์˜ owner owner_running์—์„œ lock->owner ๊ฒ€์‚ฌ์™€ return owner->on_cpu ์‚ฌ์ด์— barrier() ํ•˜๋Š” ์ด์œ ??? task_struct์˜ on_cpu์˜ ์˜๋ฏธ...

resched_task set_tsk_need_resched set_tsk_thread_flag(tsk,TIF_NEED_RESCHED);

scheduling feature true/false๋กœ ์ง€์ • kernel/sched/features.h

* __schedule
	context_switch
		prepare_task_switch
		switch_mm
		switch_to
		finish_task_switch

* __schedule ์„ ํ˜ธ์ถœํ•˜๋Š” ํ•จ์ˆ˜๋“ค
	schedule
	preempt_schedule
	preempt_schedule_irq
	__cond_resched	<- _cond_resched์—์„œ should_resched()์ผ ๋•Œ. <- cond_resched <- 

2013.07.06

mutex_lock ๋ถ„์„ ์ค‘. mutex_lock์„ ๋ฐ”๋กœ ํš๋“ํ–ˆ์„ ๊ฒฝ์šฐ atomic_cmpxchg owner๊ฐ€ ์žˆ์–ด spin_on_owner๋กœ ๋Œ€๊ธฐํ•  ๊ฒฝ์šฐ

2013.07.13

mutex ๋ถ„์„ ์ค‘. owner๊ฐ€ ๋ฐ”๋€Œ์–ด wait list๋กœ ๋“ค์–ด๊ฐ€์•ผ ํ•  ๊ฒฝ์šฐ wait_list์— ์ถ”๊ฐ€. atomic_xchg

TASK_INTERRUPTIBLE, TASK_UNINTERRUPTIBLE, TASK_KILLABLE โ€ฆ faskpath๋กœ mutex_lock_common์„ ํ˜ธ์ถœํ•  ๊ฒฝ์šฐ TASK_UNINTERRUPTIBLE๋กœ ํ˜ธ์ถœํ•œ๋‹ค. http://www.test104.com/kr/tech/3844.html

schedule ํ•จ์ˆ˜๋Š” ๋‚˜์ค‘์—โ€ฆ

preempt_enable()
	: preempt_disable์— ๋Œ€์‘๋˜์–ด ์„ ์  ๊ฐ€๋Šฅ ์ƒํƒœ๋กœ ๋งŒ๋“ ๋‹ค.

	preempt_enable_no_resched();
	barrier();
	preempt_check_resched();	// ์„ ์ ๋ถˆ๊ฐ€ ์ƒํƒœ์—์„œ๋Š” ๋ฐ”๋กœ ๋ฆฌํ„ด๋œ๋‹ค.

might_sleep
	: "explicit preemption points"๋ฅผ ๋” ๋งŽ์ด ๋‘์–ด latency๋ฅผ ์ค„์ด๊ณ ์ž ํ•  ๋•Œ ์‚ฌ์šฉ.
	  ์‹œ๊ฐ„์ด ๋งŽ์ด ๊ฑธ๋ฆฌ๋Š” ์ž‘์—… ์ „์— ์ž๋ฐœ์ ์œผ๋กœ ์Šค์ผ€์ฅด๋ง์„ ์‹คํ–‰

	# define might_sleep() do { might_resched(); } while (0)
	# define might_resched() _cond_resched()

need_resched
	: thread_info์—์„œ resched flag๋ฅผ ๊ฒ€์‚ฌํ•ด scheduling์ด ํ•„์š”ํ•œ์ง€ ๊ฒ€์‚ฌํ•˜๋Š” ํ•จ์ˆ˜

	return unlikely(test_thread_flag(TIF_NEED_RESCHED));

should_resched
	: ํ˜„์žฌ task๊ฐ€ ์Šค์ผ€์ฅด๋ง ๋˜์–ด์•ผ ํ•˜๋ฉฐ ์„ ์  ์ค‘์ด ์•„๋‹Œ์ง€ ๊ฒ€์‚ฌ.

	return need_resched() && !(preempt_count() & PREEMPT_ACTIVE);

__cond_resched
	: ํ˜„์žฌ task๊ฐ€ ์„ ์  ์ค‘์ž„์„ ํ‘œ์‹œํ•˜๊ณ , ์Šค์ผ€์ฅด๋Ÿฌ๋ฅผ ํ˜ธ์ถœํ•œ๋‹ค.

	add_preempt_count(PREEMPT_ACTIVE);
	__schedule();
	sub_preempt_count(PREEMPT_ACTIVE);

_cond_resched
	: resched

	if (should_resched()) {
		__cond_resched();
		return 1;
	}
	return 0;

kernel ์„ ์ 
	: __irq_svc (supervisor ์ƒํƒœ์—์„œ irq ๋ฐœ์ƒ)์—์„œ irq_handler ์ฒ˜๋ฆฌ ํ›„,
	  TIF_NEED_RESCHED ๋˜์–ด์•ผ ํ•œ๋‹ค๋ฉด svc_preempt๋กœ ์ฒ˜๋ฆฌํ•œ๋‹ค.

	svc_preempt:
		preempt_schedule_irq	// ์„ ์ ๋ถˆ๊ฐ€ ์ƒํƒœ์—์„œ๋Š” ๋ฐ”๋กœ ๋ฆฌํ„ด๋œ๋‹ค.

IPI .. Inter-Process Interrupt

vexpress๋Š” get_irqnr_preamble

2013.07.20

p->on_cpu p->on_rq

ttwu_do_activate

ftrace <- ๋ถ„์„ ์•ˆ ํ•จ

CONFIG_PERF_EVENTS perf_sw_event

kernel/trace/trace_* tools/perf/* tools/perf/design.txt

CGROUP_SCHED Group CPU scheduler

sched_feat(x) <- ์ •์˜๋œ ๋ชฉ๋ก ํ™•์ธ์€ ์–ด๋””์„œ? kernel/sched/features.h

sched_domainโ€ฆ

sched_clock

lock-less list. โ€ฆ llist.h

workqueue struct worker pool

struct completion

ํ˜‘์˜๋‚ด์šฉ o stop_machine์€ ์ถ”ํ›„ ๋‹ค์‹œ ๋ณด์ž. o wiki ํ™œ์„ฑํ™” o ์ด์ „์— ๋‹ค์‹œ ๋ณด๊ธฐ๋กœ ํ•œ ๋‚ด์šฉ์„ ์ •๋ฆฌํ•˜์ž. wiki๋กœ. review ์‹œ์  -

2013.07.27

highest_zoneidx์˜ ํ•„์š”์„ฑ?

notifier chain ์ด๋ฆ„ cpu_chain cb: page_alloc_cpu_notify pri: 0

์ฐธ๊ณ : tick_init clockevents_chain cb: tick_notify pri: 0

__module_param_call โ€ฆ section ("__param") <- __param์€ vmlinux.lds์— โ€ฆ

setup_arch parse_early_param parse_early_options parse_args "early options" <- pr_debug ์‚ฌ์šฉํ•  ๋•Œ ์ถœ๋ ฅ parse_one

core_param __module_param_call <- ์—ฌ๋Ÿฌ ํ•จ์ˆ˜์—์„œโ€ฆ

module_param module_param_named module_param_cb __module_param_call

#define module_param_call

kernel/params.c #define STANDARD_PARAM_DEF struct kernel_param_ops param_ops_##name

module_param /sys/module/xxx/parameters <- permission cat /sys/module/battery/parameters/cache_time

o x -> x -> x o x -> x o x -> x -> x -> x o o

build_all_zonelists ~ pidhash_init

o hash node๊ฐ€ hlist_node **prev, *next๋กœ ๋˜์–ด ์žˆ๋Š” ๊นŒ๋‹ญ์€? doubly linked list์˜ ์žฅ์ ์€ ์œ ์ง€ํ•˜๋ฉด์„œ ์ธํ„ฐํŽ˜์ด์Šค๊ฐ€ ๊ฐ„์†Œํ•ด ์ง„๋‹ค. delete์—์„œ ๋น„๊ต๋ฌธ ํ•œ ๋ฒˆ์ด ์ค„์–ด๋“ ๋‹ค.

o hash function

o unique_id๋กœ fn์„ ์ „๋‹ฌํ•ด xxx_fn์œผ๋กœ string์„ ์ €์žฅํ•˜๋Š” symbol์„ ๋งŒ๋“œ๋Š” ๊ธฐ๋ฒ•.

sort๋ผ๋Š” library ์ฝ”๋“œ์—์„œ heapsort???

2013.08.03

o hash node review memory allocation์ด ์‹คํŒจํ•˜๋ฉด ์ ˆ๋ฐ˜์”ฉ ํฌ๊ธฐ๋ฅผ ์ค„์—ฌ table์„ ์ƒ์„ฑํ•œ๋‹ค. o bit lock์„ ์‚ฌ์šฉํ•œ list. ์ฒซ๋ฒˆ์งธ node๊ฐ€ list์— ๋Œ€ํ•œ lock bit๋ฅผ next pointer์˜ LSB์— ์œ ์ง€ํ•˜๊ณ  ์žˆ๋‹ค. ์‚ญ์ œ ๋™์ž‘์—์„œ ๋‘ ๋ฒˆ์งธ ๋…ธ๋“œ๋ฅผ ์‚ญ์ œํ•  ๋•Œ ์ด์ „ ๋…ธ๋“œ์ธ ์ฒซ๋ฒˆ์งธ ๋…ธ๋“œ์˜ lock bit๊ฐ€ ๊ทธ๋ƒฅ ๋ฎ์–ด ์”Œ์—ฌ์ง€์ง€ ์•Š๋„๋ก ์ฒ˜๋ฆฌํ•œ๋‹ค.

2013.08.10

map_lowmem : lowmem ์˜์—ญ์— ๋Œ€ํ•ด mapping table (page table)์„ ์ƒ์„ฑํ•˜๋Š” ํ•จ์ˆ˜ for_each_memblock(memory, reg) : memblock์˜ memory type์˜ region ๊ฐœ์ˆ˜๋งŒํผ ์ˆœํšŒ struct map_desc map

struct memblock memory : reserved : ์˜ˆ์•ฝ๋œ ์˜์—ญ. kernel ์‹คํ–‰์ฝ”๋“œ,

arm_memblock_init

http://studyfoss.egloos.com/5444259 boot_cpu_init set_cpu_online set_cpu_active set_cpu_present set_cpu_possible

Cortex-A Series PG instruction cache : VIPT data cache : PIPT

๊ทธ๋Ÿฐ๋ฐ ์ถœ๋ ฅ๋ฌผ์€ ๋‹ค์Œ๊ณผ ๊ฐ™์ด ๋‚˜์˜จ๋‹ค. CPU: PIPT / VIPT nonaliasing data cache, VIPT aliasing instruction cache

6์ฐจ ์ฑ… p.192 ์— ๋‚˜์™€ ์žˆ์Œ

cache aliasing : ์„œ๋กœ ๋‹ค๋ฅธ ๊ฐ€์ƒ ์ฃผ์†Œ๊ฐ€ ๊ฐ™์€ ๋ฌผ๋ฆฌ ์ฃผ์†Œ๋ฅผ ๊ฐ€๋ฆฌํ‚ค๋Š” ๋ฌธ์ œ

์ •๋ฆฌ์‹œ์ž‘

start_kernel ... setup_arch(&command_line); ... parse_early_param();

sort(&meminfo.bank, meminfo.nr_banks, sizeof(meminfo.bank[0]), meminfo_cmp, NULL);
sanity_check_meminfo();
arm_memblock_init(&meminfo, mdesc);

paging_init(mdesc);
  1. setup_arch() ์—์„œ ... init_mm.start_code = (unsigned long) _text; init_mm.end_code = (unsigned long) _etext; init_mm.end_data = (unsigned long) _edata; init_mm.brk = (unsigned long) _end; ...

    init_mm์€ ์ „์—ญ ๊ตฌ์กฐ์ฒด๋กœ ์•„๋ž˜์™€ ๊ฐ™์€ ์ดˆ๊ธฐ๊ฐ’์„ ๊ฐ€์ง. _text, _etext, _edata, _end๋Š” arch/arm/kernel/vmlinux.lds.S์— ์ •์˜๋œ ์‹ฌ๋ณผ.

struct mm_struct init_mm = { .mm_rb = RB_ROOT, /** 20130216 * swapper_pg_dir : 0x8000_4000 * */ .pgd = swapper_pg_dir, .mm_users = ATOMIC_INIT(2), .mm_count = ATOMIC_INIT(1), .mmap_sem = __RWSEM_INITIALIZER(init_mm.mmap_sem), .page_table_lock = __SPIN_LOCK_UNLOCKED(init_mm.page_table_lock), .mmlist = LIST_HEAD_INIT(init_mm.mmlist), INIT_MM_CONTEXT(init_mm) };

  1. arm_add_memory ํ•จ์ˆ˜์—์„œ meminfo๊ฐ€ ์ฑ„์›Œ์ง„๋‹ค.

meminfo๋Š” struct meminfo ํƒ€์ž…์˜ ์ „์—ญ ๋ณ€์ˆ˜๋กœ, ๋ฉ”๋ชจ๋ฆฌ ์ดˆ๊ธฐํ™” ํ•จ์ˆ˜์—์„œ ์‚ฌ์šฉ๋˜๋Š” ์„ค์ • ์ •๋ณด๋ฅผ ๊ฐ€์ง€๊ณ  ์žˆ๋‹ค. struct meminfo { int nr_banks; struct membank bank[NR_BANKS]; };

arm_add_memory๋ฅผ ํ˜ธ์ถœํ•˜๋Š” ๊ณณ์€ parameter๋กœ ๋“ค์–ด์˜จ "mem"์ด ์ฒ˜๋ฆฌ๋˜๋Š” ๊ณณ๊ณผ ATAG_MEM์ด ์ฒ˜๋ฆฌ๋˜๋Š” ๊ณณ์ด๋‹ค.

Q. ATAG_MEM์œผ๋กœ ๋„˜์–ด์˜จ ๊ฒฝ์šฐ์™€ early_param์œผ๋กœ ๋“ค์–ด์˜จ ๊ฒฝ์šฐ ๊ฐ๊ฐ์— ๋Œ€ํ•ด์„œ arm_add_memory๊ฐ€ ๋ถˆ๋ ค์ง€๋‚˜? A. ATAG_MEM์ด ์žˆ๋‹ค๋ผ๋„ early_param(mem=)์ด ์ง€์ •๋˜๋ฉด override๋œ๋‹ค. ๊ทธ๋ฆฌ๊ณ  mem=์˜ ๊ฐœ์ˆ˜๋งŒํผ ์ถ”๊ฐ€๋œ๋‹ค.

  • ATAG๊ฐ€ ์ฒ˜๋ฆฌ๋˜๋Š” ๊ณณ setup_arch setup_machine_tags parse_tags parse_tag t = &__tagtable_begin; t < &__tagtable_end; t++

arch/arm/kernel/vmlinux.lds.S๋ฅผ ๋ณด๋ฉด ํ•ด๋‹น ์‹ฌ๋ณผ์ด ๋‚˜์™€ ์žˆ๋‹ค. .init.tagtable : { __tagtable_begin = .; *(.taglist.init) __tagtable_end = .; }

__tagtable(ATAG_MEM, parse_tag_mem32); ์ฒ˜๋Ÿผ __tagtable ๋งคํฌ๋กœ๋ฅผ ์‚ฌ์šฉํ•ด ์„ ์–ธํ•œ ๊ตฌ์กฐ์ฒด ๋ฐ์ดํ„ฐ๊ฐ€ .taglist.init์— ์ €์žฅ๋œ๋‹ค.

#define __tag __used attribute((section(".taglist.init"))) #define __tagtable(tag, fn)
static const struct tagtable _tagtable##fn __tag = { tag, fn }

...
  • mem= ๋ถ€ํŠธ ํŒŒ๋ผ๋ฏธํ„ฐ๊ฐ€ ์ฒ˜๋ฆฌ๋˜๋Š” ๊ณณ setup_arch parse_early_param ... parse_args
  1. sanity_check_meminfo ํ•จ์ˆ˜์˜ ๋™์ž‘

=================================================================================

bank๋“ค์„ ์ˆœํšŒํ•˜๋ฉฐ ์•„๋ž˜ ์กฐ๊ฑด์ด ์„ฑ๋ฆฝํ•˜๋ฉด highmem์„ ์„ค์ •
if (bank->start > ULONG_MAX)
    highmem = 1;

CONFIG_HIGHMEM์ผ ๊ฒฝ์šฐ ์•„๋ž˜ ๊ฒฝ์šฐ๋„ ์ฒดํฌ
if (__va(bank->start) >= vmalloc_min ||			/* vmalloc_min : (void *)(VMALLOC_END - (240 << 20) - VMALLOC_OFFSET); */
    __va(bank->start) < (void *)PAGE_OFFSET)	/* PAGE_OFFSET (VA) */
    highmem = 1;

bank->highmem = highmem;

if ((!bank->highmem) && (bank->start + bank->size > arm_lowmem_limit))
    arm_lowmem_limit = bank->start + bank->size;	/* bank์˜ ๋ ์ฃผ์†Œ ์ค‘ ๊ฐ€์žฅ ํฐ ๊ฐ’ */

์ˆœํšŒ ๋๋‚œ ๋’ค
high_memory = __va(arm_lowmem_limit - 1) + 1;

=================================================================================

CONFIG_HIGHMEM์ด ์ •์˜๋˜์ง€ ์•Š์€ vexpress ํ™˜๊ฒฝ์—์„œ highmem์€ 0์ด๊ณ , ๊ฐ bank์˜ highmem์— ์ €์žฅ๋œ๋‹ค.

arm_lowmem_limit์€ meminfo์˜ bank๋ฅผ ๋Œ๋ฉด์„œ bank ๋ ์ฃผ์†Œ + 1ํ•œ ๊ฐ’ ์ค‘ ๊ฐ€์žฅ ํฐ ๊ฐ’์„ ๊ฐ–๊ฒŒ ๋œ๋‹ค.

___________________________ from arch/arm/Kconfig _______________________________

config HIGHMEM <- omap, exynos(5) ๋Š” DEFINEํ•ด์„œ ์‚ฌ์šฉ (64bit์—์„œ๋Š” ์‚ฌ์šฉํ•  ํ•„์š”๊ฐ€?) bool "High Memory Support" depends on MMU help
The address space of ARM processors is only 4 Gigabytes large and it has to accommodate user address space, kernel address space as well as some memory mapped IO. That means that, if you have a large amount of physical memory and/or IO, not all of the memory can be "permanently mapped" by the kernel. The physical memory that is not permanently mapped is called "high memory".

  Depending on the selected kernel/user memory split, minimum 
  vmalloc space and actual amount of RAM, you may not need this
  option which should result in a slightly faster kernel. 

  If unsure, say n.

  1. arm_memblock_init์˜ ๋™์ž‘

    arm_memblock_init(&meminfo, mdesc); memblock_add /* memblock์˜ memory region์— ์ถ”๊ฐ€ํ•จ / memblock_add_region / ๋‹ค์–‘ํ•œ case์— ๋Œ€ํ•ด goto repeat ๋“ฑ์„ ์‚ฌ์šฉํ•ด memblock์˜ overlap๋˜๋Š” ๋ถ€๋ถ„์„ mergeํ•จ / memblock_reserve / ์‹คํ–‰์ฝ”๋“œ ์˜์—ญ ๋“ฑ์„ memblock์˜ reserve๋กœ ๋“ฑ๋ก */

  • reserved๋กœ ์žกํžˆ๋Š” ๊ฒƒ๋“ค kernel text~bss๊นŒ์ง€ ์ •์ƒ์ ์œผ๋กœ ์ง€์ •๋œ inited (memory์— ๋“ฑ๋ก๋˜์–ด ์žˆ์ง€ ์•Š์€ ์ฃผ์†Œ๋Š” ์‚ฌ์šฉํ•˜์ง€ ๋ชปํ•จ) swapper_pg_dir <- 80004000 ; ์ด ์ด๋ฆ„์˜ ์˜๋ฏธ๋Š”? (swapper_pg_dir is the virtual address of the initial page table.) device tree (์‚ฌ์šฉํ•œ๋‹ค๋ฉด) CONFIG_CMA ARCH reserved

VA PA

VMALLOC_END =0xff000000UL

VMALLOC_START =high_memory + VMALLOC_OFFSET(810241024)

high_memory .................... arm_lowmem_limit =__VA (arm_lowmem_limit-1) + 1

PAGE_OFFSET .................... PHYS_OFFSET =CONFIG_PAGE_OFFSET 0x80000000 __pv_phys_offset 0x60000000

MODULES_VADDR =PAGE_OFFSET - 16MB

TASK_SIZE =PAGE_OFFSET - 16MB

  • ์„ธ๋ฏธ๋‚˜ ์‹œ์— ์‚ฌ์ง„์œผ๋กœ ๋Œ€์ฒดํ•  ์˜ˆ์ • Documentation/arm/Porting Documentation/arm/memory.txt
  1. paging_init ๋ถ„์„ ์ค‘

    build_mem_type_table();

    Q. writeback์ด default์ธ๋ฐ, ์™œ smp์—์„œ WRITEALLOC์œผ๋กœ ์“ฐ๋Š”๊ฐ€? A. Snoop Controller๊ฐ€ ๋‹ค๋ฅธ ์ฝ”์–ด์—์„œ writeํ•œ ์ตœ์‹  ๊ฐ’์„ ์•Œ์•„๋‚ด๋„๋ก ํ•œ๋‹ค.

    • write alloc policy write ํ•˜๋ ค๊ณ  ํ–ˆ๋Š”๋ฐ cache miss์ด๋ฉด cache์— ์ผ๋‹จ ์˜ฌ๋ ค๋†“๊ณ  ์“ด๋‹ค. A write-back cache uses write allocate, โ€ฆ from https://en.wikipedia.org/wiki/Cache_(computing)

prepare_page_table(); // ์—ฌ๊ธฐ๋ถ€ํ„ฐ ๋‹ค์‹œ ์ •๋ฆฌ

MODULE_VADDR์˜ ์˜๋ฏธ๋Š”?

arch/arm/include/asm/pgtable-2level.h PMD_SIZE 1 << PMD_SHIFT (21) PGD_SIZE 1 << PGDIR_SHIFT (21)

  • This leads to the page tables having the following layout:
  • pgd pte
  • | |
  • +--------+
  • | | +------------+ +0
  • +- - - - + | Linux pt 0 |
  • | | +------------+ +1024
  • +--------+ +0 | Linux pt 1 |
  • | |-----> +------------+ +2048
  • +- - - - + +4 | h/w pt 0 |
  • | |-----> +------------+ +3072
  • +--------+ +8 | h/w pt 1 |
  • | | +------------+ +4096

PTRS_PER_PTE - 512 PTRS_PER_PMD - 1 PTRS_PER_PGD - 2048

PMD๊ฐ€ 2MB ๋‹จ์œ„โ€ฆ

31:21 20:12 11:0 pgd = pmd | ptr | offset 2048 entry 512entry

arch/arm/include/asm/memory.h ์—์„œ /*

  • PAGE_OFFSET - the virtual address of the start of the kernel image
  • TASK_SIZE - the maximum size of a user space task. ; PAGE_OFFSET ์•„๋ž˜์—์„œ ๋ชจ๋“ˆ์ด ์‚ฌ์šฉํ•  ๊ณต๊ฐ„๋งŒํผ ์ œ์™ธ
  • TASK_UNMAPPED_BASE - the lower boundary of the mmap VM area */ #define PAGE_OFFSET UL(CONFIG_PAGE_OFFSET) #define TASK_SIZE (UL(CONFIG_PAGE_OFFSET) - UL(0x01000000)) #define TASK_UNMAPPED_BASE (UL(CONFIG_PAGE_OFFSET) / 3)

#define MODULES_VADDR (PAGE_OFFSET - 1610241024)

/*

  • The highmem pkmap virtual space shares the end of the module area. */ #ifdef CONFIG_HIGHMEM #define MODULES_END (PAGE_OFFSET - PMD_SIZE) #else #define MODULES_END (PAGE_OFFSET) #endif

kernel/module.c SYSCALL_DEFINE3(init_module, ... load_module layout_and_allocate /* Allocate and move to the final place */ move_module module_alloc_update_bounds module_alloc __vmalloc_node_range (MODULES_VADDR ~ MODULES_END)

2013.08.17

highmem <- permanently mapping์ด ์•„๋‹Œ ์˜์—ญ. arch/arm/Kconfig์™€ http://www.makelinux.net/ldd3/chp-15-sect-1 ์ฐธ๊ณ 

Q. MODULE_VADDR(=PAGE_OFFSET - 16MB) ์ปค๋„ ๋ชจ๋“ˆ์ด 16MB์ด์ƒ์ด๋ผ๋ฉด? insmodํ•ด์„œ module์ด ์ถ”๊ฐ€๋  ๋•Œ ์ปค๋„ ์ฝ”๋“œ์— ์ „๋ถ€ ์ด ์˜์—ญ์—๋งŒ ์˜ฌ๋ผ๊ฐ€๋‚˜?

memblock์˜ memory์™€ reserved.

MMU๋ฅผ ์ผœ์ค€ ์‹œ์ ์ด ์–ธ์ œ์ง€? start_kernel ์˜ค๊ธฐ ์ „์— MMU๊ฐ€ ์ผœ์ง„ ์ƒํƒœ. -> paging_init์—์„œ page table์„ ๋ณ€๊ฒฝํ–ˆ์œผ๋ฉด flush

create_mapping์€ page table์„ ์ฑ„์šด๋‹ค. create_mapping ํ˜ธ์ถœํ•˜๋Š” ๋ถ€๋ถ„

  • map_lowmem ; ์ปค๋„ lowmem ์˜์—ญ (RAM์ด๋ž‘ 1:1 mapping๋˜๋Š” ์˜์—ญ)
  • devicemaps_init ; vector table, peripheral ์˜์—ญ์— ๋Œ€ํ•ด ์ƒ์„ฑ
  • iotable_init ; ์ด๊ฒƒ๋„ device์— ๋Œ€ํ•œ io table ์ƒ์„ฑโ€ฆ v2m

pgtable.h์—์„œ include ํ•˜๋Š” ํ•จ์ˆ˜ CONFIG_ARM_LPAE์ธ ๊ฒฝ์šฐ๋Š” arch/arm/include/asm/pgtable-3level.h ๊ทธ๋ ‡์ง€ ์•Š๋‹ค๋ฉด (cortex-a9 vexpress๋Š” ์—ฌ๊ธฐ ํ•ด๋‹น) arch/arm/include/asm/pgtable-2level.h * pgd์™€ pte layout๊ณผ ์„ค๋ช…์ด ๋˜์–ด ์žˆ์Œ

page table์— ์ฑ„์›Œ์ฃผ๋Š” ํ•จ์ˆ˜ set_top_pte set_pte_ext => cpu_set_pte_ext => __glue(CPU_NAME,_set_pte_ext) => cpu_v7_set_pte_ext

cpu_v7_set_pte_ext (arch/arm/mm/proc-v7-2level.S) Linux pt ํ•œ ๊ฐœ, h/w pt ํ•œ ๊ฐœ์”ฉ ์ฑ„์›Œ์คŒ

mmu.c์—์„œ pgd pud pmd pte pgd=pud=pmd[2], pte

static struct mem_type mem_types[] = { โ€ฆ };

memblock_alloc memblock_reserve์—๋„ ๋„ฃ์–ด์ค€๋‹ค.

pte table์—์„œ

*    pgd             pte
* |        |
* +--------+
* |        |       +------------+ +0
* +- - - - +       | Linux pt 0 |           <- ์ด๊ฑด ๋ญ๋กœ ์ฐธ์กฐํ•˜๋‚˜? ์–ธ์ œ ๋˜ ์“ฐ์ผ ๊ฑฐ๋ƒ?
* |        |       +------------+ +1024
* +--------+ +0    | Linux pt 1 |
* |        |-----> +------------+ +2048
* +- - - - + +4    |  h/w pt 0  |
* |        |-----> +------------+ +3072
* +--------+ +8    |  h/w pt 1  |
* |        |       +------------+ +4096

Q. 1.8MB์งœ๋ฆฌ๋Š” 1MB ์„น์…˜๊ณผ pte์ธ๊ฐ€? -> alloc_init_section ์—์„œ // end๊ฐ€ ๋–จ์–ด์ง€์ง€ ์•Š๋Š” ์ฃผ์†Œ๋ผ๋ฉด else๋ฅผ ํƒ„๋‹ค. if (type->prot_sect && ((addr | end | phys) & ~SECTION_MASK) == 0) { } else { }

type->prot_sect ์ถœ๋ ฅํ•ด๋ณด๋ฉด 70670

Q. kernel pte ์ฐธ๊ณ ํ•˜๋Š” ์ฝ”๋“œ arch/arm/include/asm/pgtable.h #define pte_offset_kernel(pmd,addr) (pmd_page_vaddr(*(pmd)) + pte_index(addr)) pmd_page_vaddr

alloc_init_section alloc_init_pte early_pte_alloc pte_offset_kernel pte_offset_map pte_offset_map_lock

[์ฐธ๊ณ ] include/asm-generic/pgtable.h

*** arch/arm/include/asm/pgtable.h ์ฃผ์„ ๋‹จ ๊ฒƒ ์ค‘ PAGE_MASK 0xfffff000 ์ด๋‹ค.

*** arch/arm/include/asm/pgtable.h ์—์„œ pmd_page_vaddr์— ๋Œ€ํ•œ ์ฃผ์„์„ ์ •ํ™•ํžˆ ๋‹ฌ์ž. pte table์˜ ์‹œ์ž‘ ์ฃผ์†Œ๋ฅผ ๋ฆฌํ„ดํ•˜๋Š” ๊ฒƒ์ด๋‹ค. pte_offset_kernel์— ๋Œ€ํ•œ ์ฃผ์„์„ ์ •ํ™•ํžˆ ๋‹ฌ์ž.

*** pud_addr_end์— ๋Œ€ํ•œ ์ฃผ์„ ์ž˜๋ชป ๋‹จ ๊ฒƒ ๊ฐ™์€๋ฐ? pgtable-nopud.h์„ ๋”ฐ๋ผ๊ฐ <- arch/arm/include/asm/pgtable.h์—์„œ include. pgtable.h๋Š” ์–ด๋””์—์„œ include ํ•˜๋Š”๋ฐ? proc-v7.S์—์„œ include asm/pgtable.h

  • ๋ณ„๋„๋กœ ๋ถ„์„

arch/arm/kernel/vmlinux.lds.S ๋ฅผ ๋ณด๋ฉด

#ifdef CONFIG_SMP_ON_UP .init.smpalt : { __smpalt_begin = .; *(.alt.smp.init) __smpalt_end = .; } #endif

.config ํŒŒ์ผ์„ ๋ณด๋ฉด

Kernel Features

CONFIG_HAVE_SMP=y CONFIG_SMP=y CONFIG_SMP_ON_UP=y CONFIG_ARM_CPU_TOPOLOGY=y

๋”ฐ๋ผ์„œ .init.smpalt ์„น์…˜์ด ํฌํ•จ๋œ๋‹ค. ์ƒ์„ฑ๋œ vmlinux elf์—์„œ .init.smpalt๋ฅผ ๊ฒ€์ƒ‰ํ•ด ๋ณด๋ฉด instruction์ด ํฌํ•จ๋˜์–ด ์žˆ๋Š” ๊ฒƒ์„ ์•Œ ์ˆ˜ ์žˆ๋‹ค.

$ arm-linux-gnueabihf-readelf -e vmlinux | grep smpalt [16] .init.smpalt PROGBITS 804632dc 4632dc 005000 00 A 0 0 4 $ arm-linux-gnueabihf-objdump -d -j .init.smpalt vmlinux | more

vmlinux: file format elf32-littlearm

Disassembly of section .init.smpalt:

804632dc <__smpalt_begin>: 804632dc: 80486b28 .word 0x80486b28 804632e0: 00000000 .word 0x00000000 804632e4: 804468c0 .word 0x804468c0 804632e8: e320f000 nop {0} 804632ec: 804480ec .word 0x804480ec 804632f0: e320f000 nop {0} 804632f4: 80448108 .word 0x80448108 804632f8: e320f000 nop {0} 804632fc: 80448120 .word 0x80448120 80463300: e320f000 nop {0}

โ€ฆ

์œ„ ์„น์…˜์— ํฌํ•จ์‹œํ‚ค๋Š” ๋งคํฌ๋กœ๋Š” arch/arm/include/asm/assembler.h์— ๋“ค์–ด๊ฐ€ ์žˆ๋‹ค.

#ifdef CONFIG_SMP #define ALT_SMP(instr...)
9998: instr /*

  • Note: if you get assembler errors from ALT_UP() when building with
  • CONFIG_THUMB2_KERNEL, you almost certainly need to use
  • ALT_SMP( W(instr) ... ) */ #define ALT_UP(instr...)
    .pushsection ".alt.smp.init", "a" ;
    .long 9998b ;
    9997: instr ;
    .if . - 9997b != 4 ;
    .error "ALT_UP() content must assemble to exactly 4 bytes";
    .endif ;
    .popsection

โ€ฆ

์ด ์„น์…˜์— ๋Œ€ํ•ด ์ฒ˜๋ฆฌํ•˜๋Š” ์ฝ”๋“œ๋Š” arch/arm/kernel/head.S์— ๋“ค์–ด ์žˆ๋‹ค.

#ifdef CONFIG_SMP_ON_UP bl __fixup_smp #endif

MPDIR ๋ ˆ์ง€์Šคํ„ฐ๋ฅผ ์ฝ์–ด ํ˜„์žฌ smp ์—ฌ๋ถ€๋ฅผ ํŒ๋‹จ


arch/arm/kernel/head.S ๋ฅผ ๋ณด๋ฉด

.align

1: .word . .word __smpalt_begin .word __smpalt_end

/** 20130518 smp_on_up **/ .pushsection .data .globl smp_on_up smp_on_up: ALT_SMP(.long 1) ALT_UP(.long 0) .popsection #endif

$ vi System.map smp_on_up search. ๋˜๋Š” $ arm-linux-gnueabihf-objdump -dslx vmlinux 80486b28 g .data 00000000 smp_on_up

$ arm-linux-gnueabihf-objdump -Dslx -j .data vmlinux 80486b20 68932980 a4143480 01000000 00000000 h.)...4......... ๊ฐ’์„ ํ™•์ธํ•ด ๋ณด๋ฉด 01์ž„.

(qemu๋กœ ๋Œ๋ฆฐ ๋’ค ddd๋กœ ๋ถ™์—ฌ์„œ print smp_on_up ํ•˜๋ฉด ๋ฐ”๋กœ 1 ์ถœ๋ ฅ๋จ)

์œ„ ์ •๋ณด๋ฅผ ๋ณด๊ณ ๋Š” ์ž˜ ๋ชจ๋ฅด๊ฒ ๊ณ , boot ๋กœ๊ทธ ์ฐํžŒ ๊ฒƒ ๋ณด๋ฉด

CPU0: thread -1, cpu 0, socket 0, mpidr 80000000 smp_twd: clock not found: -2 Calibrating local timer... 97.32MHz. hw perfevents: enabled with ARMv7 Cortex-A9 PMU driver, 1 counters available Setting up static identity map for 0x606d9980 - 0x606d99d8 CPU1: Booted secondary processor CPU1: thread -1, cpu 1, socket 0, mpidr 80000001 CPU1: Unknown IPI message 0x0 CPU2: Booted secondary processor CPU2: thread -1, cpu 2, socket 0, mpidr 80000002 CPU2: Unknown IPI message 0x0 CPU3: Booted secondary processor CPU3: thread -1, cpu 3, socket 0, mpidr 80000003 CPU3: Unknown IPI message 0x0 Brought up 4 CPUs SMP: Total of 4 processors activated (1216.10 BogoMIPS).

head.S ๋ฅผ ๋‹ค์‹œ ๋ณด๋ฉด

#ifdef CONFIG_SMP_ON_UP __INIT __fixup_smp: and r3, r9, #0x000f0000 @ architecture version teq r3, #0x000f0000 @ CPU ID supported? bne __fixup_smp_on_up @ no, assume UP

bic r3, r9, #0x00ff0000
bic r3, r3, #0x0000000f @ mask 0xff00fff0
mov r4, #0x41000000
orr r4, r4, #0x0000b000
orr r4, r4, #0x00000020 @ val 0x4100b020
teq r3, r4          @ ARM 11MPCore?
moveq   pc, lr          @ yes, assume SMP

mrc p15, 0, r0, c0, c0, 5   @ read MPIDR
and r0, r0, #0xc0000000 @ multiprocessing extensions and
teq r0, #0x80000000     @ not part of a uniprocessor system?
moveq   pc, lr          @ yes, assume SMP

MPIDR์—์„œ ์ฝ์€ ๊ฐ’์˜ ์ตœ์ƒ์œ„ 2๋น„ํŠธ๋ฅผ ๋น„๊ตํ•ด 10์ด๋ฉด SMP๋ผ ํŒ๋‹จํ•˜๊ณ  ๋ฐ”๋กœ ๋ฆฌํ„ด. ๊ทธ๋ ‡์ง€ ์•Š๋‹ค๋ฉด ์•„๋ž˜์˜ __fixup_smp_on_up ๊ณ„์† ์ˆ˜ํ–‰.

๋”ฐ๋ผ์„œ ๋ฐ”๋กœ ๋ฆฌํ„ด๋จ.

[์ฐธ๊ณ ] exynos 4412 http://com.odroid.com/sigong/blog/blog_list.php?bid=131

2013.08.24

L1 DOMAIN ์„ค์ •์— ๋”ฐ๋ผ L2์˜ AP์˜ ์—ญํ• ์ด ๋‹ฌ๋ผ์ง„๋‹ค. MANAGER CLIENT - L2์˜ AP ์†์„ฑ ์‚ฌ์šฉ

kmap_init

PKMAP์˜ ์šฉ๋„๋Š”? PKMAP_BASE PAGE_OFFSET-1 Permanent kernel mappings

CONFIG_HIGHMEM -> That means that, if you have a large amount of physical memory and/or IO, not all of the memory can be "permanently mapped" by the kernel

page table virtual address -> physical address๋กœ mapping ์ •๋ณด๋ฅผ ๋ณด์œ 

init_bootmem_node init_bootmem_core link_bootmem

2013.08.31

mem_init : ์ฃผ์„์— ๋”ฐ๋ฅด๋ฉด mem_map์˜ free area๋ฅผ markํ•˜๊ณ  ์–ผ๋งˆ๋‚˜ ๋ฉ”๋ชจ๋ฆฌ๊ฐ€ ๋‚จ์•„ ์žˆ๋Š”์ง€ ์•Œ๋ ค์ค€๋‹ค๊ณ  ํ•œ๋‹ค. mem_map์€ struct page *๋กœ, NODE_DATA(0)->node_mem_map๊ฐ€ ๋Œ€์ž…๋œ๋‹ค.

alloc_node_mem_map์—์„œ pfn ํฌ๊ธฐ๋งŒํผ ๋ฉ”๋ชจ๋ฆฌ๋ฅผ ํ• ๋‹น๋ฐ›์•„ map์— ์ €์žฅํ•˜๊ณ , ๊ทธ ๊ฐ’์„ node_mem_map์— ๋„ฃ๋Š”๋‹ค.

bdata_list : bdata๋“ค๋กœ ์ด๋ฃจ์–ด์ง„ ๋ฆฌ์ŠคํŠธ. init_bootmem_core์—์„œ linkํ•จ.

bdata->node_bootmem_map : init_bootmem_core์—์„œ bdata->node_bootmem_map = phys_to_virt(PFN_PHYS(mapstart)); mapstart๋Š” pfn where the bitmap is to be placed.

์‚ฌ์šฉ ์ค‘์ผ ๊ฒฝ์šฐ __reserve๋กœ 1๋กœ ํ‘œ์‹œํ•ด์คŒ

bootmem_data_t *bdata

/*

  • node_bootmem_map is a map pointer - the bits represent all physical
  • memory pages (including holes) on the node. / typedef struct bootmem_data { /* 20130420
    • node_min_pfn : ๋…ธ๋“œ์˜ ๋ฌผ๋ฆฌ ๋ฉ”๋ชจ๋ฆฌ์˜ ์‹œ์ž‘ ์ฃผ์†Œ์— ๋Œ€ํ•œ pfn
    • node_low_pfn : ๋…ธ๋“œ์˜ ๋ฌผ๋ฆฌ ๋ฉ”๋ชจ๋ฆฌ์˜(lowmem) ๋ ์ฃผ์†Œ์— ๋Œ€ํ•œ pfn
    • node_bootmem_map : bitmap์ด ์œ„์น˜ํ•œ virtual memory ์ฃผ์†Œ
    • last_end_off : page frame ๊ด€๋ฆฌ๋ฅผ ์œ„ํ•œ struct page ๋“ค์ด ์‚ฌ์šฉ ์ค‘์ธ ๊ณต๊ฐ„์˜
    •             ๋งˆ์ง€๋ง‰ physical offset. (from node_min_pfn)
      
    • hint_idx : PFN_UP(last_end_off) **/ unsigned long node_min_pfn; unsigned long node_low_pfn; void *node_bootmem_map; unsigned long last_end_off; unsigned long hint_idx; struct list_head list; } bootmem_data_t;

free_all_bootmem_core ์—์„œ

start = bdata->node_min_pfn; map = bdata->node_bootmem_map; order = ilog2 (BITS_PER_LONG) ... __free_pages_bootmem(pfn_to_page(start), order);

struct page์—์„œ _count์™€ _mapcount์˜ ์˜๋ฏธ๋Š”? _count๋Š” ํ•ด๋‹น ํŽ˜์ด์ง€์— ๋Œ€ํ•œ reference count.

o _count ํ•„๋“œ๋Š” ํŽ˜์ด์ง€์˜ ์‚ฌ์šฉํšŸ์ˆ˜, ์ฆ‰ ํ•ด๋‹น ํŽ˜์ด์ง€์— ๋Œ€ํ•œ ์ฐธ์กฐ๊ฐ€ ์–ผ๋งˆ๋‚˜ ๋˜๋Š”์ง€๋ฅผ ์ €์žฅํ•œ๋‹ค.
  ํ˜„์žฌ page_count๋Š” _count๋ฅผ ์ฝ์–ด ๋ฐ”๋กœ ๋ฆฌํ„ดํ•œ๋‹ค.

๋ฌผ๋ก  ํŽ˜์ด์ง€๊ฐ€ ์‚ฌ์šฉ์ค‘์ด๋ผ๋ฉด ์–‘์˜ ์ •์ˆ˜๊ฐ’์„ ๋ฆฌํ„ดํ•œ๋‹ค. ํŽ˜์ด์ง€๋Š” ํŽ˜์ด์ง€ ์บ์‹œ์— ์˜ํ•ด ์‚ฌ์šฉ๋˜๊ฑฐ๋‚˜(์ด ๊ฒฝ์šฐ mapping ํ•„๋“œ๋Š” ์ด ํŽ˜์ด์ง€์™€ ์—ฐ๊ด€๋œ address_space ๊ฐ์ฒด๋ฅผ ๊ฐ€๋ฆฌํ‚จ๋‹ค.), ๊ณ ์œ  ๋ฐ์ดํ„ฐ(private ํ•„๋“œ๊ฐ€ ๊ฐ€๋ฆฌํ‚ค๋Š”)๋กœ, ํ˜น์€ ํ”„๋กœ์„ธ์Šค์˜ ํŽ˜์ด์ง€ ํ…Œ์ด๋ธ” ์•ˆ์— ์œ„์น˜ํ•˜๋Š” ๋งคํ•‘์œผ๋กœ ์‚ฌ์šฉ๋œ๋‹ค. http://hooneyo.tistory.com/entry/%EB%A9%94%EB%AA%A8%EB%A6%AC-%EA%B4%80%EB%A6%AC

o _count is a usage count indicating the number of references to this page in the kernel. When its value reaches 0, the kernel knows that the page instance is not currently in use and can therefore be removed. If its value is greater than 0, the instance should on no account be removed from memory. If you are not familiar with reference counters, you should consult Appendix C for further information.

_mapcount๋Š” o _mapcount indicates at how many points the page is shared. The original value of the counter is โˆ’1. It is assigned the value 0 when the page is inserted in the reverse mapping data structures and is incremented by 1 for each additional user. This enables the kernel to check quickly how many users are using the page in addition to the owner. o _mapcount indicates how many entries in the page table point to the page. ํŽ˜์ด์ง€ ํ…Œ์ด๋ธ”์˜ entry๋“ค์ด page๋ฅผ ์–ผ๋งˆ๋‚˜ ๋งŽ์ด ์ฐธ์กฐํ•˜๋Š”์ง€ ์นด์šดํŒ…

struct page์˜ flags์—๋Š” section, nid, zone ์ •๋ณด๊ฐ€ ๊ธฐ๋ก๋œ๋‹ค.

struct per_cpu_pages *pcp;

hot-n-cold-page http://lwn.net/Articles/14768/

pageblock : page๋ฅผ ๋ฌถ์–ด์„œ ๋ฉ์–ด๋ฆฌ๋กœ ๊ด€๋ฆฌ (์™œ 1024 ๋‹จ์œ„๋กœ ๊ด€๋ฆฌ๋ฅผ ํ• ๊นŒ?). 1024๊ฐœ ์ค‘ ์ฒซ๋ฒˆ์งธ page์— ์†์„ฑ์„ ์ •์˜.

get_pageblock_bitmap์˜ ๋™์ž‘ zone->pageblock_flags๋ฅผ ๋ฐ˜ํ™˜. zone๋งˆ๋‹ค ๊ตฌ์„ฑ๋œ๋‹ค. pageblock_flags๋Š” setup_usemap์—์„œ usemapsize๋งŒํผ ํ• ๋‹น๋ฐ›์•„ ์ €์žฅํ•œ bitmap์ด๋‹ค. usemapsize๋Š” zonesize์— ํ•ด๋‹นํ•˜๋Š” pageblock์„ ๋น„ํŠธ๋งต์œผ๋กœ ํ‘œํ˜„ํ• ๋•Œ ํ•„์š”ํ•œ ๋ฐ”์ดํŠธ ๋‹จ์œ„ ํฌ๊ธฐ๋ฅผ ์ €์žฅํ•œ๋‹ค. pageblock_order๋Š” (11-1)๋กœ, zonesize๋ฅผ ๋‚˜๋ˆŒ ๋•Œ ์“ฐ์ธ๋‹ค.

memmap_init_zone ์—์„œ PFN์ด 1024์˜ ๋ฐฐ์ˆ˜์ธ page์— ๋Œ€ํ•ด์„œ migrate type์„ MIGRATE_MOVABLE ์„ค์ •

set_pageblock_flags_group(page, type, PB_migrate, PB_migrate_end) <- 1024๊ฐœ ์ค‘ ์ฒซ๋ฒˆ์งธ page. PB_migrate ~ PB_migrate_end NR_PAGEBLOCK_BITS๋Š” 3.

arm_bootmem_free์—์„œ zone_size[0] = max_low - min; <- ์ฒซ pfn๊ณผ ๋งˆ์ง€๋ง‰ pfn์˜ ์ฐจ. ๊ฒฐ๊ตญ page frame ์ˆ˜. zone_spanned_pages_in_node์ด setup_usemap์˜ 3๋ฒˆ์งธ ์ธ์ž๋กœ ๋„˜์–ด๊ฐ.

usemapsize = pfn ์ˆ˜ >> order ํ•˜๊ณ , NR_PAGEBLOCK_BITS (3)์„ ํ•œ ๊ฐ’. ์ด ๊ฐ’์„ ๋‹ค์‹œ unsigned long ๋‹จ์œ„๋กœ ์˜ฌ๋ฆฌ๊ณ  / 8 ํ•œ(๋ฐ”์ดํŠธ ๋‹จ์œ„) ๊ฐ’์ด usemap_size. (migrate type์ด 3๊ฐœ ํ•„์š”ํ•˜๋‹ค)

usemapsize๋Š” setup_usemap์—์„œ bootmem_alloc ํ•  ๋•Œ ์‚ฌ์šฉ๋œ๋‹ค.

๋‹ค์Œ ๋ณ€์ˆ˜๋Š” ํ˜ธ์ถœ๋˜๋Š” ํ•จ์ˆ˜๋งˆ๋‹ค 0์œผ๋กœ ์„ค์ •??? zone->all_unreclaimable = 0; zone->pages_scanned = 0;

zone->pageset์—๋Š” percpu ๋ณ€์ˆ˜ boot_pageset์˜ ์ฃผ์†Œ๋ฅผ ์ €์žฅ.

free_all_bootmem free_all_bootmem_core __free_pages_bootmem __free_pages

free_hot_cold_page <- percpu ๋ณ€์ˆ˜ pcp๋ฆฌ์ŠคํŠธ์— ๋‹ฌ์•„์ค€๋‹ค. ๋„ˆ๋ฌด ๋งŽ์ด ๋‹ฌ๋ ค ์žˆ์œผ๋ฉด buddy๋กœ ์ง€์ •๋œ bulk์— ์ง€์ •๋œ ๊ฐœ์ˆ˜๋งŒํผ ํ•ด์ œ. (migrate type๋ณ„ ๋ฆฌ์ŠคํŠธ์˜ ์•ž ๋ถ€๋ถ„์ด hot, ๋’ท ๋ถ€๋ถ„์ด cold) __free_pages_ok

free_pcppages_bulk __mod_zone_page_state <- ๋ณด๋Š” ์ค‘

struct per_cpu_pageset { ... vm_stat_diff[NR_VM_ZONE_STAT_ITEMS]; <--- ๋ฌด์—‡? }

page : ANONYMOUS page

vm_event_states ์—ญ์‹œ pcpu ๋ณ€์ˆ˜.

__free_pages ํ•จ์ˆ˜ ํ˜„์žฌ skip.

free_all_bootmem_core ์˜ ๋™์ž‘์„ ๋ณด๋ฉด bootmem์—์„œ ํ•ด์ œํ–ˆ์„ ๊ฒฝ์šฐ bitmap์—์„œ๋Š” ์‚ฌ์šฉ ์ค‘์ด์ง€ ์•Š์ง€๋งŒ, struct page ๋ฅผ ๊ด€๋ฆฌํ•˜๋Š” ๊ณณ์—์„œ๋Š” ์—ฌ์ „ํžˆ ํ‘œ์‹œ๊ฐ€ ๋˜์–ด ์žˆ์ง€ ์•Š๊ธฐ ๋•Œ๋ฌธ์— __free_pages_bootmem์œผ๋กœ ํ•ด์ œ.

20130831 : mem_init: free_all_bootmem ์™„๋ฃŒ

================================================================ NOTIFIER CHAIN

0) callback function์„ ์šฐ์„ ์ˆœ์œ„์— ๋”ฐ๋ผ ๋ฆฌ์ŠคํŠธ๋กœ ๋“ฑ๋กํ•ด๋‘๊ณ ,
  ์ด๋ฒคํŠธ๊ฐ€ ๋ฐœ์ƒํ•˜๋ฉด ๋“ฑ๋ก๋œ callback ํ•จ์ˆ˜๋“ค์„ ํ˜ธ์ถœํ•˜๋Š” ๋งค์ปค๋‹ˆ์ฆ˜
1) struct notifier_block์„ ์„ ์–ธํ•œ๋‹ค.
2) notifier_block์„ ๋“ฑ๋กํ•œ๋‹ค.
3) ๋“ฑ๋ก/ํ˜ธ์ถœ
  * register ํ•จ์ˆ˜ : notifier_chain_register
	notifier head๋ฅผ ๋”ฐ๋ผ๊ฐ€ priority์ˆœ์œผ๋กœ ์ •๋ ฌ๋œ ์œ„์น˜์— notifier_block์„ ๋‹ฌ์•„์ค€๋‹ค.
	notifier chain์€ ๋ณ„๋„์˜ head๋ฅผ ๋‘” ์‹ฑ๊ธ€ ๋ฆฌ์ŠคํŠธ์ด๋ฉฐ, ๋ฆฌ์ŠคํŠธ๋Š” RCU-protected pointer๋กœ ๊ด€๋ฆฌ๋œ๋‹ค.
  * unregister ํ•จ์ˆ˜ : notifier_chain_unregister
  * call ํ•จ์ˆ˜ : notifier_call_chain

4) ๋ณดํ†ต ์ „์—ญ๋ณ€์ˆ˜๋กœ notifier_chain *head๋ฅผ  ์„ ์–ธํ•˜๊ณ , ๋…์ž์ ์ธ ํ•จ์ˆ˜๋กœ ๋“ฑ๋กํ•œ๋‹ค.
	๋“ฑ๋ก : register_cpu_notifier
	ํ˜ธ์ถœ : cpu_notify
	cpu_chain์ด๋ผ๋Š” notifier head์— notifier_block์„ ๋“ฑ๋กํ•œ๋‹ค.

	notifier head๋Š” ๋‹ค์Œ๊ณผ ๊ฐ™์ด ์„ ์–ธํ•œ๋‹ค.
	static RAW_NOTIFIER_HEAD(cpu_chain);

	RAW_NOTIFIER_HEAD๋Š” ๊ตฌ์กฐ์ฒด ๋ณ€์ˆ˜๋ฅผ ์„ ์–ธํ•˜๊ณ , ๋ฉค๋ฒ„์ธ .head๋ฅผ ์ดˆ๊ธฐํ™” ํ•˜๋Š” ๋งคํฌ๋กœ์ด๋‹ค.

	  #define RAW_NOTIFIER_HEAD(name)                 \
	      struct raw_notifier_head name =             \
	          RAW_NOTIFIER_INIT(name)

	  struct raw_notifier_head {
	      struct notifier_block __rcu *head;
	  };



struct notifier_block
	int (*notifier_call)()
	struct notifier_block __rcu *next;
	int priority;

head : type์— ๋”ฐ๋ผ ๊ฐ๊ฐ ๋‹ค๋ฅธ head์™€ api๊ฐ€ ์กด์žฌํ•œ๋‹ค.
	์œ„์น˜: kernel/notifier.c

	struct atomic_notifier_head   : spinlock์„ ํฌํ•จํ•˜๋Š” atomic ๋ฒ„์ „
	struct blocking_notifier_head : rwsem์„ ํฌํ•จํ•˜๋Š” blocking ๋ฒ„์ „
	struct raw_notifier_head      : ํŠน๋ณ„ํ•œ ๊ธฐ๋Šฅ์„ ํฌํ•จํ•˜์ง€ ์•Š๋Š” raw ๋ฒ„์ „
	struct srcu_notifier_head     : sleeping rcu์„ ํฌํ•จํ•˜๋Š” srcu ๋ฒ„์ „

================================================================ iommu

drivers/iommu ์•„๋ž˜์— ๋“ค์–ด ์žˆ๋Š” ๊ฒƒ๋„ ์žˆ๊ณ , arch/arm/mach-XXX/devices-iommu.c ์ฒ˜๋Ÿผ ์ •์ด๋˜์–ด ์žˆ๊ธฐ๋„ ํ•˜๋ฉฐ, arch/arm/mm/dma-mapping.c ์—๋„ ๊ด€๋ จ ํ•จ์ˆ˜๋“ค์ด ์œ„์น˜ํ•ด ์žˆ๋‹ค.

๋ฌด์—‡?

  • DMA ์‚ฌ์šฉ์ด ๊ฐ€๋Šฅํ•œ I/O ๋ฒ„์Šค์™€ ์ฃผ๊ธฐ์–ต์žฅ์น˜๋ฅผ ์—ฐ๊ฒฐํ•˜๋Š” ๋ฉ”๋ชจ๋ฆฌ ๊ด€๋ฆฌ ์žฅ์น˜ (MMU).
  • system MMU๊ฐ€ CPU์—์„œ ๋ฉ”๋ชจ๋ฆฌ๋ฅผ ์ ‘๊ทผํ•  ๋•Œ ๊ฐ€์ƒ ์ฃผ์†Œ๋ฅผ ๋ฌผ๋ฆฌ์ฃผ์†Œ๋กœ ๋ณ€ํ™˜ํ•ด ์ฃผ๋Š” ๊ฒƒ์ฒ˜๋Ÿผ, IOMMU ์—ญ์‹œ ๊ฐ€์ƒ ์ฃผ์†Œ(device addresses or I/O addresses)๋ฅผ ๋ฌผ๋ฆฌ์ฃผ์†Œ๋กœ ๋ณ€ํ™˜ํ•œ๋‹ค. ๋ฌผ๋ก  memory protection์— ๋Œ€ํ•œ ๊ฐœ๋…๋„ ํฌํ•จํ•œ๋‹ค.
  • ์ฃผ์†Œ ํญ์ด ์ž‘์€ ๋ฒ„์Šค์ผ ๋•Œ๋Š” IOMMU๋ฅผ ์‚ฌ์šฉํ•ด ๋ฉ”๋ชจ๋ฆฌ ์–ด๋””์— ์žˆ์–ด๋„ ์ ‘๊ทผ ๊ฐ€๋Šฅํ•˜๋‹ค (๊ทธ๋ ‡์ง€ ์•Š๋‹ค๋ฉด ์ฃผ์†Œ๋กœ ํ‘œํ˜„ํ•  ์ˆ˜ ์žˆ๋Š” ๋ฉ”๋ชจ๋ฆฌ ๊ณต๊ฐ„์œผ๋กœ ๋ณต์‚ฌํ•ด์•ผ ํ•œ๋‹ค - ํ™•์ธ ํ•„์š”ํ•œ ๋‚ด์šฉ)

์ด๋ ‡๋‹ค๋ฉด ๊ถ๊ธˆ์ฆ์ด ์ƒ๊ธฐ๋Š”๋ฐ, IOMMU์™€ MMU๊ฐ€ ๊ฐ™์€ ๋ฉ”๋ชจ๋ฆฌ๋ฅผ ๊ฐ€๋ฆฌํ‚ค๋ฉด ๋ฌธ์ œ์˜ ์†Œ์ง€๊ฐ€ ์žˆ์ง€ ์•Š๋‚˜? IOMMU table์€ ์–ธ์ œ ๋งŒ๋“œ๋‚˜?

IOMMU์˜ ์žฅ์ 

  1. ๋„“์€ ์˜์—ญ์„ ์—ฐ์†์ ์ธ ๋ฌผ๋ฆฌ ๋ฉ”๋ชจ๋ฆฌ ํ• ๋‹น ์—†์ด ํ• ๋‹น ๊ฐ€๋Šฅํ•˜๋‹ค.
  2. ๋””๋ฐ”์ด์Šค์˜ ๋น„์ •์ƒ์ ์ธ(์ž์‹ ์ด ํ• ๋‹น ๋ฐ›์€ ์˜์—ญ ์™ธ) ๋ฉ”๋ชจ๋ฆฌ read/write ์ œํ•œ์ด ๊ฐ€๋Šฅํ•˜๋‹ค.
  3. ๊ฐ€์ƒํ™”์—์„œ guest os๊ฐ€ ์•ˆ์ •์ ์œผ๋กœ ํ•˜๋“œ์›จ์–ด์— ๋Œ€ํ•œ ์ง์ ‘ ์ ‘๊ทผ์„ ํ•  ์ˆ˜ ์žˆ๋‹ค.

์–ด๋–ป๊ฒŒ ์‚ฌ์šฉ? (์ธํ„ฐํŽ˜์ด์Šค ์œ„์ฃผ) dma ์ธํ„ฐํŽ˜์ด์Šค๋ฅผ ํ†ตํ•ด ์ ‘๊ทผ. init_iommu๋Š” subsyscall๋กœ ํ˜ธ์ถœ arch/arm/mm/dma-mapping.c

================================================================ ioremap

์•Œ์•„๋ณผ ๊ฒƒ

DMA KERNEL ARM DMA API

max_low_pfn

end = pfn_to_page(pfn2 - 1) + 1; <- ์ด๋Ÿฐ ์‹์œผ๋กœ ํ•˜๋Š” ์ด์œ ๋Š” ์—ฐ์†์ ์ด์ง€ ์•Š์„ ๊ฒฝ์šฐ

Documentation/cgroups/cpusets.txt

seqlock seqcount

cpu_relax () ARCH 6์—์„œ๋Š” smp_mb() ๊ทธ ์™ธ (7 ํฌํ•จ) barrier() ; ์ตœ์ ํ™” barrier (๊ทธ๋ƒฅ memory)

2013.09.07 seqlock's concept o ๋ณดํ˜ธ์ž์›์ด ์ž‘๊ณ , ๊ฐ„๋‹จํ•˜๊ณ  ์ ‘๊ทผ์ด ์žฆ์„ ๋•Œ, ๊ทธ๋ฆฌ๊ณ  ์“ฐ๊ธฐ ์ ‘๊ทผ์ด ๋“œ๋ฌผ์ง€๋งŒ ๋นจ๋ผ์•ผ ํ•  ๋•Œ o ์ฝ๊ธฐ ์Šค๋ ˆ๋“œ๊ฐ€ ์Šค์Šค๋กœ ์“ฐ๊ธฐ ์Šค๋ ˆ๋“œ์™€ ์ถฉ๋Œ์„ ํ™•์ธ, ์ถฉ๋Œ์ด ๋ฐœ์ƒํ•˜๋ฉด ์ ‘๊ทผ์„ ๋‹ค์‹œ ์‹คํ–‰ o ํฌ์ธํ„ฐ๊ฐ€ ์žˆ๋Š” ์ž๋ฃŒ ๊ตฌ์กฐ์ฒด์˜ ์ ‘๊ทผ์—๋Š” ์ ํ•ฉํ•˜์ง€ ์•Š๋‹ค.

  • read : ๊ธธ๊ณ  ์žฆ์€ ์ž‘์—…
  • write : ์งง๊ณ  ๋“œ๋ฌธ ์ž‘์—…

do { seq = read_seqbegin (&lock2); } while (read_seqretry (&lock2, seq));

__alloc_pages_nodemask ; zoned buddy allocator ์˜ ํ•ต์‹ฌ (seqlock read) get_mems_allowed ; read_seqcount_begin put_mems_allowed ; read_seqcount_retry


__free_pages์—์„œ order == 0์ด๋ฉด free_hot_cold_page ํ˜ธ์ถœ. ๋ฒ„๋””๋กœ ๋ฐ”๋กœ ๋ฐ˜ํ™˜ ์•ˆ ํ•˜๊ณ  pcp์— ์ถ”๊ฐ€. watermark ์ด์ƒ์ผ ๋•Œ๋งŒ free. ์•„๋‹ˆ๋ฉด __free_pages_ok ํ˜ธ์ถœ.

์ตœ์ข…์ ์œผ๋กœ __free_one_page

buddy system Page's order is recorded in page_private(page) field.

  • zonelist์™€ zoneref์˜ ๊ด€๊ณ„๋ฅผ ๊ทธ๋ฆผ์œผ๋กœ ๊ทธ๋ ค๋ณด์ž.

2013.09.14

zlc : zone list cache

cpu_set_zone_allowed_softwall cpu_set_zone_allowed_hardwall

cpuset : Cpusets provide a mechanism for assigning a set of CPUs and Memory Nodes to a set of tasks.

softwall hardwall <- GFP_HARDWALL

zone์˜ vm_stat[โ€ฆ] ๊ฐ๊ฐ์€? page๊ฐ€ reclaimable ์ด๋ผ๊ณ  ํ•จ์€? nr_swap_pages anon ; file mapping์ด ์•„๋‹Œ ๊ฒƒ

zone dirty_balance_reserve์˜ ์šฉ๋„๋Š”?

global vmstat, zone vmstat ์ฆ๊ฐ์€ ๋™์‹œ์— ์ด๋ค„์ง€๋‚˜?

enum zone_stat_item <- ๊ฐ๊ฐ์˜ ์˜๋ฏธ๋Š”? cat /proc/meminfo

zone_watermark_ok ์—์„œ ์™œ ๊ฐ order๋ฅผ ์ˆœํšŒํ•˜๋ฉฐ ๊ฒ€์‚ฌํ• ๊นŒ? return __zone_watermark_ok(z, order, mark, classzone_idx, alloc_flags, zone_page_state(z, NR_FREE_PAGES));

  • zone์˜ free page๊ฐ€ ๋ชจ๋‘ ํ•˜์œ„ order์— ๋ชฐ๋ ค ์žˆ๋‹ค๋ฉด, free page๊ฐ€ ๋งŽ๋‹ค ํ•ด๋„ ์ƒ์œ„ order ์š”์ฒญ์„ ์ฒ˜๋ฆฌํ•  ์ˆ˜ ์—†๋‹ค.

    ๋„˜์–ด์˜จ ๊ฐ’ mark = zone->watermark[alloc_flags & ALLOC_WMARK_MASK]; zone_page_state : zone์˜ NR_FREE_PAGES ๊ฐœ์ˆ˜๋ฅผ ์–ป์–ด ์˜จ๋‹ค. min = mark;

    long lowmem_reserve = z->lowmem_reserve[classzone_idx]; // ์„ ํ˜ธํ•˜๋Š” zone์˜ index๋กœ lowmem_reserve๋ฅผ ๊ฐ€์ ธ์˜จ๋‹ค. int o;

    free_pages -= (1 << order) - 1; // free_pages๋Š” ํ˜„์žฌ zone์˜ ์‚ฌ์šฉ ๊ฐ€๋Šฅ pages.

    if (alloc_flags & ALLOC_HIGH) min -= min / 2; if (alloc_flags & ALLOC_HARDER) min -= min / 4;

    if (free_pages <= min + lowmem_reserve) // zone ์ „์ฒด์— ํ•„์š”ํ•œ ์ตœ์†Œ๊ฐ’ ๊ฒ€์‚ฌ return false;

    for (o = 0; o < order; o++) { /* At the next order, this order's pages become unavailable */ free_pages -= z->free_area[o].nr_free << o;

    /* Require fewer higher order pages to be free */
    min >>= 1;
    
    if (free_pages <= min)
        return false;
    

    } return true;

lazy buddy allocator ; ๋นˆ๋ฒˆํ•œ ๋ฉ”๋ชจ๋ฆฌ ํ• ๋‹น๊ณผ ํ•ด์ œ์‹œ์— loop์„ ๋Œ๋ฉฐ allocation๊ณผ merge๋ฅผ ๋ฐ˜๋ณตํ•˜๊ฒŒ ๋˜๋Š”๋ฐ, ์ด ์˜ค๋ฒ„ํ—ค๋“œ๋ฅผ ์ค„์ด๊ณ ์ž __free_pages์‹œ์— ๋ฐ”๋กœ merge๋ฅผ ํ•˜์ง€ ์•Š๊ณ , watermark ์ดํ•˜๋กœ ๋–จ์–ด์กŒ์„ ๋•Œ ์ˆ˜ํ–‰ํ•œ๋‹ค.

2013.09.21 HUGEPAGE arm์—์„œ๋Š” lpae ์ง€์›์„ ์œ„ํ•ด ์ถ”๊ฐ€๋จ.

http://lwn.net/Articles/478097/

ARMv7 Cortex-A15๋ถ€ํ„ฐ ์ถ”๊ฐ€๋จ Cortex-A Series PG Cortex-A15 TRM

ID_MMFR0 : VMSA support

LPAE : 40bit address space [31:30][29:22][21:13] 4 512 512 1G 2MB

๋ณ€๊ฒฝ๋œ source ; http://lwn.net/Articles/441989/ CONFIG_ARM_LPAE ex) #include "proc-v7-3level.S"

arch/arm/mm/proc-v7.S __v7_setup

arch/arm/kernel/head.S __enable_mmu


__init ์œผ๋กœ ๋ณ„๋„์˜ ์„น์…˜์œผ๋กœ ๋‘์—ˆ๋˜ ์˜์—ญ์˜ ๋ฉ”๋ชจ๋ฆฌ ํ•ด์ œ

kernel_init init_post free_initmem poison_init_mem free_area <- 9์›” 7์ผ์— ๋ถ„์„ํ•œ ํ•จ์ˆ˜

2013.09.28 spin_lock, spin_unlock

slab ํ• ๋‹น/ํ•ด์ œ๋ฅผ ๋น ๋ฅด๊ฒŒ ํ•˜๊ธฐ ์œ„ํ•œ ๋ชฉ์ ์œผ๋กœ cpu๋‹น ๊ทธ๋ฆฌ๊ณ  ๋…ธ๋“œ ๋‹น ๋ณ„๋„์˜ ์บ์‹œ ๋“ฑ์„ ์œ ์‹œํ•œ๋‹ค. ๊ทธ๋Ÿฌ๋‹ค ๋ณด๋‹ˆ ์‹œ์Šคํ…œ์˜ ๊ทœ๋ชจ๊ฐ€ ์ปค์งˆ์ˆ˜๋ก ์Šฌ๋žฉ ํ• ๋‹น์ž ์ž์ฒด๋ฅผ ์œ„ํ•œ ๋ฉ”ํƒ€ ๋ฐ์ดํ„ฐ๊ฐ€ ์ปค์ง€๋Š” ๋ฌธ์ œ์ ์ด ์กด์žฌํ•œ๋‹ค. ๋˜ํ•œ ์บ์‹œ ๋‚ด์˜ ๊ฐ ์Šฌ๋žฉ์€ ์ž์‹ ์—๊ฒŒ ํ• ๋‹น๋œ ๊ณต๊ฐ„ ๋‚ด์— ๋ณ„๋„์˜ ๋ฉ”ํƒ€๋ฐ์ดํ„ฐ๋ฅผ ์œ ์ง€ํ•˜๋Š”๋ฐ ์ด ์—ญ์‹œ ์„ฑ๋Šฅ์ €ํ•˜์˜ ์›์ธ์ด ๋œ๋‹ค.

์ด๋Ÿฌํ•œ ๋ฌธ์ œ์ ์„ ํ•ด๊ฒฐํ•˜๊ธฐ ์œ„ํ•ด ๋“ฑ์žฅํ•œ ๊ฒƒ์ด slub ํ• ๋‹น์ž์ด๋‹ค. slub ํ• ๋‹น์ž๋Š” ๊ฐ ์Šฌ๋žฉ์— ๋ณ„๋„์˜ ๋ฉ”ํƒ€๋ฐ์ดํ„ฐ๋ฅผ ์œ ์ง€ํ•˜์ง€ ์•Š๋Š”๋‹ค. ๋Œ€์‹  ํŽ˜์ด์ง€ ๋ณ„๋กœ ํ• ๋‹น๋˜๋Š” page ๊ตฌ์กฐ์ฒด์— freelist, inuse, offset ํ•„๋“œ๋ฅผ ์ด์šฉํ•˜์—ฌ ์ด๋“ค์„ ๋งค์šฐ '๋‹จ์ˆœ'ํ•˜๊ฒŒ ๊ด€๋ฆฌํ•œ๋‹ค. ๋˜ํ•œ ํ• ๋‹น ์‹œ์—๋Š” ๋‹จ์ง€ ํ•ด๋‹น ์Šฌ๋žฉ์˜ ๋ฝ๋งŒ์„ ํš๋“ํ•˜๋ฉด ๋˜๋ฏ€๋กœ ์„ฑ๋Šฅํ–ฅ์ƒ ์—ญ์‹œ ๋„๋ชจํ•˜๊ณ  ์žˆ๋‹ค.

might_sleep <- memory ํ• ๋‹น ๋ฐ›์•„ ์˜ฌ ๋•Œ๋„ if (should_resched()) __cond_resched __schedule

zonelist ; node์— ์ƒ๊ด€ ์—†์ด ์ „์ฒด globalํ•œ zone ์ •๋ณด๋ฅผ ๊ฐ€์ง€๊ณ  ์žˆ๋Š” ์ž๋ฃŒ๊ตฌ์กฐ. build_zonelists์—์„œ ์ฑ„์›Œ์คŒ

#define high_wmark_pages(z) (z->watermark[WMARK_HIGH])

vm_total_pages = high watermark ์ดˆ๊ณผ๋ถ„ page_group_by_mobility_disabled

http://www.linux-arm.org/LinuxKernel/ARMKernelTrees http://kernelnewbies.org/Linux_2.6

http://git.kernel.org/cgit/linux/kernel/git/stable/linux-stable.git


  • page : virtual memory์—์„œ ๋ฌผ๋ฆฌ ๋ฉ”๋ชจ๋ฆฌ๋ฅผ ๊ด€๋ฆฌํ•˜๋Š” ์ตœ์†Œ ๋‹จ์œ„. MMU์— ์˜ํ•ด ๊ด€๋ฆฌ๋˜๋Š” ์ตœ์†Œ๋‹จ์œ„์ด๊ธฐ๋„ ํ•จ. struct page ํ•˜๋‚˜๊ฐ€ page์— ๋Œ€ํ•œ ์ •๋ณด๋ฅผ ๋‹ด๊ณ  ์žˆ์Œ
  • struct page๊ฐ€ ์ •์˜๋œ ๊ณณ : include/linux/mm_types.h struct page { unsigned long flags; atomic_t _count; }; flags
    • page ์ƒํƒœ (dirty, lock ์—ฌ๋ถ€ ๋“ฑ)๋ฅผ ํ‘œํ˜„ํ•จ
    • ๊ด€๋ จ ๋งคํฌ๋กœ๊ฐ€ ์ •์˜๋œ ๊ณณ : include/linux/page-flags.h _count
    • usage count๋ฅผ ์ €์žฅํ•˜๋Š”๋ฐ ์“ฐ์ž„.
    • ๊ด€๋ จ ํ•จ์ˆ˜๋“ค์ด ์ •์˜๋œ ๊ณณ : include/linux/mm.h

2013.10.05

์ถ”๊ฐ€๋กœ ๊ณต๋ถ€ํ•  ๊ฒƒ

  • HugeTLB
  • CompoundPage

page์˜ _mapcount _mapcount is the number of page-table entries pointing to the page. _count is a general reference count.


2013.10.12 mm_init kmem_cache_init ์—ฌ๊ธฐ์„œ __get_free_pages๋ฅผ ๋ถˆ๋ €๋Š”๋ฐ, ๋”ฐ๋ผ๊ฐ€๋ฉด buddy๋กœ๋ถ€ํ„ฐ ๋ฉ”๋ชจ๋ฆฌ๋ฅผ ํ• ๋‹น๋ฐ›๋Š” ๋ถ€๋ถ„์ด๋‹ค.

prep_new_page๋ถ€ํ„ฐ. pagefault_disable์€ inc_preempt_count๋ฅผ ํ•˜๊ณ , barrier. ์„ ์ ๋ถˆ๊ฐ€๋กœ ๋งŒ๋“ค์–ด ์ฃผ๋Š” ๊ฒƒ๊ณผ page fault์˜ ๊ด€๊ณ„๋Š”? page fault์™€ ๊ฐ™์€ exception๋„ user context (interrupt context์˜ ๋ฐ˜๋Œ€)๋กœ ๋ณผ ์ˆ˜ ์žˆ๋‹ค. ์™œ๋ƒ๋ฉด ์ด ๋•Œ ์ปค๋„์€ current๋ฅผ accessํ•˜๊ธฐ ๋•Œ๋ฌธ์ด๋‹ค. page fault ๋ฐœ์ƒ์‹œ handler๊ฐ€ ์‹คํ–‰๋˜๋Š” ๊ณผ์ •์€?

get_page_from_freelist
	buffered_rmqueue
		prep_new_page

==================================================================================

* kmap (arch/arm/mm/highmem.c)

kmap/kunmap์€ highmem ํŽ˜์ด์ง€๋“ค์„ ์ปค๋„์˜ virtual address space์— ๋งตํ•‘/ํ•ด์ œํ•  ๋•Œ ์‚ฌ์šฉ๋œ๋‹ค.
์ปค๋„์— ์žˆ์–ด์„œ์˜ ๊ด€์‹ฌ์‚ฌํ•ญ์ด๋‹ค. ์œ ์ €์ŠคํŽ˜์ด์Šค์˜ ํ”„๋กœ์„ธ์Šค์—๊ฒŒ๋Š” highmem์ด๋“  normal page๋“  page table์„ ๊ฑฐ์ณ๊ฐ€๊ธฐ ๋•Œ๋ฌธ์ด๋‹ค.
	- from โ€˜Professional Linux Kernel Architectureโ€™

pages_alloc ๋“ฑ์„ ํ†ตํ•ด ํŽ˜์ด์ง€ํ”„๋ ˆ์ž„์„ ์–ป์–ด์™€ ๊ฐ€์ƒ ์ฃผ์†Œ๋ฅผ kmap์„ ํ˜ธ์ถœํ•ด 
	์‚ฌ์šฉ๋ฐฉ๋ฒ• vaddr = kmap(page_ptr)
	http://kldp.org/node/137435


๋งคํ•‘ํ•จ์ˆ˜

kmap			; 
	- kmap_high	; page์— ๋Œ€ํ•œ VA๊ฐ€ ์กด์žฌํ•˜๋ฉด ๋ฐ”๋กœ ๋ฆฌํ„ด, ์—†์œผ๋ฉด map_new_virtual์„ ํ˜ธ์ถœํ•ด ๋ฆฌํ„ด.
			  ์ƒˆ๋กœ ๋งคํ•‘์ด ๋ถˆ๊ฐ€๋Šฅํ•  ๊ฒฝ์šฐ sleepํ•˜๋ฏ€๋กœ interrupt context์—์„œ ์‚ฌ์šฉํ•  ์ˆ˜ ์—†๋‹ค.
		- map_new_virtual
			- set_pte_at (pkmap_page_table์— ๋งคํ•‘. pkmap_page_table์€ kmap_init์—์„œ ํ• ๋‹นํ•ด๋‘ )
				- set_pte_ext
kmap_atomic		; interrupt context์—์„œ ์‚ฌ์šฉ ๊ฐ€๋Šฅ - sleep ํ•˜์ง€ ์•Š๋‹ค.
	- set_top_pte (Fixmap address)


์กฐํšŒ ํ•จ์ˆ˜

kmap_high_get		; ํŽ˜์ด์ง€ ํฌ์ธํ„ฐ๋ฅผ ๋ฐ›์•„ high memory์— ๋งตํ•‘ ๋˜์–ด ์žˆ์œผ๋ฉด VA๋ฅผ ๋ฆฌํ„ดํ•˜๊ณ , ์•„๋‹ˆ๋ฉด NULL์„ ๋ฆฌํ„ดํ•œ๋‹ค.




๊ด€๋ จ๋ฌธ์„œ
	kmap_atomic
	Good SMP scalability can be obtained by using kmap_atomic(), which is lockless. 
	http://linux-mm.org/HighMemory

	Understanding the Linux Kernel 8.1.6 Kernel Mappings of High-Memory Page Frames
	Documentation/arm/memory.txt
	Documentation/vm/highmem.txt

==================================================================================

kmap_init PKMAP_BASE์— pte table์„ ์ƒ์„ฑํ•ด ๋„ฃ๋Š”๋‹ค. kmap kmap_high <- kmap_atomic์—์„œ๋Š” ์™œ ์•ˆ ๋ถ€๋ฅผ๊นŒ? kmap_high๋Š” sleepable์ด๊ธฐ ๋•Œ๋ฌธ์ด๋‹ค. vaddr = page_address (page) ; struct page_address_map์—์„œ ์ฐพ์•„์„œ ๋ฆฌํ„ด. ๋งŒ๋“ค์–ด ์ฃผ๋Š” ๋ถ€๋ถ„์€? if (!vaddr) map_new_virtual

kmap_init /* PKMAP_BASE ์˜์—ญ์— ํ•ด๋‹นํ•˜๋Š” pmd entry์—, pte table์„ ์ƒ์„ฑํ•ด ์ฃผ์†Œ๋ฅผ ์ง€์ •ํ•œ๋‹ค */ pkmap_page_table = early_pte_alloc(pmd_off_k(PKMAP_BASE), PKMAP_BASE, _PAGE_KERNEL_TABLE);

PKMAP_BASE๋Š” kernel ์˜์—ญ ์‹œ์ž‘์ฃผ์†Œ - PMD(2MB).
  • early_pte_alloc pmd_off_k (PKMAP_BASE) : PKMAP_BASE๊ฐ€ ๊ฐ€๋ฆฌํ‚ค๋Š” ์ฃผ์†Œ์— ํ•ด๋‹นํ•˜๋Š” pmd์˜ ์œ„์น˜ (l1 table) PKMAP_BASE : ๋ฆฌํ„ดํ•˜๋Š” ์ฃผ์†Œ๋Š” pte table ์ค‘์—์„œ PKMAP_BASE๋ฅผ ๊ฐ€๋ฆฌํ‚ค๋Š” pte entry์˜ ์ฃผ์†Œ _PAGE_KERNEL_TABLE : pre entry์˜ ๋‚˜๋จธ์ง€ ์†์„ฑ ์ฃผ์†Œ

    1. pte table์„ ์œ„ํ•œ ๊ณต๊ฐ„์„ ํ• ๋‹น ๋ฐ›๋Š”๋‹ค.
    2. pmdp๊ฐ€ ๊ฐ€๋ฆฌํ‚ค๋Š” pmd entry์— pte table์˜ ์ฃผ์†Œ์™€ ์†์„ฑ์„ ์ฑ„์šด๋‹ค. ๋ฆฌํ„ดํ•˜๋Š” ์ฃผ์†Œ๋Š” pmdp๊ฐ€ ๊ฐ€๋ฆฌํ‚ค๋Š” ๊ณณ์—์„œ addr์— ํ•ด๋‹นํ•˜๋Š” [pte entry์˜ ์ฃผ์†Œ].

2013.10.19

o ์šฉ๋„ mmap, kmap_atomic ๋ชจ๋‘ high memory[VA]๋ฅผ ๋ฌผ๋ฆฌ ๋ฉ”๋ชจ๋ฆฌ์— mapping ์‹œํ‚ค๊ธฐ ์œ„ํ•ด ์‚ฌ์šฉ. ํŽ˜์ด์ง€ํ”„๋ ˆ์ž„ ํ• ๋‹น์€ ๋˜์–ด ์žˆ๋Š” ์ƒํƒœ์—ฌ์•ผ ํ•จ.

o lock kmap : ์ „์—ญ lock์„ ์‚ฌ์šฉ. ์ „์—ญ ์ž ๊ธˆ์„ ์‚ฌ์šฉํ•˜๋ฏ€๋กœ SMP ์‹œ์Šคํ…œ์—์„œ๋Š” ๋ณ‘๋ชฉ ํ˜„์ƒ์ด ์ผ์–ด๋‚  ์ˆ˜ ์žˆ์Œ. kmap_atomic : ์ „์—ญ lock์„ ์‚ฌ์šฉํ•˜์ง€ ์•Š์Œ. cpu ์ข…์†์ ์ธ ๊ณ ์ •๋œ ์ฃผ์†Œ์— ๋งคํ•‘.

o lock์— ์˜ํ•œ ์ฐจ์ด์  kmap : ์Šค์ผ€์ฅด๋ง ๋˜์–ด ๋‹ค๋ฅธ cpu์— ํ• ๋‹น๋˜์–ด๋„ ์•ˆ์ „ํ•จ. kmap_atomic : ๊ฐ™์€ cpu์—์„œ ๋™์ž‘ํ•˜๋Š” ๋‹ค๋ฅธ ํ”„๋กœ์„ธ์Šค๊ฐ€ ๊ฐ™์€ ์ฃผ์†Œ๋ฅผ ํ•„์š”๋กœ ํ•œ๋‹ค๋ฉด ๋งคํ•‘์„ ์‚ฌ์šฉํ•  ์ˆ˜ ์—†์Œ. ๋”ฐ๋ผ์„œ ์Šค์ผ€์ฅด ๋˜๋ฉด ์•ˆ ๋จ.

o ์‚ฌ์šฉ ๊ฐ€๋Šฅํ•œ ๊ณณ kmap : ์ธํ„ฐ๋ŸฝํŠธ ํ•ธ๋“ค๋Ÿฌ์—์„œ ์‚ฌ์šฉ ๋ถˆ๊ฐ€ kmap_atomic : ์ธํ„ฐ๋ŸฝํŠธ์—์„œ ์‚ฌ์šฉ ๊ฐ€๋Šฅ.

[์ถœ์ฒ˜] http://slgi97.egloos.com/10973585

-- Find the page of interest. -- struct page *page = find_get_page(mapping, offset); -- Gain access to the contents of that page. -- void *vaddr = kmap_atomic(page); -- Do something to the contents of that page. -- memset(vaddr, 0, PAGE_SIZE); -- Unmap that page. -- kunmap_atomic(vaddr);

์—ฌ๋Ÿฌ๊ฐœ๋ฅผ ์‚ฌ์šฉํ• ๋•Œ ์ฃผ์˜์‚ฌํ•ญ

vaddr1 = kmap_atomic(page1); vaddr2 = kmap_atomic(page2); memcpy(vaddr1, vaddr2, PAGE_SIZE); kunmap_atomic(vaddr2); kunmap_atomic(vaddr1);

http://verimuch.tistory.com/96 preempt_disable() preempt_count๋Š” 1 ์ฆ๊ฐ€ preempt_enable() preempt_count๋Š” 1 ๊ฐ์†Œ

kmap_atomic preemption_disable

preemption_enable kunmap_atomic

kmap_atomic ๋ถ„์„ ์ค‘โ€ฆ

memory.txt๋ฅผ ๋ณด๋ฉด fff00000 fffdffff fixmap ์˜์—ญ. fix_to_virt() ์— ์˜ํ•ด ๋ฆฌํ„ด๋˜๋Š” ์ฃผ์†Œ๊ฐ€ ์ด ์˜์—ญ์ด๋‹ค. 1M - 128K = 896K

physical address์— ์ƒ๊ด€ ์—†์ด virtual address๋ฅผ ๊ณ ์ •ํ•˜๋Š” ๊ฒƒ์„ fixmap์ด๋ผ ํ•œ๋‹ค. #define __fix_to_virt(x) (FIXADDR_START + ((x) << PAGE_SHIFT)) vaddr = __fix_to_virt(FIX_KMAP_BEGIN + idx);

/* vaddr์— ํ•ด๋‹นํ•˜๋Š” top_pte ์ฃผ์†Œ์— mk_pte(page, kmap_prot)๋ฅผ ์จ์ฃผ๊ณ , tlb flush */ set_top_pte(vaddr, mk_pte(page, kmap_prot));

pgd_offset_k <- pgd์—์„œ addr์„ index๋กœ ๋ณ€ํ™˜ํ•ด ํ•ด๋‹น pgd entry์˜ ์ฃผ์†Œ๋ฅผ ๋ฆฌํ„ด.

VA PA +--------+ +--------+ | | | | | | | | | | | | +--------+ +--------+

pgd -> pud -> pmd -> pte๋ฅผ ๊ฑฐ์ณ ๋ณ€ํ™˜๋˜๋Š”๋ฐ, virt๊ฐ€ ๋“ค์–ด์™”์„ ๋•Œ ์–ด๋–ค ๊ฐ’์—์„œ ์ฐพ์•„์•ผ ํ•˜๋Š”์ง€ ํŠน์ • entry๋ฅผ ์ฐพ๋Š”๋‹ค.

top_pmd๋ผ๋Š” ์ด๋ฆ„์„ ๋ถ™์ด๋Š” ์ด์œ ๊ฐ€ ๋ฌด์—‡์ผ๊นŒ? 0xffff0000 : reset vector์˜ ์‹œ์ž‘ ์ฃผ์†Œ. ์ด ์ฃผ์†Œ ๋ฒ”์œ„๋ฅผ ํฌํ•จํ•˜๋Š” pmd ์ฃผ์†Œ๊ฐ€ top_pmd์ด๋‹ค.

top_pmd = pmd_off_k(0xffff0000); /* The pmd table for the upper-most set of pages. */

xxx_off_k๋Š” pmd, pte table์—์„œ address ๋ณ€ํ™˜ ๊ฐ’์„ ๊ฐ–๊ณ  ์žˆ๋Š” entry์˜ ์ฃผ์†Œ๋ฅผ ๋ฆฌํ„ดํ•œ๋‹ค.

set_top_pte๋Š” top_pmd์—์„œ vaddr์— ๋Œ€ํ•œ pte entry์˜ ์ฃผ์†Œ๋ฅผ ์ฐพ๊ณ , ๊ทธ๊ณณ์— pte ๊ฐ’์„ ์จ์ค€๋‹ค. (fixmap ์ฃผ์†Œ์™€ ๊ฐ™์€ pmd์— ๋“ค์–ด ์žˆ์–ด์•ผ ํ•˜๋‚˜?)

set_top_pte(va, pte) { /* top_pmd๊ฐ€ ๊ฐ€๋ฆฌํ‚ค๋Š” pmd(high vector table์„ ํฌํ•จํ•˜๋Š” pmd entry์˜ ์ฃผ์†Œ)์—์„œ va์— ๋Œ€ํ•œ pte entry์˜ ์ฃผ์†Œ๋ฅผ ๋ฆฌํ„ดํ•œ๋‹ค */ pte_t *ptep = pte_offset_kernel(top_pmd, va); set_pte_ext(ptep, pte, 0); local_flush_tlb_kernel_page(va); }

set_top_pte๋Š” kmap_atomic์˜ ๋์—์„œ๋„ ํ˜ธ์ถœ. vaddr = __fix_to_virt(FIX_KMAP_BEGIN + idx); ๋กœ 0xfff00000UL์—์„œ cpu๋ณ„, type๋ณ„ ๊ฐ’์„ idx๋กœ ์‚ผ์•„ ํŠน์ • ์ฃผ์†Œ๋ฅผ ๋ฐ›์•„์˜จ๋‹ค. set_top_pte(vaddr, mk_pte(page, kmap_prot));

#define pte_index(addr) (((addr) >> PAGE_SHIFT) & (PTRS_PER_PTE - 1))

devicemaps_init() ์—์„œ vectors โ€ฆ

Documentation/cachetlb.txt tlb : VA -> PA ์ฃผ์†Œ ๋ณ€ํ™˜์— ๋Œ€ํ•œ cache. software page table๋กœ๋ถ€ํ„ฐ ์–ป์–ด์ง„๋‹ค. page table์ด ๋ณ€๊ฒฝ๋˜์—ˆ๋‹ค๋ฉด, TLB cache ์—ญ์‹œ ์˜ค๋ž˜๋œ cache ์ •๋ณด๋ฅผ ๊ฐ–๊ณ  ์žˆ์„ ๊ฒƒ์ด๋‹ค. ๋”ฐ๋ผ์„œ page table ๋ณ€๊ฒฝ์ด ๋ฐœ์ƒํ•˜๋ฉด, ์ปค๋„์€ page table ๋ณ€๊ฒฝ ๋’ค ์‚ฌ์šฉํ•  ๋ช‡ ๊ฐ€์ง€ flush ํ•จ์ˆ˜๋ฅผ ์ œ๊ณตํ•œ๋‹ค.

  1. void flush_tlb_all(void)
  2. void flush_tlb_mm(struct mm_struct *mm)
  3. void flush_tlb_range(struct vm_area_struct *vma, unsigned long start, unsigned long end)
  4. void flush_tlb_page(struct vm_area_struct *vma, unsigned long addr)
  5. void update_mmu_cache(struct vm_area_struct *vma, unsigned long address, pte_t *ptep)
  6. void tlb_migrate_finish(struct mm_struct *mm)

tlb flush ํ•จ์ˆ˜๋ฅผ ์ง์ ‘ ๋ถ€๋ฅด๊ธฐ๋„ ํ•˜๊ณ , arch/arm/include/asm/tlb.h ์ธํ„ฐํŽ˜์ด์Šค๋ฅผ ํ†ตํ•ด ํ˜ธ์ถœํ•˜๊ธฐ๋„ ํ•œ๋‹ค.

  • tlb flush ๊ณตํ†ต ์ธํ„ฐํŽ˜์ด์Šค arch/arm/include/asm/tlbflush.h

  • ์‹ค์ œ ํ˜ธ์ถœ๋˜๋Š” ARMv7์šฉ ํ•จ์ˆ˜ arch/arm/mm/tlb-v7.S

  • cache flush ๊ณตํ†ต ์ธํ„ฐํŽ˜์ด์Šค arch/arm/include/asm/cacheflush.h

  • ์‹ค์ œ ํ˜ธ์ถœ๋˜๋Š” ARMv7์šฉ ํ•จ์ˆ˜ arch/arm/mm/cache-v7.S

2013.10.26

kmap_atomic() ์ •๋ฆฌ๋ถ€ํ„ฐ ์‹œ์ž‘.

/* va์— ํ•ด๋‹นํ•˜๋Š” top_pte ์ฃผ์†Œ์— pte๋ฅผ ์จ์ฃผ๊ณ , tlb flush ํ•˜๋Š” ํ•จ์ˆ˜ / static inline void set_top_pte(unsigned long va, pte_t pte) / top_pmd(pmd entry ์ฃผ์†Œ)๋ฅผ ๋”ฐ๋ผ๊ฐ€ va์— ๋Œ€ํ•œ pte ์ฃผ์†Œ๋ฅผ ๊ฐ€์ ธ์˜จ๋‹ค */ pte_t ptep = pte_offset_kernel(top_pmd, va); / ptep์— ๋„˜์–ด์˜จ pte ๊ฐ’์„ ์จ์ค€๋‹ค / set_pte_ext(ptep, pte, 0); / pte๋ฅผ ๋ณ€๊ฒฝํ–ˆ์œผ๋ฏ€๋กœ tlb๋ฅผ flush ํ•œ๋‹ค */ local_flush_tlb_kernel_page(va);

hash ํ•จ์ˆ˜ struct hlist_head struct hlist_node

arch/arm/kernel/smp_tlb.c tlb_ops_need_broadcast โ€ฆ ์™œ?

dmb()์˜ ๊ฒฝ์šฐ CONFIG_SMP๋ฉด dmb(), (v6๊นŒ์ง€๋Š” co-ops) ๊ทธ๋ ‡์ง€ ์•Š์œผ๋ฉด compiler barrier๋กœ ์น˜ํ™˜๋œ๋‹ค. arch/arm/include/asm/barrier.h

dmb dsb isb

2013.11.02

ํ˜„์žฌ ์œ„์น˜ start_kernel mm_init kmem_cache_init /* slub */ __get_free_pages <- ํ˜„์žฌ ์—ฌ๊ธฐ์—์„œ ์•ˆ๋“œ๋กœ๋ฉ”๋‹ค๋กœ...

kmap kmap_high map_new_virtual ; ๋ถ„์„

MULTI_TLB

mm/proc-v7.S: __v7_proc __v7_ca9mp_setup ddd๋กœ ํ™•์ธํ•ด ๋ณด๋ฉด, tlb v7wbi_tlb_fns user v6_user_fns cache v7_cache_fns _cache_fns arch/arm/mm/proc-macros.S v7_flush_icache_all

2013.11.09

linux์—์„œ ๊ด€๋ฆฌํ•˜๋Š” ์†์„ฑ์˜ ์˜๋ฏธ๋Š”? L_PTE_PRESENT L_PTE_USER

set_pte_at ๋ณด๋Š” ์ค‘โ€ฆ set_pte_ext

CPU: PIPT / VIPT nonaliasing data cache, VIPT aliasing instruction cache

struct page ์˜ ๋ฉค๋ฒ„ struct address_space *mapping ; address_space๋Š” ๋ฌด์—‡?

https://kldp.org/node/122482

PG_swapcache swapper_space ; ๋ฌผ๋ฆฌ ๋ฉ”๋ชจ๋ฆฌ๊ฐ€ ๋ถ€์กฑํ•  ๊ฒฝ์šฐ swap Basically, an address space object is a kernel object for the purpose to manage various kinds of page cache

page_mapping

set_bit -> _set_bit ํ˜ธ์ถœ __setbit -> __set_bit ํ˜ธ์ถœ (non-atomic ๋ฒ„์ „)

flush_pfn_alias (cache_colour) Enhanced cache control operations using MCRR Clean and invalidate data cache range mcrr p15, 0, %1, %0, c14 MCRR ARM ๋ช…๋ น์–ด๋Š” ARMv6 ์ด์ƒ๊ณผ ARMv5์˜ E ๋ณ€ํ˜•์—์„œ ์‚ฌ์šฉํ•  ์ˆ˜ ์žˆ์Šต๋‹ˆ๋‹ค.

PoC : Coherency PoU : Unification

IS : Inner shareable <-> Outer shareable

lock_kmap() <- ์™œ irq ์ƒํƒœ๋ฅผ ์ €์žฅํ•˜์ง€ ์•Š๋Š”๊ฐ€? interrupt routine ์—์„œ ์‚ฌ์šฉํ•˜์ง€ ์•Š์„ ๊ฒƒ์ด๋ฏ€๋กœ? unlock_kmap()

page_address_slot() page_slot() ; struct page์— ๋Œ€ํ•œ hash table์˜ ์Šฌ๋กฏ์„ ๊ฐ€์ ธ์˜จ๋‹ค. page_address_htable[hash_ptr(page, PA_HASH_ORDER]; // 1 << PA_HASH_ORDER ๊ฐœ์˜ slot์œผ๋กœ ์ด๋ค„์ง. // page_address_slot์€ lock๊ณผ list_head๋กœ ๊ตฌ์„ฑ๋œ๋‹ค. <- ๊ฐ๊ฐ์— lock์„ ๊ฑฐ๋Š” ๊ฒƒ์ธ์ง€? // ์ด ์Šฌ๋กฏ์€ page_address_map์˜ list์— ๋‹ฌ๋ฆด ์ˆ˜ ์žˆ๋‹ค.

์ดˆ๊ธฐ ์ƒํƒœ : page_address_init() ์—์„œ page_address_pool ์— ๋“ฑ๋ก

set_page_address // add pool์—์„œ map์„ ์ œ๊ฑฐ, mapping ์ •๋ณด๋ฅผ ์ฑ„์›Œ hash table์˜ ํ•ด๋‹น slot list์— ์ถ”๊ฐ€ // remove hash table ์˜ ํ•ด๋‹น slot list์—์„œ ์ฐพ์•„ ์ œ๊ฑฐ, pool์˜ ๋์— ์ถ”๊ฐ€

2013.11.16

compound page <- prep_compound_page ์ฐธ๊ณ  [PG_head][PG_tail][PG_tail]โ€ฆ[PG_tail] _count=0 _count=0 _count=0

set_page_private(page, 0) <- private ํ•„๋“œ์˜ ์šฉ๋„??? set_page_private(page, mt) <- pcp๋กœ ๊ด€๋ฆฌ๋˜๋Š” page์˜ private์—๋Š” migratetype์„ ์ €์žฅํ•œ๋‹ค. (buddy์ธ ๊ฒฝ์šฐ order?)

balance_pgdat kswapd classzone_idx๊ฐ€ ์‚ฌ์šฉ๋˜๋Š” ๊ณณ

include/linux/gfp.h ์˜ gfp์†์„ฑ ์ฐธ๊ณ  get_page_from_freelist์—์„œ if ((alloc_flags & ALLOC_CPUSET) && !cpuset_zone_allowed_softwall(zone, gfp_mask)) continue;

cpuset (7) HARDWALL, SOFTWALL???

__get_free_pages /* common */ alloc_pages alloc_pages_node __alloc_pages __alloc_pages_nodemask <- โ€˜heartโ€™ of the zoned buddy allocator get_page_from_freelist <- ์ด ํ•จ์ˆ˜ ์™„๋ฃŒ __alloc_pages_slowpath

2013.11.23

__zone_watermark_ok

ALLOC_WMARK_MIN <- 0 ALLOC_WMARK_LOW <- 1 ALLOC_WMARK_HIGH <- 2

alloc_flags์—์„œ 2๊ฐœ์˜ ๋น„ํŠธ๋งŒ ๊ฒ€์‚ฌํ•œ๋‹ค. mark = zone->watermark[alloc_flags & ALLOC_WMARK_MASK];

======================================================= from Professional Linux Kernel (p.174)

pages_min, pages_high, and pages_low are "watermarks" used when pages are swapped out. The kernel can write pages to hard disk if insufficient RAM memory is available. These three elements influence the behavior of the swapping daemon. <--- kswapd

ใ… If more than pages_high pages are free, the state of the zone is ideal.

free pages > pages_high - ๋ฉ”๋ชจ๋ฆฌ๊ฐ€ ์ถฉ๋ถ„ํ•˜๋‹ค.

ใ… If the number of free pages falls below pages_low, the kernel begins to swap pages out onto the hard disk.

free pages < pages_low - ํ•˜๋“œ ๋””์Šคํฌ๋กœ swap out์„ ์‹œ์ž‘ํ•œ๋‹ค.

ใ… If the number of free pages falls below pages_min, the pressure to reclaim pages is increased because free pages are urgently needed in the zone. Chapter 18 will discuss various means of the kernel to find relief.

free pages < pages_min - ํŽ˜์ด์ง€ ํšŒ์ˆ˜๋ฅผ ๊ฐ•ํ•˜๊ฒŒ ์‹ค์‹œํ•œ๋‹ค.

=======================================================

#define wake_up_interruptible(x) __wake_up(x, TASK_INTERRUPTIBLE, 1, NULL)

kswapd kswapd_try_to_sleep ์—์„œ DEFINE_WAIT(wait)

#define DEFINE_WAIT(name) DEFINE_WAIT_FUNC(name, autoremove_wake_function) autoremove_wake_function default_wake_function try_to_wake_up

๊นจ์šฐ๋Š” ๊ณณ wakeup_kswapd wake_up_interruptible(&pgdat->kswapd_wait);

mm/page_alloc.c #define ALLOC_NO_WATERMARKS #define ALLOC_WMARK_MIN #define ALLOC_WMARK_LOW #define ALLOC_WMARK_HIGH #define ALLOC_HARDER #define ALLOC_HIGH #define ALLOC_CPUSET 0x01 /* donโ€™t check watermarks at all / 0x02 / use pages_min watermark / 0x04 / use pages_low watermark / 0x08 / use pages_high watermark / 0x10 / try to alloc harder / 0x20 / __GFP_HIGH set / 0x40 / check for correct cpuset */

10011 (supervisor)

Q. arm์ด interrupt mode์—์„œ ๋‹ค์‹œ interrupt๋ฅผ ๋ฐ›์„ ์ˆ˜ ์žˆ๋Š”๊ฐ€?

  • softirq #define softirq_count() (preempt_count() & SOFTIRQ_MASK)

arch/arm/kernel/entry-armv.S #ifdef CONFIG_MULTI_IRQ_HANDLER globl handle_arch_irq <- ๊ณต๊ฐ„์„ ๋งŒ๋“ค์–ด ์ฃผ๋Š” ๋ถ€๋ถ„

arch/arm/kernel/setup.c:setup_arch handle_arch_irq = mdesc->handle_irq;

arch/arm/kernel/smp.c:handle_IPI __inc_irq_stat

#define __inc_irq_stat(cpu, member) __IRQ_STAT(cpu, member)++

#define __IRQ_STAT(cpu, member) (irq_stat[cpu].member)

๊ฒ€์‚ฌํ•˜๋Š” ๋ถ€๋ถ„ #define local_softirq_pending()
__IRQ_STAT(smp_processor_id(), __softirq_pending)

__local_bh_disable((unsigned long)__builtin_return_address(0), SOFTIRQ_OFFSET); add_preempt_count(cnt);

autoremove_wake_function ์—์„œ๋Š” default_wake_function try_to_wake_up์—์„œ curr->private์— current๋ฅผ ์ „๋‹ฌ ๋ฐ›์•„ ๊นจ์›Œ์ฃผ๋Š”๋ฐ, ์›๋ž˜ ๊นจ์›Œ์ค„ task๋ฅผ ์ „๋‹ฌํ•˜๋Š” ๊ฒŒ ์•„๋‹Œ๊ฐ€?

__wake_up_common ์—์„œ list_for_each_entry_safe <- __wake_up_common์„ ์กฐ๊ธˆ ๋” ๋ณด์ž.

congested BDI โ€ฆ <- (๋ฐ€์ง‘ํ•œ, ํ˜ผ์žกํ•œ) backing device info http://lwn.net/Articles/326552/

page dirty flush ์—ญํ• ์„ ์ด๋ ‡๊ฒŒ ๋ณ€ํ•ด ์™”๋‹ค. bdflush -> pdflush -> per-BDI flusher

pdflush๋Š” dirty pages๋ฅผ ๋””์Šคํฌ๋กœ ์“ฐ๋Š” kernel thread์ธ๋ฐ, sync() ํ•จ์ˆ˜๋ฅผ ํ†ตํ•ด ๋ช…์‹œ์ ์œผ๋กœ ์ˆ˜ํ–‰๋˜๊ฑฐ๋‚˜, page cache๊ฐ€ ํŽ˜์ด์ง€๋ฅผ ๋‹ค ์†Œ๋น„ํ•˜๊ฑฐ๋‚˜, ๋ฉ”๋ชจ๋ฆฌ์— ๋„ˆ๋ฌด ์˜ค๋ž˜ ์žˆ๊ฑฐ๋‚˜ /proc/sys/vm/dirty_ration๋ฅผ ๋„˜์—ˆ์„ ๊ฒฝ์šฐ ๋ฌต์‹œ์ ์œผ๋กœ ์ˆ˜ํ–‰๋œ๋‹ค.


Misleading OOM messages [from] http://barriosstory.blogspot.kr/2009/05/misleading-oom-messages.html

"Failure to reclaim memory" "Unable to satisfy memory allocation request and not making progress reclaiming from other sources."

์‹œ์Šคํ…œ์˜ free page๊ฐ€ zone->watermark์ดํ•˜์ธ ์ƒํƒœ์—์„œ ํšŒ์ˆ˜๊ฐ€ ๋” ์ด์ƒ ์ง„ํ–‰๋˜์ง€ ๋ชปํ•œ ๊ฒฝ์šฐ์— ๋ฐœ์ƒ.

2013.11.30

io_schedule_timeout ; ์™œ io wait์„ ๊ณ ์ •์ ์ธ timeout ๊ฐ’๋™์•ˆ ๊ธฐ๋‹ค๋ฆฌ๋Š”์ง€? ์ด ๊ธฐ๊ฐ„๋™์•ˆ์€ TASK_UNINTERRUPTIBLE๋กœ ์ง„ํ–‰ํ•œ๋‹ค.

PREEMPT_ACTIVE <- ์ฆ๊ฐ€๋˜๋Š” ๊ณณ์€ __cond_resched, preempt_schedule_irq

struct thread_info, which holds all required processor-specific low-level information about the thread. struct task_struct

<sched.h> struct thread_info { โ€ฆ }

<include/linux/sched.h> union thread_union { struct thread_info thread_info; unsigned long stack[THREAD_SIZE/sizeof(long)]; };

2013.12.07

PF_MEMALLOC <- ํ˜„์žฌ ๋ฉ”๋ชจ๋ฆฌ๋ฅผ allocation ์ค‘์ด๋‹ค.

from โ€˜Professional linux kernel architectureโ€™

It is invoked after the PF_MEMALLOC flag has been set for the task to indicate to the remaining kernel code that all subsequent memory allocations are needed in the search for memory.

__perform_reclaim ์—์„œ

current->flags |= PF_MEMALLOC;		// task์˜ flags์— ์„ค์ •ํ•œ๋‹ค. Process Flag. ์„ ์–ธ์œ„์น˜๋„ include/linux/sched.h.

try_to_free_pages();

current->flags &= ~PF_MEMALLOC;

__alloc_pages_slowpath ์—์„œ

if (current->flags & PF_MEMALLOC)	// ์ด๋ฏธ reclaim์ด ์‹คํ–‰ ์ค‘์ด๋ผ๋ฉด ์žฌ๊ท€ ํ˜ธ์ถœ์„ ๋ง‰๊ธฐ ์œ„ํ•ด nopage๋กœ ์ด๋™ํ•œ๋‹ค.
	goto nopage;

It may be necessary for try_to_free_pages to allocate new memory for its own work. As this additional memory is needed to obtain fresh memory

__alloc_pages_slowpath ์—์„œ PF_MEMALLOC ์ด ์ผœ์ ธ ์žˆ์„ ๊ฒฝ์šฐ goto nopage. ์ฃผ์„์„ ๋ณด๋ฉด direct reclaim์˜ ์žฌ๊ท€๋ฅผ ๋ง‰๋Š”๋‹ค๋Š” ์˜๋ฏธ์ด๋‹ค. Avoid recursion of direct reclaim

zone_reclaim ์—์„œ PF_MEMALLOC ์ด ์ผœ์ ธ ์žˆ์„ ๊ฒฝ์šฐ goto nopage. scan์„ ์ง„ํ–‰ํ•˜์ง€ ์•Š๊ณ  ZONE_RECLAIM_NOSCAN์„ ๋ฐ˜ํ™˜.

/sys/fs/cgroup/cpuset

Frequency meterk ernel/cpuset.c

classzone_idx <- ๊ฒฐ๊ตญ zone idx์ธ๋ฐ classzone_idx๋Š” ๋ฌด์Šจ ์˜๋ฏธ์ธ๊ฐ€? highest

  • wakeup_kswapd
  • kswapd ์ž์ฒด์—๋„ classzone_idx๊ฐ€ ์žˆ๋‹ค. balanced_classzone_idx

https://lkml.org/lkml/2010/12/10/180 high-order ํ• ๋‹น์„ ์œ„ํ•ด kswapd๊ฐ€ ๊นจ์–ด๋‚ฌ์„ ๊ฒฝ์šฐ, kswapd๋Š” caller์— ์˜ํ•ด ์‚ฌ์šฉ ๊ฐ€๋Šฅํ•œ ๊ฐ€์žฅ ๋†’์€ zone์„ ์•Œ ์ˆ˜ ์žˆ๋‹ค. ํ• ๋‹น ์ค‘์— ์ด index๋Š” lowmem_reserve[]๋ฅผ ์„ ํƒํ•˜๋Š”๋ฐ ์‚ฌ์šฉ๋œ๋‹ค. (__zone_watermark_ok์— ๊ตฌํ˜„๋œ ๋‚ด์šฉ) ์ด lowmem_reserve๋Š” zone_watermark_ok์—์„œ watermark๋ฅผ ๊ณ„์‚ฐํ•˜๋Š”๋ฐ ์‚ฌ์šฉ๋  ๊ฒƒ์ด๋‹ค.

When kswapd is woken up for a high-order allocation, it takes account of the highest usable zone by the caller (the classzone idx). During allocation, this index is used to select the lowmem_reserve[] that should be applied to the watermark calculation in zone_watermark_ok().

node๋ฅผ balancingํ•  ๋•Œ, kswapd๋Š” ๊ฐ€์žฅ ๋†’์€ unbalanced zone์„ classzone index๋กœ ์ƒ๊ฐํ•œ๋‹ค. ์ด ๊ฐ’์€ ์ตœ์†Œ caller์˜ classzone_idx์ด๊ฑฐ๋‚˜ ๊ทธ๋ณด๋‹ค ํฐ ๊ฐ’์ด ๋  ๊ฒƒ์ด๋‹ค. ๊ทธ๋Ÿฌ๋‚˜ sleeping_prematurely() ํ•จ์ˆ˜๋Š” ๊ฐ€์žฅ ๋‚ฎ์€ zone์„ classzone index๋กœ ์‚ผ์„ ๊ฒƒ์ด๋‹ค.

When balancing a node, kswapd considers the highest unbalanced zone to be the classzone index. This will always be at least be the callers classzone_idx and can be higher. However, sleeping_prematurely() always considers the lowest zone (e.g. ZONE_DMA) to be the classzone index. This means that sleeping_prematurely() can consider a zone to be balanced that is unusable by the allocation request that originally woke kswapd. This patch changes sleeping_prematurely() to use a classzone_idx matching the value it used in balance_pgdat().

__alloc_pages_nodemask์—์„œ preferred_zone์„ ์ฑ„์šด๋‹ค. get_page_from_freelist์—์„œ classzone_idx = zone_idx(preferred_zone);

/*

  • This struct contains information about a zone in a zonelist. It is stored
  • here to avoid dereferences into large structures and lookups of tables */ struct zoneref { struct zone zone; / Pointer to actual zone / int zone_idx; / zone_idx(zoneref->zone) */ };

/*

  • One allocation request operates on a zonelist. A zonelist
  • is a list of zones, the first one is the 'goal' of the
  • allocation, the other zones are fallback zones, in decreasing
  • priority. โ€ฆ / struct zonelist { struct zonelist_cache zlcache_ptr; // NULL or &zlcache / 20130629
    • zoneref ๊ตฌ์กฐ์ฒด์˜ ๋ฐฐ์—ด์„ ์„ ์–ธ
    • 20130907
    • MAX_ZONES_PER_ZONELIST : ์ด NODE ์ˆ˜ * zone ์ข…๋ฅ˜ ๊ฐœ์ˆ˜ **/ struct zoneref _zonerefs[MAX_ZONES_PER_ZONELIST + 1];

/* pgdat์˜ node๋งˆ๋‹ค zonelists๋ฅผ ์ƒ์„ฑโ€ฆ */ build_all_zonelists() __build_all_zonelists set_zonelist_order() <-- UMA์ด๊ธฐ ๋•Œ๋ฌธ์— ZONELIST_ORDER_ZONE. build_zonelists() j = build_zonelists_node(pgdat, zonelist, 0, MAX_NR_ZONES - 1); (pgdat๋ฅผ ์ฐธ์กฐํ•ด ์„ค์ •๋œ zone๋“ค์„ zonelist์˜ zone_refs์— ์ถ”๊ฐ€ํ•œ๋‹ค.) zoneref_set_zone(zone, zonelist->_zonerefs[nr_zones++])

build_zonelists() local_node = pgdat->node_id;

j = build_zonelists_node(pgdat, zonelist, 0, MAX_NR_ZONES - 1); local_node + 1 ~ MAX_NUMNODES๊นŒ์ง€ build_zonelists_node 0 ~ local_node๊นŒ์ง€ build_zonelists_node

pgdat->classzone_idx

node_zones <- ์ž๊ธฐ ์ž์‹ ์˜ zone๋ชฉ๋ก node_zonelists <- ์ „์ฒด node์˜ zonelist

watermark๋Š” zone๋งˆ๋‹ค ์กด์žฌ. pfmemalloc_watermark_ok <- pfmemalloc๋„ watermark ์ฒดํฌ๋ฅผ ํ•œ๋‹ค.

#define min_wmark_pages(z) (z->watermark[WMARK_MIN]) #define low_wmark_pages(z) (z->watermark[WMARK_LOW]) #define high_wmark_pages(z) (z->watermark[WMARK_HIGH])

  • vm_event_item๋Š”? ์–ธ์ œ ํ™•์ธ ๊ฐ€๋Šฅํ•˜๊ณ  ์–ด๋–ค ๋ฐฑ๋ฐ์ดํ„ฐ๋กœ ์‚ฌ์šฉ๋  ์ˆ˜ ์žˆ๋ƒ?

20131214

#define zone_idx(zone) ((zone) - (zone)->zone_pgdat->node_zones)

zone_pgdat ; free_area_init_core์—์„œ zone์ด ์†ํ•œ node์˜ pgdat์— ๋Œ€ํ•œ ํฌ์ธํ„ฐ

add_wait_queue __add_wait_queue prepare_to_wait __add_wait_queue

shrinker?

http://lwn.net/Articles/414149/ https://www.google.co.kr/#newwindow=1&q=drivers+staging http://lwn.net/Articles/324279/

http://kernelnewbies.org/ http://linux-mm.org/ http://lwn.net/ http://www.tldp.org/LDP/tlk/tlk-toc.html

20131221

spin_lock_irq - local irq๋ฅผ disable ํ•˜๋˜, save๋Š” ํ•  ํ•„์š” ์—†๋‹ค. ์ธํ„ฐ๋ŸฝํŠธ ๋ฃจํ‹ด์—์„œ ์‚ฌ์šฉ๋˜์ง€ ์•Š๋Š” ๊ฒฝ์šฐ? spin_lock_irqsave - ์ด์ „ ์ƒํƒœ๋ฅผ saveํ•œ๋‹ค.

local_irq_disable

2014.01.04 __perform_reclaim try_to_free_pages do_try_to_free_pages shrink_zones <- ์ „์ฒด zone ์ค‘์—์„œ gfp_mask ์•„๋ž˜์˜ zone๋“ค์— ๋Œ€ํ•ด์„œ ์ˆ˜ํ–‰ shrink_zone lruvec = mem_cgroup_zone_lruvec shrink_lruvec(lruvec, sc) shrink_list โ€ฆ

struct page _count <- refcount _mapcount <- count of ptes mapped in mms, to show when page is mapped & limit reverse map searches

__pagevec_lru_add <- pvec์ด ๋‚˜ํƒ€๋‚ด๋Š” page๋“ค์„ zone์˜ ํƒ€์ž…๋ณ„ lru list๋กœ ์ด๋™์‹œํ‚จ๋‹ค. pagevec_lru_move_fn <- page->count๋Š” ๋งŒ๋“ค์–ด ์ง€๋Š” ์ˆœ๊ฐ„ 1, get_page ํ•˜๋ฉด +1. ์ด ํ•จ์ˆ˜๋Š” ํ•ญ์ƒ release_pages๋ฅผ ํ˜ธ์ถœํ•œ๋‹ค. release_pages๋Š” usage count๋ฅผ ๊ฐ์†Œ์‹œํ‚ค๊ณ , ๊ทธ ๊ฒฐ๊ณผ 0์ด ๋˜์—ˆ๋‹ค๋ฉด page๋ฅผ lru์—์„œ ์ œ๊ฑฐ. ์ฆ‰, ์ฐธ์กฐ๋˜์ง€ ์•Š๋Š”๋ฐ lru์—๋งŒ ๋‚จ์•„ ์žˆ์„ ๊ฒฝ์šฐ ๊ทธ๋Ÿฌ๋ฉด lru_move ํ•ด์ค„ ๋–„๋งˆ๋‹ค release ๋˜๋Š”๋ฐ?

_mapcount์˜ reset ๊ฐ’์€ -1, pagetables์— ๋งคํ•‘๋  ๋•Œ๋งˆ๋‹ค ์ฆ๊ฐ€์‹œ์ผœ์ฃผ๋Š” ๋ถ€๋ถ„์€?

page_add_file_rmap <- _mapcount page_add_new_anon_rmap

  • reverse mapping : Professional Linux Kernel Architecture 4.8 ์ฐธ๊ณ 

MEMCG๋ฅผ ์‚ฌ์šฉํ•˜์ง€ ์•Š๋Š”๋‹ค๋ฉด lruvec์€ zone ๋งˆ๋‹ค ํ•˜๋‚˜์”ฉ ์กด์žฌํ•œ๋‹ค. struct zone { // free_area_init_core์—์„œ ์ดˆ๊ธฐํ™” struct lruvec lruvec; };

struct lruvec { // lru list์— ๋Œ€ํ•œ vector์ด๋‹ค. struct list_head lists[NR_LRU_LISTS]; struct zone_reclaim_stat reclaim_stat; // recent_rotated, recent_scanned #ifdef CONFIG_MEMCG // reclaim ๊ณผ์ •์—์„œ ๋ช‡ ๊ฐœ active->inactive, struct zone *zone; // ๋ช‡ ๊ฐœ ์ฝ์—ˆ๋ƒ #endif };

๊ณผ๊ฑฐ lruvec์— ํ•ด๋‹นํ•˜๋Š” ๋‚ด์šฉ์ด struct zone ์— ์ง์ ‘ ํฌํ•จ๋˜์–ด ์žˆ์—ˆ๋˜ ๋“ฏ ํ•˜๋‹ค.

Professional Ch.3์˜ Memory Zones๋ฅผ ๋ณด๋ฉด ๋‚˜์™€ ์žˆ๋‹ค.

========================================================================================

2014.01.11

lru list์˜ ๋์—์„œ๋ถ€ํ„ฐ (cache์— ์˜ฌ๋ผ์˜ฌ ๊ฐ€๋Šฅ์„ฑ์ด ๊ฐ€์žฅ ๋‚ฎ์€ ๊ฒƒ๋ถ€ํ„ฐ) ํ•˜๋‚˜์”ฉ ๊ฐ€์ ธ์˜จ๋‹ค. head์— ๋„๋‹ฌํ–ˆ๋‹ค๋ฉด ๋” ์ด์ƒ prefetch ํ•˜์ง€ ์•Š๋Š”๋‹ค.

shrink_active_page | shrink_inactive_page isolate_lru_pages prefetchw_prev_lru_page // lru์˜ ๋์—์„œ๋ถ€ํ„ฐ data cache์— ์˜ฌ๋ฆฐ๋‹ค. __isolate_lru_page // lru์˜ ๋์—์„œ๋ถ€ํ„ฐ page๋“ค์„ ๋–ผ์–ด๋‚ธ๋‹ค (์ด ๋™์ž‘์ด scan์ด๋‹ค) // ๋–ผ์–ด๋‚ธ ํŽ˜์ด์ง€๋“ค์„ ํŒ๋‹จํ•ด page_referenced ํŽ˜์ด์ง€๋“ค์€ rotated๋กœ count. // ๊ทธ ์ค‘ ์‹คํ–‰ํŒŒ์ผ์˜ file cache page๋Š” l_active์— ์ €์žฅ // ๋‚จ์€ ํŽ˜์ด์ง€๋“ค์€ l_inactive์— ์ €์žฅ move_active_pages_to_lru(l_active) // l_active, l_inactive์˜ ํŽ˜์ด์ง€๋“ค์„ lruvec์˜ active, inactive lru list์— ์ถ”๊ฐ€(์•ž์ชฝ). move_active_pages_to_lru(l_inactive)// ํ•จ์ˆ˜ ๋‚ด์—์„œ page์˜ _count๋ฅผ ๊ฐ์†Œ์‹œ์ผœ 0์ด ๋˜๋ฉด, l_hold์— ๋„ฃ๋Š”๋‹ค. free_hot_cold_page_list // l_hold์— ๋“ฑ๋ก๋œ page๋“ค์„ cold page๋กœ free.

page๊ฐ€ active, inactive ์ƒํƒœ์ธ ๊ฒƒ์€ ๋ฌด์—‡์„ ์˜๋ฏธํ•˜๋‚˜?

scan_control์€ try_to_free_pages์— ๋“ค์–ด ์žˆ์Œ. -> ์ด ์†์„ฑ์„ ์ฐธ๊ณ ํ•ด isolate_mode๊ฐ€ ๊ฒฐ์ •๋œ๋‹ค. -> isolate_mode๋Š” __isolation_lru_page์˜ ์ง€์‹œ ์†์„ฑ์ด๋‹ค.

page->mapping struct address_space <- ์ด๊ฒŒ ๋ญ์ง€? struct address_space swapper_space

page์˜ refcount๋ฅผ ์˜๋ฏธํ•˜๋Š” _count ๋ฉค๋ฒ„๋ฅผ ์ฆ๊ฐ€์‹œํ‚ค๋Š” ๊ฒƒ์ด get_page..

isolate_lru_pages๋ฅผ ํ•˜๊ณ  ๋‚˜์„œ isolate ์ „๊นŒ์ง€ scanํ•œ ํŽ˜์ด์ง€ ์ˆ˜๋Š” zone->pages_scanned์— ์ €์žฅ /* reclaim ์ดํ›„ scanํ•œ ํŽ˜์ด์ง€์˜ ์ˆ˜ ๋ˆ„์  / isolate ํ•œ ํŽ˜์ด์ง€์˜ ์ˆ˜๋Š” reclaim_stat->recent_scanned; / ๋‹ค๋ฅธ ํ•ญ๋ชฉ์€ recent_rotated */

  • cond_resched์— ๋Œ€ํ•ด ๊ณต๋ถ€ํ•˜์ž
  • struct vm_area_struct

mlocked_vma_newpage -> vma์— lock์ด ๊ฑธ๋ ค ์žˆ์œผ๋ฉด page์˜ flag์— ๋ฐ˜์˜ํ•œ๋‹ค.

putback_lru_page
	if page๊ฐ€ evictableํ•˜๋‹ค๋ฉด
		lru_cache_add_lru (active lru์— ๋„ฃ์–ด์ค€๋‹ค)

page flags์˜ enum type.
	PG_active flag์— ๋Œ€ํ•œ function : PageActive, SetPageActive, ClearPageActive
	PG_referenced

rmap : referenced map <- ๋‹ค์Œ์ฃผ ๋ถ„์„ํ•˜๊ธฐ๋กœ ํ•จ!!!

========================================================================================

2014.01.18

o ํ˜„์žฌ config ๊ฒฐ๊ณผ๋กœ UML์ด๋‚˜ doxygen์œผ๋กœ ๋งŒ๋“ค์–ด ๋”ฐ๋ผ๊ฐˆ ์ˆ˜ ์žˆ๋Š” ๋ฐฉ๋ฒ•์€? ddd์—์„œ ๋ณด์—ฌ์ฃผ๋“ฏ์ด ์ „์ฒด์ ์ธ ๊ตฌ์กฐ๋ฅผ ๋†“๊ณ  ํ•œ ๋ˆˆ์— ํŒŒ์•…ํ•˜๊ณ  ์‹ถ์€๋ฐ ๊ทธ๋ฆผ ์—†๋‚˜?

mem_map : struct page *์˜ ์‹œ์ž‘ ์œ„์น˜๋ฅผ ๋‚˜ํƒ€๋‚ด๋Š” ์ „์—ญ๋ณ€์ˆ˜

o alloc_node_mem_map ์—์„œ ํ• ๋‹น o #define page_to_pfn __page_to_pfn #define __pfn_to_page(pfn) (mem_map + ((pfn) - ARCH_PFN_OFFSET)) ํŠน์ • pfn์ด ๋ฌผ๋ฆฌ ๋ฉ”๋ชจ๋ฆฌ์˜ ์‹œ์ž‘ pfn์—์„œ ์–ผ๋งˆ๋งŒํผ ๋–จ์–ด์ ธ ์žˆ๋Š”์ง€ offset์„ ๊ตฌํ•ด mem_map์œผ๋กœ๋ถ€ํ„ฐ ํ•ด๋‹น pfn์˜ struct page์˜ ์œ„์น˜๋ฅผ ๊ตฌํ•œ๋‹ค.

Understanding the Linux Kernel : 3. Processes

structure page ์ฃผ์š” ์†์„ฑ

  • flags Array of flags (see Table 8-2). Also encodes the zone number to which the page frame belongs.
  • _count Page frame's reference counter.
  • _mapcount Number of Page Table entries that refer to the page frame (-1 if none)
  • private Available to the kernel component that is using the page. If the page is free, this field is used by the buddy system.
  • mapping "Reverse Mapping for Anonymous Pages" in Chapter 17
  • index Used by several kernel components with different meanings. For instance, it identifies the position of the data stored in the page frame within the page's disk image or within an anonymous region (Chapter 15), or it stores a swapped-out page identifier ( Chapter 17).
  • lru Contains pointers to the least recently used doubly linked list of pages.

Professional Linux Kernel Architecture : 18.6 Page Reclaim

Figure 18-11 โ€œBig pictureโ€

Direct page reclaim : ๋ฉ”๋ชจ๋ฆฌ ๋ถ€์กฑ์œผ๋กœ ํ• ๋‹น์ด ์‹คํŒจํ•˜์˜€์„ ๊ฒฝ์šฐ ํ˜ธ์ถœ Swap Daemons : kswapd๋Š” NUMA node๋งˆ๋‹ค ํ•˜๋‚˜์”ฉ ์‹คํ–‰๋œ๋‹ค.

qemu๋กœ vexpress๋ฅผ emulation ํ–ˆ์„ ๊ฒฝ์šฐ zone์ด 1๊ฐœ ์žˆ๊ณ , kswapd๋„ ํ•˜๋‚˜ ๋Œ์•„๊ฐ„๋‹ค.

cat /proc/zoneinfo

Node 0, zone Normal โ€ฆ

ps -ef | grep kswapd

302 0 0:00 [kswapd0]

๊ณตํ†ต์œผ๋กœ ํ˜ธ์ถœํ•˜๋Š” ํ•จ์ˆ˜๋Š” shrink_zone์ด๋‹ค. shrink_active_list shrink_inactive_list - shrink_page_list

========================================================================================

2014.01.25

slab/slub/slob

  • slub์€ ๋ฌด์—‡์ธ๊ฐ€? slab allocator์—์„œ ๋ณ€ํ˜•๋œ ๋ฉ”๋ชจ๋ฆฌ ํ• ๋‹น์ž์ด๋‹ค. ๋ฆฌ๋ˆ…์Šค์˜ ์ฃผ์š” ๋ฉ”๋ชจ๋ฆฌ ๊ด€๋ฆฌ์ž๋Š” buddy์™€ slab์ด ์žˆ๋‹ค. buddy๋Š” page size ์ด์ƒ์˜ ํŽ˜์ด์ง€๋“ค์„ ๊ด€๋ฆฌํ•  ๋•Œ ์‚ฌ์šฉ๋˜๋ฉฐ, slab์€ ๊ทธ ์ดํ•˜ ํฌ๊ธฐ์˜ ๋ฉ”๋ชจ๋ฆฌ๋ฅผ ๊ด€๋ฆฌํ•  ๋•Œ ์‚ฌ์šฉ๋œ๋‹ค.

  • ์–ธ์ œ ์‚ฌ์šฉ๋˜๋‚˜?

  • ์ฃผ์š” api๋Š”?

  • ํ• ๋‹น๊ณผ ํ•ด์ œ๋Š” ์–ด๋–ป๊ฒŒ ์ด๋ฃจ์–ด์ง€๋‚˜?

  • performance์— ๋Œ€ํ•œ ์ธก์ •๊ณผ tuning point๋Š”?

    cat /proc/slabinfo cat /sys/kernel/slab/

์šฐ์„  ํ˜„์žฌ ๋ถ„์„ ์ค‘์ธ ์ปค๋„ ์†Œ์Šค๋ฅผ ๋ณด๋ฉด slub์ด๋ผ๋Š” ์ด๋ฆ„์„ ๊ฐ€์ง„ ํŒŒ์ผ์ด ๋‘ ๊ฐœ ์กด์žฌํ•œ๋‹ค. mm/slub.c * SLUB: A slab allocator that limits cache line use instead of queuing * objects in per cpu and per node lists. * * The allocator synchronizes using per slab locks or atomic operatios * and only uses a centralized lock to manage a pool of partial slabs.

include/linux/slub_def.h * SLUB : A Slab allocator without object queues.

free_area_init_node free_area_init_core mmemap_init == memmap_init_zone init_page_count(page); // _count : 1 reset_page_mapcount(page); // _mapcount : -1

__free_pages_bootmem ๋ชจ๋“  ํŽ˜์ด์ง€ _count๋Š” 0, ์ฒซ๋ฒˆ์งธ page์˜ _count๋งŒ 1๋กœ ๋งŒ๋“ค์–ด __free_pages ํ˜ธ์ถœ __free_pages(page, order);

__free_pages if (put_page_testzero(page)) { if (order == 0) free_hot_cold_page(page, 0); // percpu์— ์ถ”๊ฐ€ else __free_pages_ok(page, order); // buddy์— ์ถ”๊ฐ€ }

__free_pages_ok free_one_page __free_one_page

get_page atomic_inc(&page->_count); put_page if (unlikely(PageCompound(page))) else if (put_page_testzero(page)) __put_single_page(page);

__put_single_page __page_cache_release(page); free_hot_cold_page(page, 0);

struct zone flags ์ƒ์œ„ ๋น„ํŠธ๋“ค์—๋Š” zone ์ •๋ณด ๊ธฐ๋ก set_page_section set_page_zone set_page_node

page_zone page๋กœ zone ์ž๋ฃŒ๊ตฌ์กฐ ์ฐพ์•„์˜ฌ ๋•Œ ์‚ฌ์šฉ

wmark_min : reserved pages wmark_low : wmark_high :

alloc_pages ํŽ˜์ด์ง€ ๋””์Šคํฌ๋ฆฝํ„ฐ ๊ตฌ์กฐ์ฒด ์ฃผ์†Œ๋ฅผ ๋ฐ˜ํ™˜ __get_free_pages ์„ ํ˜•์ฃผ์†Œ๋ฅผ ๋ฐ˜ํ™˜, ๋”ฐ๋ผ์„œ ์„ ํ˜•์ฃผ์†Œ์™€ ๊ณ„์† ๋งคํ•‘๋˜์ง€ ์•Š๋Š” high memory ์˜์—ญ์—์„œ๋Š” ํ• ๋‹น ๋ฐ›์„ ์ˆ˜ ์—†๋‹ค.

Action Modifier Zone Modifier Type Flags

==================================================== HIGHMEM mapping

๊ฐ€์ƒ ๋ฉ”๋ชจ๋ฆฌ ์ค‘ highmemory ์˜์—ญ์„ page์™€ mapping ์‹œํ‚ค๋Š” ํ•จ์ˆ˜.

kmap // sleepable. interrupt routine์—์„œ ์‚ฌ์šฉ ๋ถˆ๊ฐ€. // PKMAP_BASE PAGE_OFFSET-1 ์‚ฌ์ด์— ๋งคํ•‘๋จ kmap_high map_new_virtual // ๋งคํ•‘ํ•  ์œ„์น˜๋ฅผ ์ฐพ์œผ๋ฉด pkmap_count 1 // set_pte_at pkmap_count++

kmap_high_get			// page๊ฐ€ ๋งคํ•‘ ๋˜์—ˆ๋‹ค๋ฉด ๋งคํ•‘๋œ VA๋ฅผ ๋ฆฌํ„ด. pkmap_count++

kmap_atomic // non-sleepable. interrupt routine์—์„œ ์‚ฌ์šฉ ๊ฐ€๋Šฅ // fff00000 fffdffff ์‚ฌ์ด์— ๋งคํ•‘๋จ // set_top_pte

pkmap_counter 0 ์•„๋ฌด๋„ ๋งคํ•‘ํ•˜๊ณ  ์žˆ์ง€ ์•Š๋‹ค 1 ๋งคํ•‘ํ•œ ๋’ค ํ•ด์ œํ•œ ์ƒํƒœ, tlb flush๋Š” ํ•˜์ง€ ์•Š์•„ ๋‹น๋ถ„๊ฐ„ ์‚ฌ์šฉํ•  ์ˆ˜ ์—†๋‹ค. 2>

page_address_htable ํŽ˜์ด์ง€ํ”„๋ ˆ์ž„๊ณผ ์˜๊ตฌ์ ์ธ ํŽ˜์ด์ง€ ๋งคํ•‘ ์‚ฌ์ด ์ถ”์ 

P.339 ๋ฉ”๋ชจ๋ฆฌ ์˜์—ญ ๊ด€๋ฆฌ ํ•  ์ฐจ๋ก€

========================================================================================

2014.02.08

cache์— ๋Œ€ํ•œ ์ž๋ฃŒ๊ตฌ์กฐ kmem_cache_t

struct kmem_cache { struct kmem_cache_cpu __percpu *cpu_slab; โ€ฆ struct kmem_cache_node *node[MAX_NUMNODES]; };

๋ชจ๊ธฐํ–ฅ์ฑ… p.390 ๋ฆฌ๋ˆ…์Šค ์ปค๋„์˜ ์ดํ•ด p.342 kmem_cache_init kmem_cache_open (โ€œkmem_cache_nodeโ€) kmem_cache_open (โ€œkmem_cacheโ€)

kmem_cache_alloc
create_kmalloc_cache("kmalloc-96โ€)
create_kmalloc_cache("kmalloc-192โ€)
create_kmalloc_cache("kmallocโ€)		// 2์˜ ๋ฐฐ์ˆ˜

/* kmem_cache slab์ด ์ดˆ๊ธฐํ™” ๋œ ์ดํ›„์— ํ˜ธ์ถœ */ kmem_cache_create __kmem_cache_create kmem_cache_destroy

kmem_cache_open์€ ์ตœ์ดˆ kmem_cache ๋งค์ปค๋‹ˆ์ฆ˜์„ ์‹œ์ž‘ํ•˜๊ธฐ ์œ„ํ•œ ๋™์ž‘. kmem_cache_init๊ณผ create_kmalloc_cache์—์„œ ํ˜ธ์ถœ๋จ. kmem_cache_init์€ ์ดˆ๊ธฐํ™” ๊ณผ์ •์—์„œ kmem_cache_node์™€ kmem_cache๋ฅผ ์ƒ์„ฑ.

kmem_cache_create runtime์‹œ์— ์ƒˆ๋กœ์šด kmem_cache๋ฅผ ์ƒ์„ฑ

create_kmalloc_cache kmem_cache_alloc kmem_cache_open

oo_make <- ์ƒ์œ„ 16๋น„ํŠธ์— order๋ฅผ ์ €์žฅ, ํ•˜์œ„ 16๋น„ํŠธ์— objects์˜ ์ˆ˜๋ฅผ ์ €์žฅ

slab_destroy kmem_freepages free_pages

์ธํ„ฐํŽ˜์ด์Šค kmem_cache_alloc slab_alloc kmem_cache_free slab_free

virt_to_head_page ์‚ฌ์šฉ ์ด์œ  : ํŠน์ • object๊ฐ€ ํฌํ•จ๋œ struct page *๊ฐ€ ๋ฆฌํ„ด๋˜๋Š”๋ฐ, ์ด๊ฒŒ ์Šฌ๋žฉ์„ ์˜๋ฏธํ•œ๋‹ค.

kmem_getpages alloc_pages

struct kmem_cache { struct kmem_cache_cpu __percpu cpu_slab; โ€ฆ int cpu_partial; / Number of per cpu partial objects to keep around */ };

struct kmem_cache_cpu { void *freelist; / Pointer to next available object / unsigned long tid; / Globally unique transaction id */ struct page page; / The slab from which we are allocating */ struct page partial; / Partially allocated frozen slabs */ };

struct kmem_cache_node { };

kmem_cache_alloc __kmalloc slab_alloc // slab_alloc์€ __kmalloc_node, __kmalloc ๋“ฑ์—์„œ๋„ ํ˜ธ์ถœ๋จ c->freelist // fast-path์ธ ๊ฒฝ์šฐ kmem_cache_cpu์˜ freelist๊ฐ€ ๊ฐ€๋ฆฌํ‚ค๋Š” object๋ฅผ ๋ฆฌํ„ด __slab_alloc // freelist์— object๊ฐ€ NULL์ธ ๊ฒฝ์šฐ, ๋˜๋Š” node ์ •๋ณด๊ฐ€ ๋ถˆ์ผ์น˜์ธ ๊ฒฝ์šฐ local_irq_save. new_slab_objects get_partial // ์ด๊ฒƒ์ด ์‹คํŒจํ–ˆ์„ ๊ฒฝ์šฐ new_slab์„ ํ˜ธ์ถœ new_slab

	early_kmem_cache_node_alloc	// slub allocator๊ฐ€ ์ดˆ๊ธฐํ™” ๋˜๊ธฐ ์ „์—๋Š” ์ง์ ‘ ํ˜ธ์ถœ
		new_slab		// struct page * ๋ฆฌํ„ด. 
			allocate_slab	// slab์„ ํ• ๋‹นํ•œ๋‹ค.
				alloc_slab_page	// page allocator๋กœ๋ถ€ํ„ฐ page๋ฅผ ํ• ๋‹น
			inc_slabs_node

enum slab_state { // slab allocator์˜ ์ƒํƒœ DOWN, โ€ฆ };

kmemcheck https://www.kernel.org/doc/Documentation/kmemcheck.txt

CONFIG_SLUB_STATS

// page๊ฐ€ ๊ฐ€๋ฆฌํ‚ค๋Š” memory์˜ virtual address ์ฃผ์†Œ start = page_address(page); last = start;

for_each_object(p, s, start, page->objects) { setup_object(s, page, last); set_freepointer(s, last, p); last = p; }

new_slab์˜ ๋™์ž‘ - ๊ฐ object์˜ freepointer๋ฅผ ์„ค์ •ํ•œ๋‹ค.

- page->freelist = start;	// ์ฒซ๋ฒˆ์งธ free object์˜ ์œ„์น˜๋กœ ์„ค์ •
- page->inuse = page->objects;	// inuse๋Š” ์‚ฌ์šฉ ์ค‘์ธ object์˜ ๊ฐœ์ˆ˜
- page->frozen = 1;		// ์ตœ์ดˆ์— frozen

slab object์˜ offset : offset ๊ฐ’์€ metadata์˜ ํฌ๊ธฐ๋งŒํผ์ด๊ณ , ์ดํ›„์— freepointer๊ฐ€ ์œ„์น˜ํ•œ๋‹ค.

kmem_cache_node๋Š” ๋ฌด์—‡์„ ๊ด€๋ฆฌํ•˜๊ธฐ ์œ„ํ•œ ๊ฒƒ์ธ๊ฐ€? - ๋‹ค์Œ๊ณผ ๊ฐ™์€ ์ธํ„ฐํŽ˜์ด์Šค ํ•จ์ˆ˜๋“ค์ด ์žˆ๋‹ค. get_node

kmem_cache ๊ตฌ์กฐ์ฒด์˜ ๋งˆ์ง€๋ง‰ ํ•ญ๋ชฉ์ด ๋‹ค์Œ๊ณผ ๊ฐ™๋‹ค. struct kmem_cache_node *node[MAX_NUMNODES]; node์˜ ๊ฐœ์ˆ˜๋งŒํผ kmem_cache_node์˜ pointer๋ฅผ ๊ฐ–๊ณ  ์žˆ๋‹ค.

early_kmem_cache_node_alloc

kmem_cache kmem_cache_cpu ์ž์ฒด ์ž๋ฃŒ๊ตฌ์กฐ kmem_cache_node

runtime์‹œ์— ํ˜ธ์ถœํ•˜๋Š” ํ•จ์ˆ˜๋Š” kmalloc_node __kmalloc_node

get_node ; kmem_cache๋ฅผ ๋ฐ›์•„ ํ•ด๋‹น node์— ๋Œ€ํ•œ kmem_cache_node๋ฅผ ๋ฆฌํ„ด

========================================================================================

20140301

์ด์ „ ๋ถ„์„์ฝ”๋“œ์˜ ๊ฐ’์„ ddd๋กœ ํ™•์ธ

pcpu_build_alloc_info alloc_size : 32KB ai->unit_size = alloc_size / upa;

pcpu_embed_first_chunk ํ˜ธ์ถœ์‹œ PAGE_SIZE๋ฅผ atom_size๋กœ ์‚ผ๋Š”๋‹ค.

pcpu_size_to_slot size : 16 pcpu_unit_size : 32768

slot์ด๋ผ๋Š” ๊ฐœ๋…์€? ์™œ size๋ฅผโ€ฆ slot:2 pcpu_nr_slots:15

struct pcpu_chunk { struct list_head list; /* linked to pcpu_slot lists */ โ€ฆ };

pcpu_setup_first_chunk ์—์„œ ์ „์—ญ ๋ณ€์ˆ˜๋“ค์— ๋Œ€ํ•œ ์„ค์ •์„ ํ•œ๋‹ค.

static int pcpu_unit_pages __read_mostly; static int pcpu_unit_size __read_mostly; static int pcpu_nr_units __read_mostly; static int pcpu_atom_size __read_mostly; static int pcpu_nr_slots __read_mostly; static size_t pcpu_chunk_struct_size __read_mostly;

PCPU_SLOT_BASE_SHIFT unit 32๊ฐœ๊ฐ€ ํ•˜๋‚˜์˜ slot

pcpu_slot ; list head์— ๋Œ€ํ•œ ๋ฐฐ์—ด. pcpu_setup_first_chunk ์—์„œ slot์˜ ๊ฐœ์ˆ˜๋งŒํผ list_head๋ฅผ ํ• ๋‹น

pcpu_size_to_slot ๋Š” chunk์˜ free_size๋กœ slot index๋ฅผ ๋ฐ˜ํ™˜ํ•œ๋‹ค.
ํ• ๋‹น/ํ•ด์ œ๊ฐ€ ์ด๋ฃจ์–ด์งˆ ๋•Œ๋งˆ๋‹ค pcpu_chunk_relocate๊ฐ€ ํ˜ธ์ถœ๋˜์–ด ํ•„์š”ํ•œ ๊ฒฝ์šฐ chunk๊ฐ€ ์—ฐ๊ฒฐ๋˜๋Š” slot์„ ๊ฐฑ์‹ ํ•œ๋‹ค.

 [*|*][*|*][*|*][*|*][*|*] . . . [*|*]

chunk | {[|]} | {[|]} | {[|]} ..

========================================================================================

20140308

ํ˜„์žฌ ๋ถ„์„ ์œ„์น˜ kmem_cache_init kmem_cache_open โ€ฆ init_kmem_cache_nodes alloc_kmem_cache_cpus __alloc_percpu // <โ€” percpu์˜ ์„ธ๊ณ„๋กœ ๋น ์ง

__alloc_percpu pcpu_alloc mutex_lock(&pcpu_alloc_mutex) spin_lock_irqsave(&pcpu_lock, flags)

pcpu_populate_chunk pcpu_alloc_pages ; page ํ• ๋‹น pcpu_map_pages ; page์™€ vmalloc address range mapping __pcpu_map_pages map_kernel_range_noflush vmap_page_range_noflush

โ€” vmalloc์€ percpu ๋ถ„์„ ํ›„ ๋ถ„์„ํ•˜๊ธฐ๋กœ ํ•จ -

vmap_page_range_noflush vmap_pud_range vmap_pmd_range

__addr_to_pcpu_ptr(chunk->base_addr + off)

pcpu_first_chunk ; 0x810ba100 chunk->base_addr ; 0x8109b000

pcpu_base_addr : first chunk์—์„œ pcpu๊ฐ€ ํ• ๋‹น๋ฐ›์€ ๊ฐ€์žฅ ์ž‘์€ ๊ฐ’ (8109b000)

80844000 __per_cpu_load 80844000 __per_cpu_start (.data..percpu) 80844000 cpu_loops_per_jiffy

chunk nr_unit

pcpu_alloc_area percpu ํ• ๋‹น

========================================================================================

20140315

_this_cpu_generic_cmpxchg_double transaction id์™€ ๋น„๊ตํ•  ๋‹ค๋ฅธ ๋ณ€์ˆ˜ 2๊ฐœ์— ๋Œ€ํ•ด cmpxchg. if (pcpu1 == oval1 && pcpu2 == oval2) pcpu1 = nval1, pcpu2 = nval2;

preemption enable ์ƒํƒœ, interrupt disable ์ƒํƒœ (spinlock์„ ์‚ฌ์šฉํ•˜์ง€๋Š” ์•Š์•˜์Œ)

kmem_cache_open์€ handmade function. kmem_cache ๊ตฌ์กฐ์ฒด์˜ ๋ณธ์ฒด ์„ค์ •, init_kmem_cache_nodes ์ดˆ๊ธฐํ™”, alloc_kmem_cache_cpus ์ดˆ๊ธฐํ™”. free_kmem_cache_nodes๋Š” node ๊ฐœ์ˆ˜๋งŒํผ kmem_cache_free๋ฅผ ํ•˜๊ณ  ๋ณ€์ˆ˜๋ฅผ NULL๋กœ ์ดˆ๊ธฐํ™” ํ–ˆ๋‹ค.

+โ€”โ€”โ€”โ€”โ€”โ€”โ€”โ€”โ€”โ€”โ€”โ€”โ€”โ€”โ€”+ | * | โ€”โ€”โ€” struct kmem_cache_cpu +โ€”โ€”โ€”โ€”โ€”โ€”โ€”โ€”โ€”โ€”โ€”โ€”โ€”โ€”โ€”+ | | | | | | | | | | | | | | | | +โ€”โ€”โ€”โ€”โ€”โ€”โ€”โ€”โ€”โ€”โ€”โ€”โ€”โ€”โ€”+ | * | * | * | * | โ€”โ€”โ€” ๊ฐ๊ฐ์€ struct kmem_cache_node +โ€”โ€”โ€”โ€”โ€”โ€”โ€”โ€”โ€”โ€”โ€”โ€”โ€”โ€”โ€”+

init_kmem_cache_nodes์—์„œ๋Š” slab_state๊ฐ€ DOWN์ธ ๊ฒฝ์šฐ
	early_kmem_cache_node_alloc์œผ๋กœ node๋ฅผ ๋ฐ›์•„์˜จ๋‹ค.



virt_to_head_page
	: object๊ฐ€ ์†ํ•ด ์žˆ๋Š” slab์„ ๋ฆฌํ„ดํ•œ๋‹ค.
	: slab instance๋ฅผ struct page*๋กœ ํ‘œํ˜„ํ•œ๋‹ค.
new_slab
slab_alloc
slab_free

kmem_cache_init

struct kmem_cache ์ž์ฒด๋„ slub allocator๋ฅผ ํ†ตํ•ด ํ• ๋‹น ๋ฐ›๋Š”๋‹ค.
ํ•˜์ง€๋งŒ ์•„์ง slub allocator๋ฅผ ์‚ฌ์šฉํ•  ์ˆ˜ ์—†์œผ๋ฏ€๋กœ, kmem_cache_open์„ ํ†ตํ•ด ์ดˆ๊ธฐํ™” ํ•œ๋‹ค.

/* kmem_cache_open for slab_state == DOWN. */
kmem_cache = (void *)__get_free_pages(GFP_NOWAIT, order);	// buddy๋กœ๋ถ€ํ„ฐ ๋ฉ”๋ชจ๋ฆฌ ํ• ๋‹น

kmem_cache_node = (void *)kmem_cache + kmalloc_size;

kmem_cache_open(kmem_cache_node, "kmem_cache_node",		// kmem_cache_open์œผ๋กœ kmem_cache_node๋ฅผ ๊ด€๋ฆฌํ•˜๋Š” struct kmem_cache ์ดˆ๊ธฐํ™”
    sizeof(struct kmem_cache_node),
    0, SLAB_HWCACHE_ALIGN | SLAB_PANIC, NULL);

slab_state = PARTIAL;

temp_kmem_cache = kmem_cache;
kmem_cache_open(kmem_cache, "kmem_cache", kmem_size,
    0, SLAB_HWCACHE_ALIGN | SLAB_PANIC, NULL);
kmem_cache = kmem_cache_alloc(kmem_cache, GFP_NOWAIT);		// slab instance ์ƒ์„ฑ
memcpy(kmem_cache, temp_kmem_cache, kmem_size);

temp_kmem_cache_node = kmem_cache_node;
kmem_cache_node = kmem_cache_alloc(kmem_cache, GFP_NOWAIT);
memcpy(kmem_cache_node, temp_kmem_cache_node, kmem_size);




kmem_cache_create	; ์ƒˆ๋กœ์šด kmem_cache struc ์ƒ์„ฑ
kmem_cache_alloc	; slub์œผ๋กœ๋ถ€ํ„ฐ kmem_cache object ํ•˜๋‚˜ ์ƒ์„ฑ

	โ€œkmem_cache_nodeโ€
	โ€œkmem_cacheโ€

kmem_cache_free

========================================================================================

20140322

kmem_cache_create __kmem_cache_create s = kmalloc kmem_cache_open (s) sysfs_slab_add // <โ€” /sys/kernel/slab

pcpu_create_chunk pcpu_alloc_chunk

========================================================================================

20140329

linux/mm/vmalloc.c - Support of BIGMEM added by Gerhard Wichert - SMP-safe vmalloc/vfree/ioremap - Major rework to support vmap/vunmap - Numa awareness

/*** Page table manipulation functions ***/

vmalloc/vmap/ioremap o Documentation/arm/memory.txt VMALLOC_START VMALLOC_END-1

o Professional Linux Kernel Architecture
	3.5.7 Allocation of Discontiguous Pages in the Kernel

o Understanding the Linux Kernel
	8.3. Noncontiguous Memory Area Management

vmalloc ๊ด€๋ จ - barrios ์ •๋ฆฌ๋ฌธ์„œ https://app.box.com/shared/bss1x8cq5x

========================================================================================

20140405

  • memblock์˜ ๋ฌผ๋ฆฌ๋ฉ”๋ชจ๋ฆฌ๋ฅผ mapping ํ•˜๋Š” ๋ถ€๋ถ„

setup_arch paging_init map_lowmem create_mapping

vm_map_ram
	vb_alloc or alloc_vmap_area
	vmap_page_range
		if fail, vm_unmap_ram

vm_unmap_ram
	vb_free
		vunmap_page_range

========================================================================================

20140412

new_vmap_block

vmap_block : vmap_area๋ฅผ block ๋‹จ์œ„๋กœ ๊ด€๋ฆฌ vmap_block_queue : per_cpu_var.

========================================================================================

20140419

purge_fragmented_blocks vmap/vunmap ์ •๋ฆฌ

mm_init โ€ฆ vmalloc_init(); // vmlist์— ๋“ฑ๋ก๋˜์–ด ์žˆ๋˜ entry๋ฅผ vmap_area๋กœ ์ƒ์„ฑํ•ด ๋“ฑ๋ก

struct vmap_area

__vmalloc __vmalloc_node __vmalloc_area_node

vmap __vmalloc_area_node map_vm_area (vmap, vmalloc ๊ณตํ†ต) vmap_page_range vmap_page_range_noflush (๊ณตํ†ต)

pcpu_alloc pcpu_populate_chunk pcpu_alloc_pages pcpu_map_pages __pcpu_map_pages map_kernel_range_noflush vmap_page_range_noflush (๊ณตํ†ต)

vmalloc __vmalloc_node_flags __vmalloc_node __vmalloc_node_range __get_vm_area_node

__get_vm_area_node vmlist๋ฅผ ์ˆœํšŒํ•˜๋ฉฐ ์ ํ•ฉํ•œ entry๋ฅผ ์ฐพ๋Š”๋‹ค.

free_unmap_vmap_area_noflush unmap_vmap_area vunmap_page_range free_vmap_area_noflush if (vmap_lazy_nr > lazy_max_pages()) try_purge_vmap_area_lazy

try_purge_vmap_area_lazy __purge_vmap_area_lazy purge_fragmented_blocks_allcpus purge_fragmented_blocks

free_vmap_block

new_vmap_block

vmap_block free, dirty alloc_map dirty_map

========================================================================================

20140426

HZ	- 1์ดˆ์— ๋ฐœ์ƒํ•˜๋Š” tick ์ˆ˜
jiffies -

sched_class : callback function (handler) interface
	kernel/sched/sched.h ์— ์ •์˜

kernel/sched ์•„๋ž˜ scheduler์— ๋”ฐ๋ผ ๊ฐ๊ฐ ํ•จ์ˆ˜๊ฐ€ ์กด์žฌํ•œ๋‹ค.
	CONFIG์— ๋”ฐ๋ผ ํŠน์ • ์Šค์ผ€์ฅด๋Ÿฌ๊ฐ€ ํฌํ•จ๋˜์ง€ ์•Š๋Š”๋‹ค.
	obj-y += core.o clock.o idle_task.o fair.o rt.o stop_task.o

runqueue์˜ root domain
	sched_class์˜ ๊ด€๊ณ„๋Š”?


root domain




NO_HZ : ์Šค์ผ€์ค„๋ง ํด๋Ÿญ tick์„ ์ค„์ด๊ธฐ ์œ„ํ•œ ๋ฐฉ๋ฒ•
https://www.kernel.org/doc/Documentation/timers/NO_HZ.txt

CONFIG_NO_HZ=y		; ์ผ๋ฐ˜์ ์ธ ๊ฒฝ์šฐ. idle CPU์—์„œ scheduling tick์˜ ๋ฐœ์ƒ์„ ๋ง‰๋Š”๋‹ค.
CONFIG_NO_HZ=n		; vexpress์˜ ๊ฒฝ์šฐ. scheduler tick์˜ ๋ฐœ์ƒ์„ ๋ง‰์ง€ ์•Š๋Š”๋‹ค.
CONFIG_NO_HZ_FULL	; rt์˜ ๊ฒฝ์šฐ ์‚ฌ์šฉ๊ฐ€๋Šฅ. CPU์—์„œ idle ์ƒํƒœ์ด๊ฑฐ๋‚˜ ํ•˜๋‚˜์˜ task๋งŒ ์‹คํ–‰ ๊ฐ€๋Šฅํ•œ ๊ฒฝ์šฐ tick์˜ ๋ฐœ์ƒ์„ ๋ง‰๋Š”๋‹ค.


start_kernel์˜ ์ดˆ๊ธฐโ€ฆ
tick_init();		; clockevent notifier๋ฅผ ๋“ฑ๋กํ•œ๋‹ค.
boot_cpu_init();	; ๋ถ€ํŒ…์‹œ ์‚ฌ์šฉ๋œ cpu๋ฅผ cpu masks์— ๋“ฑ๋กํ•œ๋‹ค.



SCHED_SOFTIRQ
	bottom halfs์˜ ์ข…๋ฅ˜ : softirq, tasklet, workqueue


CONFIG_HOTPLUG์ธ ๊ฒฝ์šฐ cpuinit/cpuinitdata ๋“ฑ ์˜์—ญ์— ์ €์žฅ๋จ.

	register_cpu_notifier๋กœ ๋“ฑ๋ก๋œ notifier block์„ ํ˜ธ์ถœํ•˜๋Š” ๊ณณ์€?


fair.c

========================================================================================

20140510

struct mm_struct
	struct vm_area_struct *mmap;
struct vmap_area
	va_start, va_end ; address ๊ตฌ๊ฐ„ ์ •๋ณด์™€ rb_node, list์— ๋“ฑ๋กํ•  ์ˆ˜ ์žˆ๋Š” ํฌ์ธํŠธ๋ฅผ ๊ฐ–๊ณ  ์žˆ๋‹ค.
struct vm_struct
	vmalloc, ioremap, pcpu_create_chunk ๋“ฑ์„ ํ†ตํ•ด mapping๋˜๋Š” ์ฃผ์†Œ์— ๋Œ€ํ•œ ๊ตฌ์กฐ์ฒด.
	struct page **pages์™€ vaddr์„ ๋ฉค๋ฒ„๋กœ ๊ฐ–๊ณ  ์žˆ๋‹ค.

struct vm_area_struct



Documentation/filesystems/sysfs.txt
fs/sysfs/symlink.c
/sys/kernel/slab ์•„๋ž˜ ์˜ˆ๋ฅผ ๋“ค์–ด kmem_cache ํ•ญ๋ชฉ์„ ์‚ดํŽด๋ณด๋ฉด ๋จ



/* Walk a vmap address to the struct page it maps. */
struct page *vmalloc_to_page(const void *vmalloc_addr)
	; ๊ถ๊ธˆํ•œ ์ ์€ pcpu_addr_to_page์—์„œ ์ด ํ•จ์ˆ˜๋ฅผ ํ˜ธ์ถœํ•œ๋‹ค.

pcpu_map_pages
pcpu_set_page_chunk(pages[pcpu_page_idx(cpu, i)], chunk)

์ฒซ๋ฒˆ์งธ chunk์ธ ๊ฒฝ์šฐ ๊ณ ์ •๋œ ์ฃผ์†Œ ๊ณต๊ฐ„์— ๋งŒ๋“ค์–ด ์ง€์ง€๋งŒ, ๋‹ค๋ฅธ chunk๋“ค์€ ๋™์ ์œผ๋กœ ์ƒ์„ฑ๋œ๋‹ค.
mapping ๋˜๋Š” page์˜ index์— ํ•ด๋‹นํ•˜๋Š” chunk ๊ตฌ์กฐ์ฒด์˜ ์ฃผ์†Œ๋ฅผ ์ ์–ด์ค€๋‹ค.

__pcpu_ptr_to_addr(ptr) <- ์ดํ•ด๊ฐ€ ํ•„์š”ํ•˜๋‹ค.


slub์˜ slowpath (alloc, free ๋ชจ๋‘)๋ฅผ ๊ณต๋ถ€ํ•˜์ž.
pcpu_create_chunk ๋ถ„์„.
	1. chunk ๊ตฌ์กฐ์ฒด๋ฅผ ํ• ๋‹น ๋ฐ›๊ณ  ์ดˆ๊ธฐํ™”
	2. chunk๊ฐ€ ์‚ฌ์šฉํ•  data ๊ณต๊ฐ„์„ ํ• ๋‹น ๋ฐ›์Œ (vm_struct)
	3. chunk์˜ ์ฃผ์†Œ


pcpu ptr์„ ํ•ด์ œํ•  ๋•Œ๋Š” map ์ •๋ณด๋งŒ update ๋˜๊ณ ,
์‹ค์ œ ๋ฉ”๋ชจ๋ฆฌ ํšŒ์ˆ˜์ธ pcpu_recliam์€ workqueue์— ๋“ฑ๋ก๋˜์–ด ํ˜ธ์ถœ๋œ๋‹ค.

free_partial ๋ถ„์„ ํ•„์š”

slow-path
	kmem_cache_alloc ๋“ฑ -> slab_alloc -> __slab_alloc
	kmem_cache_free ๋“ฑ -> slab_free -> __slab_free

free_slab -> __free_slab


RCU
rcutree.c:call_rcu_sched (= call_rcu)


์ฃผ์†Œ ๋ณ€ํ™˜

static __always_inline void *lowmem_page_address(const struct page *page) return __va(PFN_PHYS(page_to_pfn(page)));

========================================================================================

20140517 struct idr struct idr_layer

clockevents and dynticks
http://lwn.net/Articles/223185/
	CONFIG_NO_HZ_FULL


page frame์— ๋Œ€ํ•œ ๊ด€๋ฆฌ
	- bootmem allocator
	- buddy allocator
	- slab allocator (policy : slub)
	- 

kmem_cache_alloc
kmalloc/kzalloc

========================================================================================

20140524

try_to_free_pages
	/* direct reclaim์— ํ•ด๋‹นํ•˜๋Š” ๋ถ€๋ถ„ */
	do_try_to_free_pages
		shrink_zones
			shrink_zone
				shrink_lruvec
					shrink_list
						shrink_active_list

						; ์ตœ๋Œ€ nr_to_scan๋งŒํผ isolate ํ–ˆ๋‹ค๊ฐ€ ์ ๋‹นํ•œ lru๋กœ ๋‹ค์‹œ putback.
						shrink_inactvie_list
							shrink_page_list


wait queue ์ด์šฉ
prepare_to_wait()
io_schedule_timeout() or something
finish_wait()

wait_on_page_bit ๋ถ„์„ํ•ด์•ผ ํ•จ
page_referenced ์ถ”ํ›„๋ถ„์„


lru list - active/inactive (eviction ๊ฐ€๋Šฅํ•œ page๋“ค์„ ๋“ฑ๋กํ•œ list)

shrink_active_list

shrink_inactive_list


KSM: Kernel Samepage Merging


page_lock_anon_vma๊ฐ€ ๋ณต์žกํ•œ ์ด์œ โ€ฆ?
anon_vma_free์—์„œ anon_vma_lock, anon_vma_unlock์„ ํ˜ธ์ถœํ•˜๋Š” ์ด์œ ๋Š”?

========================================================================================

20140531

* rmap : reverse mapping (Professional Linux Kernel Architecture 4.8)
	mm/rmap.c - physical to virtual reverse mappings

page table : virtual -> physical

ํŠน์ • page๋ฅผ ์ฐธ๊ณ ํ•˜๋Š” ๋ชจ๋“  ํ”„๋กœ์„ธ์Šค์˜ ํŽ˜์ด์ง€ ํ…Œ์ด๋ธ” ์—”ํŠธ๋ฆฌ ์ •๋ณด์— ๋Œ€ํ•œ ์ž๋ฃŒ๊ตฌ์กฐ.
swap-out๋  ๋•Œ ํ•ด๋‹น ํŽ˜์ด์ง€๋ฅผ ์ฐธ๊ณ ํ•˜๋Š” ํŽ˜์ด์ง€ ํ…Œ์ด๋ธ”์— ๊ธฐ๋ก๋˜์–ด์•ผ ํ•œ๋‹ค.

1. When a page is โ€˜mappedโ€™,
   it is associated with a process but โ€˜need not necessarily be in active useโ€™.

2. The number of โ€˜references to a pageโ€™ indicates how actively the page is used.
   In order to determine this number, the kernel must first establish a link
   between a page and all its users and must then resort to a few tricks to find
   out how actively the page is used.


struct page {
	โ€ฆ
	atomic_t _mapcount;
	โ€ฆ
};

_mapcount๋Š” page๋ฅผ ๊ณต์œ ํ•˜๋Š” point๊ฐ€ ์–ผ๋งˆ๋‚˜ ๋˜๋Š”์ง€๋ฅผ ๋‚˜ํƒ€๋‚ธ๋‹ค.
์ตœ์ดˆ๊ฐ’ -1์„ ๊ฐ€์ง€๋ฉฐ, reverse mapping ์ž๋ฃŒ๊ตฌ์กฐ์— ์ถ”๊ฐ€๋˜๋ฉด์„œ 0์ด ๋˜๊ณ , ์ถ”๊ฐ€๋กœ mapping๋  ๋•Œ๋งˆ๋‹ค 1์”ฉ ์ฆ๊ฐ€ํ•œ๋‹ค.


try_to_unmap ; page๊ฐ€ mapping๋œ ๋ชจ๋“  process์˜ pagetable์—์„œ page๋ฅผ unmap ํ•œ๋‹ค.
	try_to_unmap_anon
		try_to_unmap_one
	try_to_unmap_file
		try_to_unmap_one
	


__do_page_fault
	handle_mm_fault
		handle_pte_fault
			pte_mkyoung


* page flags
	pageflags๋กœ ์‚ดํŽด๋ณธโ€ฆ http://studyfoss.egloos.com/5512112
	Professional Linux kernel	chap.3 ํ•˜๋‹จ ํŽ˜์ด์ง€ ๋ฒˆํ˜ธ๋กœ 152.

========================================================================================

20140607

The Page Frame Reclaimation Algorithm (PFRA)

RSS : anon + file ์œผ๋กœ ์‚ฌ์šฉ ์ค‘์ธ page ์ˆ˜

Q. ์—ฌ๋Ÿฌ process์— ์˜ํ•ด ๊ณต์œ ๋˜๋Š” ๋ฉ”๋ชจ๋ฆฌ๋„ swap์˜ ๋Œ€์ƒ์ด ๋˜๋Š” ๊ฒƒ์ธ๊ฐ€?

Q. anon_vma? ์ƒ์„ฑํ•ด์„œ ๋„ฃ์–ด์ฃผ๋Š” ๋ถ€๋ถ„์€?

	The anon_vma element serves as a pointer to the management structure that is associated with
	each list and comprises a list head and an associated lock.

anon_vma_chain์— ๋“ฑ๋กํ•˜๋Š” ๊ณผ์ •์„ ์‚ดํŽด๋ด์•ผ ํ•œ๋‹ค. (ํ˜„์žฌ๊นŒ์ง€ ๋ฃจํ‹ด์—์„œ๋Š” ๋‚˜์˜ค์ง€ ์•Š์•˜์Œ)
	anon_vma_init์€ ์ถ”ํ›„ ๋“ฑ์žฅ.

dup_mm
	dup_mmap
		anon_vma_fork
			anon_vma = anon_vma_alloc();
			avc = anon_vma_chain_alloc(GFP_KERNEL);
			anon_vma->root = pvma->anon_vma->root;
			get_anon_vma(anon_vma->root);
			anon_vma_chain_link(vma, avc, anon_vma);

anon_vma_chain_link
	list_add(&avc->same_vma, &vma->anon_vma_chain);
	list_add_tail(&avc->same_anon_vma, &anon_vma->head);




Understanding the Linux Kernel
	ch17.2.1 Reverse Mapping for Anonymous Pages
Professional Linux Kernel Architecture
	ch16. Page and Buffer Cache

swap ๊ด€๋ จ ๋‚ด์šฉ (ULK 17.4.6  The Swap Cache)
	mm/swap_state.c
	struct address_space swapper_space;



try_to_unmap_file์˜ ๊ฒฝ์šฐ
	prio_tree_iter๋ฅผ ์ˆœํšŒํ•œ๋‹ค.
	Priority Search Tree (PST)

try_to_free_swap

try_to_rlease_page
	if   mapping->a_ops->releasepage
	else try_to_free_buffers
		drop_buffers

Q. is_page_cache_freeable์˜ ์˜๋ฏธ๋Š”?
Q. page flag ์ค‘ swapbacked?

UTLK ch15. Page Caches ๋ด์•ผํ•จ (address_space ํฌํ•จ)
	radix tree
writeback ๋ด์•ผํ•จ.

filemap์ด๋ž€?
pagemap? https://www.kernel.org/doc/Documentation/vm/pagemap.txt



try_to_unuse
	swap_writepage

========================================================================================

2014.06.14 handle_pte_fault !pte_present pte_none do_anonymous_page pte_file ; L_PTE_FILE do_nonlinear_fault

PageAnon
	page->mapping & PAGE_MAPPING_ANON	; user VMA์— mapping ๋˜์—ˆ์„ ๋•Œ

/**

  • page_is_file_cache - should the page be on a file LRU or anon LRU?
  • @page: the page to test
  • Returns 1 if @page is page cache page backed by a regular filesystem,
  • or 0 if @page is anonymous, tmpfs or otherwise ram or swap backed.
  • Used by functions that manipulate the LRU lists, to sort a page
  • onto the right LRU list.
  • We would like to get this info without a page flag, but the state
  • needs to survive until the page is last deleted from the LRU, which
  • could be as far down as __page_cache_release. / /* 20140104
  • page๊ฐ€ file LRU / anon LRU์— ์†ํ•˜๋Š” ๊ฒƒ์ธ์ง€ ๋ฆฌํ„ด.
  • page๊ฐ€ regular filesystem์— ๊ทผ๊ฑฐํ•œ page cache page๋ผ๋ฉด 1, ๊ทธ๋ ‡์ง€ ์•Š๋‹ค๋ฉด 0์„ ๋ฆฌํ„ด **/ static inline int page_is_file_cache(struct page *page) { return !PageSwapBacked(page); }

========================================================================================

2014.06.21

IPI message

์†ก์‹ ๋ถ€


smp_call_function_single
	- ๋ชฉ์ ์ง€ cpu๊ฐ€ ํ˜„์žฌ cpu๋ผ๋ฉด ์ธํ„ฐ๋ŸฝํŠธ๋ฅผ ๋ง‰์€ ์ƒํƒœ์—์„œ ๋ฐ”๋กœ ์‹คํ–‰

	- ๊ทธ๋ ‡์ง€ ์•Š๋‹ค๋ฉด generic_exec_single๋ฅผ ํ†ตํ•ด IPI_CALL_FUNC_SINGLE์„ ๋‚ ๋ฆฐ๋‹ค.

enum ipi_msg_type {
    IPI_TIMER = 2,
    IPI_RESCHEDULE,
    IPI_CALL_FUNC,
    IPI_CALL_FUNC_SINGLE,
    IPI_CPU_STOP,
};


์ˆ˜์‹ ๋ถ€
handle_IPI

	case IPI_CALL_FUNC:
		irq_enter();
		generic_smp_call_function_interrupt();
		irq_exit();

	case IPI_CALL_FUNC_SINGLE:
		irq_enter();
		generic_smp_call_function_single_interrupt();
		irq_exit();




/* kernel/rcu.h ๋ถ„์„ํ•ด์•ผ ํ•จ */
struct rcu_dynticks

set_task_cpu

rcutree.c
	rcu_irq_enter
	rcu_irq_leave




/* vexpress์˜ ๊ฒฝ์šฐ ct_ca9x4_init_cpu_map ์—์„œ gic_raise_softirq ๋“ฑ๋ก */
smp_cross_call(&mask, IPI_CPU_STOP);

========================================================================================

2014.06.28

_mod_timer, lock_timer_base, del_timer_sync depth๊ฐ€ ๊นŠ์–ด ์ถ”ํ›„ ๋ถ„์„.
kernel/timer.c
	del_timer_sync

์ฒ˜์Œ ALLOW_WMARK_LOW ์ฃผ๊ณ , get_page_from_freelist๋ฅผ ์‹œ๋„.
	get_page_from_freelist์—์„œ๋Š” ๋ฉ”๋ชจ๋ฆฌ๋ฅผ ํ• ๋‹นํ•  zone์„ ๊ฐ€์ ธ์˜ฌ ๋•Œ dirty page๊ฐ€ limit์— ๋„๋‹ฌํ•œ zone์—์„œ๋Š” ์‹œ๋„ํ•˜์ง€ ์•Š๋Š”๋‹ค.

__alloc_pages_slowpath

#define ALLOC_WMARK_MIN  : 
#define ALLOC_WMARK_LOW  :
#define ALLOC_WMARK_HIGH : 

free pages ๊ฐ’์ด water mark ์•„๋ž˜๋กœ ๋–จ์–ด์ง€๋ฉด alloc fail.
ALLOC_HARDER : water mark ๊ฐ’์ด ๋” ์ปค์ง„๋‹ค.
ALLOC_HIGH   : water mark ๊ฐ’์ด ํฌ๋‹ค.

========================================================================================

20140705

get_mems_allowed, put_mem_allowed์—์„œ read/write seqlock์„ ์‚ฌ์šฉํ•œ๋‹ค.

seqlock - linux/seqlock.h
read critical section  : 
write critical section : 


kmalloc
__get_pages_from_freelist


* __builtin_constant_p ๋Š” ์ธ์ˆ˜๊ฐ€ ์ปดํŒŒ์ผ ์‹œ ์ƒ์ˆ˜๋กœ ์•Œ๋ ค์ ธ ์žˆ์œผ๋ฉด ์ •์ˆ˜ 1์„ ๋ฐ˜ํ™˜ํ•˜๊ณ  ๊ทธ๋ ‡์ง€ ์•Š์œผ๋ฉด 0์„ ๋ฐ˜ํ™˜ํ•ฉ๋‹ˆ๋‹ค.

========================================================================================

20140712

lib/idr.c

	objects์— integer ID๋ฅผ ํ• ๋‹นํ•˜๊ณ , id๋กœ object๋ฅผ ์ฐพ์„ ๋•Œ ์‚ฌ์šฉ.
	kernel address space์— ์กด์žฌํ•˜๋Š” timer ๊ฐ์ฒด์— ๋Œ€ํ•ด user address space์—์„œ ์ ‘๊ทผํ•  ๋•Œ,
	id๋ฅผ ์ œ๊ณตํ•˜์—ฌ ๊ฐ์ฒด์— ์ ‘๊ทผํ•  ์ˆ˜ ์žˆ๋‹ค.

	- idr_get_new(struct idr *idp, void *ptr, int *id)
		; assign a new ID for the pointer ptr and return it via id

	- void *idr_find(struct idr *idp, int id)
		; return the pointer corresponding to id

	- void idr_remove(struct idr *idp, int id)
		; clear the entry for id


	- sub_alloc / sub_remove

struct idr_layer { unsigned long bitmap; /* A zero bit means "space here" */ struct idr_layer __rcu ary[1<<IDR_BITS]; int count; / When zero, we can release it / int layer; / distance from leaf */ struct rcu_head rcu_head; };

struct idr { struct idr_layer __rcu *top; // IDR layer์˜ ์ตœ์ƒ๋‹จ. tree์˜ root. struct idr_layer id_free; // int layers; / only valid without concurrent changes */ int id_free_cnt; spinlock_t lock; };

o kernel/events/core.c

	perf_event_init

		idr_init (&pmu_idr)			; pmu_idr์„ ์ดˆ๊ธฐํ™”.

		perf_pmu_register (pmu, name, type)	; pmus์— pmu๋ฅผ ๋“ฑ๋ก.
			idr_pre_get (&pmu_idr)		; idr_layer ๊ตฌ์กฐ์ฒด๋ฅผ id_free์— ๋ฆฌ์ŠคํŠธ๋กœ ์ฑ„์›Œ ๋„ฃ๋Š”๋‹ค.
			idr_get_new_above (&pmu_idr)

		pmuํ•˜๋‚˜์— pmu_idr์ด ํ•˜๋‚˜์”ฉ ์ƒ์„ฑ๋˜๊ณ , ๊ฐ event type๋งˆ๋‹ค integer ๊ฐ’์ด ์ƒ์„ฑ๋œ๋‹ค.


	perf_event_open					; SYSCALL๋กœ ์ œ๊ณต. fd๊ฐ€ ๋ฆฌํ„ด๋˜๊ณ , ioctl (perf_ioctl) ๋ช…๋ น์„ ๋ฐ›์„ ์ˆ˜ ์žˆ๋Š” index๋กœ ์‚ฌ์šฉ๋œ๋‹ค.

		perf_event_alloc

			perf_init_event
				pmu = idr_find(&pmu_idr, event->attr.type)	; user์—์„œ attr๋กœ ๋„˜์–ด์˜จ pmu_idr๋กœ mapping ํ–ˆ๋˜ pmu๋ฅผ ๋ฐ›์•„์˜จ๋‹ค.
				pmu->event_init(event);


o arch/arm/kernel/perf_event.c


	perf_cpu_context (per cpu)
		perf_event_context (ctx, *task_ctx ํ•˜๋‚˜์”ฉ ์กด์žฌ)



o sys์— device ๋“ฑ๋ก

	device_initialize
	dev_set_name
	dev_set_drvdata
	device_add

	put_device

o perf package ์„ค์น˜ ํ›„ ๋ช…๋ น ์‹คํ–‰
	perf list
	sudo perf top



sub_alloc ๋ถ„์„ ์ค‘.
	pa๋Š” 0 ~ MAX_LEVEL-1. 0์€ leaf.

	pa [leaf]โ€ฆ[top][NULL]๋Š” ๋‹ค์Œ ์ˆœ์„œ๋กœ ์ฑ„์›Œ์ง.

	starting_id๊ฐ€ 65๋ผ ๊ฐ€์ •. idp->layers๋Š” 2๊ฐ€ ๋œ๋‹ค.
	id => 65
	p => top
	l => layers - 1 = 1

	n = (id >> (IDR_BITS*l)) & IDR_MASK;	// 2 (0~31, 32~63)
	bm = ~p->bitmap;			// | | | | | โ€ฆ | | | n์„ ์ œ์™ธํ•œ ๋ชจ๋‘ 1์ด๋ผ ๊ฐ€์ •.
	m = find_next_bit(&bm, IDR_SIZE, n);	// n:2, m:3

	if (m != n) {
		sh = IDR_BITS*l;		// sh:5
		id = ((id >> sh) ^ n ^ m) << sh; // (2 ^ 2 ^ 3) << 5 = 3 * 32 = 96
	}

sub_remove ๋ถ„์„ ์ค‘.
	์ตœ๋Œ€ bit shift ๊ฐ’๊ณผ ์‚ญ์ œํ•  id๋กœ ํ˜ธ์ถœ๋œ๋‹ค.
	p  [*] โ€”โ€”โ€”> idp->top
	paa [*]โ€”โ€”โ€”> [**]     // pa์˜ layer๋‹น path ์ƒ์˜ ๊ฐ node๋ฅผ ๊ฐ€๋ฆฌํ‚ด.
	pa [NULL][&idp->top][์‚ญ์ œํ•  id๊ฐ€ ์†ํ•œ ํ•˜์œ„ layer ์œ„์น˜][**][**][**][**]

	์˜ˆ) ์‚ญ์ œํ•  id: 65, shift: 5

20140719

perf_pmu_register

register_cpu_notifier

blocking_notifier_chain_register

struct swevent_htable
	struct swevent_hlist

kernel/event.core.c

2014.08.16

note quiescent state
note_new_gpnum

๋‹ค๋ฅธ grace period

rcu_assign_pointer


	rcu_start_gp_per_cpu

rcu_preempt_boost_start_gp



softirq handler, rcu_kthread
	rcu_process_callbacks	/* Do RCU core processing for the current CPU. */
		for_each_rcu_flavor(rsp)
			__rcu_process_callbacks
				rcu_process_gp_end
					__rcu_process_gp_end
				rcu_check_quiescent_state
				if (cpu_has_callbacks_ready_to_invoke(rdp))
					invoke_rcu_callbacks(rsp, rdp);

invoke_rcu_core
	raise_softirq(RCU_SOFTIRQ);



CONFIG_RCU_BOOST์ผ ๊ฒฝ์šฐ์—๋งŒ early_initcall(rcu_spawn_kthreads)๋กœ ํ˜ธ์ถœ.
rcu_spawn_kthreads
	kthread_run(rcu_kthread, โ€œrcu_kthreadโ€);




synchronize_rcu() : wait until a grace period has elapsed.
	PREEMPTIVE : 
	non-PREEMPTIVE : 
call_rcu() : callback form of synchronize_rcu()


rcu_start_gp ํ˜ธ์ถœ ์‹œ์ 
	1. rcu_report_qs_rsp

	2. force_quiescent_state

	3. rcu_prepare_for_idle

	4. call_rcu_sched, call_rcu_bh
	   call_rcu, kfree_call_rcu

		__call_rcu
			__call_rcu_core ์—ฌ๊ธฐ


rcu_process_gp_end : ํ˜„์žฌ gp๊ฐ€ ๋๋‚ฌ์„ ๋•Œ ์ด cpu์— ๋Œ€ํ•œ callback์„ ์ง„ํ–‰ํ•ด gp๊ฐ€ ๋๋‚ฌ์Œ์„ ํ‘œ์‹œํ•œ๋‹ค.
rcu_process_gp_end ํ˜ธ์ถœ ์‹œ์ 
	rcu_process_callbacks <- ํ˜ธ์ถœ์€ rcu_kthread
	open_softirq์— ๋Œ€ํ•œ callback. RCU_SOFTIRQ์— ์˜ํ•œ callback.



๋ชจ๋“  cpu๊ฐ€ qs state์— ๋“ค์–ด๊ฐ”๋‹ค๋ฉด end_gp

1. idle์— ๋“ค์–ด๊ฐˆ ๋•Œ
2. ์ด๋ฏธ idle ์ƒํƒœ์ผ ๋•Œ

context switch์ด ๋ฐœ์ƒํ•˜๋ฉด synchronize_rcu๊ฐ€ ๋๋‚ฌ๋‹ค๊ณ  ํ•˜๋Š” ๊ฒƒ์ธ์ง€?


open_softirq(RCU_SOFTIRQ, rcu_process_callbacks);
raise_softirq(RCU_SOFTIRQ)๊ฐ€ ๋ถˆ๋ฆฌ๋Š” ์‹œ์ 
	- invoke_rcu_core ํ˜ธ์ถœ ์‹œ์ 
		rcu_do_batch
		rcu_check_callbacks
		__call_rcu_core
		rcu_prepare_for_idle


completed gpnum
	gpnum์€ gp๊ฐ€ ์‹œ์ž‘๋˜๋ฉด ++๋กœ ์ฆ๊ฐ€๋˜์–ด ๋ถ€์—ฌ๋œ๋‹ค.

gp ์‹œ์ž‘

barrios ๋ฌธ์„œ์—์„œ๋Š” ์•„๋ž˜์™€ ๊ฐ™์ด ์„ค๋ช…ํ•˜๋Š”๋ฐ, PREEMPTION RCU์ผ ๊ฒฝ์šฐ 
	โ€œ์ด๋ ‡๊ฒŒ ์‹œ์Šคํ…œ์˜ ๋ชจ๋“  CPU์—์„œ context switch๋ฅผ ํ•˜๋Š” ๊ฒƒ์€ ์ด์ „์— ๋ชจ๋“  RCU read-side critical sections๋“ค์ด ๋ชจ๋‘ ์ผ์„ ๋๋งˆ์ณค๋‹ค๋Š” ๊ฒƒ์„ ์˜๋ฏธํ•˜๊ฒŒ ๋˜๋Š” ๊ฒƒ์ด๋‹ค.โ€


synchronize_rcu
	synchronize_sched
		if (rcu_blocking_is_gp())
			return;
		wait_rcu_gp(call_rcu_sched);

rdp ๊ตฌ์กฐ์ฒด
	struct rcu_head *nxtlist;
	struct rcu_head **nxttail[RCU_NEXT_SIZE];

	rcu_head์€ single list๋กœ ์ญ‰ ์—ฐ๊ฒฐ๋˜์–ด ์žˆ๊ณ ,
		[next [*], func [*]]
	nxttail์€ batch ๋‹จ์œ„๋กœ ๊ฐฑ์‹ ์‹œ์ผœ์ฃผ๊ธฐ ์œ„ํ•ด rcu_head์˜ next๋ฅผ ๊ฐ€๋ฆฌํ‚ค๋Š” ํฌ์ธํ„ฐ๋“ค์ด๋‹ค.

rcu_head๋ฅผ ์ง€๋‹ˆ๋Š” ๊ตฌ์กฐ์ฒด (struct rcu_head๋ฅผ ํฌํ•จํ•˜๋Š” ๊ฐ์ฒด๋“ค)
	nxplist ~ nxttail[RCU_DONE_TAIL]   ; done

rcu_do_batch /* callback ํ•จ์ˆ˜ ์ฒ˜๋ฆฌ */
	rdp->qlen_lazy -= count_lazy; /* # of lazy queued callbacks */

2014.08.23

__schedule
	/*
	 * Note a context switch.  This is a quiescent state for RCU-sched,
	 * and requires special handling for preemptible RCU.
	 * The caller must have disabled preemption.
	 */
	rcu_note_context_switch
		rcu_sched_qs			// sched ๋™์ž‘์— ์˜ํ•œ qs๋ฅผ ๊ธฐ๋กํ•œ๋‹ค.
		rcu_preempt_note_context_switch	// ์ดํ•ด???


rcu_check_callbacks
	rcu_sched_qs
	rcu_bh_qs

__rcu_pending
	check_cpu_stall
	โ€ฆ




kfree_cru
	__kfree_rcu
		kfree_call_rcu
			if CONFIG_PREEMPT_RCU
				__call_rcu(โ€ฆ, rcu_preempt_state, 1)
			else
				__call_rcu(โ€ฆ, rcu_sched_state, 1)

call_rcu
	if CONFIG_PREEMPT_RCU
		__call_rcu(โ€ฆ, &rcu_preempt_state, 0)
	else
		call_rcu = call_rcu_sched
		__call_rcu(โ€ฆ, &rcu_sched_state, 0)

2014.08.30

__schedule


handle_IPI
	case IPI_TIMER:
		irq_enter();
		ipi_timer();
			evt->event_handler(evt);
		irq_exit();
		break;		


tick_setup_periodic(newdev, 0);
	tick_set_periodic_handler(dev, broadcast); // broadcast = 0
		if (!broadcast)
			dev->event_handler = tick_handle_periodic;


sp804_timer_interrupt			// trace์—์„œ ์‹ค์ œ ํ˜ธ์ถœ๋œ ๋ถ€๋ถ„
	evt->event_handler(evt);



call_rcu (= call_rcu_sched)

๋ชจ๋“  cpu์—์„œ qs๊ฐ€ ๋ฐœ์ƒํ•œ ๋’ค์—, rcu_report_qs_rsp๊นŒ์ง€ ํƒ„๋‹ค.
์ดํ›„ ๋Œ€๊ธฐ ์ค‘์ธ callback์„ ํ˜ธ์ถœํ•˜๋Š” ๋ถ€๋ถ„์€?
	rcu_do_batch

rcu_process_callbacks	// ํ˜ธ์ถœํ•˜๋Š” ๊ณณ 1. RCU_SOFTIRQ handler 2. rcu_kthread
			// 1. RCU_SOFTIRQ๋ฅผ raise ํ•˜๋Š” ๊ณณ : invoke_rcu_core
			// 



/* checkํ•˜๋Š” ๋ถ€๋ถ„ */
// jiffies๊ฐ€ timer interrupt์— ์˜ํ•ด update ๋˜๋Š” ๋ถ€๋ถ„
tick_handle_periodic	/* tick_setup_periodic, */
	tick_periodic
		if (tick_do_timer_cpu == cpu) {
			do_timer(1)
				jiffies_64 += ticks;	// ticks = 1.
		}

		update_process_times
			โ€ฆ
			rcu_check_callbacks(cpu, user_tick);		// Check to see if this CPU is in a non-context-switch quiescent state
				if (rcu_pending(cpu))
					invoke_rcu_core();



* softirq handler, rcu_kthread
	rcu_process_callbacks	/* Do RCU core processing for the current CPU. */
		for_each_rcu_flavor(rsp)
			__rcu_process_callbacks
				rcu_process_gp_end
					__rcu_process_gp_end
				rcu_check_quiescent_state

rcu_check_quiescent_state -> rcu_report_qs_rdp -> rcu_report_qs_rnp -> rcu_report_qs_rsp


rcu_check_quiescent_state
	โ€ฆ
	if (!rdp->passed_quiesce)
		return;

	rcu_report_qs_rdp(rdp->cpu, rsp, rdp, rdp->passed_quiesce_gpnum);



rcu_report_qs_rsp
	rsp->completed = rsp->gpnum;
	rsp->fqs_state = RCU_GP_IDLE;





* read-side critical section
	โ€œrcu_read_lock๊ณผ rcu_read_unlock์˜ ์‚ฌ์ด์—์„œ๋Š” ์ ˆ๋Œ€ blockํ•˜๊ฑฐ๋‚˜ sleepํ•ด์„œ๋Š” ์•ˆ๋œ๋‹ค๋Š” semantics๊ฐ€ ์žˆ๋‹ค๊ณ  ์ด์ „์— ์–ธ๊ธ‰ํ•˜์˜€๋‹ค.
	 ๊ทธ๋Ÿฌ๋ฏ€๋กœ CPU๊ฐ€ context switch๋ฅผ ์ˆ˜ํ–‰ํ•˜๊ฒŒ ๋˜๋ฉด ์ด์ „์— ๊ทธ CPU์—์„œ ์ˆ˜ํ–‰๋˜์—ˆ๋˜ ๋ชจ๋“  RCU read-side critical section period๋Š”
	 ์™„์ „ํžˆ ๋๋‚ฌ๋‹ค๋Š” ๊ฒƒ์„ ๋ณด์žฅํ•  ์ˆ˜ ์žˆ ๊ฒŒ ๋œ๋‹ค. ์ด๋Ÿฌํ•œ ๊ทœ์น™์„ ๋ชจ๋“  CPU์— ๋Œ€ํ•ด ์ ์šฉํ•˜๋ฉด, ์ฆ‰ ๋ชจ๋“  CPU๊ฐ€ ํ•œ๋ฒˆ์”ฉ context switching์„ ํ•˜๋Š” ๊ฒƒ์„โ€ฆโ€ - barrios๋ฌธ์„œ

	rcu_read_lock	// CONFIG_PREEMPT_RCU๊ฐ€ ์•„๋‹ ๊ฒฝ์šฐ preempt_disable๋กœ ์ฒ˜๋ฆฌ.
	{ critical section }
	rcu_read_unlock

	schedule() ํ•จ์ˆ˜๋Š” preempt disabled ์ƒํƒœ์—์„œ ํ˜ธ์ถœ๋˜์ง€ ์•Š๋Š”๋‹ค. (์„ ์ ์ด ๋ฐœ์ƒํ•˜๋Š” ์‹œ์ ์€ irq, syscall ์ข…๋ฃŒ์‹œ์ ์ด๋‹ค)
		- user, idle ์ƒํƒœ์ธ ๊ฒฝ์šฐ QS๋ผ๊ณ  ํ•  ์ˆ˜ ์žˆ๋Š” ๊ทผ๊ฑฐ.
	์—ญ์œผ๋กœ context_switch๊ฐ€ ๋ฐœ์ƒํ–ˆ๋‹ค๋ฉด rcu read-side critical section์ด ์•„๋‹ˆ๋ผ๋Š” ๊ฒƒ์ด๊ณ , QS๋ผ๊ณ  note ํ•œ๋‹ค.
	๋”ฐ๋ผ์„œ schedule() ํ•จ์ˆ˜๊ฐ€ ํ˜ธ์ถœ๋˜๋ฉด rcu_note_context_switch()๋ฅผ ํ˜ธ์ถœํ•˜๊ณ , rcu_sched_qs()์—์„œ passed_quiesce๋กœ ์ฒ˜๋ฆฌํ•œ๋‹ค.

	timer interrupt

===================================================================== [PREEMPT] ์‹ค์ œ ์„ ์ ์€ ์–ธ์ œ ๋ฐœ์ƒํ•˜๋Š”๊ฐ€?

A. irq ๋ณต๊ท€์‹œ์ 
B. syscall ๋“ฑ ์ดํ›„ ret_to_user

__irq_svc: // CPSR 0x200001d3 svc_entry irq_handler

#ifdef CONFIG_PREEMPT /** 20140816 * tsk์— thread_info๋ฅผ ๋ฐ›์•„์™€ ์„ ์  ์นด์šดํŠธ์™€ flags๋ฅผ ์ถ”์ถœํ•œ๋‹ค. * ์„ ์  ์นด์šดํŠธ๊ฐ€ 0์ด ์•„๋‹ˆ๋ผ๋ฉด, ์ฆ‰ ์„ ์ ๋ถˆ๊ฐ€๋ผ๋ฉด flags๋ฅผ 0์œผ๋กœ ์ดˆ๊ธฐํ™” ์‹œํ‚จ๋‹ค. * flags์— TIF_NEED_RESCHED๊ฐ€ ์ผœ์žˆ๋‹ค๋ฉด svc_preempt ๋ฃจํ‹ด์„ ์‹คํ–‰ํ•˜๊ณ  ๋Œ์•„์˜จ๋‹ค. **/ get_thread_info tsk ldr r8, [tsk, #TI_PREEMPT] @ get preempt count ldr r0, [tsk, #TI_FLAGS] @ get flags teq r8, #0 @ if preempt count != 0 movne r0, #0 @ force flags to 0 tst r0, #_TIF_NEED_RESCHED blne svc_preempt #endif โ€ฆ svc_exit r5

#ifdef CONFIG_PREEMPT /** 20140816 * bl์ด ํ˜ธ์ถœ๋˜์–ด lr์ด ๋ณ€๊ฒฝ๋˜๊ธฐ ์ „์— ๋ฐฑ์—…์„ ๋ฐ›์•„๋‘”๋‹ค. * preempt_schedule_irq๋ฅผ ํ˜ธ์ถœํ•ด ์„ ์ ํ•œ ๋’ค, ๋‹ค์‹œ flags๋ฅผ ๊ฐ€์ ธ์™€ * _TIF_NEED_RESCHED๊ฐ€ ์ผœ์žˆ๋‹ค๋ฉด ๋‹ค์‹œ ์„ ์ ํ•จ์ˆ˜๋ฅผ ํ˜ธ์ถœํ•˜๊ณ , * ๊ทธ๋ ‡์ง€ ์•Š๋‹ค๋ฉด ๋ณต๊ท€ํ•œ๋‹ค. **/ svc_preempt: mov r8, lr 1: bl preempt_schedule_irq @ irq en/disable is done inside ldr r0, [tsk, #TI_FLAGS] @ get new tasks TI_FLAGS tst r0, #_TIF_NEED_RESCHED moveq pc, r8 @ go again b 1b #endif

/** 20140824

  • ์ธํ„ฐ๋ŸฝํŠธ ์ปจํ…์ŠคํŠธ์˜ ์ปค๋„ ์„ ์ ๊ธˆ์ง€๋กœ๋ถ€ํ„ฐ schedule()์„ ํ˜ธ์ถœํ•˜๋Š” ์ง„์ž…์ .

  • ์ธํ„ฐ๋ŸฝํŠธ ๊ธˆ์ง€ ์ƒํƒœ์—์„œ ํ˜ธ์ถœ๋˜์–ด ์ธํ„ฐ๋ŸฝํŠธ ๊ธˆ์ง€ ์ƒํƒœ๋กœ ๋ณต๊ท€ํ•˜๋ฏ€๋กœ,

  • ์ธํ„ฐ๋ŸฝํŠธ๋กœ๋ถ€ํ„ฐ ์žฌ๊ท€์  ํ˜ธ์ถœ์„ ๋ง‰๋Š”๋‹ค. **/ asmlinkage void __sched preempt_schedule_irq(void) { struct thread_info *ti = current_thread_info();

    /* Catch callers which need to be fixed */ BUG_ON(ti->preempt_count || !irqs_disabled());

    do { /** 20140824 * ์„ ์  ์ค‘ ์ƒํƒœ๋กœ ๋งŒ๋“ค๊ณ , interrupt ๊ฐ€๋Šฅ ์ƒํƒœ์—์„œ * schedule ํ•จ์ˆ˜๋ฅผ ํ˜ธ์ถœํ•ด ์„ ์ ์‹œํ‚จ๋‹ค. **/ add_preempt_count(PREEMPT_ACTIVE); local_irq_enable(); __schedule(); // <โ€” __schedule() ํ•จ์ˆ˜์‹คํ–‰ local_irq_disable(); sub_preempt_count(PREEMPT_ACTIVE);

static void __sched __schedule(void) { struct task_struct *prev, *next; unsigned long *switch_count; struct rq *rq; int cpu;

need_resched: preempt_disable(); cpu = smp_processor_id(); rq = cpu_rq(cpu); rcu_note_context_switch(cpu); // <โ€” rcu_note_context_switch ํ˜ธ์ถœ prev = rq->curr;

/** 20140824

  • context switch ๋ฐœ์ƒ์„ ๊ธฐ๋กํ•œ๋‹ค. **/ void rcu_note_context_switch(int cpu) { trace_rcu_utilization("Start context switch"); rcu_sched_qs(cpu); rcu_preempt_note_context_switch(cpu); trace_rcu_utilization("End context switch"); } EXPORT_SYMBOL_GPL(rcu_note_context_switch);

/** 20140823

  • scheduler ๋™์ž‘์— ์˜ํ•œ qs๋ฅผ ๊ธฐ๋กํ•œ๋‹ค.

  • gp ์‹œ์ž‘ ์ดํ›„๋กœ qs๊ฐ€ ํ•œ ๋ฒˆ์ด๋ผ๋„ ๋ฐœ์ƒํ–ˆ๋Š”์ง€ ์ •๋ณด๋งŒ ์•Œ๋ฉด ๋˜๋ฏ€๋กœ,

  • ํ˜ธ์ถœ๋  ๋•Œ๋งˆ๋‹ค ํ˜„์žฌ gpnum๊ณผ passed_quiesce ์ƒํƒœ๊ฐ’์„ ์—…๋ฐ์ดํŠธ ํ•œ๋‹ค. **/ void rcu_sched_qs(int cpu) { struct rcu_data *rdp = &per_cpu(rcu_sched_data, cpu);

    rdp->passed_quiesce_gpnum = rdp->gpnum; barrier(); if (rdp->passed_quiesce == 0) trace_rcu_grace_period("rcu_sched", rdp->gpnum, "cpuqs"); rdp->passed_quiesce = 1; }

2014.09.06

rcu_check_callbacks ๋ถ„์„

GICv2 Spec. v2.0
	* Distributor
	* CPU interface

	ID0  -     15 : SGI (Software-generated interrupt)

	ID16 -     31 : PPI (Private Peripheral Interrupt)
		This is a peripheral interrupt that the Distributor can route to any of a specified combination of processors.
	ID32 - ID1019 : SPI (Shared Peripheral Interrupt)
		This is a peripheral interrupt that is specific to a single processor.


kernel/irq/irqdomain.c : irq_domain์ด ๋ฌด์—‡์ธ๊ฐ€?
kernel/irq/chip.c

[from] https://www.kernel.org/doc/Documentation/IRQ-domain.txt

irq_domain interrupt number mapping library

Linux kernel์˜ ๋””์ž์ธ์€ ํ•˜๋‚˜์˜ ํฐ ๋ฒˆํ˜ธ ๊ณต๊ฐ„์„ ์‚ฌ์šฉํ•ด IRQ source๋งˆ๋‹ค ๋‹ค๋ฅธ ๋ฒˆํ˜ธ๋ฅผ ๋‚˜๋ˆ ์„œ ์“ฐ๊ณ  ์žˆ๋‹ค.
interrupt controller๊ฐ€ ํ•˜๋‚˜์ผ ๋•Œ๋Š” ๋‹จ์ˆœํ•˜์ง€๋งŒ, ์—ฌ๋Ÿฌ ๊ฐœ์˜ interrupt controller๊ฐ€ ์‚ฌ์šฉ๋  ๋•Œ,
์ปค๋„์€ Linux IRQ ๋ฒˆํ˜ธ๊ฐ€ ๊ฒน์น˜์ง€ ์•Š๋„๋ก ๋ฒˆํ˜ธ๋ฅผ ์ œ๊ณตํ•ด์•ผ ํ•œ๋‹ค.

๋งŽ์€ interrupt controller๊ฐ€ ์œ ์ผํ•œ irqchips๋กœ ๋“ฑ๋ก๋˜๊ณ , ์ฆ๊ฐ€ํ•˜๋Š” ์ถ”์„ธ๋ฅผ ๋ณด์—ฌ์ค€๋‹ค :
์˜ˆ๋ฅผ ๋“ค๋ฉด GPIO ์ปจํŠธ๋กค๋Ÿฌ์ฒ˜๋Ÿผ ๋‹ค๋ฅธ ์ข…๋ฅ˜์˜ ์„œ๋ธŒ๋“œ๋ผ์ด๋ฒ„๋“ค์€ ๋…๋ฆฝ์ ์ธ CB ๋งค์ปค๋‹ˆ์ฆ˜์„ ์žฌ๊ตฌํ˜„ํ•˜์ง€ ์•Š๊ณ ,
irqchips๋กœ ๋ชจ๋ธ๋ง ํ•˜์—ฌ IRQ ์ฝ”์–ด์‹œ์Šคํ…œ์„ ์‚ฌ์šฉํ•œ๋‹ค.

โ€ฆ


irq_desc : irq์— ๋Œ€ํ•œ descriptor
irq_chip : hardware interrupt chip descriptor
irq_data : irq๋งˆ๋‹ค irq_chip, irq_domain ๊ด€๋ จ๋œ ์ •๋ณด๋ฅผ ๋ณด์œ ํ•œ๋‹ค.

2014.09.13 architecture

kernel
* request api
	request_irq
		=> request_threaded_irq
	request_any_context_irq
	request_percpu_irq		


wfi
wfe <-> sev
	ex) spin_lock

IPI
	handle_IPI : handler
	smp_cross_call : caller, GIC๋Š” gic_raise_softirq


/* tick notifier ๋“ฑ๋ก */
tick_init
	clockevents_register_notifier(&tick_notifier);

/* tick notify */
clockevents_config_and_register
	clockevents_register_device
		clockevents_do_notify
			raw_notifier_call_chain()
// ํ˜ธ์ถœํ•˜๋Š” ๊ณณ : v2m_timer_init, percpu_timer_setup(smp_prepare_cpus ๋˜๋Š” secondary_start_kernel์—์„œ ํ˜ธ์ถœ)

clockevents_register_device
	clockevents_do_notify(CLOCK_EVT_NOTIFY_ADD, dev);


/* notify handler ๋™์ž‘ */
tick_notify
	case CLOCK_EVT_NOTIFY_ADD:
		tick_check_new_device	/* percpu_timer_setup -> twd_timer_setup core๋งˆ๋‹ค ํ•œ ๋ฒˆ์”ฉ ํ˜ธ์ถœ */
			tick_setup_device
				if (td->mode == TICKDEV_MODE_PERIODIC)
					tick_setup_periodic(newdev, 0);







ddd๋กœ ํ™•์ธํ•œ ๊ฒฐ๊ณผ twd์—์„œ tick_periodic

__setup_irq(irqaction์„ ๋“ฑ๋กํ•˜๋Š” ํ•จ์ˆ˜)๋ฅผ ํ˜ธ์ถœํ•˜๋Š” ๊ณณ
	setup_irq, setup_percpu_irq, request_threaded_irq, request_percpu_irq
	setup๋ฅ˜๋Š” Used to statically setup interrupts in the early boot process.

=====================================================================

/* kernel/softirq.c */

/*
 * preempt_count and SOFTIRQ_OFFSET usage:
 * - preempt_count is changed by SOFTIRQ_OFFSET on entering or leaving
 *   softirq processing.
 * - preempt_count is changed by SOFTIRQ_DISABLE_OFFSET (= 2 * SOFTIRQ_OFFSET)
 *   on local_bh_disable or local_bh_enable.
 * This lets us distinguish between whether we are currently processing
 * softirq and whether we just have bh disabled.
 */



/* include/linux/hardirq.h */

thread_info์˜ preempt_count๋ฅผ ์ชผ๊ฐœ ์‚ฌ์šฉํ•œ๋‹ค.
#define preempt_count() (current_thread_info()->preempt_count)


#define hardirq_count() (preempt_count() & HARDIRQ_MASK
#define softirq_count() (preempt_count() & SOFTIRQ_MASK)
#define irq_count() (preempt_count() & (HARDIRQ_MASK | SOFTIRQ_MASK | NMI_MASK))

#define in_interrupt()      (irq_count())		// HARDIRQ, SOFTIRQ ํฌํ•จ.

IPI ํ•ธ๋“ค๋Ÿฌ์—์„œ
irq_enter
irq_exit

preempt_count์˜ ํ•˜์œ„ 8๋น„ํŠธ๋งŒ ์„ ์  count๋ฅผ ๊ธฐ๋กํ•  ๋•Œ ์‚ฌ์šฉ๋œ๋‹ค.
์œ„์ชฝ์€ softirq, hardirq, nmi ๋“ฑ์˜ ์šฉ๋„๋กœ ์‚ฌ์šฉ๋œ๋‹ค.
ํŠนํžˆ 0x40000000์€ PREEMPT_ACTIVE๋กœ ํ˜„์žฌ thread๊ฐ€ ์„ ์ ๋˜์—ˆ์Œ์„ ํ‘œ์‹œํ•œ๋‹ค.

__schedule() ๋ถ„์„ํ•„์š”.
svc_preempt ; entry-armv.S
	preempt_schedule_irq

set_tsk_need_resched
preempt_check_resched



#define softirq_count() (preempt_count() & SOFTIRQ_MASK)
#define in_serving_softirq()    (softirq_count() & SOFTIRQ_OFFSET)

2014.09.20

===================================================================== vexpress ์‹คํ–‰ํ™”๋ฉด twd๊ฐ€ 29๋ฒˆ, timer๊ฐ€ 34๋ฒˆ์œผ๋กœ ๋“ฑ๋ก๋˜์–ด ์žˆ๋‹ค. twd๋Š” ๊ฐ core์˜ local timer์ด๋‹ค. timer๋Š” 0๋ฒˆ core์—์„œ 35๋ฒˆ๋งŒ ๋ฐœ์ƒํ•˜๊ณ , ์ดํ›„ ์ฆ๊ฐ€ํ•˜์ง€ ์•Š๊ณ  ์žˆ๋‹ค. Timer broadcast interrupts๋Š” ํ•œ ๋ฒˆ๋„ ๋ฐœ์ƒํ•˜์ง€ ์•Š์•˜๋‹ค.

cat /proc/interrupts

       CPU0       CPU1       CPU2       CPU3       

29: 967 960 942 940 GIC twd // ct_ca9x4_init_irq์—์„œ ๋“ฑ๋กํ•œ timer watchdog 34: 35 0 0 0 GIC timer // v2m_timer_init โ€ฆ sp804_clockevents_init 36: 0 0 0 0 GIC rtc-pl031 37: 55 0 0 0 GIC uart-pl011 41: 0 0 0 0 GIC mmci-pl18x (cmd) 42: 0 0 0 0 GIC mmci-pl18x (pio) 44: 9 0 0 0 GIC kmi-pl050 45: 101 0 0 0 GIC kmi-pl050 47: 0 0 0 0 GIC eth0 IPI0: 0 0 0 0 Timer broadcast interrupts IPI1: 634 457 133 291 Rescheduling interrupts IPI2: 1 2 1 2 Function call interrupts IPI3: 0 0 0 0 Single function call interrupts IPI4: 0 0 0 0 CPU stop interrupts Err: 0 / #

gic_notifier


IPI_RESCHEDULE ์‚ฌ์šฉ ์˜ˆ
	load_balance()
		resched_cpu(env.dst_cpu);
			resched_task(cpu_curr(cpu));
				smp_send_reschedule(cpu);
					smp_cross_call(cpumask_of(cpu), IPI_RESCHEDULE);


IPI_CALL_FUNC_SINGLE ์‚ฌ์šฉ์˜ˆ
	flush_tlb_page()
		smp_call_function_many
			smp_call_function_single
				generic_exec_single	// ๋‹ค๋ฅธ cpu์ธ ๊ฒฝ์šฐ
					IPI_CALL_FUNC_SINGLE

clockevent_config_and_register



start_kernel
โ€”โ€”โ€”โ€”โ€”โ€”โ€”โ€”โ€”
prio_tree_init() : prio tree๋ฅผ ์ดˆ๊ธฐํ™” ํ•œ๋‹ค.
	http://studyfoss.egloos.com/5194196
	http://studyfoss.egloos.com/5194452
	http://studyfoss.egloos.com/5199543

init_timers() : notify๋ฅผ ์ฃผ์–ด tvec_base๋ฅผ ์ดˆ๊ธฐํ™”.

softirq_init()
timekeeping_init() : timekeeper ๊ด€๋ จ ๋ณ€์ˆ˜ ์ดˆ๊ธฐํ™”.
time_init() : system_timer ์„ค์ •. v2m_timer.init = v2m_timer_init
	static struct sys_timer v2m_timer = {
	    .init   = v2m_timer_init,
	};

2014.09.27 * softirq

The softirq and tasklet are both kind of bottom-halves mechanism. Sleep is not allowed because they run under interrupt context not process context.
	
Whenever a system call is about to return to userspace, or a hardware interrupt handler exits, any โ€˜software interruptsโ€™ which are marked pending
 (usually by hardware interrupts) are run (kernel/softirq.c).


softirq์™€ tasklet ๊ด€๋ จ ์†Œ์Šค๋Š” sortirq.c์— ๊ฐ™์ด ์กด์žฌ.
/sys/devices/system/cpu/cpu0 online์— 0์„ ์ง€์ •ํ•˜๋ฉด drivers/base/cpu.c์˜ store_online ํ•จ์ˆ˜๊ฐ€ ์‹คํ–‰๋œ๋‹ค.
โ€˜0โ€™์ด ๋“ค์–ด์˜ค๋ฉด cpu_down
		__cpu_die

	cpu_idle
		cpu_die
			platform_cpu_die
				platform_do_lowpower
					wfi();
				โ€œb   secondary_start_kernel" // wfi์—์„œ ๊นจ์–ด๋‚œ ๋’ค


โ€˜1โ€™์ด ๋“ค์–ด์˜ค๋ฉด cpu_up


vector_irq
	// svc๋กœ ์ „ํ™˜
	// exception ๋ฐœ์ƒ ๋‹น์‹œ์˜ mode์— ๋”ฐ๋ผ ํ˜ธ์ถœํ•  routine์„ ์ฐพ์Œ.
	//   svc์ผ ๊ฒฝ์šฐ __irq_svc

__irq_svc:
	svc_entry
	irq_handler	// gic_handle_irq -> handle_IRQ -> irq_enter, generic_handle_irq, irq_exit
	svc_exit	// svc์—์„œ ๋ณต๊ท€ํ•˜์—ฌ exception code๋กœ ์ง„์ž…ํ•˜๋Š” ๋งคํฌ๋กœ


__do_softirq
	irq enable ์ƒํƒœ์—์„œ softirq action์„ ์‹คํ–‰ํ•œ๋‹ค.

Kernel code can tell if it is running in interrupt context by calling the function in_interrupt( ), which takes no parameters and returns nonzero if the processor is currently running in interrupt context, either hardware interrupt or software interrupt.

A function related to in_interrupt( ) is in_atomic( ). Its return value is nonzero whenever scheduling is not allowed; this includes hardware and software interrupt contexts as well as any time when a spinlock is held. In the latter case, current may be valid, but access to user space is forbidden, since it can cause scheduling to happen. Whenever you are using in_interrupt( ), you should really consider whether in_atomic( ) is what you actually mean. Both functions are declared in <asm/hardirq.h>

interrupt_context
process_context

kernel์€ hardirq, softirq ๋ชจ๋‘ in_interrupt()๊ฐ€ ์ฐธ์œผ๋กœ, interrupt context๋ผ ๊ฐ„์ฃผํ•œ๋‹ค.

2014.10.04 idle loop์—์„œ RCU๋Š” read-side critical section์„ ๋ฌด์‹œํ•œ๋‹ค.

nest ํšŸ์ˆ˜๋งŒํผ ๋ฒ—์–ด๋‚˜์•ผ idle๋กœ ์ง„์ž…

rcu_idle_enter
	
rcu_idle_exit



NO_HZ & SMP
	RCU_FAST_NO_HZ

2014.10.11

* top-half / bottom-half
	: bottom half ์‹คํ–‰์‹œ์—๋Š” ๋ชจ๋“  ์ธํ„ฐ๋ŸฝํŠธ๊ฐ€ enable๋œ ์ƒํƒœ์ด๋‹ค.
	  ๋ณดํ†ต top-half์—์„œ device-specific ๋ฒ„ํผ์— ๋””๋ฐ”์ด์Šค ๋ฐ์ดํ„ฐ๋ฅผ ์ €์žฅํ•˜๊ณ , bottom-hal์—์„œ ์‹คํ–‰๋  ์ˆ˜ ์žˆ๋„๋ก ์Šค์ผ€์ฅด ํ•œ ๋’ค ์ข…๋ฃŒํ•œ๋‹ค.

  top-half์— ๋“ฑ๋ก - request_irq()

  tasklet ์ดˆ๊ธฐํ™” - tasklet_init()
  tasklet ์˜ˆ์•ฝ - tasklet_schedule()


* softirq์™€ tasklet
	: kernel 2.3๋ถ€ํ„ฐ ์ถ”๊ฐ€๋˜์–ด BH๋ฅผ ๋Œ€์ฒดํ•œ bottom-halfs.
softirq - compile ์‹œ์— ์ •์ ์œผ๋กœ ์ •์˜๋˜๋ฉฐ, ๊ฐ™์€ ํƒ€์ž…์˜ softirq๊ฐ€ ๋™์‹œ์— ์—ฌ๋Ÿฌ ํ”„๋กœ์„ธ์„œ์—์„œ ์‹คํ–‰๋  ์ˆ˜ ์žˆ๋‹ค. ๋„คํŠธ์›Œํ‚น๊ณผ ๊ฐ™์ด ์†๋„์— ๋ฏผ๊ฐํ•œ ์ž‘์—…์€ softirq๋กœ ์‹คํ–‰ํ•œ๋‹ค.
tasket - softirq๋ฅผ ๊ธฐ๋ฐ˜์œผ๋กœ ๋™์ ์œผ๋กœ ์ •์˜๋˜๋ฉฐ, ๊ฐ™์€ ํƒ€์ž…์˜ tasklet์€ ๋™์‹œ์— ์—ฌ๋Ÿฌ ํ”„๋กœ์„ธ์„œ์—์„œ ์‹คํ–‰๋  ์ˆ˜ ์—†๋‹ค. ๋”ฐ๋ผ์„œ ์†๋„๋ฅผ ํฌ์ƒํ•˜๋Š”๋Œ€์‹ , ์‚ฌ์šฉ ํŽธ์˜์„ฑ์„ ์ œ๊ณตํ•œ๋‹ค. ๋Œ€์ฒด๋กœ tasklet์ด๋ฉด ์ถฉ๋ถ„ํ•˜๋‹ค.

__do_softirq์—์„œ ํ•จ์ˆ˜๋ฅผ ์‹คํ–‰ ํ•  ๋•Œ, local_irq_enable() ํ›„ ์‹คํ–‰ํ•œ๋‹ค.



rcu_check_callbacks   : update_rocess_times์—์„œ ํ˜ธ์ถœ <- tick handler
rcu_process_callbacks : rcu_kthread์—์„œ ํ˜ธ์ถœ


run_timer_softirq
	hrtimer_run_pending
	if (jiffies < timer_jiffies)
		__run_timers()


* kernel/time/Kconfig
	: hrtimer, NO_HZ(dynticks)๋Š” oneshot ๋ฐฉ์‹์œผ๋กœ ๋™์ž‘ํ•œ๋‹ค.
	  vexpress๋Š” ์ •์˜๋˜์–ด ์žˆ์ง€ ์•Š์ง€๋งŒ, exynos์˜ ๊ฒฝ์šฐ ์•„๋ž˜ ๋‘ config๊ฐ€ ํฌํ•จ๋˜์–ด ์žˆ๋‹ค.

HIGH_RES_TIMERS
	select TICK_ONESHOT

NO_HZ
	select TICK_ONESHOT

2014.10.18 grace period : removal phase๊ฐ€ ์™„๋ฃŒ๋˜๊ธฐ ์ „ ์ ‘๊ทผํ•œ read-side critical section์ด ๋๋‚œ ์‹œ์ .

rcu_start_gp
	ํ•˜๋‚˜์˜ gp๋ฅผ ๋งˆ์นœ ๋’ค ํ˜ธ์ถœ๋  ์ˆ˜๋„ ์žˆ๊ณ , removal phase ์ดํ›„ ์‹œ์ž‘๋  ์ˆ˜ ์žˆ๋‹ค.
	๋‹ค์Œ gp๋ฅผ detectํ•˜๊ธฐ ์œ„ํ•œ ์ค€๋น„๊ณผ์ •์œผ๋กœ hierarchy๋ฅผ ์žฌ์ดˆ๊ธฐํ™” ํ•œ๋‹ค.

synchronize_rcu
	1) PREEMPT_RCU์ธ ๊ฒฝ์šฐ
	wait_rcu_gp
	2) PREEMPT_RCU๊ฐ€ ์•„๋‹Œ ๊ฒฝ์šฐ
	synchronized_sched


cpu_notify
clockevents_notify



void wake_up_new_task(struct task_struct *p)
	set_task_cpu(p, select_task_rq(p, SD_BALANCE_FORK, 0));

.select_task_rq     = select_task_rq_fair,

2014.10.25 ๋ถ๋งˆํฌ ์ค‘๊ตญ์–ด ํŽ˜์ด์ง€๋ถ€ํ„ฐโ€ฆ

Professional Linux Kernel Architecture
	15.2.4 Dynamic Timers

The kernel needs data structures to manage all timers registered in the system.
๊ฒ€์‚ฌ๋Š” timer interrupt๋งˆ๋‹ค ๋งค๋ฒˆ ์ˆ˜ํ–‰ํ•ด์•ผ ํ•œ๋‹ค.


tick_nohz_stop_sched_tick
	get_next_timer_interrupt : analyzes the timer wheel and discovers
				   the jiffy value at which the next event is due. 

2014.11.01

tick_do_update_jiffies64

do_timer(ticks) {
	jiffies_64 += ticks;
	โ€ฆ
	http://www.hanbit.co.kr/network/view.html?bi_id=1443
}


struct timekeeper - clocksource


high-res timer & dynticks
http://studyfoss.egloos.com/viewer/5268468
	clocksource : ์‹œ๊ฐ„ ์ •๋ณด๋ฅผ ์ฝ์–ด์˜ค๊ธฐ ์œ„ํ•œ ๊ฐ์ฒด(์žฅ์น˜)์ด๋‹ค.
	clock_event_device : ํŠน์ • ์‹œ๊ฐ„์— ์ด๋ฒคํŠธ(์ธํ„ฐ๋ŸฝํŠธ)๋ฅผ ๋ฐœ์ƒ์‹œํ‚ค๋Š” ์žฅ์น˜์ด๋‹ค.
- clock event๋Š” ์ฃผ๊ธฐ์  ๋™์ž‘์—ฌ๋ถ€์— ๋”ฐ๋ผ
	CLOCK_EVT_FEAT_PERIODIC
	CLOCK_EVT_FEAT_ONESHOT   ; hrtimer, dynticks์—์„œ ์‚ฌ์šฉํ•˜๋Š” ๋ฐฉ์‹.


Professional Linux Kernel Architecture. Ch.15
	CLOCK_MONOTONIC : system์ด ๋ถ€ํŒ…๋œ ์ดํ›„ 0๋ถ€ํ„ฐ ์‹œ์ž‘
	CLOCK_REALTIME  : system์˜ ์‹ค์ œ ์‹œ๊ฐ„

https://www.kernel.org/doc/Documentation/timers/highres.txt
High resolution timers and dynamic ticks design notes

* kernel/irq/manage.c์˜ request_percpu_irq ๋ถ„์„ํ•ด์•ผ ํ•จ.
	arch/arm/mach-vexpress/ct-ca9x4.c twd_init ๋ถ€๋ถ„ ํ•จ๊ป˜ ๋ณผ ๊ฒƒ.


http://stackoverflow.com/questions/3523442/difference-between-clock-realtime-and-clock-monotonic
==================================================================================================================
CLOCK_REALTIME represents the machine's best-guess as to the current wall-clock, time-of-day time.
As Ignacio and MarkR say, this means that CLOCK_REALTIME can jump forwards and backwards as the system
time-of-day clock is changed, including by NTP.

CLOCK_MONOTONIC represents the absolute elapsed wall-clock time since some arbitrary, fixed point in the past.
It isn't affected by changes in the system time-of-day clock.

If you want to compute the elapsed time between two events observed on the one machine without an intervening reboot,
CLOCK_MONOTONIC is the best option.
==================================================================================================================


hrtimer.c๋Š” ํ•ญ์ƒ ํฌํ•จ.
	CONFIG_NO_HZ, CONFIG_HIGH_RES_TIMERS์ผ ๊ฒฝ์šฐ CONFIG_TICK_ONESHOT

2014.11.08

ํ˜„์žฌ ๋ถ„์„ ํ๋ฆ„
tick_nohz_idle_enter
	__tick_nohz_idle_enter
		tick_nohz_stop_sched_tick
			โ€ฆ
			get_next_timer_interrupt


for_each_domain์˜ ์ฃผ์„์— ๋‚˜์™€ ์žˆ๋Š” RCUโ€™s qs transition์˜ ์˜๋ฏธ๋Š”?
http://sunjinyang.wordpress.com/2010/10/06/linux-kernel-scheduling-domains-and-classes/


Greedy hrtimer expiration
http://lwn.net/Articles/461592/

_softexpires (soft) : timer์˜ ๊ฐ€์žฅ ๋น ๋ฅธ ๋งŒ๋ฃŒ์‹œ๊ฐ„.
node.expires(hard) : _softexpires + slack(๋˜๋Š” delta๋กœ ์ง€์ •๋œ ๊ฐ’)

2014.11.15

kernel/time/Kconfig
	TICK_ONESHOT

	NO_HZ
		select TICK_ONESHOT
	HIGH_RES_TIMERS
		select TICK_ONESHOT




tick_check_new_device
	tick_setup_device
		tick_next_period = ktime_get();	// ์ดˆ๊ธฐํ™” ๋˜๋Š” ๋ถ€๋ถ„




* init_timers
	timer_cpu_notify(&timers_nb, (unsigned long)CPU_UP_PREPARE, (void *)(long)smp_processor_id());
	register_cpu_notifier(&timers_nb);
	open_softirq(TIMER_SOFTIRQ, run_timer_softirq);

* hrtimers_init
	hrtimer_cpu_notify(&hrtimers_nb, (unsigned long)CPU_UP_PREPARE, (void *)(long)smp_processor_id());
	register_cpu_notifier(&hrtimers_nb);
	open_softirq(HRTIMER_SOFTIRQ, run_hrtimer_softirq);



run_timer_softirq
	hrtimer_run_pending();
	if (time_after_eq(jiffies, base->timer_jiffies))
		__run_timers(base);

hrtimer_run_pending
	if (tick_check_oneshot_change(!hrtimer_is_hres_enabled()))
		hrtimer_switch_to_hres();

hrtimer_switch_to_hres
	if (tick_init_highres()) {โ€ฆ}
		tick_switch_to_oneshot(hrtimer_interrupt)

	tick_setup_sched_timer	// tick emulation timer๋ฅผ ์„ค์ •ํ•œ๋‹ค.
		struct tick_sched *ts = &__get_cpu_var(tick_cpu_sched);
		hrtimer_init(&ts->sched_timer, CLOCK_MONOTONIC, HRTIMER_MODE_ABS);
		ts->sched_timer.function = tick_sched_timer;				// hrtimer CB function ์ง€์ •
		hrtimer_set_expires(&ts->sched_timer, tick_init_jiffy_update());	// expires ์„ค์ •

		hrtimer_start_expires
			hrtimer_start_range_ns
				__hrtimer_start_range_ns
					switch_hrtimer_base : if necessary
					hrtimer_set_expires_range_ns
					timer_stats_hrtimer_set_start_info
					enqueue_hrtimer
					hrtimer_enqueue_reprogram

tick_sched_timer	/* CONFIG_HIGH_RES_TIMERS */	
	if (tick_do_timer_cpu == cpu)
		tick_do_update_jiffies64
			do_timer(++ticks)


==============================================================================================================

clockevents_config_and_register
	clockevents_config(dev, freq);
	clockevents_register_device(dev);
		clockevents_do_notify(CLOCK_EVT_NOTIFY_ADD, dev);
		clockevents_notify_released();

tick_notify
	case CLOCK_EVT_NOTIFY_ADD:
		tick_check_new_device // ์ด์ „ ๋””๋ฐ”์ด์Šค(clock_event_device)๋ฅผ ํ•ด์ œํ•˜๊ณ , ์ƒˆ๋กœ์šด ๋””๋ฐ”์ด์Šค๋ฅผ ๋“ฑ๋กํ•œ๋‹ค.





ktime_get() : ktime_t (ns resolution)์„ ๋ฐ›์•„์˜จ๋‹ค. wall_to_monotonic

timerqueue : expires ๊ธฐ์ค€์œผ๋กœ rb_tree๋กœ ๊ตฌ์„ฑ๋˜์–ด ์žˆ๋‹ค.

struct hrtimer_cpu_base - the per cpu clock bases

	hrtimer_bases๋ผ๋Š” ๋ณ€์ˆ˜๋ช…์œผ๋กœ percpu๋ณ„ ์ž๋ฃŒ๊ตฌ์กฐ๋ฅผ ๊ฐ€์ง„๋‹ค.
	    DEFINE_PER_CPU(struct hrtimer_cpu_base, hrtimer_bases)
	์ด percpu ๊ตฌ์กฐ์ฒด๋ฅผ ์„ ์–ธ๋ถ€์— hrtimer_clock_base ๋ผ๋Š” ๊ตฌ์กฐ์ฒด ๋ฐฐ์—ด์ด ์„ ์–ธ๋˜์–ด ์žˆ๋‹ค.

	hrtimer_cpu_base
	++++++++++++++++++++++++++++++++
	+ lock                     <โ€”โ€” + - lock protecting the base and associated clock bases and timers
	+ active_bases                 +
	+ โ€ฆ                            +
	+   +โ€”โ€”โ€”โ€”โ€”โ€”โ€”โ€”โ€”โ€”โ€”โ€”โ€”โ€”โ€”โ€”โ€”โ€”+   <โ€”โ€” + โ€” struct hrtimer_clock_base [HRTIMER_MAX_CLOCK_BASES]
	+   +     MONOTOMIC    +       +
	+   +โ€”โ€”โ€”โ€”โ€”โ€”โ€”โ€”โ€”โ€”โ€”โ€”โ€”โ€”โ€”โ€”โ€”โ€”+       +
	+   +     REALTIME     +       +
	+   +โ€”โ€”โ€”โ€”โ€”โ€”โ€”โ€”โ€”โ€”โ€”โ€”โ€”โ€”โ€”โ€”โ€”โ€”+       +
	+   +     BOOTTIME     +       +
	+   +โ€”โ€”โ€”โ€”โ€”โ€”โ€”โ€”โ€”โ€”โ€”โ€”โ€”โ€”โ€”โ€”โ€”โ€”+       +
	++++++++++++++++++++++++++++++++

	hrtimer_cpu_base		// cpu๊ฐœ์ˆ˜๋งŒํผ ๋ฐ˜๋ณต
	++++++++++++++++++++++++++++++++
	โ€ฆ



hrtimer_clock_base - the timer base for a specific clock


Q. raise_softirq_irqoff ํ•  ๋•Œ ksoftirqd๊ฐ€ ์ž๊ณ  ์žˆ์„ ๋•Œ๋Š” pending์‹œํ‚ค๊ณ  ๊นจ์šฐ๋Š”๋ฐ, ๊นจ์–ด ์žˆ์„ ๋•Œ๋Š”?
	ํ•œ ๋ฒˆ ํ›‘๊ณ  ์ง€๋‚˜๊ฐ„ ๋‹ค์Œ ๋‹ค์‹œ ๋Œ๋ฆฐ๋‹ค.




tick_setup_periodic
	tick_set_periodic_handler(dev, broadcast);

tick_handle_periodic
	tick_periodic
		if (tick_do_timer_cpu == cpu) {
			do_timer(1)

2014.11.22 static struct clock_event_device sp804_clockevent

* putback_lru_page ํ˜ธ์ถœํ•˜๋Š” ๊ณณ
	putback_lru_pages
	shrink_page_list
	putback_inactive_pages
	shrink_active_list
	
*
arm_dma_alloc
	dma_alloc_from_coherent
	__dma_alloc
		__alloc_from_contiguous
			dma_alloc_from_contiguous
				alloc_contig_range
					__alloc_contig_migrate_range
						putback_lru_pages


http://linux-mm.org/PageReplacementDesign
http://studyfoss.egloos.com/viewer/5512112

2014.11.29 * tick_setup_device : tick device์— ์ƒˆ๋กœ์šด clock event device๋ฅผ ๋“ฑ๋กํ•œ๋‹ค. if (tick_do_timer_cpu == TICK_DO_TIMER_BOOT) { : ์ดˆ๊ธฐ๊ฐ’์ด๋ผ๋ฉด tick_next_period = ktime_get(); tick_period = ktime_set(0, NSEC_PER_SEC / HZ);

* tick_periodic : tick_next_period๋Š” ์ดˆ๊ธฐ๊ฐ’ 0๋ถ€ํ„ฐ tick_period๋งŒํผ ๋ˆ„์ ์‹œํ‚จ ๊ฐ’. ์ฒ˜์Œ tick_setup_device ํ˜ธ์ถœ๋˜์—ˆ์„ ๋•Œ.
	if (tick_do_timer_cpu == TICK_DO_TIMER_BOOT) {
		tick_next_period = ktime_add(tick_next_period, tick_period);

* tick_init_jiffy_update : last_jiffies_update์˜ ์ดˆ๊ธฐ๊ฐ’์€ tick_next_period
	last_jiffies_update = tick_next_period;



timekeeping_init();	// timekeeper init
time_init();		// v2m_timer_init


tick_setup_device
	tick_device_uses_broadcast



๋ถ„์„โ€ฆ
hrtimer_switch_to_hres
	tick_setup_sched_timer
		hrtimer_start_expires
			hrtimer_start_range_ns

2014.12.06

NO_HZ => tick์ด ํ•ญ์ƒ periodicํ•˜๊ฒŒ ๋ฐœ์ƒํ•˜์ง€ ์•Š๋Š”๋‹ค.
HRTIMER => 

no_hz์—์„œ ๋‹ค์Œ ์ฃผ๊ธฐ๋ฅผ ๋ฐ”๊ฟ”์ฃผ๋Š” ๋ถ€๋ถ„?
no_hz์ธ ๊ฒฝ์šฐ๋ผ๋„ ํ•˜๋Š” ์ผ์ • ์ฃผ๊ธฐ๋กœ update๊ฐ€ ๋˜์–ด์•ผ ํ•œ๋‹ค.

oneshot

๋ชจ๋“  core๋ฅผ ๋‹ค ์ฃฝ์˜€๋Š”๋ฐ, ๊ทธ๋ ‡๋‹ค๋ฉด jiffies๋ฅผ ๋งˆ์ง€๋ง‰์œผ๋กœ 

* cpu ํ•˜๋‚˜๋Š” ๋‚จ์•„์„œ jiffies๋ฅผ update ํ•œ๋‹ค.
core ์ค‘ ํ•˜๋‚˜๋Š” idle(WFI)์ƒํƒœ๋ฅผ ๊นจ์›Œ์ค„ ์ˆ˜ ์žˆ์–ด์•ผ ํ•œ๋‹ค.

cpu hotplug online/offline์—์„œ ARM์€ 0๋ฒˆ์€ ๋Œ ์ˆ˜ ์—†๋‹ค.

* cpu_idle ๋ถ„์„ํ•  ๋‚ด์šฉ์ด ๋งŽ๋‹ค.


hrtimer_interrupt
tick_nohz_handler	; nohz timer interrupt handler for low res.


* softirq
	raise_softirq๋Š” pending๋งŒ ์‹œ์ผœ๋‘”๋‹ค.
	__do_softirq์—์„œ 

irq_exit()
	if (!in_interrupt() && local_softirq_pending())
		invoke_softirq();


* ์•„๋ž˜ ๋ฉ”์‹œ์ง€๋Š” ์–ด๋””์„œ? (arch/arm/mm/fault.c)
โ€œUnable to handle kernel paging request at virtual address โ€ฆโ€

2014.12.13 persistent_clock? boot_clock?

PTP : The Precision Time Protocol is a protocol used to synchronize clocks throughout a computer network.

CLOCK_REALTIME : xtime์— ์‚ฌ์šฉ. The xtime variable stores the current time and date (wallcloc k time. 1970๋…„ ์ดํ›„๋กœ ๋ช‡ ์ดˆ ์ง€๋‚ฌ๋Š”์ง€)
CLOCK_MONOTOMIC : ๋‹จ์กฐ์ฆ๊ฐ€


* The common Clk Framework
  https://www.kernel.org/doc/Documentation/clk.txt
  http://elinux.org/images/b/b8/Elc2013_Clement.pdf


struct fixed_rate
+โ€”โ€”โ€”โ€”โ€”โ€”โ€”โ€”โ€”โ€”โ€”โ€”โ€”โ€”โ€”โ€”โ€”โ€”โ€”โ€”โ€”โ€”โ€”โ€”โ€”โ€”โ€”โ€”โ€”+
|                             |	
| struct clk_hw               |
| +โ€”โ€”โ€”โ€”โ€”โ€”โ€”โ€”โ€”โ€”โ€”โ€”โ€”โ€”โ€”โ€”โ€”โ€”โ€”โ€”โ€”โ€”โ€”โ€”+  |
| | struct clk *           |  |
| | struct clk_init_data * |  |
| +โ€”โ€”โ€”โ€”โ€”โ€”โ€”โ€”โ€”โ€”โ€”โ€”โ€”โ€”โ€”โ€”โ€”โ€”โ€”โ€”โ€”โ€”โ€”โ€”+  |
| fixed_reate                 |
|                             |
| flags                       |
+โ€”โ€”โ€”โ€”โ€”โ€”โ€”โ€”โ€”โ€”โ€”โ€”โ€”โ€”โ€”โ€”โ€”โ€”โ€”โ€”โ€”โ€”โ€”โ€”โ€”โ€”โ€”โ€”โ€”+

v2m_clk_init
	clk = clk_register_fixed_rate(..)
	clk_register_clkdev(clk, โ€ฆ)


drivers/clk/clkdev.c
include/linux/clk-private.h


arch/arm/common/timer-sp.c: sp804_get_clock_rate
	clk_get_sys(โ€œsp804โ€, name);
arch/arm/kernel/smp_twd.c : twd_get_clock
	clk_get_sys(โ€œsmp_twdโ€, NULL);

v2m_clk_init
	clk_register_fixed_rate
	v2m_osc_register

2014.12.20 get_slab์—์„œ size 0์„ ์š”์ฒญํ•˜๋ฉด ZERO_SIZE_PTR์ด ๋ฆฌํ„ด๋œ๋‹ค. NULL์€ ์•„๋‹ˆ๋‹ค.

clk_register_fixed_rate : clk ๊ตฌ์กฐ์ฒด๋ฅผ ์ƒ์„ฑํ•˜๊ณ  hierarchy ๊ตฌ์„ฑ. parent๊ฐ€ ๋ฐ”๋€Œ๋ฉด ๊ทธ child๊นŒ์ง€ ๋ณ€๊ฒฝ๋œ๋‹ค.
clk_register_clkdev     : clk_lookup์„ ์œ„ํ•œ ๊ตฌ์กฐ์ฒด๋ฅผ clk๋ฅผ ๋ฐ›์•„ ์ƒ์„ฑํ•˜๊ณ , ์ „์—ญ ๋ฆฌ์ŠคํŠธ์— ๋“ฑ๋กํ•œ๋‹ค.
	clk_find๋Š” dev_id, con_id๋ผ๋Š” ๋ฌธ์ž์—ด๋กœ ์ด๋ฃจ์–ด์ง„๋‹ค.

2014.12.27

Documentation/timers/timekeeping.txt

clocksource
clockevent : Clock events manages and distributes clock events and coordinates the use of clock event handling functions.
    	ํด๋Ÿญ์ด๋ฒคํŠธ๋Š” ํด๋Ÿญ ์ด๋ฒคํŠธ๋ฅผ ๊ด€๋ฆฌํ•˜๊ณ  ๋ถ„๋ฐฐํ•˜๋ฉฐ, ํด๋Ÿญ ์ด๋ฒคํŠธ ํ•ธ๋“ค๋ง ํ•จ์ˆ˜๋ฅผ ์‚ฌ์šฉํ•ด ์กฐ์ •ํ•œ๋‹ค.
clock_event_device

sched_clock


*** Hrtimers and Beyond: Transforming the Linux Time Subsystems
https://www.kernel.org/doc/ols/2006/ols2006v1-pages-333-346.pdf

OS์—์„œ Clock๊ณผ ๊ด€๋ จ๋œ ์„œ๋น„์Šค
- time keeping
- clock synchronization
- time-of-day representation
- next event interrupt scheduling
- process and in-kernel timers
- process accounting
- process profiling

hardware device : ํด๋Ÿญ ์†Œ์Šค๋“ค์„ ๊ณต๊ธ‰ํ•  ์ˆ˜ ์žˆ์œผ๋ฉฐ, ํŠน์„ฑ์€ ํ•˜๋“œ์›จ์–ด๋งˆ๋‹ค ๋‹ค๋ฅด๋‹ค.



*** The common clock framework

clk_get
clk_get_sys



setup_irq ์ค‘๊ฐ„์— ๋‚˜์˜จ bus_lock?

2015.01.03

qemu๋กœ ํ™•์ธํ•ด ๋ณด๋ฉด
# dmesg | grep -i clock
sched_clock: 32 bits at 24MHz, resolution 41ns, wraps every 178956ms
smp_twd: clock not found: -2
Switching to clocksource v2m-timer1		// v2m_sp804_init์—์„œ sp804_clocksource_init์œผ๋กœ ํด๋Ÿญ์†Œ์Šค ๋“ฑ๋ก (์ดํ›„ jiffies๋„ clocksource๋กœ ๋“ฑ๋ก)
rtc-pl031 mb:rtc: setting system clock to 2015-01-03 06:18:37 UTC (1420265917)

# cat /sys/devices/system/clocksource/clocksource0/current_clocksource 
v2m-timer1


* Documentation/timers/timekeeping.txt

     14 To provide timekeeping for your platform, the โ€œclock sourceโ€ provides
     15 the basic timeline, whereas โ€œclock eventsโ€ shoot interrupts on certain points
     16 on this timeline, providing facilities such as high-resolution timers.
     17 โ€œsched_clock()โ€ is used for scheduling and timestamping, and delay timers
     18 provide an accurate delay source using hardware counters.


* v2m clocksource ๋“ฑ๋ก : rating์„ ํ•จ๊ป˜ ์ „๋‹ฌํ•ด ๋†’์€ ์ˆœ์„œ๋Œ€๋กœ ๋ฆฌ์ŠคํŠธ(clocksource_list)์— ์ถ”๊ฐ€ํ•œ๋‹ค.
	clocksource_mmio_init
		clocksource_register_hz
			__clocksource_register_scale
				__clocksource_updatefreq_scale(cs, scale, freq);

				MUTEX_LOCK
				clocksource_enqueue
				clocksource_enqueue_watchdog
				clocksource_select
				MUTEX_UNLOCK

2015.01.10 arch/arm/Kconfig์˜ HOTPLUG_CPU => sys/devices/system/cpu

#define get_cpu()       ({ preempt_disable(); smp_processor_id(); })
#define put_cpu()       preempt_enable()


kswapd
	balance_pgdat


zone
	lruvec
		lists[NR_LRU_LISTS]		// add_page_to_lru_list. page->lru๊ฐ€ ์—ฌ๊ธฐ ์ถ”๊ฐ€๋  ๋•Œ๋งŒ flag ์†์„ฑ์˜ PG_lru๊ฐ€ ์„ค์ •๋œ๋‹ค.

	pageset (struct per_cpu_pageset)	// __percpu
		pcp (struct per_cpu_pages)
			lists[MIGRATE_PCPTYPES]	// list์˜ ์•ž์€ hot, ๋์€ cold. free_hot_cold_page์— ์˜ํ•ด ์ถ”๊ฐ€๋จ. order=0์งœ๋ฆฌ ํŽ˜์ด์ง€๋“ค.



* ๋ณ„๋„๋กœ percpu๋กœ ์กด์žฌํ•˜๋Š” lru_add_pvecs.
  pagevec ์ค‘ lru ์ถ”๊ฐ€/์‚ญ์ œ์šฉ cache.
static DEFINE_PER_CPU(struct pagevec[NR_LRU_LISTS], lru_add_pvecs);
static DEFINE_PER_CPU(struct pagevec, lru_rotate_pvecs);
static DEFINE_PER_CPU(struct pagevec, lru_deactivate_pvecs);

	__lru_cache_add ์— ์˜ํ•ด lru cache์— ๋“ฑ๋ก๋จ.



buffered_rmqueue
	if (order == 0)
		list = pcp->lists[migratetype];
		rmqueue_bulk (order = 0)	// pcp->lists[migratetype]์—์„œ ๊ฐ€์ ธ์˜ค๋Š”๋ฐ, ๋งŒ์•ฝ ๋น„์–ด ์žˆ๋‹ค๋ฉด buddy๋กœ๋ถ€ํ„ฐ ๋ฐ›์•„์™€ ์ฑ„์›Œ๋†“๋Š”๋‹ค.
	else
		__rmqueue (order)
	

* percpu ๋ณ€์ˆ˜ ์„ ์–ธํ•˜๊ณ  ์ถ”๊ฐ€/์‚ญ์ œ ํ•ด๋ณด์ž
* waitqueue


* lruvec์˜ lru ๋ฆฌ์ŠคํŠธ์— ์ถ”๊ฐ€
lru_add_drain					// cpu์˜ pagevec์—์„œ page๋ฅผ ๋น„์›Œ zone์˜ lruvec์— ์ถ”๊ฐ€ํ•œ๋‹ค.
	lru_add_drain_cpu
		__pagevec_lru_add
			__pagevec_lru_add_fn
				add_page_to_lru_list
		activate_page_drain(cpu)	// activate ์‹œ์ผœ zone์˜ lru list์— ๋“ฑ๋กํ•œ๋‹ค.
			__activate_page
				add_page_to_lru_list

2015.01.17 vexpress CONFIG_PLAT_VERSATILE=y

* task_struct์™€ thread_info๋Š” 1:1 ๊ด€๊ณ„
	task_struct ์„ ์–ธ์œ„์น˜ : include/linux/sched.h
		kmem_cache_alloc์œผ๋กœ ํ• ๋‹น
	thread_info ์„ ์–ธ์œ„์น˜ : arch/arm/include/asm/thread_info.h
		.stack์„ ์œ„ํ•ด ๊ณต๊ฐ„์„ ํ• ๋‹น๋ฐ›๊ณ , overlayํ•ด ์‚ฌ์šฉ๋œ๋‹ค.

* thread_info์˜ cpu์— ์–ธ์ œ ํ˜„์žฌ ์ˆ˜ํ–‰ ์ค‘์ธ cpu๋ฒˆํ˜ธ๋ฅผ ๋„ฃ๋Š” ๊ฒƒ์ธ๊ฐ€?
	smp_processor_id๋Š” thread_info์—์„œ cpu๋ฒˆํ˜ธ๋ฅผ ๊ฐ€์ ธ์˜ค๋Š”๋ฐ, ์—ฌ๊ธฐ์—์„œ ๊ถ๊ธˆ์ฆ ์‹œ์ž‘

	# define smp_processor_id() raw_smp_processor_id()
	#define raw_smp_processor_id() (current_thread_info()->cpu)

	kernel/sched/sched.h์—์„œ
	set_task_cpu
	init_idle
		__set_task_cpu
			task_thread_info(p)->cpu = cpu;
* cpu์˜ rq์— idle task๋ฅผ ๊ฐ€๋ฆฌํ‚ค๋Š” ํฌ์ธํ„ฐ๊ฐ€ ์กด์žฌํ•œ๋‹ค.


platform_smp_prepare_cpus
	v2m_flags_set(virt_to_phys(versatile_secondary_startup));	// ๋‹ค๋ฅธ ์ฝ”์–ด๊ฐ€ ๊นจ์–ด๋‚ฌ์„ ๋•Œ ์ˆ˜ํ–‰ํ•  ์ฝ”๋“œ ์ฃผ์†Œ.
		// when pen released, 
		secondary_startup


start_kernel
	sched_init
		init_idle(current, smp_processor_id());
			rq->curr = rq->idle = idle;
		#ifdef CONFIG_SMP
		idle_thread_set_boot_cpu();			// percpu ๋ณ€์ˆ˜์— ๋„ฃ์–ด์ค€ ๋ถ€๋ถ„
		#endif
			per_cpu(idle_threads, smp_processor_id()) = current;

kernel_init
	smp_init
		idle_threads_init				// idle_threads๋ฅผ ์ดˆ๊ธฐํ™” ํ•œ๋‹ค.
			idle_init(cpu);				// boot_cpu๊ฐ€ ์•„๋‹Œ ๊ฒƒ๋“ค์— ๋Œ€ํ•ด์„œ๋งŒ.
				tsk = fork_idle(cpu);
					copy_process( โ€ฆ, init_struct_pid, โ€ฆ)	// struct ๊ตฌ์กฐ์ฒด ๋ณต์‚ฌ
					init_idle_pids(task->pids);
					init_idle(task, cpu);	// 
				per_cpu(idle_threads, cpu) = tsk; // percpu idle_threads๋กœ tsk๋ฅผ ์ง€์ •
		cpu_up(cpu)
			_cpu_up(cpu, tasks_frozen = 0)
				idle = idle_thread_get(cpu)	// idle_init์—์„œ ๋„ฃ์–ด๋‘” idle_threads๋ฅผ ์ฝ์–ด์˜จ๋‹ค.
					tsk = per_cpu(idle_threads, cpu);	// idle_threads๋กœ ์ง€์ •ํ•œ tsk๋ฅผ ๋ฐ›์•„์™€
					init_idle (tsk, cpu);	// ํ•ด๋‹น cpu์˜ idle task๋กœ ์ง€์ •ํ•œ๋‹ค.
						__set_task_cpu	// cpu ๋ฒˆํ˜ธ๋ฅผ stack์˜ cpu์— ๋„ฃ์–ด์คŒ.
				__cpu_up(cpu, idle)		// secondary_data ์ดˆ๊ธฐํ™” (.stack, .pgdir, .swapper_pg_dir)
					boot_secondary(cpu, idle)	// pen_release ๊ฐ’์„ ์จ์ค€๋‹ค.
						write_pen_release(cpu_logical_map(cpu))
						gic_raise_softirq(cpumask_of(cpu), 0)


copy_process
	p = dup_task_struct(current);
		ti = alloc_thread_info_node(tsk, node);		// thread_info๋ฅผ ์œ„ํ•œ ๋ฉ”๋ชจ๋ฆฌ ๊ณต๊ฐ„์ด ํ• ๋‹น๋œ๋‹ค. task_struct์€ kmem_cache_alloc์œผ๋กœ ํ• ๋‹น๋ฐ›๋Š”๋ฐ, ์™œ?
		err = arch_dup_task_struct(tsk, orig);
		tsk->stack = ti;
		setup_thread_stack(tsk, orig);
	โ€ฆ
	copy_thread()


* pen_release

	platsmp.c: platform_smp_prepare_cpus ์—์„œ soft interrupt๋ฅผ ๋ฐ›์œผ๋ฉด ์ด ํ•จ์ˆ˜๋กœ ์ด๋™ํ•œ๋‹ค.
	versatile_secondary_startup				// pen_release์— ์ž๊ธฐ cpu๊ฐ€ ์˜ฌ ๋•Œ๊นŒ์ง€ ๋Œ€๊ธฐํ•œ๋‹ค.
		secondary_startup				// __secondary_data๋ฅผ ์ฝ์–ด์˜จ๋‹ค.
			#PROCINFO_INITFUNC			// __v7_ca9mp_setup
			__enable_mmu				// r4 => TTBR0 ; __secondary_data.pgdir
				__turn_mmu_on
					__secondary_switched
						secondary_start_kernel
							โ€ฆ
							cpu_switch_mm(mm->pgd, mm);
							โ€ฆ
							cpu_init
							platform_secondary_init
								gic_secondary_init(0);
								write_pen_release(-1);	// pen_release๋ฅผ ํ‘ผ๋‹ค.
							โ€ฆ
							cpu_idle




* TLS
	๋ฉ€ํ‹ฐ์Šค๋ ˆ๋“œ ์–ดํ”Œ๋ฆฌ์ผ€์ด์…˜์—์„œ ๋™์ผํ•œ ๋ฉ”๋ชจ๋ฆฌ ๊ณต๊ฐ„์„ ๊ณต์œ ํ•˜๋Š” (์ฃผ๋กœ ์ „์—ญ๋ณ€์ˆ˜) ํ”„๋กœ์„ธ์„œ์˜ ์Šค๋ ˆ๋“œ๋“ค์€
	๋•Œ๋•Œ๋กœ ์Šค๋ ˆ๋“œ๋งˆ๋‹ค ์œ ์ผํ•œ ๋ฐ์ดํ„ฐ๋ฅผ ์ €์žฅํ•ด์•ผ ํ•  ํ•„์š”๊ฐ€ ์žˆ๋‹ค. errno๊ฐ€ ๋Œ€ํ‘œ์ ์ธ ์˜ˆ์ด๋‹ค.
	Cortex-A ์‹œ๋ฆฌ์ฆˆ๋กœ ์˜ค๊ธฐ ์ „๊นŒ์ง€, ARM์€ TLS๋ฅผ ์ง€์›ํ•˜์ง€ ์•Š์•˜์œผ๋ฏ€๋กœ ๋ฆฌ๋ˆ…์Šค์—์„œ ์—๋ฎฌ๋ ˆ์ด์…˜ ํ•ด์ฃผ์—ˆ๋‹ค.

	arch/arm/asm

* thread_info ๊ด€๋ จ ํ•จ์ˆ˜/๋งคํฌ๋กœ
	arch/arm/include/asm/thread_info.h

	current_thread_info

	thread_saved_pc(tsk)	<- stacktrace ๋“ฑ์—์„œ ์‚ฌ์šฉ
	thread_saved_sp(tsk)
	thread_saved_fp(tsk)

	unwind_backtrace๋กœ ์ด์ „ ๋ฐ์ดํ„ฐ์˜ stack ๋‚ด์šฉ์„ ์ถœ๋ ฅํ•  ๋•Œ ์‚ฌ์šฉ๋œ๋‹ค.
	frame.sp = thread_saved_sp(tsk);

* ํ˜„์žฌ thread๋ฅผ sp๋ฅผ ํ†ตํ•ด ์ ‘๊ทผํ•ด ์–ป์–ด ์˜ค๋Š” ๋ฐฉ๋ฒ•.
	sp๊ฐ€ ๊ฐ€๋ฆฌํ‚ค๋Š” stack์˜ ๋‚ฎ์€ ์ฃผ์†Œ์— thread_info๊ฐ€ overlay ๋˜์–ด ์žˆ๊ณ ,
	์ด thread_info์—์„œ task๋ผ๋Š” ํฌ์ธํ„ฐ๋กœ task_struct๋ฅผ ๊ฐ€๋ฆฌํ‚ค๊ณ  ์žˆ๋‹ค.

	#define current (get_current())
	get_current
		return current_thread_info()->task;
			// current_thread_info๋Š”
			// register unsigned long sp asm ("sp");
			// return (struct thread_info *)(sp & ~(THREAD_SIZE - 1));

	๋ฌธ๋งฅ๊ตํ™˜์‹œ sp ์—ญ์‹œ ๋ณ€๊ฒฝ๋œ๋‹ค.
	__schedule
		context_switch
			switch_to
				__switch_to		// ๋‘ task ์‚ฌ์ด์˜ ๋ฌธ๋งฅ ์ „ํ™˜ ํ•จ์ˆ˜. register๋ฅผ ๊ตํ™˜ํ•œ๋‹ค. sp ํฌํ•จ.

* arm_syscall

* ACPS
	Table 6.1. APCS registers
	Register	APCS name	APCS role
	r0	a1	argument 1/scratch register/result
	r1	a2	argument 2/scratch register/result
	r2	a3	argument 3/scratch register/result
	r3	a4	argument 4/scratch register/result
	r4	v1	register variable
	r5	v2	register variable
	r6	v3	register variable
	r7	v4	register variable
	r8	v5	register variable
	r9	sb/v6	static base/register variable
	r10	sl/v7	stack limit/stack chunk handle/register variable
	r11	fp/v8	frame pointer/register variable
	r12	ip	scratch register/new -sb in inter-link-unit calls
	r13	sp	lower end of the current stack frame
	r14	lr	link register/scratch register
	r15	pc	program counter

2015.01.24 2015.01.31 * BogoMIPS (lpj base๋กœ ๊ณ„์‚ฐํ•œ๋‹ค)

calibrate_delay_converge


* ํ•œ jiffies ๋‚ด์—์„œ ์–ผ๋งˆ๋งŒํผ์˜ ์—ฐ์‚ฐ(์—ฌ๊ธฐ์„œ๋Š” delay)์„ ์ฒ˜๋ฆฌํ•  ์ˆ˜ ์žˆ๋Š”์ง€ ๋ด์•ผํ•˜๊ธฐ ๋•Œ๋ฌธ์— ๋‹ค์Œ ์ง€ํ”ผ๊ฐ€ ์‹œ์ž‘๋  ๋•Œ๊นŒ์ง€ ๋Œ€๊ธฐํ•œ๋‹ค.
	tick = jiffies;
	while (tick == jiffies) ;
	tick = jiffies;


[1๋‹จ๊ณ„] trial ๋‹จ์œ„๋กœ delay๋ฅผ ์ˆ˜ํ–‰ํ•˜๋ฉฐ tick์ด ๋ณ€ํ•˜์ง€ ์•Š๋Š” ์ตœ๋Œ€ tiral ์œ„์น˜๋ฅผ ์ฐพ์•„๋‚ธ๋‹ค.

* ๋ฐด๋“œ๊ฐ€ ์ด๋™ํ•  ๋•Œ๋งˆ๋‹ค trial ์ˆ˜๋Š”  2๋ฐฐ์”ฉ ์ฆ๊ฐ€ํ•˜๊ณ , trial ๋‚ด์˜ delay๋Š” ๋ฐด๋“œ๋ฅผ ๋”ฐ๋ผ๊ฐ„๋‹ค.
	o  : delay ํ•˜๋‚˜ (lpj ๋‹จ์œ„)
	[] : trial ํ•˜๋‚˜

band
    0    1    2    3    4 ..
        [o]  [oo] [ooo]
        [o]  [oo] [ooo]
         [oo] [ooo]
         [oo] [ooo]
              [ooo]
              [ooo] <- tick์ด ๋„˜์–ด๊ฐ€๋ฉด ๋ฐด๋“œ ๋‚ด์˜ ํŠน์ • trial๊นŒ์ง€ ์ˆ˜ํ–‰ํ•œ ์ƒํƒœ์ด๋‹ค.
              [ooo]
              [ooo]


* ๋‚˜๋จธ์ง€ ๋ถ€๋ถ„์„ ์ฐพ๊ธฐ ์œ„ํ•ด, ์ตœ๊ณ ์ •๋ฐ€๋„์— ๋„๋‹ฌํ•  ๋•Œ๊นŒ์ง€ ์ •๋ฐ€๋„๋ฅผ 2๋ฐฐ์”ฉ ๋†’์—ฌ๊ฐ€๋ฉฐ lpj์— ๋ˆ„์ ์‹œํ‚จ๋‹ค.

2015.02.07

mm_rmap

mmap_init : vm_committed_as 0์œผ๋กœ ์ดˆ๊ธฐํ™”

overcommit : Documentation/vm/overcommit-accounting
	0 - (Default) Heuristic overcommit handling.
	1 - Always overcommit.
	2 - Don't overcommit.

2015.02.14

sector => ๋ฌผ๋ฆฌ์  addressing์ด ๊ฐ€๋Šฅํ•œ ์ตœ์†Œ ๋‹จ์œ„.
sector : block = 1~N : 1

block <= PAGE
block : buffer = 1 : 1

buffer head => ๋ฒ„ํผ ๋””์Šคํฌ๋ฆฝํ„ฐ

fs/block_dev.c๋ฅผ ๋ณด๋ฉด ๋‘ ๊ฐœ์˜ operations๊ฐ€ ์žˆ๋‹ค.


* address_space_operation : 

static const struct address_space_operations def_blk_aops = { .readpage = blkdev_readpage, .writepage = blkdev_writepage, .write_begin = blkdev_write_begin, .write_end = blkdev_write_end, .writepages = generic_writepages, .releasepage = blkdev_releasepage, .direct_IO = blkdev_direct_IO, };

const struct file_operations def_blk_fops = { .open = blkdev_open, .release = blkdev_close, .llseek = block_llseek, .read = do_sync_read, .write = do_sync_write, .aio_read = generic_file_aio_read, .aio_write = blkdev_aio_write, .mmap = generic_file_mmap, .fsync = blkdev_fsync, .unlocked_ioctl = block_ioctl, #ifdef CONFIG_COMPAT .compat_ioctl = compat_blkdev_ioctl, #endif .splice_read = generic_file_splice_read, .splice_write = generic_file_splice_write, };

ํŒŒ์ผ์‹œ์Šคํ…œ๋งˆ์šดํŠธ
https://www.linux.co.kr/home2/board/subbs/board.php?bo_table=lecture&wr_id=1644

address_space๋Š” page cache๋ฅผ ๊ตฌํ˜„ํ•˜๋Š”, ์•„์ฃผ ์ค‘์š”ํ•œ ์—ญํ• ์„ ํ•˜๋Š” ๊ตฌ์กฐ์ฒด์ด๋‹ค.
๋˜ํ•œ ์‹ค์ œ๋กœ ํŒŒ์ผ์‹œ์Šคํ…œ์—์„œ ๋ธ”๋ก์„ ์ฝ๊ธฐ ์œ„ํ•œ ๊ธฐ๋Šฅ์„ ๊ตฌํ˜„ํ•˜๋Š” address_space_operations ๊ตฌ์กฐ์ฒด๋ฅผ ํฌํ•จํ•˜๊ธฐ๋„ ํ•œ๋‹ค. 

file_operations ๊ตฌ์กฐ์ฒด๋Š” ์‘์šฉ ํ”„๋กœ๊ทธ๋ž˜๋จธ๋“ค์ด ์ผ๋ฐ˜์ ์œผ๋กœ ์‚ฌ์šฉํ•˜๋Š” open, read, write, close, ioctl, mmap ๋“ฑ๊ณผ ์—ฐ๊ด€๋˜๋Š” ํ•จ์ˆ˜์ด๋‹ค.
์ผ๋ฐ˜์ ์œผ๋กœ file_operations์˜ ํ•จ์ˆ˜๋“ค์€ ์ตœ์ข…์ ์œผ๋กœ address_space_operations์˜ ํ•จ์ˆ˜๋“ค์„ ํ˜ธ์ถœํ•ด ์‹ค์ œ์ ์œผ๋กœ block device์—์„œ ํŽ˜์ด์ง€๋“ค์„ ์ฝ๊ฒŒ ๋œ๋‹ค.


Professional Linux Kernel Architecture



config_keys, config_security
credentials


์ง€์—ฐ๋œ ์ž‘์—…
	* tasklet   : 
	* workqueue : 

tasklet : softirq์— ๊ธฐ๋ฐ˜ํ•œ bottom half ๋งค์ปค๋‹ˆ์ฆ˜. include/linux/interrupt.h ์ฐธ๊ณ .

	DECLARE_TASKLET()
	tasklet_handler : softirq context์—์„œ ์ˆ˜ํ–‰๋˜๋ฉฐ, ํœด๋ฉด๋˜์ง€ ์•Š๋Š”๋‹ค.
	tasklet_schedule : tasklet์„ ์Šค์ผ€์ฅด(๋“ฑ๋ก)ํ•˜๋Š” ํ•จ์ˆ˜
		__tasklet_schedule
			๋กœ์ปฌ ์ธํ„ฐ๋ŸฝํŠธ๋ฅผ ๋ง‰์€ ์ƒํƒœ์—์„œ raise_softirq_irqoff๋กœ ํ˜ธ์ถœ.
			raise_softirq_irqoff๋Š” ํ•ด๋‹น softirq๋ฅผ pending ์‹œํ‚ค๊ณ , ์ธํ„ฐ๋ŸฝํŠธ ์ƒํ™ฉ์ด ์•„๋‹ˆ๋ผ๋ฉด wakeup_softirqd๋ฅผ ํ˜ธ์ถœํ•œ๋‹ค.
		ksoftirqd๋Š” core๋ณ„๋กœ ํ•˜๋‚˜์”ฉ ๋™์ž‘ํ•˜๋Š” thread.

	tasklet_action : tasklet softirq๊ฐ€ ํŽœ๋”ฉ๋˜๋ฉด ํ˜ธ์ถœ๋˜๋Š” action

softirq
	* ksoftirqd๋Š” core๋งˆ๋‹ค ํ•˜๋‚˜์”ฉ ์ƒ์„ฑ.
	* ์šฐ์„ ์ˆœ์œ„์— ๋”ฐ๋ผ ๋ฒˆํ˜ธ๊ฐ€ ์ฃผ์–ด์ง€๊ณ , tasklet๋„ ๊ทธ ์ค‘ ํ•˜๋‚˜.
	* 

	* invoke_softirq
		ํ˜ธ์ถœ์ง€์  : irq_exit

	* __do_softirq ํ˜ธ์ถœ์œ„์น˜ : do_softirq, invoke_softirq, run_ksoftirqd
		do_softirq ํ˜ธ์ถœ์ง€์  : 

	__do_softirq๋Š” __local_bh_disable์„ ํ˜ธ์ถœํ•œ ์ƒํƒœ์—์„œ, local interrupt๋ฅผ ํ™œ์„ฑํ™”ํ•œ ์ƒํƒœ๋กœ ์ง„ํ–‰ํ•œ๋‹ค.


/* ์ด ์ˆœ์„œ๊ฐ€ ์šฐ์„ ์ˆœ์œ„๋กœ, __do_softirq์—์„œ ์šฐ์„ ์ˆœ์œ„ ์ˆœ์œผ๋กœ ์ž‘์—…์„ ์‹คํ–‰ํ•œ๋‹ค. */
enum
{
    HI_SOFTIRQ=0,
    TIMER_SOFTIRQ,
    NET_TX_SOFTIRQ,
    NET_RX_SOFTIRQ,
    BLOCK_SOFTIRQ,
    BLOCK_IOPOLL_SOFTIRQ,
    TASKLET_SOFTIRQ,
    SCHED_SOFTIRQ,
    HRTIMER_SOFTIRQ,
    RCU_SOFTIRQ,    /* Preferable RCU should always be the last softirq */

    NR_SOFTIRQS
};

2015.02.21

tasklet_trylock(), tasklet_unlock()
	: TASKLET_STATE_RUN์„ ๊ฒ€์‚ฌํ•œ๋‹ค.


tasklet_disable(), tasklet_enable()
	: count, ์ฆ‰ disable count๋ฅผ ๋‘๊ณ  tasklet์˜ ์‹คํ–‰์„ ๋ง‰๋Š”๋‹ค.


tasklet_vec์€ percpu๋กœ ์กด์žฌํ•œ๋‹ค.



example) fs/proc/cpuinfo.c

file_operations
	.open
	.read
	.llseek
	.release

fs/seq_file.c
	seq_open
	seq_read
	seq_lseek
	seq_printf

seq_open์‹œ seq_operations ๊ตฌ์กฐ์ฒด๋ฅผ ์ง€์ •ํ•œ๋‹ค.
	seq_operations ๋ฅผ ๋“œ๋ผ์ด๋ฒ„ ๊ฐœ๋ฐœ์ž๊ฐ€ ๊ตฌํ˜„ํ•ด์•ผ ํ•˜๋Š” operations.





GFP_ATOMIC	!(__GFP_WAIT) && (__GFP_HIGH)
	interrupt context์—์„œ ๋ฉ”๋ชจ๋ฆฌ ํ• ๋‹น์„ ์œ„ํ•ด ์‚ฌ์šฉ๋˜๋Š” flag. ALLOC_HIGH๋กœ ํ• ๋‹น ์š”์ฒญ์ด ๋˜๊ณ , ํ• ๋‹น์ด ๋ถˆ๊ฐ€๋Šฅํ•˜๋‹ค ํ• ์ง€๋ผ๋„ sleep ๋˜์ง€ ์•Š๊ณ  ๋ฐ”๋กœ ๋ฆฌํ„ด๋œ๋‹ค.

	(__GFP_HIGH) == (ALLOC_HIGH)
	=> __zone_watermark_ok์—์„œ watermark ๊ธฐ์ค€์น˜๊ฐ€ ๋‚ฎ๊ฒŒ ์„ค์ •๋œ๋‹ค. ๋”ฐ๋ผ์„œ check๋ฅผ ํ†ต๊ณผํ•  ๊ฐ€๋Šฅ์„ฑ์ด ๋†’์•„์ง„๋‹ค.





struct percpu_counter {
	raw_spinlock_t lock;
	s64 count;
	s32 __percpu *counters;
};


ida & idr
http://studyfoss.egloos.com/5187192
 : ida๋Š” id ํ• ๋‹น๋งŒ์„ ๋ฐ›๊ณ  ํฌ์ธํ„ฐ ๋งคํ•‘์€ ํ•˜์ง€ ์•Š๋Š”๋‹ค. ida๋Š” layr[0]์˜ ary์— ๋…์ž์ ์ธ bitmap์— ๋Œ€ํ•œ ํฌ์ธํ„ฐ๋ฅผ ์ €์žฅํ•œ๋‹ค.

mnt_id๋ฅผ ์–ด๋””์„œ ์‚ฌ์šฉํ•˜๋Š”๊ฐ€? fhandle

fhandle
	name๋Œ€์‹  handle์„ ์“ด๋‹ค.
	name_to_handle_at : convert name to handle

	Open by handle
	http://lwn.net/Articles/375888/


sysfs_init๋ถ€ํ„ฐ โ€ฆ depth๊ฐ€ ๊นŠ์–ด์ง„๋‹ค.




[address space]
	page -> ์ปค๋„์ด ๋ฌผ๋ฆฌ์  ๋ฉ”๋ชจ๋ฆฌ๋ฅผ ๊ด€๋ฆฌํ•˜๊ธฐ ์œ„ํ•ด ์‚ฌ์šฉํ•˜๋Š” ๋‹จ์œ„


inode <-> address_space 1:1 ๋งคํ•‘

์—ฌ๊ธฐ์„œ address_space๋Š” 


- page cache ๋Š” struct address_space ๊ตฌ์กฐ์ฒด๋กœ ๊ด€๋ฆฌ๋ฉ๋‹ˆ๋‹ค.
- ๋ณดํ†ต inode์— ๋“ค์–ด ์žˆ๋Š”๋ฐ์š”, i_mapping ๋ณ€์ˆ˜ ์ž…๋‹ˆ๋‹ค.
- struct address_space ๊ตฌ์กฐ์ฒด์˜ page_tree ๋ฅผ ๋”ฐ๋ผ๊ฐ€์‹œ๋ฉด ํ˜„์žฌ page cache ์—์„œ ์บ์‰ฌํ•˜๊ณ  ์žˆ๋Š” ๊ฐ page ๋ฅผ ์•Œ์ˆ˜ ์žˆ์Šต๋‹ˆ๋‹ค. (๋”ฐ๋ผ๊ฐ€๋Š” ๋ถ€๋ถ„์€ linux ์†Œ์Šค ์ฝ”๋“œ๋ฅผ ํ™•์ธํ•ด ๋ณด์„ธ์š”)
- ์ฆ‰, struct page ๊ตฌ์กฐ์ฒด ์ •๋ณด๋ฅผ ์•Œ์ˆ˜ ์žˆ๋Š”๋ฐ์š”, struct page ๊ตฌ์กฐ์ฒด ์ •๋ณด์—์„œ ์‹ค์ œ ๋ฌผ๋ฆฌ์ฃผ์†Œ ์ •๋ณด๋ฅผ ์•Œ์•„๋‚ด์‹ค์ˆ˜ ์žˆ์„๋“ฏ ํ•ฉ๋‹ˆ๋‹ค.

2015.02.28

VFS
 - ์ปค๋„์—์„œ ํŒŒ์ผ์‹œ์Šคํ…œ(๋””๋ ‰ํ† ๋ฆฌ ๋ฐ ํŒŒ์ผ ์–ต์„ธ์Šค)์„ ๋‹ค๋ฃฐ ๋•Œ ์‚ฌ์šฉ๋˜๋Š” ์ปดํฌ๋„ŒํŠธ์ด๋‹ค.
 - ๋งŽ์€ ํŒŒ์ผ์‹œ์Šคํ…œ์˜ ๊ณตํ†ต ์ž‘์—…์„ ์ถ”์ƒํ™”ํ•ด ๋†“์€ ๋ ˆ์ด์–ด.
 - ํŒŒ์ผ ๊ด€๋ จ ์‹œ์Šคํ…œ์ฝœ์„ ํ†ตํ•ด ๋‹จ์ผ ์ธํ„ฐํŽ˜์ด์Šค๋ฅผ ์œ ์ €์—๊ฒŒ ์ œ๊ณตํ•œ๋‹ค.

	๋‚ด๋ถ€์ ์œผ๋กœ dcache์™€ inode cache๋ฅผ ์œ ์ง€ํ•œ๋‹ค.
 : http://haifux.org/lectures/119/linux-2.4-vfs/linux-2.4-vfs.html


* page cache
 : The page cache caches pages of files to optimize file I/O.

* buffer cache
 : The buffer cache caches disk blocks to optimize block I/O.


* the superblock object
 : ๋งˆ์šดํŠธ ๋œ ํŒŒ์ผ ์‹œ์Šคํ…œ์— ๊ด€ํ•œ ์ •๋ณด๋ฅผ ์ €์žฅํ•ฉ๋‹ˆ๋‹ค. ๋””์Šคํฌ ๊ธฐ๋ฐ˜ ํŒŒ์ผ ์‹œ์Šคํ…œ์˜ ๊ฒฝ์šฐ, ์ด ๊ฐœ์ฒด๋Š” ์ผ๋ฐ˜์ ์œผ๋กœ ๋””์Šคํฌ์— ์ €์žฅ๋œ ํŒŒ์ผ ์‹œ์Šคํ…œ์˜ ์ œ์–ด ๋ธ”๋ก์— ๋Œ€์‘ํ•œ๋‹ค.

* the inode object
 : ํŠน์ • ํŒŒ์ผ์— ๋Œ€ํ•œ ์ผ๋ฐ˜์ ์ธ ์ •๋ณด๋ฅผ ์ €์žฅํ•œ๋‹ค.
   ๋””์Šคํฌ ๊ธฐ๋ฐ˜ ํŒŒ์ผ ์‹œ์Šคํ…œ์˜ ๊ฒฝ์šฐ(๊ทธ ์™ธ ๊ฐ€์ƒ ํŒŒ์ผ์‹œ์Šคํ…œ, ๋„คํŠธ์›Œํฌ ํŒŒ์ผ์‹œ์Šคํ…œ ๋“ฑ), ์ด ๊ฐœ์ฒด๋Š” ์ผ๋ฐ˜์ ์œผ๋กœ ๋””์Šคํฌ์— ์ €์žฅ๋œ ํŒŒ์ผ ์ œ์–ด ๋ธ”๋ก์— ๋Œ€์‘ํ•œ๋‹ค.
   ๊ฐ ์•„์ด๋…ธ๋“œ ๊ฐ์ฒด๋Š” ๊ณ ์œ ์˜ ํŒŒ์ผ ์‹œ์Šคํ…œ ๋‚ด์—์„œ ํŒŒ์ผ์„ ์‹๋ณ„ํ•˜๋Š” ์•„์ด ๋…ธ๋“œ ๋ฒˆํ˜ธ์™€ ์—ฐ๊ด€๋˜์–ด ์žˆ์Šต๋‹ˆ๋‹ค.

   ํŒŒ์ผ์ด ์ƒ์„ฑ๋  ๋•Œ ๋งŒ๋“ค์–ด์ง„๋‹ค. dentry์™€ ๋‹ฌ๋ฆฌ ๋””์Šคํฌ์— ์กด์žฌํ•œ๋‹ค.

* inode cache
 : cache of "inode" objects, used to represent files and directories on the file systems.

* dcache
 : cache of "dentry" objects, used to translate paths to inodes.
   ํ•ด๋‹น ํŒŒ์ผ๊ณผ ๋””๋ ‰ํ† ๋ฆฌ ์—”ํŠธ๋ฆฌ์˜ ๋งํฌ์— ๋Œ€ํ•œ ์ •๋ณด๋ฅผ ์ €์žฅ (์ฆ‰, ํŒŒ์ผ์˜ ํŠน์ • ์ด๋ฆ„์ด๋‹ค). ๊ฐ๊ฐ์˜ ๋””์Šคํฌ ๊ธฐ๋ฐ˜ ํŒŒ์ผ ์‹œ์Šคํ…œ์— ์ €์žฅ ๋””์Šคํฌ์— ์ž์‹ ์˜ ํŠน๋ณ„ํ•œ ๋ฐฉ์‹์œผ๋กœ ์ด ์ •๋ณด๋ฅผ ์ œ๊ณตํ•ฉ๋‹ˆ๋‹ค.

   ๋””๋ ‰ํ† ๋ฆฌ ๋‚ด์˜ ํŒŒ์ผ ์ด๋ฆ„์€ ๋””์Šคํฌ์— ์ €์žฅ๋˜์–ด ์žˆ์ง€๋งŒ, ์„ฑ๋Šฅ ํ–ฅ์ƒ์„ ์œ„ํ•ด ๋ฉ”๋ชจ๋ฆฌ์— ์ €์žฅํ•œ๋‹ค.

   dcache๋Š” ๋ถ€๋ชจ dentry์™€ ํŒŒ์ผ ์ด๋ฆ„์„ ํ‚ค๋กœ ํ•˜๋Š” hash table๋กœ ๊ตฌํ˜„๋˜์–ด ์žˆ์œผ๋ฉฐ
   ์‹œ์Šคํ…œ์˜ ๊ฐ€์šฉ ๋ฉ”๋ชจ๋ฆฌ ํฌ๊ธฐ์— ๋”ฐ๋ผ ๋™์ ์œผ๋กœ ํฌ๊ธฐ๋ฅผ ์กฐ์ •ํ•˜๊ธฐ ์œ„ํ•ด LRU ๋ฆฌ์ŠคํŠธ๋ฅผ ๋ณ„๋„๋กœ ์œ ์ง€ํ•œ๋‹ค.

   dcache๋Š” ์ด๋ฆ„์„ ํ†ตํ•ด ํŒŒ์ผ์— ์ ‘๊ทผํ•˜๋Š” (๋ณดํ†ต path lookup ํ˜น์€ path walk๋ผ๊ณ  ํ•œ๋‹ค) ์‹œ์Šคํ…œ ์ฝœ(open, access, stat, chmod, ...)์˜ ์„ฑ๋Šฅ์— ๋งŽ์€ ์˜ํ–ฅ์„ ๋ฏธ์น˜๊ฒŒ ๋œ๋‹ค.


http://www.quora.com/Linux-Kernel/What-is-the-major-difference-between-the-buffer-cache-and-the-page-cache

* dentry์™€ inode cache ๊ด€๊ณ„
* file table๊ณผ file descriptor ๊ด€๊ณ„

2015.03.07 * file_system_type์€ โ€œsysfsโ€, โ€œext4โ€์™€ ๊ฐ™์€ ์‹œ์Šคํ…œ์— ์œ ์ผํ•œ filesystem์— ๋Œ€ํ•œ ์ž๋ฃŒ๊ตฌ์กฐ์ด๋‹ค. * superblock์€ ํŒŒํ‹ฐ์…˜๋งˆ๋‹ค ํ•˜๋‚˜์”ฉ ์กด์žฌ

* superblock์€ type์ด๋ผ๋Š” ํฌ์ธํ„ฐ๋กœ file_system_type์„ ๊ฐ€๋ฆฌํ‚จ๋‹ค.	
  file_system_type์˜ fs_supers ๋ฆฌ์ŠคํŠธ์—๋Š” ์—ฐ๊ฒฐ๋œ superblock ๋ฆฌ์ŠคํŠธ๋ฅผ ๊ฐ€๋ฆฌํ‚จ๋‹ค.

per-sb shrinker???



vfs_caches_init
	dcache_init
	inode_init
	files_init
	mnt_init
		init_rwsem
		kmem_cache_create
		__get_free_page
		sysfs_init				// 
		kobject_create_and_add
		init_rootfs
		init_mount_tree				// rootfs ๋งˆ์šดํŠธ
	bdev_cache_init
		kmem_cache_create
		register_filesystem
		kern_mount = kern_mount_data
		blockdev_superblock = bd_mnt->mnt_sb;	// for writeback
	chrdev_init


sysfs_init
	kern_mount = kern_mount_data
		vfs_kern_mount
			mount_fs
				sysfs_mount		// struct file_system_type์˜ .mount ์ฝœ๋ฐฑ ํ˜ธ์ถœ๋จ
					sget		// here
					sysfs_fill_super
					dget

alloc_super

2015.03.14

wait_table์€ zone์— ํฌํ•จ.
bit_waitqueue๋Š” ์™œ ์ด wait_table์— ์ข…์†๋˜๊ฒŒ ์„ค๊ณ„ ๋˜์—ˆ์„๊นŒ?

zone_wait_table_init
	zone์˜ wait_table ์ดˆ๊ธฐํ™”

kernel/wait.c

io_schedule()

virt_to_page
page_zone


wait_table, wait_table_bits, and wait_table_hash_nr_entries implement a wait queue for processes waiting for a page to become available. While the details of this mechanism are shown in Chapter 14, the intuitive notion holds pretty well: Processes queue up in a line to wait for some condition. When this condition becomes true, they are notified by the kernel and can resume their work.

2015.03.21

sysfs_mount
	sysfs_fill_super		// sget์— ์˜ํ•ด ์ƒˆ๋กœ ์ƒ์„ฑ๋œ sb์ธ ๊ฒฝ์šฐ, sb์˜ ์ •๋ณด๋ฅผ ์ฑ„์›Œ ๋„ฃ๋Š”๋‹ค.
		sysfs_get_inode
			iget_locked	// ino๋กœ inode๋ฅผ ์ฐพ์•„ ์˜ค๋ธŒ์ ํŠธ๋ฅผ ๋ฆฌํ„ดํ•˜๋Š”๋ฐ, ์กด์žฌํ•˜์ง€ ์•Š๋Š”๋‹ค๋ฉด ๊ฒ€์ƒ‰์šฉ์œผ๋กœ ๋ฐ›์€ inode ๋ฒˆํ˜ธ๋กœ inode๋ฅผ ํ• ๋‹นํ•˜๊ณ  i_state์— I_NEW๋ฅผ ๋ฆฌํ„ดํ•œ๋‹ค. ๋‚˜๋จธ์ง€ ์ •๋ณด๋Š” ํ˜ธ์ถœ์ž๊ฐ€ ์ฑ„์›Œ์•ผ ํ•œ๋‹ค.
		d_make_root		// root_inode๋ฅผ ๋ฐ›์•„ ์ด inode๋ฅผ ์œ„ํ•œ dentry๋ฅผ ํ• ๋‹น๋ฐ›๊ณ  ์ดˆ๊ธฐํ™”ํ•ด ๋ฆฌํ„ดํ•œ๋‹ค.
			__d_alloc()	// 
			d_instantiate()	// __d_alloc์ด ์„ฑ๊ณตํ–ˆ๋‹ค๋ฉด inode ์ •๋ณด๋ฅผ dentry์— ์ฑ„์šด๋‹ค.
			



Step by Step ์ปค๋„ ํ”„๋กœ๊ทธ๋ž˜๋ฐ ๊ฐ•์ขŒโ‘ค ํŒŒ์ผ์‹œ์Šคํ…œ๋งˆ์šดํŠธ
Step by Step ์ปค๋„ ํ”„๋กœ๊ทธ๋ž˜๋ฐ ๊ฐ•์ขŒโ‘ฅ ํŒŒ์ผ์‹œ์Šคํ…œ๋งˆ์šดํŠธ

Professional Linux Kernel Architecture. Ch.8 VFS

/usr/bin/emacs => โ€˜/โ€˜ -> โ€˜usrโ€™ -> โ€˜binโ€™ -> โ€˜emacsโ€™

1. root directory๋ถ€ํ„ฐ search ์‹œ์ž‘.


struct page ์˜ mapping 
	file->f_dentry->d_inode->i_mapping

2015.03.28

[freestyle@f17 sys]$ stat /sys File: `/sys' Size: 0 Blocks: 0 IO Block: 4096 ๋””๋ ‰ํ† ๋ฆฌ Device: eh/14d Inode: 1 Links: 13 Access: (0555/dr-xr-xr-x) Uid: ( 0/ root) Gid: ( 0/ root) Access: 2015-03-28 15:43:09.877767524 +0900 Modify: 2014-11-22 16:04:58.755999998 +0900 Change: 2014-11-22 16:04:58.755999998 +0900 Birth: -

__LITTLE_ENDIAN <- ์–ธ์ œ ์ •์˜๋˜์–ด ์ถ”๊ฐ€๋˜๋‚˜?

include/linux/seqlock.h์˜ seqcount


inode <-> dentry๋Š” 1:n ๊ด€๊ฒŒ.
inode๋Š” ํŒŒ์ผ ํ•˜๋‚˜๋‹น ํ•˜๋‚˜์”ฉ ์ƒ์„ฑ. dentry๋Š” ์—ฌ๋Ÿฌ ๊ฒฝ๋กœ๋กœ inode์— ์—ฐ๊ฒฐ๋  ์ˆ˜ ์žˆ๋‹ค.


d_make_root ๋ถ„์„ ์ค‘


http://egloos.zum.com/studyfoss/v/5469672
Documentation/filesystems/path-lookup.txt

Dcache scalability and RCU-walk
	https://lwn.net/Articles/419811/

rcu-walk
	


* dcache๋Š” directory-entry (= dentry)์˜ cache๋ฅผ ๋งํ•˜๋Š” ๊ฒƒ์ด๋‹ค.
* dcache๋Š” โ€˜๋ถ€๋ชจ dentry์™€ ํŒŒ์ผ ์ด๋ฆ„โ€™์„ ํ‚ค๋กœ ํ•˜๋Š” hash table๋กœ ๊ตฌํ˜„๋˜์–ด ์žˆ์œผ๋ฉฐ
  ์‹œ์Šคํ…œ์˜ ๊ฐ€์šฉ ๋ฉ”๋ชจ๋ฆฌ ํฌ๊ธฐ์— ๋”ฐ๋ผ ๋™์ ์œผ๋กœ ํฌ๊ธฐ๋ฅผ ์กฐ์ •ํ•˜๊ธฐ ์œ„ํ•ด LRU ๋ฆฌ์ŠคํŠธ๋ฅผ ๋ณ„๋„๋กœ ์œ ์ง€ํ•œ๋‹ค.

* ์ตœ์ข… ์œ„์น˜๊นŒ์ง€ path lookup์„ ํ•˜๋Š” ๊ณผ์ •๋งˆ๋‹ค dcache์—์„œ dentry๋ฅผ ์ฐพ๊ณ , ์ด ๊ณผ์ • ์ค‘์— ์ด๋ฏธ ์ฐพ์•„๋†“์€ ๊ฒฝ๋กœ์˜ dentry๊ฐ€ ํ•ด์ œ๋˜์ง€ ์•Š๋„๋ก ํ•˜๊ธฐ ์œ„ํ•ด
  ๊ฐ dentry์˜ ref-count๋ฅผ ์ฆ๊ฐ€์‹œํ‚ค๋Š” ๋ฐฉ๋ฒ•(ref-walk)์„ ์‚ฌ์šฉํ–ˆ๋‹ค.

  ํ•˜์ง€๋งŒ ์ด๋ ‡๊ฒŒ ref-count๋ฅผ ์ฆ๊ฐ€์‹œํ‚ค๋Š” ๋™์ž‘์ด ๋‹ค๋ฅธ cpu์˜ cache๋ฅผ ๊นจ๋จน๋Š” ๋“ฑ ์„ฑ๋Šฅ์— ์˜ํ–ฅ์„ ๋งŽ์ด ๋ฏธ์น˜๊ฒŒ ๋œ๋‹ค.
  dentry ์ž์ฒด๋ฅผ ์ˆ˜์ •ํ•˜์ง€ ์•Š๊ณ  path lookup์„ ์ˆ˜ํ–‰ํ•˜๋„๋ก ๊ฐฑ์‹ ๋˜์—ˆ๋‹ค.
  ํ•ญ์ƒ ์‚ฌ์šฉํ•  ์ˆ˜ ์žˆ๋Š” ๋ฐฉ๋ฒ•์€ ์•„๋‹ˆ๊ณ , fast-path์—๋งŒ ํ—ˆ์šฉ๋œ๋‹ค.

  rcu-walk : path lookup์‹œ ๊ฒฝ๋กœ์˜ dentry๊ฐ€ ๋ชจ๋‘ dcache์— ์žˆ๋‹ค๋Š” ๊ฐ€์ •ํ•˜์—, ์ „์ฒด ๋‹จ๊ณ„์— rcu_read_lock/rcu_read_unlock์„ ๋‘”๋‹ค.
  

lookup_fast
	__d_lookup_rcu


atomic_dec_and_lock
lib/dec_and_lock.c	<- include/linux/spinlock.h

dcache table์€ dcache_init_early์—์„œ ์ดˆ๊ธฐํ™” ํ•œ๋‹ค.
hlist_bl_head์˜ array ํ˜•ํƒœ์˜ table์ด๋‹ค. ์ฒซ๋ฒˆ์งธ node๋Š” bit lock ์†์„ฑ์„ ๊ฐ–๊ณ  ์žˆ๊ณ , rcuํ•จ์ˆ˜๋กœ ๋ณดํ˜ธ(include/linux/rculist_bl.h)๋  ์ˆ˜ ์žˆ๋‹ค.

2015.04.04

from Docu../filesystems/vfs.txt
โ€”โ€”โ€”โ€”โ€”โ€”โ€”โ€”โ€”โ€”โ€”โ€”โ€”โ€”โ€”โ€”โ€”โ€”โ€”โ€”โ€”โ€”โ€”โ€”โ€”โ€”โ€”โ€”โ€”โ€”โ€”


system call๋กœ ์ „๋‹ฌ๋œ ํŒŒ์ผ ์ด๋ฆ„์€, VFS์—์„œ directory entry cache๋ฅผ ๊ฒ€์ƒ‰ํ•˜๋Š”๋ฐ ์‚ฌ์šฉ๋œ๋‹ค.
์ด dcache๋Š” ๋น ๋ฅด๊ฒŒ pathname(filename)์„ ํŠน์ • dentry๋กœ ์ „ํ™˜์‹œ์ผœ์ฃผ๋Š” look-up ๋งค์ปค๋‹ˆ์ฆ˜์ด๋‹ค.
๋ฉ”๋ชจ๋ฆฌ์—๋งŒ ์กด์žฌํ•˜๋Š” ๊ฐ์ฒด๋กœ, ์˜ค์ง ํผํฌ๋จผ์Šค๋ฅผ ์œ„ํ•ด ์‚ฌ์šฉ๋œ๋‹ค.

pathname์œผ๋กœ dentry๋ฅผ ํ•ด์„ํ•œ ๋‹ค์Œ, VFS๋Š” ๊ทธ ๊ธธ์„ ๋”ฐ๋ผ dentry๋ฅผ ๋งŒ๋“ค๊ธฐ ์œ„ํ•ด resort ํ•ด์•ผ ํ•  ์ˆ˜๋„ ์žˆ๋‹ค.
๊ทธ๋ฆฌ๊ณ  inode๋ฅผ ๋กœ๋”ฉํ•œ๋‹ค.





from Linux Kernel Development
โ€”โ€”โ€”โ€”โ€”โ€”โ€”โ€”โ€”โ€”โ€”โ€”โ€”โ€”โ€”โ€”โ€”โ€”โ€”โ€”โ€”โ€”โ€”โ€”โ€”โ€”โ€”โ€”โ€”โ€”


* VFS์˜ ์ฃผ์š” ์˜ค๋ธŒ์ ํŠธ ํƒ€์ž…
  - superblock : ํŠน์ • ๋งˆ์šดํŠธ๋œ ํŒŒ์ผ์‹œ์Šคํ…œ์„ ํ‘œํ˜„ํ•˜๋Š” ์˜ค๋ธŒ์ ํŠธ
  - inode : ํŠน์ • ํŒŒ์ผ์„ ํ‘œํ˜„ํ•˜๋Š” ์˜ค๋ธŒ์ ํŠธ
  - dentry : directory entry๋ฅผ ํ‘œํ˜„ํ•˜๋Š” ์˜ค๋ธŒ์ ํŠธ. ํ•œ ๊ฒฝ๋กœ์—์„œ ํ•œ ์ปดํฌ๋„ŒํŠธ๋ฅผ ํ‘œํ˜„ํ•œ๋‹ค. (๋งˆ์šดํŠธ ํฌ์ธํŠธ์™€ ์ตœ์ข… ํŒŒ์ผ๋„ ํ•ด๋‹น)
  - file : ์–ด๋–ค ํ”„๋กœ์„ธ์Šค์™€ ์—ฐ๊ฒฐ๋˜์–ด ์—ด๋ฆฐ ํŒŒ์ผ์„ ํ‘œํ˜„ํ•˜๋Š” ์˜ค๋ธŒ์ ํŠธ

* VFS์˜ operations ์˜ค๋ธŒ์ ํŠธ
  - super_operations : ์ปค๋„์ด ํŠน์ • ํŒŒ์ผ์‹œ์Šคํ…œ์— ๋Œ€ํ•ด ํ˜ธ์ถœํ•  ์ˆ˜ ์žˆ๋Š” ๋ฉ”์†Œ๋“œ๋“ค์˜ ์˜ค๋ธŒ์ ํŠธ. ์˜ˆ) write_inode(), sync_fs()
  - inode_operations : ์ปค๋„์ด ํŠน์ • ํŒŒ์ผ์— ๋Œ€ํ•ด ํ˜ธ์ถœํ•  ์ˆ˜ ์žˆ๋Š” ๋ฉ”์†Œ๋“œ๋“ค์˜ ์˜ค๋ธŒ์ ํŠธ. ์˜ˆ) create(), link()
  - dentry_operations : ์ปค๋„์ด ํŠน์ • ๋””๋ ‰ํ† ๋ฆฌ ์—”ํŠธ๋ฆฌ์— ๋Œ€ํ•ด ํ˜ธ์ถœํ•  ์ˆ˜ ์žˆ๋Š” ๋ฉ”์†Œ๋“œ๋“ค์˜ ์˜ค๋ธŒ์ ํŠธ. ์˜ˆ) d_compare(), d_delete()
  - file_operations : ์–ด๋–ค ํ”„๋กœ์„ธ์Šค๊ฐ€ ์—ด๋ฆฐ ํŒŒ์ผ์— ๋Œ€ํ•ด ํ˜ธ์ถœํ•  ์ˆ˜ ์žˆ๋Š” ๋ฉ”์†Œ๋“œ๋“ค์˜ ์˜ค๋ธŒ์ ํŠธ. ์˜ˆ) read(), write()

* VFS์˜ ๋‹ค๋ฅธ ์˜ค๋ธŒ์ ํŠธ
  - file_system_type : ๋“ฑ๋ก๋œ ํŒŒ์ผ์‹œ์Šคํ…œ์— ๋Œ€ํ•œ ์„œ์ˆ ๊ณผ ์บํผ๋ธ”๋ฆฌํ‹ฐ ์ •๋ณด๋ฅผ ๋‚˜ํƒ€๋‚ธ๋‹ค.
  - vfsmount : ๋งˆ์šดํŠธ ์œ„์น˜์™€ ๋งˆ์šดํŠธ ํ”Œ๋ž˜๊ทธ ๋“ฑ ๋งˆ์šดํŠธ ํฌ์ธํŠธ ์ •๋ณด๋ฅผ ๋‚˜ํƒ€๋‚ธ๋‹ค.
  - fs_struct, file : ํ”„๋กœ์„ธ์Šค์™€ ์—ฐ๊ด€๋œ ํ”„๋กœ์„ธ์Šค๋ณ„ ๊ตฌ์กฐ์ฒด๋กœ ๊ฐ๊ฐ ํŒŒ์ผ์‹œ์Šคํ…œ๊ณผ ํŒŒ์ผ์— ๋Œ€ํ•œ ์ •๋ณด๋ฅผ ๋‚˜ํƒ€๋‚ธ๋‹ค.

* dcache
  - dentry๋Š” ํ•œ ๊ฒฝ๋กœ์—์„œ ๊ฐ ์ปดํฌ๋„ŒํŠธ๋งˆ๋‹ค ์กด์žฌํ•˜๋Š” ์ž๋ฃŒ๊ตฌ์กฐ์ด๋‹ค.
    ๊ฒฝ๋กœ๋ฅผ ์ฐพ๊ธฐ ์œ„ํ•ด ๋งค๋ฒˆ ๊ฐ ๋‹จ๊ณ„์˜ dentry๋ฅผ ์กฐ์‚ฌํ•ด ๊ฒฝ๋กœ๋ฅผ ์ฐพ๋Š” ๊ฒƒ์€ ์ง€๊ทนํžˆ ํšจ์œจ์„ฑ์ด ๋‚ฎ๋‹ค.
    ๋”ฐ๋ผ์„œ ์ปค๋„์€ dcache ์ƒ์— dentry ์˜ค๋ธŒ์ ํŠธ๋“ค์„ ์บ์‹œ ์‹œํ‚จ๋‹ค.
  - dcache๋Š” ์„ธ ํŒŒํŠธ๋กœ ๊ตฌ์„ฑ๋œ๋‹ค.
    Lists of โ€œusedโ€ dentries - inode์˜ i_dentry ํ•„๋“œ์— ๋งํฌ. (dentry์˜ d_alias๋กœ ์—ฐ๊ฒฐ)
    A doubly linked โ€œleast recently usedโ€ list of unused and negative dentry objects. superblock์˜ s_dentry_lru์— ๋งํฌ. (dentry์˜ d_lru๋กœ ์—ฐ๊ฒฐ)
    A hash table and hashing function used to quickly resolve a given path into the associated dentry object. dentry_hashtable์— ๋งํฌ.
        (dentry์˜ d_hash๋กœ ์—ฐ๊ฒฐ)









dentry_string_cmp
	load_unaligned_zeropad


do_filep_open
	path_openat
		link_path_walk
			walk_component
				lookup_fast
					__d_lookup_rcu


up_write ๋ถ„์„
	include/linux/rwsem-spinlock.h



โ€ฆ sysfs๋Š” ida๋ฅผ ์–ธ์ œ ์‚ฌ์šฉํ•˜๋‚˜?

2015.04.11

rootfs๊ฐ€ ๋จผ์ € ํ˜ธ์ถœ, init์ด ํ˜ธ์ถœ๋œ ํ›„์— busybox์˜ init์ด rcS์Šคํฌ๋ฆฝํŠธ ์‹คํ–‰

// busybox ์ƒ์„ฑ์‹œ ์‚ฌ์šฉ
$ cat etc/init.d/rcS
#!/bin/sh
mount -t proc none /proc
mount -t sysfs none /sys
/sbin/mdev -s

qemu์—์„œ initrd๋กœ ์ฃผ์–ด์ง„ ์˜ต์…˜์— ์˜ํ•ด parse_tag_initrd2์ด ํ˜ธ์ถœ๋จ.

* img mount command
bzcat /boot/initramfs-2.6.32-279.el6.i686.img > $HOME/initramfs-2.6.32-279.el6.i686.cpio
cd $HOME
mkdir initramfs
cd initramfs
cpio -vid < ../initramfs-2.6.32-279.el6.i686.cpio


timekeeper ๋ถ„์„..
current_kernel_time

2015.04.18

#define CURRENT_TIME        (current_kernel_time())
#define CURRENT_TIME_SEC    ((struct timespec) { get_seconds(), 0 })

2014.11.1, 2014.12.27 ์ž๋ฃŒ ์ฐธ๊ณ .

struct timekeeper {
}

ktime_get
	timekeeping_get_ns


do_timer
	update_wall_time
		accumulate_nsecs_to_secs



sysfs_alloc_ino : sysfs์šฉ ida๋กœ๋ถ€ํ„ฐ ์ •์ˆ˜๊ฐ’์„ ํ• ๋‹น๋ฐ›๋Š”๋‹ค.

sysfs_dirent์˜  s_bin_attr. ์šฉ๋„๋Š”?

case SYSFS_KOBJ_BIN_ATTR:
sysfs_create_bin_file


kobject ์ธํ„ฐํŽ˜์ด์Šค ํ•จ์ˆ˜
kobject_create_and_add(name, parent)
	mnt_init()์—์„œ kobject_create_add(โ€œfsโ€, NULL);


* The Linux Device Model ๊ณต๋ถ€ํ•  ๊ฒƒ.
	kobject_uevent  - notify userspace by sending an uevent


์‚ดํŽด๋ณธ file_system_type. .name, .mount, .kill_sb๋Š” ์ •์˜ํ•˜๊ณ  ์žˆ๋‹ค.
	- sysfs
	- rootfs
	- ramfs (์•„์ง ๋ณด์ง„ ์•Š์•˜์ง€๋งŒ rootfs ๋ฐ”๋กœ ์œ„์— ์žˆ์Œ)

* ida_remove ์ฃผ์„ ๋‹ฌ๊ธฐ

lib/parser.c : match_token, match_int

include/linux/stat.h
	S_ISUID

2015.04.25

get_next_ino

init_cred ์–ธ์ œ ์ดˆ๊ธฐํ™” ๋˜๋‚˜?
	https://www.kernel.org/doc/Documentation/security/credentials.txt

vmscan ์ •๋ฆฌํ•˜๊ธฐ.

bdi_destroy ์ฃผ์„ ๋‹ฌ๊ธฐ

fs->owner ๋Š” module *์ธ๋ฐ, ์ด ์ดˆ๊ธฐํ™”๊ฐ€ ์–ด๋–ป๊ฒŒ ์ด๋ค„์ง€๋‚˜?

mntput ์ฃผ์„ ๋‹ฌ๊ธฐ

nsproxy - namespace



alloc_vfsmnt()๋Š” struct mount๋ฅผ ์ƒ์„ฑํ•˜๊ณ  ์ดˆ๊ธฐํ™” ํ•œ๋‹ค. struct vfsmnt๊ฐ€ struct mount ๋‚ด์— ํฌํ•จ๋˜์–ด ์žˆ๊ธฐ ๋•Œ๋ฌธ์ด๋‹ค.

// mnt_pcp๋Š” percpu ํฌ์ธํ„ฐ
์ƒ์„ฑ: mnt->mnt_pcp = alloc_percpu(struct mnt_pcp);
์ ‘๊ทผ: per_cpu_ptr(mnt->mnt_pcp, cpu)

#ifdef CONFIG_SMP
this_cpu_add(mnt->mnt_pcp->mnt_count, n);
#endif

2015.05.02

big-reader lock. kernel์—์„œ๋Š” lglock.c


init_mount_tree
	create_mnt_ns

	set_fs_pwd
	set_fs_root

d_make_root ; root inode๋ฅผ ๋ฐ›์•„ dentry๋ฅผ ํ• ๋‹น๋ฐ›๊ณ  ์„ค์ •ํ•ด ๋ฆฌํ„ดํ•œ๋‹ค.
	__d_alloc	; dentry alloc
	d_instantiate	; dentry์— inode ์ •๋ณด๋ฅผ ์ฑ„์›Œ ์ธ์Šคํ„ด์Šคํ™” ์‹œํ‚จ๋‹ค.

sysfs -> sysfs_mount
ramfs/rootfs -> ramfs_mount -> mount_nodev
bdev -> bd_mount -> pseudo_mount


flex_proportion

2015.05.09

๊ฐ filesystem๋ณ„๋กœ ๋…์ž์ ์ธ MAGIC NUMBER๊ฐ€ ์กด์žฌํ•œ๋‹ค.
	VFS ๊ณตํ†ต ops์ •๋ณด๋ฅผ ์ฑ„์›Œ์•ผ ํ•œ๋‹ค.

sysfs
	sysfs_mount - sysfs_fill_super

rootfs/ramfs
	rootfs_mount/ramfs_mount - mount_nodev - ramfs_fill_super

procfs
	proc_mount - proc_fill_super


XXX_fill_super๋ฅ˜์˜ ํ•จ์ˆ˜ : superblock ์ž์ฒด์˜ ์ •๋ณด๋„ ์ฑ„์šฐ๊ณ , root inode์— ํ•ด๋‹นํ•˜๋Š” dentry๋ฅผ ๋ฐ›์•„์™€ ์ €์žฅํ•œ๋‹ค.
	์ด ํ•จ์ˆ˜์—์„œ mount์‹œ ์ œ๊ณต๋œ ์˜ต์…˜์„ ,๋ฅผ ๊ธฐ์ค€์œผ๋กœ ํŒŒ์‹ฑํ•ด ์ฒ˜๋ฆฌํ•œ๋‹ค.

mount์‹œ ์ œ๊ณต๋˜๋Š” ์˜ต์…˜์„ ์ฒ˜๋ฆฌํ•˜๊ธฐ ์œ„ํ•ด match_table_t๋ฅผ ์˜ต์…˜ ์ฒ˜๋ฆฌ ํ•จ์ˆ˜์™€ ๋”๋ถˆ์–ด ์ •์˜ํ•˜๊ณ  ์žˆ๋‹ค.
procfs์˜ ๊ฒฝ์šฐ, match_table๊ณผ proc_parse_options๋ฅผ ์ •์˜ํ•œ๋‹ค.
ramfs์˜ ๊ฒฝ์šฐ, match_table๊ณผ ramfs_parse_options๋ฅผ ์ •์˜ํ•œ๋‹ค.



pid_namespace

struct pid {
	/* Try to keep pid_chain in the same cacheline as nr for find_vpid */
	int nr;
	struct pid_namespace *ns;
	struct hlist_node pid_chain;	// hash list๋กœ ์—ฐ๊ฒฐํ•˜๊ณ  ์‹ถ์€ ์ž๋ฃŒ๊ตฌ์กฐ.
};

* hash table์€ hlist_head ๋ฐฐ์—ด๋กœ ์ด๋ฃจ์–ด ์ง„๋‹ค. ๊ฐ hlist_head๋ฅผ ์—ฐ๊ฒฐํ•˜๊ธฐ ์œ„ํ•œ hlist_node๊ฐ€ ์žˆ๋Š”๋ฐ, ์ด hlist_node๋ฅผ ํฌํ•จํ•˜๋Š” ๊ตฌ์กฐ์ฒด๊ฐ€ ์‹ค์ œ ๊ด€๋ฆฌํ•˜๊ณ ์ž ํ•˜๋Š” ๋ฐ์ดํ„ฐ์ด๋‹ค.



* struct proc_dir_entry์™€ struct proc_inode๋ฅผ ์„ค๋ช…ํ•ด๋ณด์ž.

* struct pernet ???


#################
  pid_namespace
#################

* Professional Linux Kernel Architecture
    Figure 2-5: Overview of data structures used to implement a namespace-aware representation of IDs.
* [Linux] PID ๊ด€๋ฆฌ
    http://studyfoss.egloos.com/5242243

/* pid_namespace๋Š” tree ํ˜•ํƒœ๋กœ ๊ณ„์ธต๊ตฌ์กฐ๋กœ ์กด์žฌํ•œ๋‹ค. ๊ฐ namespace๋Š” ๋ช‡ ๋ฒˆ์งธ ๋ ˆ๋ฒจ์— ์†ํ•˜๋Š”์ง€ level ๋ฒˆํ˜ธ๊ฐ€ ์กด์žฌํ•œ๋‹ค. ๊ด€๋ จ api๋“ค์€ pid_namespace.c์— ์ •์˜๋˜์–ด ์žˆ๋‹ค. */
pidmap_init

struct pid_namespace
	struct pidmap pidmap[PIDMAP_ENTRIES];	// namespace์— ์†ํ•˜๋Š” pid๋“ค์„ bitmap์œผ๋กœ ๊ด€๋ฆฌํ•œ๋‹ค. pidmap_init์—์„œ ์ดˆ๊ธฐํ™”.
	struct kmem_cache *pid_cachep;
	unsigned int level;
	struct pid_namespace *parent;

struct pid_namespace init_pid_ns;




* /proc/sys ==> sysctl
  struct ctl_table         : ๊ฐ sysctl entry. sysctl table์€ ์ด ๊ตฌ์กฐ์ฒด์˜ ๋ฐฐ์—ด๋กœ ๊ตฌ์„ฑ๋œ๋‹ค.
  struct ctl_node          : rb_node์™€ ctl_table_header ํฌ์ธํ„ฐ๋กœ ๊ตฌ์„ฑ๋œ๋‹ค.

  struct ctl_table_header  : struct ctl_table ํŠธ๋ฆฌ๋ฅผ ๋™์  ๋ฆฌ์ŠคํŠธ๋กœ ๊ด€๋ฆฌํ•˜๊ธฐ ์œ„ํ•œ ์ž๋ฃŒ๊ตฌ์กฐ.
  struct ctl_dir	   : ctl_table_header์™€ rb root๋กœ ๊ตฌ์„ฑ๋œ๋‹ค.
  struct ctl_table_set     : Sysctl tree	
  struct ctl_table_root    : sysctl table hierarchy์˜ root

  struct ctl_path          :

	register_sysctl(paht, table)๋กœ ๋“ฑ๋กํ•˜๋ฉด ctl_table_header๊ฐ€ ๋ฆฌํ„ด๋œ๋‹ค.


Professional Linux Kernel Architecture
    Ch. 10 Filesystems without Persistent Storage

2015.05.16 * ์ ˆ๋Œ€๊ฒฝ๋กœ์™€ ์ƒ๋Œ€๊ฒฝ๋กœ ํŒŒ์ผ ์ฐพ๊ธฐ * bdev ํŒŒ์ผ์‹œ์Šคํ…œ ๊ฐ™์€ pseudo ํŒŒ์ผ์‹œ์Šคํ…œ์—์„œ dentry โ€œbdev:โ€ ๊ฒฝ๋กœ ์ฐพ๋Š” ๋ฐฉ์‹?

* ์Šค์ผ€์ฅด : ๋ฆฌ๋ทฐ / ์ง„๋„
	- ์•„ํ‚คํ…์ณ
		arm, MMU
	- ๋ฉ”๋ชจ๋ฆฌ ํ• ๋‹น/ํ•ด์ œ (memblock, ๋ฒ„๋””, ์Šฌ๋žฉ)

	- ๋™๊ธฐํ™” RCU
	- timer
	- ์ž๋ฃŒ๊ตฌ์กฐ (RB-Tree, radix-tree, idr/ida, hash table)
	- VFS

	- cgroup์€ ์ถ”ํ›„ v2๋กœ ๋ถ„์„.
	- ๊ทธ ์ฃผ ์Šคํ„ฐ๋””ํ•  ๋‚ด์šฉ ๊ด€๋ จ chapter์™€ ์ž๋ฃŒ ์ •๋ฆฌ

* ๋ฐฉ๋ฒ•๋ก 
	- ํ•จ์ˆ˜์™€ API ์œ„์ฃผ์˜ ๋ถ„์„

2015.05.23

include/linux/moduleparam.h

core_param
module_param


Shareable Device
The TCM region overrides memory type attributes of the MPU and all addresses within the TCM space are treated as Normal, Non-Shared memory.

Cortex-A series PG
	* Table 9-2 Memory attributes
	  Memory Type		Shareable/Non-shareable		Cacheable

	  Normal		Shareable/Non-shareable		Yes
	  Device		-				No
	  Strongly-ordered	-				No

	* Table 9-3 Memory type and cacheable properties encoding in translation table entry

The only difference between Device and Strongly-ordered memory is that:
โ€ข A write to Strongly-ordered memory can complete only when it reaches the peripheral or memory component accessed by the write.
โ€ข A write to Device memory is permitted to complete before it reaches the peripheral or memory component accessed by the write.

System peripherals will almost always be mapped as Device memory.

Regions of Device memory type can be described using the Shareable attribute.

On some ARMv6 processors, the Shareable attribute of Device accesses is used to determine which memory interface will be used for the access,
 with memory accesses to areas marked as Device, Non-Shareable performed using a dedicated interface, the private peripheral port.
 This mechanism is not used on ARMv7 processors.



vmap : ํŽ˜์ด์ง€๋ฅผ ๊ฐ€์ƒ ์ฃผ์†Œ๋กœ ์—ฐ์†์ ์ธ ๊ณต๊ฐ„์— ๋งคํ•‘ํ•œ๋‹ค. struct vm_struct, struct vm_map ์ž๋ฃŒ๊ตฌ์กฐ. ์–ด๋–ค ์‹์œผ๋กœ ์—ฎ๊ณ , ์–ด๋–ค ์‹์œผ๋กœ ์ฐพ๋‚˜?
get_vm_area_caller()๋กœ vm area๋ฅผ ๋ฐ›์•„์˜ค๊ณ , map_vm_area()๋กœ ๋งคํ•‘ํ•œ๋‹ค.
map_vm_area๋Š” ๋‚ด๋ถ€์—์„œ vmap_page_range() ํ˜ธ์ถœ.



sched_class

2015.05.30

cfs schduler ๋ถ„์„
http://iloveriver.egloos.com/6117181

cpu_stopper thread๋กœ ์‹คํ–‰ํ•˜๋Š” ์ด์œ 
	=> stop_sched_class๋กœ ์‹คํ–‰๋˜๋ฉฐ, ์‹คํ–‰๋˜๋Š” ๋™์•ˆ ์„ ์ ๋ถˆ๊ฐ€์ด๋‹ค.

2015.06.06

clockevents_config_and_register


enable_percpu_irq
	irq_percpu_enable



* tick_init์—์„œ tick_notifier๋กœ  tick_notify๊ฐ€ ๋“ฑ๋ก๋œ๋‹ค.
* clockevents_register_device : clock event device๋ฅผ ๋“ฑ๋กํ•œ๋‹ค.
	clockevent_devices ๋ฆฌ์ŠคํŠธ์— ์ถ”๊ฐ€ํ•˜๊ณ ,
	CLOCK_EVT_NOTIFY_ADD notify๋ฅผ ๋‚ ๋ฆฐ๋‹ค.
	์œ„ tick_notify๊ฐ€ ํ˜ธ์ถœ๋œ๋‹ค.

  broadcast_timer_setup์ด๋‚˜ timer_init ํ•จ์ˆ˜ ๋“ฑ์—์„œ ํ˜ธ์ถœ๋œ๋‹ค.

* ๊ด€๊ฑด์€ tick_notify์ธ๋ฐ, CLOCK_EVT_NOTIFY_ADD์— ๋Œ€ํ•ด
  tick_check_new_device๋ฅผ ํ˜ธ์ถœํ•œ๋‹ค.
  ๋“ฑ๋ก๋œ device๊ฐ€ ์‚ฌ์šฉ๋˜์–ด์•ผ ํ•˜๋Š”์ง€ ํŒ๋‹จํ•˜๋Š” ์šฉ๋„๋กœ ์‚ฌ์šฉ๋œ๋‹ค.

  ์ตœ์ข… ํŒ์ •๋œ๋‹ค๋ฉด, clockevents_exchange_device(curdev, newdev);๋ฅผ ํ˜ธ์ถœํ•˜๋Š”๋ฐ, 
  ํ˜„์žฌ ๋“ฑ๋ก๋œ ๋””๋ฐ”์ด์Šค๋ฅผ list_add(&old->list, &clockevents_released)์— ๋“ฑ๋ก์‹œํ‚จ๋‹ค.

* ์ด์ƒํ•œ ๊ฒƒ์€ clockevents_released์— ๋“ฑ๋กํ•ด ๋‘๋Š” ๊ฒƒ์ธ๋ฐ, 

2015.06.13

start_kernel
	time_init
		v2m_timer_init
			v2m_sp804_init
				sp804_clockevents_init
					clockevents_config_and_register
						clockevents_register_device
							clockevents_notify_released

์œ„ ์ˆœ์„œ๋กœ ์ฒ˜์Œ v2m_timer๊ฐ€ ๋“ฑ๋ก.
๋‹ค์Œ percpu_timer_setup์—์„œ twd_timer_setup์— ์˜ํ•ด โ€œlocal_timerโ€๊ฐ€ 4๋ฒˆ ๋“ฑ๋ก.

v2m_timer_init



include/asm-generic/vmlinux.lds.h

do_early_param
	__setup_start ~ __setup_end		// ์„น์…˜์˜ ํ•จ์ˆ˜ ์‹คํ–‰		
do_pre_smp_initcalls
	__initcall_start ~ __initcall0_start	// // early_initcall

2015.06.20

platform_smp_prepare_cpus
	v2m_flags_set(virt_to_phys(versatile_secondary_startup));	// (headsmp.S) v2m register์— secondary CPU๋ฅผ ์œ„ํ•œ entry ํ•จ์ˆ˜๋ฅผ ๊ธฐ๋กํ•œ๋‹ค.


identity mapping
	http://lists.kernelnewbies.org/pipermail/kernelnewbies/2012-June/005552.html
	http://www.spinics.net/lists/newbies/msg46709.html

	โ€ฆ
	The kernel sets up a small identity mapping (virtual == physical) as well as a the kernel direct mapping.

System.map์„ ๋ณด๋ฉด, __idmap_text_start ~ __idmap_text_end ์‚ฌ์ด์˜ ์‹ฌ๋ณผ๋“ค์„ ๋ณผ ์ˆ˜ ์žˆ๋‹ค.
	๊ทธ ์ค‘ __turn_mmu_on์ด ํ•ต์‹ฌ์ด๋‹ค.



1. identity mapping์ด๋ž€?
	physical address == virtual address mapping
2. ์™œ __turn_mmu_on์ด identity mapping์œผ๋กœ ๋“ค์–ด๊ฐ€์•ผ ํ•˜๋‚˜? ์ด๋ฏธ ์ปค๋„ ๊ณต๊ฐ„์— ๋Œ€ํ•ด ๋ณต์‚ฌํ•ด ๋‘๊ณ ?
	์ปค๋„ ๊ณต๊ฐ„์€ p <-> v ๊ฐ„ 1:1 ๋งคํ•‘์ด ์•„๋‹ˆ๋‹ค.
3. identity mapping์„ ํ•  ๊ฒฝ์šฐ ๊ธฐ์กด kernel mapping ๋œ ์˜์—ญ์ด ๋ฎ์—ฌ ์“ฐ์ผ ๊ฐ€๋Šฅ์„ฑ์ด ์žˆ์ง€ ์•Š๋‚˜?
  ๊ทธ๋ ‡๋‹ค๋ฉด page table์„ ๋ณ€๊ฒฝํ•˜๊ธฐ ์ „๊นŒ์ง€ ์‹คํ–‰๋˜๋Š” ์ฝ”๋“œ ์ฃผ์†Œ๊ฐ€ override ๋˜์ง€ ์•Š์„๊นŒ?
	

secondary_start_kernel
	cpu_switch_mm(mm->pgd, mm); 	// mm์€ init_mm์ด๊ณ , page table์„ switch ํ•œ๋‹ค. (= cpu_v7_switch_mm)
		// TTB0์˜ ๋ ˆ์ง€์Šคํ„ฐ๋ฅผ ์„ค์ •ํ•œ๋‹ค.



arm kernel ์—์„œ task_struct, thread_info, switch_to ๊ทธ๋ฆฌ๊ณ  stack
http://forum.falinux.com/zbxe/index.php?mid=lecture_tip&listStyle=list&page=10&document_srl=808363

2015.06.27

kthread_bind


update_process_times
	run_local_timers();
		raise_softirq(TIMER_SOFTIRQ);

init_timers
	open_softirq(TIMER_SOFTIRQ, run_timer_softirq);
		struct tvec_base *base = __this_cpu_read(tvec_bases);
		if (time_after_eq(jiffies, base->timer_jiffies))

2015.07.04

cascade(base, &base->tv2, INDEX(0))
#define INDEX(N) ((base->timer_jiffies >> (TVR_BITS + (N) * TVN_BITS)) & TVN_MASK)			

	์ดํ•ด ์•ˆ ๊ฐ€๋Š” ๋ถ€๋ถ„์ด INDEX ๋ถ€๋ถ„์ธ๋ฐ, ํ˜„์žฌ ์ด ๋ฃจํ‹ด์ด ํ•˜๋Š” ๋ฐ”๋ฅผ ์ดํ•ดํ•˜์ง€ ๋ชปํ•ด์„œ์ด๋‹ค.
	timer_jiffies์˜ ๊ฐ’์€ timer ๋ฃจํ‹ด์ด ํ˜„์žฌ ์•Œ๊ณ  ์žˆ๋Š”, ๊ฐ€์žฅ ๊ฐ€๊นŒ์šด ๋งŒ๋ฃŒ jiffies ๊ฐ’์ด๋‹ค.
	์ด ๊ฐ’์ด ํ˜„์žฌ jiffies ๋ณด๋‹ค ์ž‘๊ธฐ ๋•Œ๋ฌธ์— ์ฒ˜๋ฆฌํ•ด์•ผ ํ•  timer๋ฅผ ์ฐพ๋Š” ๊ณผ์ •์„ ์ˆ˜ํ–‰ํ•˜๋Š” ๊ฒƒ์ด๊ณ , ..

	    Q. tvec์€ array์ธ๋ฐ, timer_jiffies๋ฅผ ๊ธฐ์ค€์œผ๋กœ ํ•˜๋Š” INDEX๊ฐ€ ํŠน์ • array ์ค‘ ํ•˜๋‚˜๊ฐ€ ์•„๋‹Œ or-ing์„ ๋‚˜ํƒ€๋‚ด๊ฒŒ ๋˜์ง€ ์•Š๋‚˜?
	    Q. cascade์—์„œ tvec์— timer๊ฐ€ ์žˆ์–ด์„œ ์˜ฎ๊ฒจ ๋‹ฌ๊ณ  ๋‚˜๋ฉด ๋‹ค์Œ tvec์„ ์ฒ˜๋ฆฌํ•˜์ง€ ์•Š๋Š”๋ฐ, ์ด๋ ‡๊ฒŒ ๋˜๋ฉด ๋‚˜๋จธ์ง€ ํƒ€์ด๋จธ๋“ค์€ ํ˜„์žฌ timer_jiffies์— ๋งž์ง€ ์•Š๋Š” ์ž๋ฆฌ์— ์œ„์น˜ํ•œ๋‹ค.


ida_init
ida_pre_get
	ida_get_new

2015.07.11

gcwq [percpu์™€ unbound_global_cwq]๋กœ ์กด์žฌ (UNBOUND๊นŒ์ง€ ๋ณ„๋„์˜ cpu๋ผ ๋ณธ๋‹ค๋ฉด 1:1 ๊ด€๊ณ„์ด๋‹ค)
	pool [normal๊ณผ highpri]๋กœ ์กด์žฌ

wq
	cwq ; cwq๋Š” per-cpu wq๋กœ cpu๋งˆ๋‹ค ์กด์žฌํ•œ๋‹ค.
		pool ; gcwq์˜ ํŠน์ • pool์„ ๊ฐ€๋ฆฌํ‚ค๋Š”๋ฐ, ์—ฐ๊ฒฐํ•˜๋Š” ์กฐ๊ฑด์ด..?
	resquer ; gcwq์™€ ๋…๋ฆฝ์ ์ธ worker์ด๋ฉฐ, ๋ณ„๋„์˜ thread๋กœ ์ˆ˜ํ–‰ํ•œ๋‹ค.

qcwq


cwq ; workqueue ์ƒ์„ฑ์‹œ ๋งŒ๋“ค์–ด ์ค€๋‹ค.

worker
	scheduled ; ์ฒ˜๋ฆฌํ•ด์•ผ ํ•  work์˜ ๋ฆฌ์ŠคํŠธ
	task ; ์‹คํ–‰ํ•  thread ๊ตฌ์กฐ์ฒด.

2015.07.25

http://criticalblue.com/news/wp-content/uploads/2013/12/linux_scheduler.pdf


cpu_stopper_thread
	migration/%d


hrtimer_cpu_notify์—์„œ
	clockevents_notify๋ฅผ ํ˜ธ์ถœํ•œ๋‹ค.

tick_init
	clockevents_register_notifier(&tick_notifier);

clockevents_do_notify

softirq
	.notifier_call = cpu_callback,

	.notifier_call = remote_softirq_cpu_notify,

timer_cpu_notify

2015.08.01

kthread_create_on_node <-> kthreadd๊ฐ€ ์ „์—ญ ๋ฆฌ์ŠคํŠธ์— ๋“ฑ๋ก๋œ create ์š”์ฒญ์„ ์ฒ˜๋ฆฌํ•˜๊ณ  complete์„ ๋‚ ๋ฆฌ๋Š” ๊ณผ์ •

sched_class
http://criticalblue.com/news/wp-content/uploads/2013/12/linux_scheduler.pdf

stop	=> rt	=> fair	=> idle
(percpu)                   (percpu)


* copy_process ๋ถ„์„ ํ•„์š”.
* cmwq ์ •๋ฆฌ



secondary_startup			// arch/arm/kernel/head.S
	secondary_start_kernel		// 


arch/arm/mm/proc-v7-2level.S
	#define TTB_FLAGS_SMP   TTB_IRGN_WBWA|TTB_S|TTB_NOS|TTB_RGN_OC_WBWA
	#define PMD_FLAGS_SMP   PMD_SECT_WBWA|PMD_SECT_S

CONFIG_CPU_HAS_ASID
mcr p15, 0, r1, c13, c0, 1
mcr p15, 0, r0, c2, c0, 0


==========================================================================================

* cpu_init() ํ•จ์ˆ˜์—์„œ irq, abt, und mode๋กœ ์ „ํ™˜ํ•˜๋ฉด์„œ stacks array๋ฅผ ๊ฐ ๋ชจ๋“œ์˜ sp๋กœ ์ €์žฅํ•œ๋‹ค.

* vector_stub ์–ด์…ˆ๋ธ”๋ฆฌ ๋งคํฌ๋กœ๋ฅผ ๋ณด๋ฉด ์–ด๋–ค ๋ชจ๋“œ์—์„œ stub์œผ๋กœ ์ง„์ž… ํ–ˆ๋“  ํ•ญ์ƒ svc ๋ชจ๋“œ๋กœ ์ „ํ™˜ํ•˜๋Š” ๊ณผ์ •์„ ๋ณผ ์ˆ˜ ์žˆ๋‹ค.

* ๊ทธ๋ ‡๋‹ค๋ฉด ์™œ ํ•ญ์ƒ svc๋ชจ๋“œ๋กœ ์ „ํ™˜ํ•˜๋Š”๊ฐ€?
	๊ฐ™์€ ์งˆ๋ฌธ์ด stackoverflow์— ์˜ฌ๋ผ์™€ ์žˆ๋Š”๋ฐ, ์ด๋ ‡๊ฒŒ ์ฒ˜๋ฆฌํ–ˆ์„ ๋•Œ์˜ ์žฅ์ ์€ ๋‘ ๊ฐ€์ง€๋‹ค.

	(irq -> svc๋ฅผ ์ค‘์‹ฌ์œผ๋กœ ํ•œ ์„ค๋ช…์ด๋‹ค)
	1. IRQ mode๋ฅผ ์œ„ํ•œ ๋ณ„๋„์˜ stack์ด ํ•„์š”ํ•˜์ง€ ์•Š๋‹ค.
	2. SVC ๋ชจ๋“œ๋กœ ์ „ํ™˜ํ•จ์œผ๋กœ์จ ์ƒˆ๋กœ์šด interrupt๋ฅผ ํ•ธ๋“ค๋ง ํ•  ์ˆ˜ ์žˆ๋‹ค.
	http://stackoverflow.com/questions/7915255/why-does-linux-arm-always-switch-to-supervisor-mode-during-exception-handling
	http://stackoverflow.com/questions/22928904/linux-kernel-arm-exception-stack-init

	[์ฐธ๊ณ ] arm exception
		http://www.ic.unicamp.br/~celio/mc404-2013/arm-manuals/ARM_exception_slides.pdf
	
* ์ฐธ๊ณ ์ž๋ฃŒ
	Interrupt handling in ARM
	http://venkateshabbarapu.blogspot.kr/2012/09/interrupt-handling-in-arm.html

	How ARM Supervisor Call Exception Handled in Linux
	http://pr1mary.blogspot.kr/2013/11/how-arm-supervisor-call-exception.html

	http://arm-linux-interview-questions.blogspot.kr/2013/04/arm-linux-abort-handler-implementation.html

* arm linux kernel์€ 7๊ฐœ์˜ processor mode ์ค‘ ์–ด๋–ค ๋ชจ๋“œ์—์„œ ์ˆ˜ํ–‰๋˜๋‚˜? (USR, IRQ, FIQ, SVC, ABT, UND, SYS ; sys๊ฐ€ os task๋ฅผ ์œ„ํ•œ privileged mode์ด๋‹ค)
	SVC mode ์—์„œ ์ˆ˜ํ–‰๋œ๋‹ค.
	arm kernel ์š”๊ตฌ์‚ฌํ•ญ์œผ๋กœ svc ์ƒํƒœ์—ฌ์•ผ ํ•œ๋‹ค๋Š” ๋‚ด์šฉ.
	https://www.kernel.org/doc/Documentation/arm/Booting

	arm64์—์„œ๋Š” ๋ณ„ ๋ง์ด ์—†๋‹ค?
	https://www.kernel.org/doc/Documentation/arm64/booting.txt


__stubs_start:
	vector_stub irq, IRQ_MODE, 4				// vector_irq ์„ ์–ธ. correction ๊ฐ’์€ ์ต์…‰์…˜ ํ•ธ๋“ค๋ง์ด ๋๋‚˜๊ณ  ์–ด๋А ์ƒํƒœ๋กœ ๋Œ์•„๊ฐ€์•ผ ํ•˜๋‚˜.
	.long	โ€ฆ
	.long   __irq_svc           @  3  (SVC_26 / SVC_32)

	vector_stub dabt, ABT_MODE, 8
	.long	โ€ฆ

	ector_stub pabt, ABT_MODE, 4
	.long	โ€ฆ
	
__stubs_end:

	.equ    stubs_offset, __vectors_start + 0x200 - __stubs_start


__vectors_start:
	โ€ฆ
	W(b)    vector_irq + stubs_offset
	โ€ฆ
__vectors_end:

2015.08.08 * cpu_switch_mm(mm->pgd, mm) ์ถ”ํ›„ ๋ถ„์„

2015.08.15 * root_domain * sched_domain

sd_data ๊ตฌ์„ฑ์œ„์น˜ : __sdt_alloc

2015.08.22 scheduling์‹œ load balancing์„ ์œ„ํ•ด ์‚ฌ์šฉ. treeํ˜• ๊ณ„์ธต ๊ตฌ์กฐ๋กœ top์„ ์ œ์™ธํ•˜๊ณ  parent๋กœ ์ƒ์œ„ cpu๋ฅผ ๊ฐ€๋ฆฌํ‚ด (๊ตฌํ˜„์€ ๋ฐฐ์—ด๋กœ) * sched_domain : * sched_group : cpu๋ณ„, ๋˜๋Š” ์—ฌ๋Ÿฌ cpu๋ฅผ ๋ฌถ์–ด ๊ตฌ์„ฑํ•œ๋‹ค. sched_domain ๋‚ด์˜ ํ•˜์œ„๊ตฌ์กฐ๋กœ ์กด์žฌํ•˜๋ฉฐ, ๋‚ด๋ถ€์˜ sched_group์€ ํ™˜ํ˜• ์—ฐ๊ฒฐ ๋ฆฌ์ŠคํŠธ๋กœ ์—ฐ๊ฒฐ๋œ๋‹ค.

CONFIG_TMPFS
CONFIG_DEVTMPFS ๊ฐ€ ์ •์˜๋˜์ง€ ์•Š์•„ devtmpfs_init.


shmem_init

* Shared Memory Virtual Filesystem
https://www.kernel.org/doc/gorman/html/understand/understand015.html
	file์ด๋‚˜ device์— ํ•ด๋‹นํ•˜๋Š” ์˜์—ญ์˜ ๋ฉ”๋ชจ๋ฆฌ ๊ณต์œ ๋Š” mmap() ํ•จ์ˆ˜๋กœ  MAP_SHARED๋ผ๋Š” ํ”Œ๋ž˜๊ทธ๋ฅผ ์ฃผ์–ด ํ•  ์ˆ˜ ์žˆ๋‹ค. ๊ทธ๋Ÿฌ๋‚˜ anonymous ์˜์—ญ์— ๋Œ€ํ•œ ๋ฉ”๋ชจ๋ฆฌ ๊ณต์œ ๊ฐ€ ํ•„์š”ํ•œ ๋‘ ๊ฐ€์ง€ ์ผ€์ด์Šค๊ฐ€ ์žˆ๋‹ค.
	mmap()์— MAP_SHARED๊ฐ€ ์ฃผ์–ด์กŒ์ง€๋งŒ file backing์ด ์•„๋‹ ๊ฒฝ์šฐ์™€ shmget()์œผ๋กœ ํ• ๋‹น๋ฐ›๊ณ  shmat()์œผ๋กœ v.a.์— ๋ถ™์ผ ๊ฒฝ์šฐ์ด๋‹ค.



[๋ณ„๋„] CMA, GCMA
	http://gurugio.blogspot.kr/2014/05/cma-contiguous-memory-allocation.html
	https://lwn.net/Articles/480055/
	https://lwn.net/Articles/486301/
	https://events.linuxfoundation.org/images/stories/pdf/lceu2012_nazarwicz.pdf
	http://events.linuxfoundation.org/sites/events/files/slides/gcma-guaranteed_contiguous_memory_allocator-lfklf2014_0.pdf

2015.08.29

uevent

* Linux Device Driver Chapter 14.
	kobjects
		- Reference counting of objects
		- Sysfs representation
		- Data structure glue
		- Hotplug event handling : hardware์˜ ์—ฐ๊ฒฐ/์ œ๊ฑฐ๋ฅผ user space์— ์•Œ๋ฆฌ๋Š” events ์ƒ์„ฑ์„ ๋‹ค๋ฃฌ๋‹ค.

	kset
	ktype

* struct class, struct bus_type, struct device, struct device_driver ์„ ์–ธ ์œ„์น˜
	include/linux/device.h

* struct subsys_private ์„ ์–ธ์œ„์น˜
	drivers/base/base.h

* struct subsys_private priv;
  struct bus_type bus;

	๋ฒ„์Šค ๋“ฑ๋ก์‹œ ์„œ๋กœ ์—ฐ๊ฒฐ๋œ๋‹ค.
	priv.bus = bus;
	bus.p = priv;

2015.09.05

subsys_private	; bus_type๊ณผ class ๊ตฌ์กฐ์ฒด์— ๊ณตํ†ต์œผ๋กœ ํฌํ•จ๋œ ์ž๋ฃŒ๊ตฌ์กฐ๋กœ, ๋“œ๋ผ์ด๋ฒ„ core์˜ private ๋ฐ์ดํ„ฐ(์ฃผ๋กœ ๊ด€๋ฆฌ์šฉ ์ž๋ฃŒ๊ตฌ์กฐ)๋ฅผ ํฌํ•จํ•œ๋‹ค.
	struct kset subsys;

driver_private

klist_node ; klist ?


struct device_private
	.device๋กœ ํ•ด๋‹น struct device๋ฅผ ๊ฐ€๋ฆฌํ‚จ๋‹ค.
struct device


kobject_uevent_env()

bus_register(subsys)
	__bus_register์—์„œ drivers_autoprobe = 1;


subsys_system_register
	bus_register


* klist iterate
	klist_iter_init_node(&klist, &klist_iter, โ€ฆ)
		; klist๋ฅผ ์ˆœํšŒํ•˜๊ธฐ ์œ„ํ•œ klist_iter๋ฅผ ์ฑ„์šด๋‹ค.
	while ((dev = next_device(&i)))
		; klist_iter๋กœ ์ˆœํšŒํ•œ๋‹ค.
		  next_device()๋Š” klist_next๋กœ iter๋ฅผ ์ด์šฉํ•œ๋‹ค.
		  struct klist_node *klist_next(struct klist_iter *i)
	klist_iter_exit(&i)


device_release_driver

topology_init
	register_cpu


gic_init
	gic_init_bases
		alloc_irq_descs? ํ•ด์„œ irq_desc๋ฅผ ํ• ๋‹น


prepare_to_wait			<- wake_up() ๊นจ์›Œ์ค€๋‹ค.
schedule_timeout(timeout);
finish_wait

2015.09.12

initcall ๋ถ„์„ ์ค‘...

initcall level
	0 - early
	1 - core
	2 - postcore
	3 - arch
	4 - subsys
	5 - fs
	6 - device
	7 - late


* namespace
	CONFIG_NAMESPACES

	namespace๋ฅผ ์‚ฌ์šฉํ•˜๋Š” ๊ฒฝ์šฐ 
	/proc/<PID>/ns ์•„๋ž˜ UTS, IPC, PID, NET, MNT ํŒŒ์ผ๋“ค์ด ์ƒ์„ฑ๋œ๋‹ค.

* ptrace
	Process Tracer
	

[Cortex-A series PG]	Table 11-3 Link Register Adjustments
Exception		Adjustment		Return instruction		Instruction Returned to
SVC			0			movs pc, r14			Next instruction
Undef			0			movs pc, r14			Next instruction
Prefetch Abort		-4			subs pc, r14, #4		Aborting instruction
Data abort		-8			subs pc, r14, #8		Aborting instruction if precise
FIQ			-4			subs pc, r14, #4		Next instruction
IRQ			-4			subs pc, r14, #4		Next instruction


* undefined instruction (UND exception)
	: ์ •์˜๋˜์ง€ ์•Š์€ ๋ช…๋ น์–ด๊ฐ€ execute level์—์„œ ์‹คํ–‰
	  ์˜ˆ๋ฅผ ๋“ค์–ด ์ฝ”ํ”„๋กœ์„ธ์„œ ๋ช…๋ น์–ด๊ฐ€ ์‹คํ–‰๋˜์—ˆ์ง€๋งŒ, ํ•ด๋‹น coprocessor๊ฐ€ ์—†์„ ๊ฒฝ์šฐ undefined instruction exception ๋ฐœ์ƒ.
	

	__und_svc:
		svc_entry


	__und_svc_fault:

	__und_fault
		b do_undefinstr.
			call_undef_hook		// register_undef_hook๋กœ ๋“ฑ๋ก๋œ hook์— ์ง€์ •๋œ ํ•จ์ˆ˜๋ฅผ ํ˜ธ์ถœํ•œ๋‹ค.


โ€”โ€”โ€”โ€”โ€”โ€”โ€”โ€”โ€”โ€”โ€”โ€”โ€”โ€”โ€”โ€”โ€”โ€”โ€”โ€”โ€”โ€”โ€”
add r7, sp, #S_SP - 4   @ here for interlock avoidance
mov r6, #-1         @  ""  ""      ""       ""
add r2, sp, #(S_FRAME_SIZE + \stack_hole - 4)

#define S_SP 52 /* offsetof(struct pt_regs, ARM_sp) @ */




ldmia   sp, {r0 - pc}^          @ load r0 - pc, cpsr

ARM condition codes
โ€”โ€”โ€”โ€”โ€”โ€”โ€”โ€”+โ€”โ€”โ€”โ€”โ€”โ€”โ€”โ€”โ€”โ€”โ€”โ€”โ€”โ€”โ€”โ€”โ€”โ€”โ€”โ€”โ€”โ€”โ€”โ€”โ€”โ€”โ€”โ€”โ€”โ€”โ€”โ€”โ€”โ€”โ€”โ€”โ€”โ€”โ€”โ€”โ€”โ€”โ€”โ€”โ€”โ€”โ€”โ€”โ€”โ€”โ€”โ€”โ€”โ€”โ€”โ€”โ€”โ€”โ€”โ€”โ€”โ€”โ€”โ€”โ€”โ€”โ€”โ€”โ€”โ€”โ€”โ€”
Suffix	|	Flags				Meaning
โ€”โ€”โ€”โ€”โ€”โ€”โ€”โ€”+โ€”โ€”โ€”โ€”โ€”โ€”โ€”โ€”โ€”โ€”โ€”โ€”โ€”โ€”โ€”โ€”โ€”โ€”โ€”โ€”โ€”โ€”โ€”โ€”โ€”โ€”โ€”โ€”โ€”โ€”โ€”โ€”โ€”โ€”โ€”โ€”โ€”โ€”โ€”โ€”โ€”โ€”โ€”โ€”โ€”โ€”โ€”โ€”โ€”โ€”โ€”โ€”โ€”โ€”โ€”โ€”โ€”โ€”โ€”โ€”โ€”โ€”โ€”โ€”โ€”โ€”โ€”โ€”โ€”โ€”โ€”โ€”
EQ	|	Z = 1				Equal
NE	|	Z = 0				Not equal
CS/HS	|	C = 1				Higher or same (unsigned >= )
CC/LO	|	C = 0				Lower (unsigned < )
MI	|	N = 1				Negative
PL	|	N = 0				Positive or zero
VS	|	V = 1				Overflow
VC	|	V = 0				No overflow
HI	|	C = 1, Z = 0			Higher (unsigned > )
LS	|	C = 0 or Z = 1			Lower or same (unsigned <= )
GE	|	N == V				Signed >=
LT	|	N != V				Signed <
GT	|	Z = 0, N == V			Signed >
LE	|	Z = 1, N != V			Signed <=
AL	|	Any				Always (usually omitted)
โ€”โ€”โ€”โ€”โ€”โ€”โ€”โ€”+โ€”โ€”โ€”โ€”โ€”โ€”โ€”โ€”โ€”โ€”โ€”โ€”โ€”โ€”โ€”โ€”โ€”โ€”โ€”โ€”โ€”โ€”โ€”โ€”โ€”โ€”โ€”โ€”โ€”โ€”โ€”โ€”โ€”โ€”โ€”โ€”โ€”โ€”โ€”โ€”โ€”โ€”โ€”โ€”โ€”โ€”โ€”โ€”โ€”โ€”โ€”โ€”โ€”โ€”โ€”โ€”โ€”โ€”โ€”โ€”โ€”โ€”โ€”โ€”โ€”โ€”โ€”โ€”โ€”โ€”โ€”โ€”

svc_entry
โ€ฆ
svc_exit



head.S์—์„œ ๊ฐ€์žฅ ๋จผ์ € ์ˆ˜ํ–‰ํ•˜๋Š” ์ฝ”๋“œ
	: SVC ๋ชจ๋“œ์—์„œ ์ˆ˜ํ–‰ํ•œ๋‹ค.
	setmode PSR_F_BIT | PSR_I_BIT | SVC_MODE, r9 @ ensure svc mode and irqs disabled


early_trap_init
	vector์šฉ์œผ๋กœ ํ• ๋‹น ๋ฐ›์€ ํŽ˜์ด์ง€์— ๋ณต์‚ฌ
	0x0		: __vector_start ~ __vector_end		// vector table
	0x200 		: __stubs_start ~ __stubs_end		// stub macro
	0x1000-kuser_sz	: __kuser_helper_start ~ (0x1000-1)	// kuser_helper

2015.10.03 nwfpe

2015.10.10 โ€œsys/kernelโ€ kernel_kobj = kobject_create_and_add("kernel", NULL); error = sysfs_create_group(kernel_kobj, &kernel_attr_group);

kexec - kexec is a system call that implements the ability to shutdown your current kernel, and to start another kernel.
	It is like a reboot but it is independent of the system firmware.   And like a reboot you can start any kernel with it, not just Linux.

	โ€œUEFI secure bootโ€์„ kexec๊ฐ€ ์šฐํšŒ ํ–ˆ์—ˆ๋Š”๋ฐ, 3.17์—์„œ signed๋œ kernel๋งŒ ์˜ฌ๋ฆด ์ˆ˜ ์žˆ๋„๋ก ํŒจ์น˜๊ฐ€ ์ ์šฉ๋˜์—ˆ๋‹ค.
	http://kernelnewbies.org/Linux_3.17
	http://lwn.net/Articles/603116/

	arm trust zone์„ ์‚ฌ์šฉํ•˜๋Š” secure os๋ฅผ ์‚ฌ์šฉํ•˜๊ธฐ ์œ„ํ•ด ์ปค๋„ 3.18 ์ด์ƒ์„ ์‚ฌ์šฉ...

	http://elinux.org/images/2/2f/ELC-2010-Damm-Kexec.pdf

kexec/kdump : http://www.openseed.co.kr/1



case SYSFS_KOBJ_BIN_ATTR:

fs/sysfs/bin.c
const struct file_operations bin_fops = {
	โ€ฆ
};


โ€œnotesโ€
ELFNOTE ๋งคํฌ๋กœ๋กœ ์ง€์ •ํ•ด ์ฃผ์ง€ ์•Š์•˜๋Š”๋ฐ ์™œ .notes ์„น์…˜์ด ๋ณด์ผ๊นŒ?
	http://kernelnewbies.org/vmlinux/asm-notes
	https://lwn.net/Articles/531148/


fsnotify
flock
binfmt_XXX : binfmt_script (default), binfmt_elf, ..got

mount_single
debugfs๊ฐ™์€ ํŒŒ์ผ์‹œ์Šคํ…œ์€ ํ•˜๋‚˜์˜ instance๋ฅผ ๊ณต์œ ํ•œ๋‹ค. (test์‹œ ํ•ญ์ƒ ์ฐธ์„ ๋ฆฌํ„ด)
https://www.kernel.org/doc/Documentation/filesystems/vfs.txt


dentry ํ• ๋‹น์˜ ์˜๋ฏธ๋Š” ๋ฌด์—‡์ธ๊ฐ€? (d_alloc)
d_add


random32_init ๋ถ€ํ„ฐ...

2015.10.17

RCU_INIT_POINTER
rcu_dereference_protected(p, c)
// include/linux/rcupdate.h

2015.10.24 lookup_dcache(name, dentry *dir, flags, *need_lookup) : d_lookup()์„ ํ•˜๊ณ , ์ฐพ์œผ๋ฉด d_revalidate() ํ•ด์„œ ๋ฆฌํ„ด, ๋ชป ์ฐพ์œผ๋ฉด d_alloc ํ•ด ๋ฆฌํ„ด. lookup_real

// d_lookup์€ parent์™€ name ์Œ์„ ์‚ฌ์šฉํ•œ๋‹ค.
struct dentry *d_lookup(struct dentry *parent, struct qstr *name)
	d_lookup searches the children of the parent dentry for the name in question
	๋‚ด๋ถ€์—์„œ __d_lookup ํ˜ธ์ถœ

__d_lookup_rcu ; ์™œ rcu ๋ฒ„์ „์ด ๋ณ„๋„๋กœ ์กด์žฌํ•˜๋Š”๊ฐ€?


lru : superblock์— s_dentry_lru ๋ฆฌ์ŠคํŠธ๋กœ ๊ด€๋ฆฌ๋œ๋‹ค.


hlist_bl_add_head_rcu ํ˜ธ์ถœํ•˜๋Š” ํ•จ์ˆ˜: __d_rehash, d_obtain_alias



struct net : include/net/net_namespace.h
sturct net_generic : include/net/netns/generic.h

2015.10.31 incude/net/sock.h struct sock - network layer representation of sockets struct proto - Networking protocol blocks we attach to sockets.

include/linux/skbuff.h
struct sk_buff

include/linux/net.h
struct socket
	struct file *file
	struct sock *sk
	struct proto_ops *ops


netlink_create
netlink_kernel_create

2015.11.07

uevent_net_init
	netlink_kernel_create
		sock_create_lite
			sock_alloc


struct dentry *mount_pseudo(struct file_system_type *, char *name, super_operations, dentry_operations, magic)
 : sockfs๋Š” sockfs_ops, sockfs_dentry_operations
	

/proc/net/netlink
	group์˜ ์˜๋ฏธ

listeners์˜ ์˜๋ฏธ
	http://softengcrunch.blogspot.kr/2010/12/communicating-with-kernel-via-netlink.html


================================================================================

bus_register(amba)

struct bus_type
	struct subsys_private

struct class
	struct subsys_private

2015.11.14 driver model์—์„œ struct bus_type์˜ ์—ญํ• ์€?

* http://lxr.free-electrons.com/source/Documentation/driver-model/bus.txt


subsys_system_register(bus_type์„ ๋“ฑ๋กํ•œ๋‹ค) โ€œ/sys/bus/cpuโ€
subsys_interface_register(&s3c2440_clk_interface);		// ๋ฒ„์Šค๊ฐ€ ์ด๋ฏธ ๋“ฑ๋ก๋œ ์ƒํƒœ์—์„œ bus->subsys_private->interface์— ๋“ฑ๋กํ•œ๋‹ค.

device_create(struct class, โ€ฆ)


/proc/<PID>/maps : vma ์ •๋ณด
/proc/<PID>/pagemaps : pagetable

https://www.kernel.org/doc/Documentation/filesystems/proc.txt
https://www.kernel.org/doc/Documentation/vm/pagemap.txt


platform device ๋“ฑ๋ก์‹œ ์ง€์ •ํ•œ resource์˜ ioports์™€ iomem ํƒ€์ž… ์ •๋ณด๋Š” /proc/ioports, /proc/iomem์„ ํ†ตํ•ด ํ™•์ธํ•  ์ˆ˜ ์žˆ๋‹ค.

2015.11.21 device_init_wakeup

fsr_info (์ด ํ…Œ์ด๋ธ”์€ ARM ARM โ€˜Table B3-23 Short-descriptor format FSR encodingsโ€™ p.1406์— ํ•ด๋‹นํ•œ๋‹ค)
	do_DataAbort, do_PrefetchAbort์—์„œ ์–ด๋–ป๊ฒŒ ์‚ฌ์šฉ๋˜๋Š”๊ฐ€?

hook_fault_code, hook_ifault_code๋ฅผ ์‚ฌ์šฉํ•ด fsr_info๋ฅผ ์—…๋ฐ์ดํŠธ ํ•œ๋‹ค.
	=> architecture version์— ๋”ฐ๋ผ ๋‹ฌ๋ผ์งˆ ๊ฒฝ์šฐ
	=> hw_breakpoint ์‚ฌ์šฉ์‹œ trap ์ฒ˜๋ฆฌ๊ฐ€ ํ•„์š”ํ•  ๊ฒฝ์šฐ


.macro define_processor_functions name:req, dabort:req, pabort:req, โ€ฆ

define_processor_functions v7, dabort=v7_early_abort, pabort=v7_pabort, โ€ฆ

	=> dabort๋Š” ์–ด๋–ป๊ฒŒ ์‚ฌ์šฉ๋˜์ง€? processor_functions๋ฅผ kernel์—์„œ ๊ฐ€๋ฆฌํ‚ค๋Š” ๊ตฌ์กฐ์ฒด๊ฐ€ ์žˆ์„ํ…๋ฐ?
		arch/arm/include/asm/proc-fns.h ์˜ struct process ๋‚ด์— _data_abort๊ฐ€ ์žˆ์Œ

		arch/arm/include/asm/procinfo.h ์˜ struct proc_info_list ๋‚ด์— struct process๊ฐ€ ์žˆ์Œ

arch/arm/include/asm/glue-df.h : Data Abort Model
	v7_early  - ARMv7 generic early abort handler

	define CPU_DABORT_HANDLER v7_early_abort
	// 	arch/arm/mm/abort-ev7.S ์‚ฌ์šฉ (MULTI_DABORT๊ฐ€ ์•„๋‹ˆ๋‹ค)

arch/arm/include/asm/glue-pf.h : Prefetch Abort Model
	v7        - ARMv7: IFSR and IFAR


arch/arm/kernel/entry-armv.S
	.macro dabrt_helper

__dabt_svc:
	svc_entry
	..
	dabt_helper	-> CPU_DABORT_HANDLER // v7_early_abort -> do_DataAbort




/sys/bus/amba/drivers/mmci-pl18x # ls
bind
mb:mmci -> ../../../../devices/mb:mmci
uevent
unbind

/sys/bus/amba/drivers/mmci-pl18x # unbind โ€œmb:mmciโ€ > unbind
/sys/bus/amba/drivers/mmci-pl18x # ls
bind
uevent
unbind

/sys/bus/amba/drivers/mmci-pl18x # unbind โ€œmb:mmciโ€ > bind
mmci-pl18x mb:mmci: mmc0: PL181 manf 41 rev0 at 0x10005000 irq 41,r2 (pio)
/sys/bus/amba/drivers/mmci-pl18x # ls
bind
mb:mmci -> ../../../../devices/mb:mmci
uevent
unbind



ueventโ€ฆ
kobject_uevent
	kobject_uevent_env

2015.11.28

alignment_init
clocksource_done_booting

anon_inode_init
populate_rootfs

sched_clock_syscore_init
timer_init_syscore_ops
proc_execdomains_init
ioresource_init
uid_cache_init



/proc/kallsysms have symbols of dynamically loaded modules as well static code and system.map is symbol tables of only static code.


MODULE_VERSION(..) ์‚ฌ์šฉํ•œ ๋ชจ๋“ˆ๋“ค
	xz_dev, libata, smsc911x,tcp_cubic, 


stop_sched_class
	cpu_stopper_thread



http://criticalblue.com/news/wp-content/uploads/2013/12/linux_scheduler.pdf

#define for_each_class(class) \
	for (class = sched_class_highest; class; class = class->next)


struct sched_class
struct sched_entity

2015.12.12

timekeeping_init
clocksource_change (v2m_timer1) ๋กœ๊ทธ ์ถœ๋ ฅ
rtc ๋“œ๋ผ์ด๋ฒ„ ์ดˆ๊ธฐํ™”

๋ถ€ํŒ… ํ›„ timerkeeper์˜ ๊ฐ’์„ display ํ•ด๋ณด๋‹ˆ 1449xxxxxx.
์—ฐ๋„๋กœ ๊ณ„์‚ฐํ•˜๋ฉด 45๋…„.
UTC 1970 + 45 => 2015



stop_machine -> stop_cpus

2015.12.19 /proc//fd

eventfd ์˜ˆ์ œ
3 -> anon_inode:[eventfd]

unlink ์‚ญ์ œ ์˜ˆ์ œ
์‚ญ์ œ์ „: /home/freestyle/kernel/iamroot9C/debug_with_qemu/test_code/anon_inode/file
์‚ญ์ œํ›„: /home/freestyle/kernel/iamroot9C/debug_with_qemu/test_code/anon_inode/file (deleted)

2015.12.26

1. argument๋ฅผ ์ „๋‹ฌํ•˜๋Š” ๋ถ€๋ถ„
2. argument pool์— ํŒŒ์‹ฑ ํ•จ์ˆ˜๋ฅผ ๋“ฑ๋กํ•˜๋Š” ๋ถ€๋ถ„


parse_args(โ€œearly optionsโ€, โ€ฆ, do_early_param)
parse_args(โ€œBooting kernelโ€, โ€ฆ, unknown_bootoption)
	parse_one ํ˜ธ์ถœ
		- early_init


lds ๋งคํฌ๋กœ RO_DATA_SECTION์— ์˜ํ•ด ๊ฑฐ์˜ ๋งˆ์ง€๋ง‰์— ์•„๋ž˜์™€ ๊ฐ™์ด ๋ฐฐ์น˜
	__start__param
	*(__param)	<- __level_param_cb() / core_param(), module_param() -> ๊ฒฐ๊ตญ __module_param_call
	__end__param

lds ๋งคํฌ๋กœ INIT_SETUP์— ์˜ํ•ด ์•„๋ž˜์™€ ๊ฐ™์ด ๋ฐฐ์น˜.
	__setup_start
	*(.init.setup)	<- ์—ฌ๊ธฐ์— ๋“ฑ๋กํ•ด์ฃผ๋Š” ๋งคํฌ๋กœ๋Š” __setup()
	__setup_end

__setup ๋“ฑ๋ก





initrd
	=> initial ram disk. bootloader๊ฐ€ ๋ฉ”๋ชจ๋ฆฌ์— ํ•„์š”ํ•œ ํŒŒ์ผ์„ ์˜ฌ๋ฆฌ๊ณ  ์ปค๋„์—๊ฒŒ ์ œ์–ด๋ฅผ ๋„˜๊ธด๋‹ค. ์ปค๋„์€ real root filesystem์œผ๋กœ ๊ต์ฒดํ•ด ์ž‘์—…ํ•œ๋‹ค.
	gzipped filesystem image. ext2 ๊ฐ™์€ ํŒŒ์ผํฌ๋งท์œผ๋กœ ์ƒ์„ฑ.
initramfs => cpio

rootfs

CONFIG_BLK_DEV_INITRD=y
CONFIG_INITRAMFS_SOURCE=""
	=> initramfs ์‚ฌ์šฉ์‹œ

CONFIG_BLK_DEV_RAM
	=> block device : ext2 ๋“ฑ์œผ๋กœ ๊พธ๋ฉฐ์ง„ rootfs๋ฅผ mountํ•˜๊ธฐ ์œ„ํ•œ ์„ค์ •.


android
	CMDLINE root=/dev/ram init=/init
	rootfs.img
	/system
	/data
	/cache


์‹์‚ฌ ํ›„ ramfs ๋ถ„์„
https://www.kernel.org/doc/Documentation/filesystems/ramfs-rootfs-initramfs.txt

โ€œramfsโ€
	Ramfs๋Š” ๋งค์šฐ ๊ฐ„๋‹จํ•œ ํŒŒ์ผ์‹œ์Šคํ…œ์œผ๋กœ, ๋ฆฌ๋ˆ…์Šค์˜ ๋””์Šคํฌ ์บ์‹œ ๋งค์ปค๋‹ˆ์ฆ˜(page cache์™€ dentry cache)์„ export ํ•œ๋‹ค.
	๋™์ ์œผ๋กœ ์žฌ์กฐ์ • ๊ฐ€๋Šฅํ•œ ram-based ํŒŒ์ผ์‹œ์Šคํ…œ์ด๋‹ค.

	ramfs๋กœ ์‚ฌ์šฉ๋œ ํŽ˜์ด์ง€๋“ค์€ backing device์— ์“ฐ์ด์ง€ ๋ชปํ•˜๊ธฐ ๋•Œ๋ฌธ์— clean๋˜์ง€ ์•Š๋Š”๋‹ค.

	=> ๋ฆฌ๋ˆ…์Šค์—์„œ ๋‹ค์Œ๊ณผ ๊ฐ™์ด mount ํ•ด ์‚ฌ์šฉ ๊ฐ€๋Šฅํ•˜๋‹ค. ์ฃผ์–ด์ง„ size ์ด์ƒ์œผ๋กœ data๋ฅผ ๊ณ„์† ์“ธ ์ˆ˜ ์žˆ๋‹ค.
	mount -t ramfs -o size=5m tmpfs /mnt/ram1

	=> ๋‹ค์Œ๊ณผ ๊ฐ™์ด mount ํ•˜์˜€์„ ๊ฒฝ์šฐ, block device operation์ด ํ˜ธ์ถœ๋ ๊นŒ???
	mount -t ramfs /dev/ram0 /mnt/ram2

โ€œram diskโ€
	๋ณด๋‹ค ์˜ค๋ž˜๋œ ram disk ๋ฉ”์ปค๋‹ˆ์ฆ˜์€ RAM ์™ธ์— ๊ณต๊ฐ„์— ํŒŒ์ผ์‹œ์Šคํ…œ์„ ์œ„ํ•œ ์˜๋ฏธ๋ก ์  block device๋ฅผ ๋งŒ๋“ค๊ณ  backing store๋กœ ์‚ฌ์šฉํ•œ๋‹ค.
	block device๋Š” ๊ณ ์ •๋œ ํฌ๊ธฐ์ด๋ฏ€๋กœ ๊ทธ๊ณณ์— ๋งˆ์šดํŠธ๋œ ํŒŒ์ผ์‹œ์Šคํ…œ๋„ ๊ณ ์ •๋œ ํฌ๊ธฐ๋‹ค.
	fack block device์™€ page cache ์‚ฌ์ด์— ๋ถˆํ•„์š”ํ•œ ๋ฉ”๋ชจ๋ฆฌ ๋ณต์‚ฌ๊ฐ€ ์ผ์–ด๋‚œ๋‹ค. ์—ฌ๊ธฐ์— dentry๊นŒ์ง€ ์ƒ์„ฑํ•˜๊ณ  ์‚ญ์ œ๋œ๋‹ค.
	ext2 ๊ฐ™์ด ์ด๋ฅผ ํฌ๋งทํ•˜๊ณ  ๋ฐ์ดํ„ฐ๋ฅผ ์ฒ˜๋ฆฌํ•  ํŒŒ์ผ์‹œ์Šคํ…œ๋„ ํ•„์š”ํ•˜๋‹ค.

	ramdisk๋Š” ๋ณด๋‹ค ์œ ์—ฐํ•˜๊ณ  ํŽธ๋ฆฌํ•œ ์ธํ„ฐํŽ˜์ด์Šค๋ฅผ ์ œ๊ณตํ•˜๋Š” loopback device์˜ ๋“ฑ์žฅ์œผ๋กœ ๊ตฌ์‹์ด ๋˜์—ˆ๋‹ค.
	loopback device๋Š” ๋ฉ”๋ชจ๋ฆฌ ๋ฉ์–ด๋ฆฌ๊ฐ€ ์•„๋‹ˆ๋ผ ํŒŒ์ผ์— ์˜๋ฏธ๋ก ์  ๋ธ”๋Ÿญ ๋””๋ฐ”์ด์Šค๋ฅผ ์ƒ์„ฑํ•œ๋‹ค.

โ€œtmpfsโ€
	ramfs๋Š” ๋ชจ๋“  ๋ฉ”๋ชจ๋ฆฌ๋ฅผ ๋‹ค ์“ธ ๋•Œ๊นŒ์ง€ ๊ณ„์†ํ•ด์„œ ์“ธ ์ˆ˜ ์žˆ๊ณ , VM์€ ramfs๊ฐ€ backing store๋ฅผ ๊ฐ–์ง€ ๋ชปํ•˜๋ฏ€๋กœ free์‹œํ‚ค์ง€ ๋ชปํ•œ๋‹ค.
	root๋‚˜ ๊ถŒํ•œ์ด ๋ถ€์—ฌ๋œ user๋งŒ์ด ramfs mount์— write access๊ฐ€ ํ—ˆ์šฉ๋œ๋‹ค.

	ramfs์—์„œ ํŒŒ์ƒ๋œ tmpfs๋Š” size์ œํ•œ์„ ๋‘˜ ์ˆ˜ ์žˆ๊ณ , swap space์— data๋ฅผ ์“ธ ์ˆ˜ ์žˆ๋‹ค. ๋”ฐ๋ผ์„œ ์ผ๋ฐ˜ user๋„ ์‚ฌ์šฉ ๊ฐ€๋Šฅํ•˜๋‹ค.
	See Documentation/filesystems/tmpfs.txt.

โ€œrootfsโ€
	ramfs์˜ ํŠน๋ณ„ํ•œ ์ธ์Šคํ„ด์Šค์ด๋‹ค(tmpfs๊ฐ€ ์„ ํƒ๋˜์—ˆ๋‹ค๋ฉด tmpfs). init process๋ฅผ ์ฃฝ์ด์ง€ ๋ชปํ•˜๋Š” ๊ฒƒ๊ณผ ๊ฐ™์€ ์ด์œ ๋กœ rootfs๋ฅผ ์–ธ๋งˆ์šดํŠธ๋ฅผ ํ•  ์ˆ˜ ์—†๋‹ค.
	kernel์€ empty list๋ฅผ ์ฒ˜๋ฆฌํ•˜๋Š”๋Œ€์‹  empty list๊ฐ€ ๋  ์ˆ˜ ์—†๋„๋ก ์ฒ˜๋ฆฌํ–ˆ๋‹ค.

	๋Œ€๊ฐœ ์‹œ์Šคํ…œ์€ rootfs ์œ„๋กœ ๋‹ค๋ฅธ ํŒŒ์ผ์‹œ์Šคํ…œ์„ ๋งˆ์šดํŠธํ•˜๊ณ  rootfs๋ฅผ ๋ฌด์‹œํ•œ๋‹ค.

โ€œinitramfsโ€
	2.6๋ฒ„์ „๋ถ€ํ„ฐ gziped โ€œcpioโ€ format archive๋ฅผ ํฌํ•จ๋˜๋Š”๋ฐ, ๋ถ€ํŒ…ํ•˜๋ฉฐ rootfs์— ํ’€๋ฆฐ๋‹ค.
	์••์ถ• ํ•ด์ œ ํ›„, rootfs ๋‚ด์—์„œ init์„ ์ฐพ์•„ PID 1๋กœ ์‹คํ–‰์‹œํ‚ฌ ์ˆ˜ ์žˆ๋Š”์ง€ ๊ฒ€์‚ฌํ•˜๋Š”๋ฐ, ๊ฐ€๋Šฅํ•˜๋‹ค๋ฉด init์œผ๋กœ real root device๋ฅผ ์œ„์น˜์‹œํ‚ค๊ณ  ๋งˆ์šดํŠธ ์‹œํ‚จ๋‹ค.
	init์„ ์ฐพ์ง€ ๋ชปํ•˜๋ฉด ์ด์ „ ์ฝ”๋“œ๋ถ€๋ถ„์—์„œ root partition์„ ๋ฐฐ์น˜์‹œํ‚ค๊ณ  ๋งˆ์šดํŠธ ์‹œ์ผœ /sbin/init์˜ ๋ณ€์ข…์„ ์‹คํ–‰ํ•œ๋‹ค.

2016.01.02 unpack_to_rootfs ๋ถ„์„ ๊ณ„์†. ์žŠ๊ณ  ์žˆ๋˜ rootfs mount ํ•˜๋Š” ์ฝ”๋“œ ์ฐพ์•„๊ฐ.

https://www.kernel.org/doc/Documentation/early-userspace/buffer-format.txt
https://www.kernel.org/doc/Documentation/blockdev/ramdisk.txt

scripts/gen_initramfs_list.sh

2016.01.09 device_initcall(6) ์‹œ์ž‘. syscore๋กœ ๋“ฑ๋ก๋˜๋Š” ๊ฒƒ๋“ค : sched_clock, timer_init,

struct task_struct -> struct cred -> struct user_struct

/sys/kernel/slab : ์ „์ฒด slab ์ •๋ณด. merge๋œ ๊ฒƒ๋“ค์€ alias๋กœ ๋ณด์—ฌ์ค€๋‹ค.
/proc/slabinfo : merge๋˜๊ณ  ์‹ค์ œ ์กด์žฌํ•˜๋Š” slab cache ๋ชฉ๋ก๋งŒ ๋ณด์—ฌ์ค€๋‹ค.


__copy_to_user, __copy_from_user ๋ถ„์„ ํ•ด๋ณด์žโ€ฆ

2016.01.16 timer_create : timer clock_gettime : clock

gettimeofday : CLOCK_REALTIME๊ณผ ๊ฐ™์€ ํŠน์„ฑ. deprecated.

process_cpu_timer_create : break point๋ฅผ ๊ฑธ์–ด๋„ ํ˜ธ์ถœ๋˜์ง€ ์•Š์•˜์Œ.



suspend_devices_and_enter
	dpm_suspend_start



tick_suspend
	struct tick_device *td = &__get_cpu_var(tick_cpu_device);

	struct tick_device {
		struct clock_event_device *evtdev;	// tick_setup_device์—์„œ ์„ค์ •
		enum tick_device_mode mode;
	};


percpu_timer_setup
	struct clock_event_device *evt = &per_cpu(percpu_clockevent, cpu);




qemu ์‹คํ–‰์ˆœ์„œ trace
1. schedule_preempt_disabled()
2. __schedule()
3. __schedule()
4. __schedule()
5. kthreadd
6. kernel_init



==============================================================================================================================
- twd_handler ๋“ฑ๋ก๊ณผ์ •
	MACHINE VEXPRESS
		.init_irq = v2m_init_irq
			ct_desc->init_irq() => ct_ca9x4_init_irq => gic_init, ca9x4_twd_init => twd_local_timer_register => twd_local_timer_common_register
				request_percpu_irq(twd_ppi, twd_handler, โ€œtwdโ€, twd_evt)

- twd_handler ํ˜ธ์ถœ์ˆœ์„œ
	gic_handle_irq
		handle_IRQ
			generic_handle_irq
				generic_handle_irq_desc
					handle_percpu_devid_irq (irq=29) ; PPI
						twd_handler

- twd_handler ๋™์ž‘
	if twd_timer_ack()
		evt->event_handler(evt);	// evt๋Š” clock_event_device

- event_handler์˜ ๋“ฑ๋ก ์œ„์น˜
	tick_set_periodic_handler		// periodicํ•œ clock event (tick๊ณผ ๊ฐ™์€)์— ๋Œ€ํ•œ ์ด๋ฒคํŠธ ํ•ธ๋“ค๋Ÿฌ ์ง€์ •.
		if (!broadcast)
			dev->event_handler = tick_handle_periodic;	// core๋งˆ๋‹ค ํ˜ธ์ถœ๋œ๋‹ค.
		else
			dev->event_handler = tick_handle_periodic_broadcast;

- tick_handle_periodic
	tick_periodic(cpu);
		if (tick_do_timer_cpu == cpu) {
			tick_next_period = ktime_add(tick_next_period, tick_period);
			do_timer(1);
				jiffies_64 += ticks;
				update_wall_time();
				calc_global_load(ticks);
		update_process_times(user_mode(get_irq_regs()));
		profile_tick(CPU_PROFILING);


Q. scheduler๋Š” ๊ฐ cpu๋งˆ๋‹ค ์ˆ˜ํ–‰๋˜๋‚˜?
	__schedule์€ ๊ทธ๋ƒฅ ํ•จ์ˆ˜. ํ˜ธ์ถœ ๊ฐ€๋Šฅํ•œ ์‹œ์ ์ด ๋˜๋ฉด ์–ด๋А core์—์„œ๋“  ํ˜ธ์ถœ๋œ๋‹ค. runqueue์— ๋“ฑ๋ก๋œ task ์ค‘ ํ•˜๋‚˜๋ฅผ ์„ ํƒํ•ด ํ˜ธ์ถœํ•œ๋‹ค.
Q. scheduler_tick์ด๋ž€?
Q. tick์€ ๊ฐ core๋งˆ๋‹ค ๋ฐœ์ƒํ•˜๋‚˜?
==============================================================================================================================

* request_percpu_irq์™€ enable_percpu_irq ๋ถ„์„ํ•  ๊ฒƒ

	twd_timer_setup			// twd_local_timer_common_register์—์„œ request_percpu_irq๋กœ twd ๋“ฑ๋ก ํ›„ local_timer_register์—์„œ๋ถ€ํ„ฐ ํ˜ธ์ถœ.
		clockevents_config_and_register(clk, twd_timer_rate, 0xf, 0xffffffff); //clock_event_device๋ฅผ ์„ค์ •ํ•˜๊ณ  ๋“ฑ๋กํ•œ๋‹ค.
		enable_percpu_irq(clk->irq, 0)

==============================================================================================================================

์ธํ„ฐ๋ŸฝํŠธ(irq) ํ˜ธ์ถœ ์ˆœ์„œ

__irq_svc		// setup_arch์—์„œ handle_arch_irq์— machine์˜ handle_irq๋ฅผ ์ €์žฅํ•ด ๋‘์—ˆ๋‹ค.
	=handle_arch_irq ๋กœ pc์ด๋™

MACHINE VEXPRESS
	.handle_irq = gic_handle_irq
		handle_IRQ

handle_IRQ(irq, regs)
	old_regs = set_irq_regs(regs)
	irq_enter()				// add_preempt_count(HARDIRQ_OFFSET)
	generic_handle_irq(irq)
	irq_exit()				// 
	set_irq_regs(old_regs)

generic_handle_irq
	generic_handle_irq_desc(irq, desc)
		irq_desc desc = irq_to_desc(irq)
		generic_handle_irq_desc(irq, desc)
			desc->handle_irq(irq, desc)

2016.01.23

shift + v ๋กœ ์ฃผ์„ ์‹œํ‚ฌ ๋ผ์ธ์€ ์„ ํƒ ํ•œ ํ›„์—.. -> ( ์ด๊ฑฐ ๊ทธ๋Œ€๋กœ ์ž…๋ ฅ ํ•˜๋ฉด ๋จ ) :norm i// ์ฝœ๋ก  + norm + i + / + / ์ฝœ๋ก ์€ ํ™•์žฅ ์ž…๋ ฅ๋ชจ๋“œ๋กœ .. norm ์€ normal์˜ ์•ฝ์–ด ( ์ผ๋ฐ˜ ๋ชจ๋“œ์—์„œ ํ‚ค ์ž…๋ ฅ ์ƒํƒœ๋กœ ๋น„์Šทํ•˜๊ฒŒ ๋งŒ๋“ฌ ) i < insert ?? ๋ฌด์—‡์„ ?? / / <-- ์š”๊ฑฐ 2๊ฐœ๋ฅผ ๊ทธ๋Ÿผ // ์ฃผ์„ ์™„์„ฑ ํ•ด์ œ : ์ฃผ์„์ด ์žˆ๋Š” ๋งจ ์•ž๋ถ€๋ถ„์— ์ปค์„œ๋ฅผ ์œ„์น˜ํ•˜๊ณ  ์œ„์™€ ๋งˆ์ฐฌ๊ฐ€์ง€๋กœ :norm xx ( x๋Š” ํ•œ๊ธ€์ž ์ง€์šฐ๊ธฐ. )

http://briancarper.net/blog/165.html

2016.01.30 ps axjf <- parent process๊นŒ์ง€ ์ถœ๋ ฅ. pstree <- kthread๋Š” ์•ˆ ๋ณด์ธ๋‹ค.

static int __init init_workqueues(void)
early_initcall(init_workqueues);
	create_worker
		kthread_create_on_node
		kthread_create

schedule_work
	queue_work
		queue_work_on

2016.02.06 kernel_init init_post run_init_process

2016.02.13 busybox: ps -o pid,ppid,comm parrent ๋ณด๊ธฐ

2016.02.20 * need_resched ํ”Œ๋ž˜๊ทธ ์„ค์ •ํ•˜๋Š” ๋ถ€๋ถ„(resched_task()) - scheduler_tick()์—์„œ ํ”„๋กœ์„ธ์Šค๊ฐ€ ํƒ€์ž„์Šฌ๋ผ์ด์Šค๋ฅผ ๋ชจ๋‘ ์†Œ๋น„ํ•œ ๊ฒฝ์šฐ

	- try_to_wake_up()์—์„œ ํ˜„์žฌ ํ”„๋กœ์„ธ์Šค๋ณด๋‹ค ๋†’์€ ์šฐ์„  ์ˆœ์œ„๋ฅผ ๊ฐ€์ง„ ํ”„๋กœ์„ธ์Šค๊ฐ€ ๊นจ์–ด๋‚œ ๊ฒฝ์šฐ
		check_preempt_curr()


* TIF์— NEED_RESCHED๋ฅผ ๊ธฐ๋กํ•˜๋Š” ๋ถ€๋ถ„
tick_handle_periodic
	tick_periodic		// periodic tick์ธ ๊ฒฝ์šฐ
		update_process_times
			scheduler_tick
				curr->sched_class->task_tick()



.task_tick = task_tick_fair;	// scheduler tick hitting a task of our scheduling class:

for_each_sched_entity(se)
	entity_tick

		check_preempt_tick	// fair.c: Preempt the current task with a newly woken task if needed:
			resched_task	// TIF์— need_resched set.



* resched_task๋ฅผ ํ˜ธ์ถœํ•˜๋Š” ๋‹ค๋ฅธ ์˜ˆ

ttwu_do_wakeup
	check_preempt_curr()



* TIF์—์„œ NEED_RESCHED๋ฅผ ๊ฒ€์‚ฌํ•ด ํ˜ธ์ถœํ•˜๋Š” ๋ถ€๋ถ„
	__irq_svc
		svc_entry
		irq_handler
		TIF check ํ›„ svc_preempt
			preempt_schedule_irq
				local_irq_enable
				__schedule
				local_irq_disable
		svc_exit


	ret_to_user
		_TIF_WORK_MASK(
		work_pending



* timer์— ์˜ํ•œ ๋™์ž‘
	run_local_timers
		raise_softirq(TIMER_SOFTIRQ)
			raise_softirq_irqoff
				wakeup_softirqd
					wake_up_process
						try_to_wake_up
	
		

__switch_to์—์„œ ๋งˆ์ง€๋ง‰์— pc ๋ฐ”๋€Œ๋ฉด ์ดํ›„์˜ ์ฝ”๋“œ๊ฐ€ ์ˆ˜ํ–‰์ด ์•ˆ ๋˜๋Š” ๊ฒŒ ์•„๋‹Œ๊ฐ€?

2016.02.27

2016.03.05 vector_stub : irq, dabt, pabt, und vector_swi sys_call_table / sys_oabi_call_table arm_syscall ; ํŠน๋ณ„ํ•œ syscall๋“ค

sys_syscall
	sys_fork_wrapper
	sys_clone_wrapper


sys_fork_wrapper
	add r0, sp, #S_OFF	@ r4/r5 register


(function call์ด ์•„๋‹ˆ๋ผ exception์œผ๋กœ r4, r5๋ฅผ ๋ ˆ์ง€์Šคํ„ฐ๋กœ ์ „๋‹ฌ)

do_forkโ€ฆ

2016.03.12 rtmutex

2016.03.19

2016.03.26

2016.04.02

task_struct.prio: 0-99 -> Realtime 100-140 -> Normal priority

ps/stat "prio" field: task_struct.prio - MAX_RT_PRIO (100) (-100)-(-1) -> Realtime 0-40 -> Normal Priority

stat "rt_priority" field: 0 -> normal 1-99 -> realtime

stat "policy" field: 0 -> SCHED_OTHER (normal) 1 -> SCHED_FIFO 2 -> SCHED_RR (realtime)

2016.04.16

switch_new_context
	__new_context
	cpu_switch_mm



http://www.2cto.com/os/201411/349997.html
https://lwn.net/Articles/383162/
http://barriosstory.blogspot.kr/2010/02/improve-scalability-of-rmap-scanning-of.html

anon_vma_chain_link

2016.04.23

__do_page_fault
	vma = find_vma(mm, addr)
	handle_mm_fault(mm, vma, โ€ฆ)
		handle_pte_fault
			do_swap_page
			do_wp_page

2016.04.30

* vm_file
	

* struct address_space์˜ ์—ญํ• ?

ํŽ˜์ด์ง€ ์บ์‹œ : ๋””์Šคํฌ์— ์ ‘๊ทผ์‹œ ์‹œ๊ฐ„์ด ์˜ค๋ž˜ ๊ฑธ๋ฆฌ๋ฏ€๋กœ ์ฝ์€ ๋ฐ์ดํ„ฐ๋ฅผ ๋ฌผ๋ฆฌ์  ํŽ˜์ด์ง€์— ์ €์žฅํ•˜๋Š” ๊ธฐ๋ฒ•.
	- ๋””์Šคํฌ์— ์ ‘๊ทผํ•œ ๋ฐ์ดํ„ฐ๋งŒ ํŽ˜์ด์ง€ ์บ์‹œ๋กœ ์ƒ์„ฑ๋˜๋‚˜?
ํ•œ ๋ฒˆ ์ ‘๊ทผํ•œ ๋ฐ์ดํ„ฐ๋ฅผ ๋‹ค์‹œ ์ ‘๊ทผํ•˜๋ฉด ์ด ํŽ˜์ด์ง€ ์บ์‹œ๋ฅผ ๋’ค์ ธ page๋ฅผ ๋ฆฌํ„ดํ•˜๋Š”๋ฐ, ๋ฌผ๋ฆฌ์  ๋ธ”๋Ÿญ์ด ์บ์‹œ๋œ ํŽ˜์ด์ง€๋ฅผ ์ฐพ์„ ๋•Œ address_mapping์ด๋ผ๋Š” ๊ตฌ์กฐ์ฒด๋ฅผ ์ฐธ์กฐํ•˜๊ฒŒ ๋œ๋‹ค.

ํŽ˜์ด์ง€๊ฐ€ ํŽ˜์ด์ง€ ์บ์‹œ๋กœ ์‚ฌ์šฉ๋  ๋•Œ, address_space๋ฅผ mapping์ด๋ผ๋Š” ํฌ์ธํ„ฐ๋กœ ๊ฐ€๋ฆฌํ‚จ๋‹ค.

task_struct->file->f_dentry->d_inode->i_mapping
	
	find_get_page(mapping, index);		


pgoff_t index; /* Our offset within mapping. */


vfs์—์„œ file์„ ์ฝ์–ด์˜ฌ ๊ฒฝ์šฐ
1) page cache๋ฅผ ์•ˆ ์“ธ ๊ฒฝ์šฐ
2) page cache๋ฅผ ์“ธ ๊ฒฝ์šฐ


* address_space_operations
	๊ฐ fs๋งˆ๋‹ค ๊ตฌํ˜„ํ•ด ๋†“๋Š”๋‹ค.


address_space๋Š” page cache๋ฅผ ๊ตฌํ˜„ํ•˜๋Š”, ์•„์ฃผ ์ค‘์š”ํ•œ ์—ญํ• ์„ ํ•˜๋Š” ๊ตฌ์กฐ์ฒด์ด๋‹ค.
๋˜ํ•œ ์‹ค์ œ๋กœ ํŒŒ์ผ์‹œ์Šคํ…œ์—์„œ ๋ธ”๋ก์„ ์ฝ๊ธฐ ์œ„ํ•œ ๊ธฐ๋Šฅ์„ ๊ตฌํ˜„ํ•˜๋Š” address_space_operations ๊ตฌ์กฐ์ฒด๋ฅผ ํฌํ•จํ•˜๊ธฐ๋„ ํ•œ๋‹ค. 

file_operations ๊ตฌ์กฐ์ฒด๋Š” ์‘์šฉ ํ”„๋กœ๊ทธ๋ž˜๋จธ๋“ค์ด ์ผ๋ฐ˜์ ์œผ๋กœ ์‚ฌ์šฉํ•˜๋Š” open, read, write, close, ioctl, mmap ๋“ฑ๊ณผ ์—ฐ๊ด€๋˜๋Š” ํ•จ์ˆ˜์ด๋‹ค.
์ผ๋ฐ˜์ ์œผ๋กœ file_operations์˜ ํ•จ์ˆ˜๋“ค์€ ์ตœ์ข…์ ์œผ๋กœ address_space_operations์˜ ํ•จ์ˆ˜๋“ค์„ ํ˜ธ์ถœํ•ด ์‹ค์ œ์ ์œผ๋กœ block device์—์„œ ํŽ˜์ด์ง€๋“ค์„ ์ฝ๊ฒŒ ๋œ๋‹ค.




inode <-> address_space 1:1 ๋งคํ•‘

- page cache ๋Š” struct address_space ๊ตฌ์กฐ์ฒด๋กœ ๊ด€๋ฆฌ๋ฉ๋‹ˆ๋‹ค.
- ๋ณดํ†ต inode์— ๋“ค์–ด ์žˆ๋Š”๋ฐ์š”, i_mapping ๋ณ€์ˆ˜ ์ž…๋‹ˆ๋‹ค.
- struct address_space ๊ตฌ์กฐ์ฒด์˜ page_tree ๋ฅผ ๋”ฐ๋ผ๊ฐ€์‹œ๋ฉด ํ˜„์žฌ page cache ์—์„œ ์บ์‰ฌํ•˜๊ณ  ์žˆ๋Š” ๊ฐ page ๋ฅผ ์•Œ์ˆ˜ ์žˆ์Šต๋‹ˆ๋‹ค. (๋”ฐ๋ผ๊ฐ€๋Š” ๋ถ€๋ถ„์€ linux ์†Œ์Šค ์ฝ”๋“œ๋ฅผ ํ™•์ธํ•ด ๋ณด์„ธ์š”)
- ์ฆ‰, struct page ๊ตฌ์กฐ์ฒด ์ •๋ณด๋ฅผ ์•Œ์ˆ˜ ์žˆ๋Š”๋ฐ์š”, struct page ๊ตฌ์กฐ์ฒด ์ •๋ณด์—์„œ ์‹ค์ œ ๋ฌผ๋ฆฌ์ฃผ์†Œ ์ •๋ณด๋ฅผ ์•Œ์•„๋‚ด์‹ค์ˆ˜ ์žˆ์„๋“ฏ ํ•ฉ๋‹ˆ๋‹ค.





raw_spin_is_contended ์— ๋Œ€ํ•œ ์ฃผ์„ ๋‹ฌ๊ธฐ

2016.05.14

ํ–ฅํ›„ ์ง„ํ–‰ ๋ฐฉํ–ฅ
	- copy_process ๋ถ„์„ ํ›„ ๋ฆฌ๋ˆ…์Šค ์ปค๋„์˜ ์ดํ•ด๋ฅผ ๊ธฐ์ค€์œผ๋กœ ์†Œ์Šค ๋ถ„์„
	- aarch64 ํ•™์Šต
์ฑ…

2016.05.21 ํœด์‹

2016.05.28 - aarch64 chapter 3๊นŒ์ง€ ๋ฆฌ๋”ฉ

check_and_switch_context
	if (!((mm->context.id ^ cpu_last_asid) >> ASID_BITS))
	else if (irqs_disabled())
	else
		switch_new_context(mm);		
			__new_context(mm);
				โ€ฆ
				flush_context();
				smp_call_function(reset_context, NULL, 1);
			cpu_switch_mm(mm->pgd, mm);


contextidr p.1532

config PID_IN_CONTEXTIDR ์„ค์ •๋œ ๊ฒฝ์šฐ์—๋งŒ. (vexpress๋Š” ์„ค์ • ์•ˆ ๋˜์–ด ์žˆ์Œ)
	Say Y here only if you are planning to use hardware trace tools with this kernel.

contextidr_notifier
	asm volatile(
	"   mrc p15, 0, %0, c13, c0, 1\n"	// read contextidr
	"   bfi %1, %0, #0, %2\n"
	"   mcr p15, 0, %1, c13, c0, 1\n"
	: "=r" (contextidr), "+r" (pid)
	: "I" (ASID_BITS));







mmap, vm_area_struct

https://www.kernel.org/pub/linux/kernel/people/mochel/doc/papers/ols-2005/mochel.pdf

http://hoonycream.tistory.com/entry/udev
http://iamhjoo.tistory.com/17
https://www.google.co.kr/search?q=uevent&biw=1276&bih=705&tbm=isch&imgil=bk36wyBJFm-jqM%253A%253BedAaDp9ntgpLsM%253Bhttp%25253A%25252F%25252Fshunzi610747304.blog.163.com%25252Fblog%25252Fstatic%25252F509997342011101441223936%25252F&source=iu&pf=m&fir=bk36wyBJFm-jqM%253A%252CedAaDp9ntgpLsM%252C_&usg=__SjFnjwDdQuKIGTJbNBsO-gLXiaA%3D#imgrc=bk36wyBJFm-jqM%3A&usg=__SjFnjwDdQuKIGTJbNBsO-gLXiaA%3D
https://www.kernel.org/doc/pending/hotplug.txt
http://com.odroid.com/sigong/nf_board/nboard_view.php?brd_id=odroidseven&bid=415
http://stackoverflow.com/questions/22803469/uevent-sent-from-kernel-to-user-space-udev
http://blog.daum.net/baramjin/16011051
https://www.google.co.kr/?gws_rd=ssl#q=sysfs+attribute+groups
http://kroah.com/log/blog/2013/06/26/how-to-create-a-sysfs-file-correctly/

zlam, zcache?

=================================================================== ๊ณต๋ถ€ํ•  ๊ฒƒ

์ ˆ๋ฐ˜์€ aarch64 ๋ถ„์„ / ์ ˆ๋ฐ˜์€ ์ปค๋„ ์†Œ์Šค ๋ถ„์„


1) AArch64 ๋ถ„์„
	: ARMv8 Architecture PG (296 pages)   ์—์ƒ(5~6์žฅ)
		์šฉ๋„๊ฐ€ ๋‹ค์–‘ํ•˜๋‹ค.
		- kernel
		- llvm backend
		- hypervisor

2) ์ปค๋„ ์†Œ์Šค๋ถ„์„
	: ํ˜„์žฌ ๋ถ„์„ ์ค‘์ธ ์ฝ”๋“œ
		do_fork๋Š” ๋ถ„์„ ์™„๋ฃŒ. ์ถ”๊ฐ€๋กœ ๋ณผ function์— ๋Œ€ํ•œ ๊ฒƒ ์ถ”๊ฐ€๋กœ ๋ช‡ ๊ฐ€์ง€.

	: OS ์ฑ…
		understanding linux kernel 3ํŒ (2.6) ํ•™์Šต & ํ˜„์žฌ ์ปค๋„์—์„œ ์ฝ”๋“œ๋กœ ํ™•์ธ (20์ฃผ)

	: embedded os ์ „์ฒด๋ฅผ ๋ณผ ์ˆ˜ ์žˆ์œผ๋ฉฐ / ๋ถ„์„ํ•˜๊ธฐ ํŽธํ•œ ์ž‘์€ ์ฝ”๋“œ๋Ÿ‰ (์ „์›์ฐฌ๋‹˜)
	  aarch64, ๋งŽ์€ Document, ๋งŽ์€ ๊ฐœ๋ฐœ์ž๊ฐ€ ๋ถ™์–ด ์žˆ๋Š” os (๋‚˜)

	  secure os - Trusty TEE(์ถ”ํ›„ ๊ณต๊ฐœ ์˜ˆ์ •, os?), tlk, OP-TEE/optee-os, OPEN-TEE
	  micro kernel - freeRTOS, uCOS
             hypervisor(kvm) -
	  IoT os - 
	  automotive os - 

	: new version linux
		์‹œ๊ฐ„์ด ๋งŽ์ด ํ•„์š”ํ•จ. ์ƒˆ๋กœ ๋ณด๋Š” ์ˆ˜์ค€์ผ ๊ฒƒ

* ๋ณผ ๊ฒƒ
	do_fork
		copy_process
			copy_mm
				dup_mm
					dup_mmap
						copy_page_range
							copy_pud_range
								copy_pmd_range
									copy_pte_range

	์ „์›์ฐฌ๋‹˜
	* TEE OS ๊ฐ™์€ ์†Œ์Šค ๋ถ„์„
	* aarch64 ๋ณด๋Š” ๋ฐฉ์‹, ๋ฒ”์œ„
		memory ๊ด€๋ฆฌ?


	do_execve
	user mode์—์„œ kernel๋กœ ์ „ํ™˜ํ•˜๋Š” ๊ณผ์ • <-> ret_to_user

	scheduler
	context_switch

	initcall ๋‚˜๋จธ์ง€ ๋ถ€๋ถ„๋“ค

	vma ๊ด€๋ฆฌ

	smp์—์„œ ๋‹ค๋ฅธ core๋“ค์ด idle ์ƒํƒœ์—์„œ ์–ธ์ œ ์Šค์ผ€์ฅด๋ง ๋˜์–ด ํƒœ์Šคํฌ๋ฅผ ์‹คํ–‰ํ•˜๋Š”์ง€


* cpuset
* cgroup

* RCU

* page fault
* interrupt handling

* scheduling
	sched_class / sched_domain, sched_group

* context switching

* vm_area_struct
	.vm_ops

=================================================================== SIDE MENU

o Compiler

[llvm]
	http://www.slideshare.net/Hybrid0/llvm-28276305
	http://www.aosabook.org/en/llvm.html
	http://en.wikipedia.org/wiki/LLVM
	http://clang.llvm.org/
	http://llvm.linuxfoundation.org/index.php/Main_Page

[gcc]

	http://gcc.gnu.org/onlinedocs/gccint
	http://gcc.gnu.org/onlinedocs/gcc/Debugging-Options.html

GCC Intermediate Representations (1) - GENERIC (tree)
	http://studyfoss.egloos.com/5117056
GCC Intermediate Representations (2) - GIMPLE
	http://studyfoss.egloos.com/5167039

spec ํŒŒ์ผ ์ด์šฉํ•˜๊ธฐ
	http://studyfoss.egloos.com/5037962
gcc ์‹คํ–‰ ๊ณผ์ •
	http://studyfoss.egloos.com/5273402
์ปดํŒŒ์ผ๋Ÿฌ ์‹คํ–‰ ๊ณผ์ •
	http://studyfoss.egloos.com/5274269
์ปดํŒŒ์ผ ์ตœ์ ํ™” ๊ณผ์ • - ๊ฐœ์š”
	http://studyfoss.egloos.com/5276500

o c++ LLVM C++ ์ฝ”๋”ฉ ๊ฐ€์ด๋“œ๋ผ์ธ ์‚ดํŽด๋ณด๊ธฐ http://minjang.egloos.com/2794928

ACCESS_ONCE() cpu_relax() asm volatile

=================================================================== INITCALL

arch_initcall(customize_machine); machine_desc->init_machine(); // .init_machine = v2m_init, v2m_init // .init_tile = ct_ca9x4_init, ct_ca9x4_init // l2x0_init outer_cache.inv_range = l2x0_inv_range;

include/linux/init.h

__define_initcall()์„ ์žฌ์ •์˜. ํ˜ธ์ถœ ์ˆœ์„œ์™€ section์ด ์ง€์ •
initcall_debug๋ฅผ enable ํ•ด์„œ ์‚ดํŽด๋ณผ ๊ฒƒ.


kernel_init
	do_basic_setup
		do_initcalls // loop์„ ๋Œ๋ฉด์„œ ๋“ฑ๋กํ•œ ํ•จ์ˆ˜๋“ค ์ˆœ์ฐจ์ ์œผ๋กœ ์‹คํ–‰
			do_initcall_level

valgrind/duma ftrace

aarch64 https://gcc.gnu.org/onlinedocs/gcc/AArch64-Options.html

iometer - gdb, strace๋ฅผ ์‚ฌ์šฉํ•˜์ž.

kthreadd (2) ksoftirqd (13) // /proc/13/status๋ฅผ ์—ด์–ด๋ณด๋‹ˆ PPid๊ฐ€ 2.

pstree -lnps

init -> systemd

strace : system call trace ltrace : library call trace

๋ณต๊ท€์ฃผ์†Œ์— ๋Œ€ํ•œ ์„ค๋ช…. exception ๋ฐœ์ƒ์‹œ์ ์ด ์–ธ์ œ์ธ๊ฐ€, ์ข…๋ฅ˜์— ๋”ฐ๋ผ ์–ด๋””๋กœ ์ด๋™ํ•ด์•ผ ํ•˜๋‚˜. http://forum.falinux.com/zbxe/?mid=lecture_tip&sort_index=regdate&order_type=desc&document_srl=552369

Cortex-A series Programming Guide ์ค‘์—์„œ

The steps taken to handle an interrupt are as follows:

  1. An IRQ exception is raised by external hardware. The processor performs several steps automatically. The contents of the PC in the current execution mode are stored in LR_IRQ. The CPSR register is copied to SPSR_IRQ. The bottom byte of the CPSR is updated to change to IRQ mode, and to disable IRQ which prevent further exceptions from occurring. The PC is set to IRQ entry in the vector table. ํ˜„์žฌ ์‹คํ–‰ ๋ชจ๋“œ์˜ PC๊ฐ€ LR_IRQ(banked register)์— ์ €์žฅ๋˜๊ณ , CPSR์€ SPSR_IRQ์— ๋ณต์‚ฌ๋œ๋‹ค. CPSR์˜ ํ•˜์œ„๋ฐ”์ดํŠธ(5๋น„ํŠธ์งœ๋ฆฌ ์‹คํ–‰ ๋ชจ๋“œ)๊ฐ€ IRQ ๋ชจ๋“œ๋กœ ๋ณ€๊ฒฝ๋˜๊ณ , ์ถ”๊ฐ€์ ์ธ code๊ฐ€ ์‹คํ–‰๋˜์–ด IRQ๋ฅผ ๋ง‰๋Š”๋‹ค. ๋ฒกํ„ฐ ํ…Œ์ด๋ธ”์˜ IRQ entry๋กœ PC๊ฐ€ ๋ณ€๊ฒฝ๋œ๋‹ค.

  2. The instruction at the IRQ entry in the vector table (a branch to the interrupt handler) is executed. ๋ฒกํ„ฐ ํ…Œ์ด๋ธ”์˜ IRQ entry์˜ ๋ช…๋ น์ด ์‹คํ–‰๋œ๋‹ค (์ธํ„ฐ๋ŸฝํŠธ ํ•ธ๋“ค๋Ÿฌ๋กœ ์ ํ”„)

  3. The interrupt handler saves the context of the interrupted program (that is, it pushes onto the stack any registers which will be corrupted by the handler). ์ธํ„ฐ๋ŸฝํŠธ ํ•ธ๋“ค๋Ÿฌ๋Š” ์ธํ„ฐ๋ŸฝํŠธ๋œ ํ”„๋กœ๊ทธ๋žจ์˜ context๋ฅผ ์ €์žฅํ•œ๋‹ค (ํ•ธ๋“ค๋Ÿฌ์— ์˜ํ•ด ๋ณ€๊ฒฝ๋˜๋Š” ๋ ˆ์ง€์Šคํ„ฐ๋“ค์ด ์Šคํƒ์— ์ €์žฅ๋œ๋‹ค)

  4. The interrupt handler determines which interrupt source needs to be processed and calls the appropriate device driver. ์ธํ„ฐ๋ŸฝํŠธ ํ•ธ๋“ค๋Ÿฌ๋Š” ์–ด๋–ค ์ธํ„ฐ๋ŸฝํŠธ ์†Œ์Šค๊ฐ€ ์ฒ˜๋ฆฌ๋˜์–ด์•ผ ํ•˜๋Š”์ง€ ๊ฒฐ์ •ํ•˜๊ณ , ์ ํ•ฉํ•œ ๋””๋ฐ”์ด์Šค ๋“œ๋ผ์ด๋ฒ„๋ฅผ ํ˜ธ์ถœํ•œ๋‹ค.

  5. Finally, the SPSR_IRQ is copied back into the CPSR, which switches the system back to the previous execution mode. At the same time, the PC is restored from the LR_IRQ. ๋งˆ์ง€๋ง‰์œผ๋กœ, SPSR_IRQ๊ฐ€ CPSR๋กœ ๋ณต์›๋˜๊ณ , ์ด์ „ ์‹คํ–‰ ๋ชจ๋“œ๋กœ ๋Œ์•„๊ฐ„๋‹ค. ๋™์‹œ์— PC๋Š” LR_IRQ๋กœ๋ถ€ํ„ฐ ๋ณต์›๋œ๋‹ค.

__irq_svc: svc_entry irq_handler

svn_preempt

svc_exit




.macro

UNWIND(.fnstart) UNWIND(.save {r0 - pc}) <-> stmfd sp!, {r0 - pc} UNWIND(.fnend)

http://www.univ-orleans.fr/SCIENCES/INFO/RESSOURCES/webada/doc/gnat/as_9.html http://sourceware.org/ml/libc-ports/2013-04/msg00094.html

Stack type Push Pop Full descending STMFD (STMDB) LDMFD (LDMIA) ; push์‹œ ๊ฐ์†Œ๋ถ€ํ„ฐ ํ•˜๊ณ  ์ €์žฅ, pop์‹œ ์ฆ๊ฐ€๋ถ€ํ„ฐ ํ•˜๊ณ  ๋กœ๋“œ. Full ascending STMFA (STMIB) LDMFA (LDMDA) Empty descending STMED (STMDA) LDMED (LDMIB) Empty ascending STMEA (STMIA) LDMEA (LDMDB)

http://infocenter.arm.com/help/index.jsp?topic=/com.arm.doc.dui0068b/Cacbgchh.html http://www.davespace.co.uk/arm/introduction-to-arm/stack.html

Moving interrupts to threads http://lwn.net/Articles/302043/

=================================================================== CCI-400

  • Additional reading

This section lists publications by ARM and by third parties. See Infocenter, http://infocenter.arm.com, for access to ARM documentation. See onARM, http://www.onarm.com, for embedded software development resources including the Cortex Microcontroller Software Interface Standard (CMSIS).

  • ARM publications โ€ข AMBAยฎ AXIโ„ข and ACEโ„ข Protocol Specification, AXI3โ„ข, AXI4โ„ข, AXI4-Liteโ„ข, ACE and ACE-Liteโ„ข (ARM IHI 0022).

    โ€ข CoreLink CCI-400 Cache Coherent Interconnect Integration Manual (ARM DII 0264). โ€ข CoreLink CCI-400 Cache Coherent Interconnect Implementation Guide (ARM DII 0263). โ€ข CoreLink QVN Protocol Specification (ARM IHI 0063).

=================================================================== Cache

* http://www.cse.unsw.edu.au/~cs9242/08/lectures/04-cachex6.pdf
* http://en.wikipedia.org/wiki/CPU_cache
* What every programmer should know about memory, Part 1
	http://lwn.net/Articles/250967/
	http://lwn.net/Articles/252125/
	http://lwn.net/Articles/253361/

=================================================================== raspberry pi

- kernel์„ local buildํ•  ์ˆ˜ ์žˆ๋‹ค๊ณ  ํ•˜๋Š”๋ฐ, ๋ฌด์Šจ ๋ง์ด์ง€?

NO-HZ.txt ์ •๋ฆฌ

scheduling-clock interrupts๋ฅผ ๋งค๋‹ˆ์ง• ํ•˜๋Š” 3๊ฐ€์ง€ ์ฃผ์š” ๋ฐฉ๋ฒ• (scheduling-clock ticks ๋˜๋Š” ๊ฐ„๋‹จํžˆ ticks)

  1. scheduling-clock ticks๋ฅผ ์ƒ๋žตํ•˜์ง€ ์•Š๋Š” ๋ฐฉ๋ฒ• (CONFIG_HZ_PERIODIC=y or CONFIG_NO_HZ=n)
  2. scheduling-clock ticks๋ฅผ idle CPUs์—์„œ ์ƒ๋žตํ•˜๋Š” ๋ฐฉ๋ฒ• (CONFIG_NO_HZ_IDLE=y or CONFIG_NO_HZ=y)
  3. scheduling-clock ticks๋ฅผ idle ์ƒํƒœ์ด๊ฑฐ๋‚˜ ํ•˜๋‚˜์˜ runnable task๋งŒ ์žˆ์„ ๋•Œ ์ƒ๋žตํ•˜๋Š” ๋ฐฉ๋ฒ• (CONFIG_NO_HZ_FULL=y)

3๋ฒˆ์€ HPC ์ข…๋ฅ˜์˜ workloads์ด๊ฑฐ๋‚˜ realtime applications์—์„œ๋งŒ ์“ฐ์ด๊ณ , ๋ณดํ†ต 2๋ฒˆ์„ ์„ ํƒ.

tmux

#! /bin/sh

Yubin totally wrote this one herself

Run this script outside of tmux!

for name in tmux ls -F '#{session_name}'; do tmux setenv -g -t $name DISPLAY $DISPLAY #set display for all sessions done

tmux setenv -g -t DISPLAY $DISPLAY

https://www.linux.co.kr/home2/board/bbs/board.php?bo_table=lecture&wr_id=1642&sca=1&sca2=32

Concurrency Problem

  • interrupt
  • preemptive
  • smp
  • delayed function

Kernel locking

  • semaphore
  • mutex
  • semaphore (์ปค๋„ ๋ฒ„์ „์— ๋”ฐ๋ผ ๋‹ฌ๋ผ์กŒ๋‹ค) include/linux/semaphore.h kernel/semaphore.c

    • DEFINE_SEMAPHORE(name) โ€ฆ ์ •์  ์ดˆ๊ธฐํ™”
    • sema_init(&sem, init_val) โ€ฆ ๋™์  ์ดˆ๊ธฐํ™”
    • down_interruptible(), down_killable(), up() down_interruptible()์„ ๋ณด๋ฉด raw_spin_lock_irqsave() if (sem->count > 0) // 0๋ณด๋‹ค ํฌ๋ฉด ์ง„์ž… sem->count--; else __down_interruptible(); raw_spin_unlock_irqrestore()

$ tree arch/arm/ arch/arm/ |-- Kconfig |-- Kconfig-nommu |-- Kconfig.debug |-- Makefile |-- boot |-- common |-- configs |-- crypto |-- firmware |-- include |-- kernel |-- kvm |-- lib โ€ฆ |-- mach-vexpress โ€ฆ |-- mm |-- net |-- nwfpe |-- oprofile |-- plat-iop |-- plat-omap |-- plat-orion |-- plat-pxa |-- plat-samsung |-- plat-versatile |-- probes |-- tools |-- vdso |-- vfp `-- xen

$ tree arch/arm/boot arch/arm/boot |-- Image |-- Makefile |-- bootp | |-- Makefile | |-- bootp.lds | |-- init.S | |-- initrd.S | -- kernel.S |-- compressed | |-- Makefile | |-- ashldi3.S | |-- atags_to_fdt.c | |-- big-endian.S | |-- bswapsdi2.S | |-- debug.S | |-- decompress.c | |-- efi-header.S | |-- head-sa1100.S | |-- head-sharpsl.S | |-- head-xscale.S | |-- head.S // compressed/head.S | |-- hyp-stub.S | |-- lib1funcs.S | |-- libfdt_env.h | |-- ll_char_wr.S | |-- misc.c | |-- piggy.S | |-- piggy.gzip | |-- string.c | |-- vmlinux | |-- vmlinux.lds | -- vmlinux.lds.S |-- dts |-- install.sh `-- zImage

arch/arm/kernel |-- Makefile |-- arch_timer.c |-- armksyms.c |-- asm-offsets.c |-- atags.h |-- atags_compat.c |-- atags_parse.c |-- atags_proc.c |-- bios32.c |-- calls.S |-- cpuidle.c |-- crash_dump.c |-- debug.S |-- devtree.c |-- dma-isa.c |-- dma.c |-- early_printk.c |-- efi.c |-- elf.c |-- entry-armv.S |-- entry-common.S |-- entry-ftrace.S |-- entry-header.S |-- entry-v7m.S |-- fiq.c |-- fiqasm.S |-- ftrace.c |-- head-common.S |-- head-nommu.S |-- head.S // kernel/head.S |-- hibernate.c |-- hw_breakpoint.c |-- hyp-stub.S |-- insn.c |-- io.c |-- irq.c |-- isa.c |-- iwmmxt.S |-- jump_label.c |-- kgdb.c |-- machine_kexec.c |-- module-plts.c |-- module.c |-- module.lds |-- opcodes.c |-- paravirt.c |-- patch.c |-- perf_callchain.c |-- perf_event_v6.c |-- perf_event_v7.c |-- perf_event_xscale.c |-- perf_regs.c |-- pj4-cp0.c |-- process.c |-- psci_smp.c |-- ptrace.c |-- reboot.c |-- reboot.h |-- relocate_kernel.S |-- return_address.c |-- setup.c |-- signal.c |-- sigreturn_codes.S |-- sleep.S |-- smccc-call.S |-- smp.c |-- smp_scu.c |-- smp_tlb.c |-- smp_twd.c |-- stacktrace.c |-- suspend.c |-- swp_emulate.c |-- sys_arm.c |-- sys_oabi-compat.c |-- tcm.c |-- thumbee.c |-- time.c |-- topology.c |-- traps.c |-- unwind.c |-- v7m.c |-- vdso.c |-- vmlinux-xip.lds.S |-- patch.c |-- perf_callchain.c |-- perf_event_v6.c |-- perf_event_v7.c |-- perf_event_xscale.c |-- perf_regs.c |-- pj4-cp0.c |-- process.c |-- psci_smp.c |-- ptrace.c |-- reboot.c |-- reboot.h |-- relocate_kernel.S |-- return_address.c |-- setup.c |-- signal.c |-- sigreturn_codes.S |-- sleep.S |-- smccc-call.S |-- smp.c |-- smp_scu.c |-- smp_tlb.c |-- smp_twd.c |-- stacktrace.c |-- suspend.c |-- swp_emulate.c |-- sys_arm.c |-- sys_oabi-compat.c |-- tcm.c |-- thumbee.c |-- time.c |-- topology.c |-- traps.c |-- unwind.c |-- v7m.c |-- vdso.c |-- vmlinux-xip.lds.S |-- vmlinux.lds |-- vmlinux.lds.S `-- xscale-cp0.c

arch/arm/common |-- Kconfig |-- Makefile |-- bL_switcher.c |-- bL_switcher_dummy_if.c |-- dmabounce.c |-- firmware.c |-- icst.c |-- it8152.c |-- locomo.c |-- mcpm_entry.c |-- mcpm_head.S |-- mcpm_platsmp.c |-- sa1111.c |-- scoop.c |-- sharpsl_param.c |-- vlock.S `-- vlock.h

$ tree arch/arm64 -L 1 | less arch/arm64 |-- Kconfig |-- Kconfig.debug |-- Kconfig.platforms |-- Makefile |-- boot |-- configs |-- crypto |-- include |-- kernel |-- kvm |-- lib |-- mm |-- net `-- xen

pcp๋Š” cache ์—ญํ• .

ํŽ˜์ด์ง€ ์บ์‹ฑ์šฉ active -> inactive

lru ratio?

evictable์˜ ๊ธฐ์ค€์€?

putback_lru_page

shrink_lruvec

  1. zone์˜ lruvec์˜ ์šฉ๋„๋Š”?
  2. ๋ชจ๋“  ํ• ๋‹น๋œ ํŽ˜์ด์ง€๋Š” lruvec์— ๋“ค์–ด๊ฐ€๋‚˜? => ํŽ˜์ด์ง€ ์บ์‹œ๋กœ ์‚ฌ์šฉ๋˜๋Š” ํŽ˜์ด์ง€๋“ค๋งŒ ๋“ค์–ด๊ฐ„๋‹ค.

ํ•จ์ˆ˜๋ถ„์„

shrink_zone shrink_list shrink_inactive_list // shrink๋Š” ์ •๋ฆฌํ•  ๋•Œ๊ณ , ์–ธ์ œ zone์˜ lruvec์— ๋“ค์–ด๊ฐ€๋‚˜?

lru_add_drain // ํ˜„์žฌ cpu์˜ lru list์— ๋จธ๋ฌผ๋Ÿฌ ์žˆ๋˜ ํŽ˜์ด์ง€๋“ค์„ zone์˜ ํ•ด๋‹น lru list๋กœ ์˜ฎ๊ธด๋‹ค. ์™œ? ์–ธ์ œ? lru_add_drain_cpu(cpu) // Drain pages out of the cpu's pagevecs. activate_page_drain(cpu) // cpu์˜ lru list์—์„œ ์ œ๊ฑฐํ•˜๊ณ , active์‹œ์ผœ zone์˜ lru list์— ๋“ฑ๋ก์‹œํ‚จ๋‹ค. pagevec_lru_move_fn(pvec, __activate_page, NULL)

add_to_page_cache_lru add_to_page_cache lru_cache_add_file

lru_cache_add_anon lru_cache_add_file lru_cache_add_lru __lru_cache_add

__lru_cache_add(page, lru) // lru์— ์ฒ˜์Œ ์ถ”๊ฐ€ํ•˜๋Š” ํ•จ์ˆ˜ pagevec *pvec = &get_cpu_var(lru_add_pvecs)[lru]; // pagevec์€ page๋ฅผ vector(๋ฉ์–ด๋ฆฌ)๋กœ ๊ด€๋ฆฌํ•˜๊ธฐ ์œ„ํ•œ ๊ตฌ์กฐ. (cold๋Š” lruvec์—์„œ๋งŒ ์‚ฌ์šฉ) if (!pagevec_add(pvec, page)) __pagevec_lru_add(pvec, lru)

vmap() ๋Œ€์‹  vm_map_ram() <- shortly life์ธ ๊ฒƒ์ด ์“ฐ๋Š” ๊ฒŒ ์ข‹๋‹ค. per-cpu

purge_fragmented_blocks_allcpus ๋ชฉ์  ๋ง๊ณ , ๊ตฌ์กฐโ€ฆ

Why must I enable the MMU to use the D-Cache but not for the I-Cache? http://infocenter.arm.com/help/index.jsp?topic=/com.arm.doc.faqs/ka13835.html

โš ๏ธ **GitHub.com Fallback** โš ๏ธ