Linux kernel 분석 QnA

1 linux 3.2 커널 부팅과정 순서 (파일)
2 리눅스는 파라미터를 어떻게 전달받는가?
3 GAS 어셈에서 1f, 1b가 의미하는 바는?
4 상수 뒤에 붙는 ULL과 LL은 무엇인가?
5 x86 커널이 함수를 호출할때 사용하는 함수 호출 규약은?
6 (rdmsr) MSR의 0xc0010015의 15번 비트를 0으로 하는건 어떤 의미인가?
7 int 0x15 AX=0xec00, BX=2로 Long mode Taget을 지정하는건 어떠한 의미인가?
8 inline 어셈에서 output, input쪽에 있는 "=" "+" "&"는 무엇인가?
9 Int 0x15 AX=0xe980, Intel speed step 인터럽트는 어떤 기능을 하는가?
10 query_MCA 에서 MCA(micro channel architecture) 정보는 무엇이고 왜 가져오는가?
11 return a=x; 이 리턴하는 것은 x인가 true 인가?
12 비디오카드에서 DAC란 무엇인가?
13 *%eax에서 *를 왜 넣는가?
14 offsetof란??????
15 kbuild.h에서 ->의 의미
16 세그먼트에 32비트값을 넣을수 있는가?
17 32모드에서 call할때의 크기
18 MSR_IA32_MISC_ENABLE
19 PAE의 크기가 어떻게 36비트를 쓰는가?
20 RIP 상대 주소 지정방법 (RIP-relative addressing)
21 GOT란?
22 __builtin_constant_p
23 MSR_GS_BASE
24 do { } while (0) 이 많이 쓰이는 이유
25 inline 어셈에서 세번째 : 필드(list of clobbered registers)에서 "memory"와 "cc"의 의미
26 inline 어셈에서 %h0, %b0 오퍼렌드의 의미
27 inline 어셈에서 output, input 필드에 붙는 제약 ex) "=a"
28 #define __percpu_arg(x) __percpu_prefix "%P" #x
29 #define # ##
30 cgroup이란?
31 css_set
32 응용프로그램에서 LMA와 VMA를 다르게 하면?
33 매크로 함수의 리턴값
34 RCU
35 sparse란 무엇인가?
36 __force_order
37 as의 .pushsection, .popsection, .previous
38 const struct cpu_dev *const *cdev
39 const int와 int *const의 차이
40 intel vt 에서 ldtr을 세팅하면 intel vt가 happy한 이유는?
41 cpu family 값
42 native_read_msr 의 EAX_EDX_VAL 매크로에서 32/64비트를 "a" "d", "A"를 나눠놓은 이유
43 atomic_set 에서 mov에는 왜 lock prefix가 안붙는가?
44 __this_fixmap_does_not_exist 함수는 선언이 안되있는 이유?
45 Write combining이란?

1 linux 3.2 커널 부팅과정 순서 (파일)

SETUP : boot/header.S - boot/main.c - boot/pm.c - boot/pmjump.S
COMPRESSED : compressed/head₆₄.S
KERNEL : kernel/head₆₄.S - kernel/head64.c - init/main.c

2 리눅스는 파라미터를 어떻게 전달받는가?

boot loader로부터 커널의 header에 위치한 BOOT PROTOCOL 정보를 통해 전달받는다.

header.s

boot protocol

3 GAS 어셈에서 1f, 1b가 의미하는 바는?

앞의 숫자는 임시 레이블, f는 앞, b는 뒤쪽에서 가장 가까운 레이블 주소를 나타낸다.

GAS Symbol names

4 상수 뒤에 붙는 ULL과 LL은 무엇인가?

unsigned long long integer, long long integer (최소 64bit) 로 데이터형을 명시한다.

C FAQS 한글 번역-Basic type

5 x86 커널이 함수를 호출할때 사용하는 함수 호출 규약은?

regparm으로 정의한다. gcc 32비트에서 regparm=3 이면 EAX, EDX, ECX를 인자전달에 사용한다.

calling convention

6 (rdmsr) MSR의 0xc0010015의 15번 비트를 0으로 하는건 어떤 의미인가?

MSR은 CPU에 따른 정보를 변경/저장하는 레지스터로 AMD의 0xC0010015는 HWCR(Hardware Configuration Register)이다. 15번 비트가 0이면 SSE instruction을 켠다.

BIOS and Kernel Developer's Guid for AMD Athlon 64

7 int 0x15 AX=0xec00, BX=2로 Long mode Taget을 지정하는건 어떠한 의미인가?

BIOS에 앞으로 Long mode(64bits)로 동작할것을 알려준다. Long mode로 변환하기 전에 BSP에 의해 한번만 실행된다.

8 inline 어셈에서 output, input쪽에 있는 "=" "+" "&"는 무엇인가?

"="는 write-only, "+"는 read-write, "&"는 early clobber(값이 바뀔수 있음을 알려준다.)

<info:gcc:Modifiers>

9 Int 0x15 AX=0xe980, Intel speed step 인터럽트는 어떤 기능을 하는가?

eax, ebx, ecx, edx에 값을 리턴한다. bx의 입력값은 0은 ownership, 0은 get, 1은 set 기능이다. (GET_SPEEDSTEP_STATE)

10 `query_MCA` 에서 MCA(micro channel architecture) 정보는 무엇이고 왜 가져오는가?

IBM의 ISA 후속 규격. EISA에 밀려 대중적이지는 않았던걸로 보인다. ah=0xc0, int 0x15는 시스템의 정보를 읽어오는것으로 보인다.

11 return a=x; 이 리턴하는 것은 x인가 true 인가?

12 비디오카드에서 DAC란 무엇인가?

digital-to-analog 변환. D-sub등 아날로그 출력에서 필요한 과정으로 보인다.

13 %eax에서 를 왜 넣는가?

절대주소를 쓸때 * 접두사를 붙여야한다. 기본은 상대주소를 사용한다.

Opcode syntax

14 offsetof란??????

offsetof(a,b) 일때 a 구조체 내부의 b의 오프셋을 구하는 매크로

15 kbuild.h에서 ->의 의미

->로 시작하는 부분을 sed를 사용해 define으로 치환한다. (ex. -> a b c 는 #define a b * c * 로 변환된다.)

kbuild

16 세그먼트에 32비트값을 넣을수 있는가?

mov ds,cx와 mov ds,ecx를 컴파일 했을때의 기계어 코드는 같다. segment selector의 크기는 16비트기 때문에 같은 결과가 들어가는 것으로 보인다.

17 32모드에서 call할때의 크기

기본적으로 memory operand는 32비트다. 일반 호출시 스택에 증감되는 값도 32비트일 것이다.

18 `MSR_IA32_MISC_ENABLE`

x87 FPU 명령어 지원여부 P4에서 지원한다. 최근에는 지원하지 않는것 같다.

Intel manual Vol3, Table B-13

19 PAE의 크기가 어떻게 36비트를 쓰는가?

PAE의 엔트리의 물리 메모리 크기 제한(bits)은 아키텍쳐에 따라 가변적이다. Pentium pro에서부터 36비트를 지원하고 최대 52비트다. (테이블 크기가 4K라면 정렬로 엔트리에는 40비트 사용) CPUID.80000008H:EAX[7:0]의 MAXPHYADDR로 크기를 얻을수 있다.

msdn

Intel manual Vol.3 4.1.4, 4.4

20 RIP 상대 주소 지정방법 (RIP-relative addressing)

64비트 모드는 기본 오퍼렌드 크기가 32, 어드레스 크기는 64이기 때문에 다음 명령어 위치에 상대적인 주소지정법인 RIP 지정법이 생겼다.

64비트 멀티코어 OS 원리와 구조, p.87

21 GOT란?

ELF포맷의 영역중 하나. global offset table. 자매품 plt(procedure linkage table)도 있다.

22 `__builtin_constant_p`

상수면 1을 반환한다. (define…)

23 `MSR_GS_BASE`

Long mode에서 세그먼트 레지스터의 base, limit는 무시된다. 하지만 예외적으로 MSR을 통해 fs와 gs의 base 주소를 변경할수 있다. (FS.base (C000_0100h), GS.base (C000_0101h)

osdev - x86-64

24 do { } while (0) 이 많이 쓰이는 이유

복잡한 형태의 매크로를 사용가능하게 해주고 if else 문에서 ;이 와도 깨지는걸 방지한다.
링크의 예제에서 gcc에서 사용가능한 ({..})의 마지막 라인의 lcl; 은 리턴값이 된다.
#define FOO(arg) ({ \ typeof(arg) lcl; \ lcl = bar(arg); \ lcl; \ })

리눅스 커널에서 do while(0)을 쓰는 이유

Statements and Declarations in Expressions

25 inline 어셈에서 세번째 : 필드(list of clobbered registers)에서 "memory"와 "cc"의 의미

"memory"는 메모리, "cc"는 condition code register(FLAGS register)가 변경되었음을 뜻한다.

GCC:Extended ASM

GCC-inline ASM HOWTO

26 inline 어셈에서 %h0, %b0 오퍼렌드의 의미

%a0 - memory addressed by register operand 0
%A0 - operand 0 with a "*" prefix
%b0 - 8bit form of register operand 0 (al)
%B0 - gives "b"
%c0 - operand 0, without $ prefix
%h0 - high 8 bit form of register operand 0 (ah)
%k0 - 32bit form of register operand 0 (eax)
%l0 - operand 0 as label
%L0 - gives "l"
%n0 - negate operand 0 without $ prefix
%O0 - nothing
%P0 - same as %c0
%q0 - 64bit form of register operand 0 (rax)
%Q0 - gives "l"
%s0 - operand 0 with a comma appended
%S0 - gives "s"
%t0 - only usable on immediate operands, does nothing?
%T0 - gives "t"
%w0 - 16 bit form of register operand 0 (ax)
%W0 - gives "w"
%x0 - same as %w0
%y0 - same as %k0
%z0 - Opcode suffix based on operand 0 size (b, w, l), example asm ("mov%z1 %1, %0" : "=r"(ret) : "r"(val));

Operands in gcc inline assembly

A brief tutorial on GCC inline asm (x86 biased)

27 inline 어셈에서 output, input 필드에 붙는 제약 ex) "=a"

"m" : A memory operand is allowed, with any kind of address that the machine supports in general.
"o" : A memory operand is allowed, but only if the address is offsettable. ie, adding a small offset to the address gives a valid address.
"V" : A memory operand that is not offsettable. In other words, anything that would fit the `m’ constraint but not the `o’constraint.
"i" : An immediate integer operand (one with constant value) is allowed. This includes symbolic constants whose values will be known only at assembly time.
"n" : An immediate integer operand with a known numeric value is allowed. Many systems cannot support assembly-time constants for operands less than a word wide. Constraints for these operands should use ’n’ rather than ’i’.
"g" : Any register, memory or immediate integer operand is allowed, except for registers that are not general registers.
"r" : Register operand constraint, look table given above.
"q" : Registers a, b, c or d.
"I" : Constant in range 0 to 31 (for 32-bit shifts).
"J" : Constant in range 0 to 63 (for 64-bit shifts).
"K" : 0xff.
"L" : 0xffff.
"M" : 0, 1, 2, or 3 (shifts for lea instruction).
"N" : Constant in range 0 to 255 (for out instruction).
"f" : Floating point register
"t" : First (top of stack) floating point register
"u" : Second floating point register
"A" : Specifies the `a’ or `d’ registers. This is primarily useful for 64-bit integer values intended to be returned with the `d’ register holding the most significant bits and the `a’ register holding the least significant bits.

GCC-inline asm HOWTO

<info:gcc:Simple Constraints>

<info:gcc:Machine Constraints>

28 #define `__percpu_arg(x)` `__percpu_prefix` "%P" #x

#define __percpu_arg(x)               __percpu_prefix "%P" #x

__percpu_prefix 는 percpu 자료구조가 있는 gs의 segment prefix다. "%P" #x 는 operand 숫자를 뜻한다. %P는 i386에 종속적인 지시자로 추측한다.

실제 예를 들면

#define percpu_to_op(op, var, val) 의 경우는 오퍼랜드 크기에 따라 바이트,워드등으로 변환하는데

              asm(op "b %1,"__percpu_arg(0)           \
                  : "+m" (var)                        \
                  : "qi" ((pto_T__)(val)));           \

op는 mov등의 명령어가 넘어오고 __percpu_arg(0) 은 인자 %0과 같다.

percpu_xx_op 의 인자는 (명령어, dest,src)로 인텔 어셈과 유사하다.

case로 처리를 해줘서 오퍼랜드 크기등에 신경쓸 필요 없다.

29 #define # ##

#define onesharp(x,y) x #y
#define twosharp(x,y) x ##y

#는 string으로 결합한다. onesharp("hello",world) == "helloworld"
##는 변수명으로 결합한다. twosharp(my,precious) == myprecious

30 cgroup이란?

cpu, 메모리, 네트워크등 다양한 자원을 마운트해서 그룹별로 제한 가능하다. (Control groups)
init/main.c에서 root cgroups와 =css_set=을 초기화한다.

cgroups kernel document cgroups - wikipedia

31 `css_set`

cgroups subsystem state
cgroups에는 자원별 subsystem이 있는데 이를 관리하기 위한 자료구조로 추측. (좀 더 봐야함)

32 응용프로그램에서 LMA와 VMA를 다르게 하면?

LMA(물리메모리)는 무시될 것이다.

33 매크로 함수의 리턴값

링크를 요약하면 마지막 문장이 일반 함수에서의 리턴값과 의미가 비슷하나 C++에서는 사용을 자제하는게 좋다. ({ … })는 gcc의 확장기능이다.

http://kldp.org/node/58409

34 RCU

RCU(read-copy-update)란 리스트나 트리구조에서 자료를 보호하기 위한 락킹이다.
자료구조를 읽는동안 쓰려고 하면 복제및 링크를 변경해 보호하고 복사본이 원본이 되고 원본은 적절한 시점에 제거한다.

RCU wiki

http://onestep.tistory.com/32

35 sparse란 무엇인가?

sparse는 리눅스 커널을 위해 만든 코드 체크용 툴이다. 아래와 같이 사용한다.

__attribute__((address_space(num)))

compiler.h

sparse는 메모리 모델중 하나이다. 메모리 섹션이 나누어져 있어 특정 섹션을 online, offline 시킬수 있다.

iamroot sparse memory 관련 질답

36 `__force_order`

clobber의 "memory" 표시는 성능을 저해하고 volatile만으로 컴파일러 reordering을 막기엔 부족하다.
arch/x86/include/asm/system.h 주석 참조

force order 사용이유

37 as의 .pushsection, .popsection, .previous

.pushsection은 현재 섹션을 스택에 넣고 현재 섹션을 뒤에오는 section,subsection으로 바꾼다.
.popsection은 스택에서 마지막 섹션을 빼서 현재 섹션에 넣는다.
.previous는 가장 최근의 section/subsection으로 바꾼다.

GAS - assembler directive

38 const struct `cpu_dev` const cdev

a pointer to const pointer to const struct
cdev 포인터만 바꿀수 있다. *cdev와 **cdev등은 const

39 const int와 int *const의 차이

  +---------------------------------------------------------------+
  |Const usage |Meaning          |Description                     |
  |------------+-----------------+--------------------------------|
  |const int   |Pointer to a     |Value pointed to by x can’t     |
  |*x;         |const int        |change                          |
  |------------+-----------------+--------------------------------|
  |int * const |Const pointer to |x cannot point to a different   |
  |x;          |an int           |location.                       |
  |------------+-----------------+--------------------------------|
  |const int   |Const pointer to |Both the pointer and the value  |
  |*const x;   |a const int      |pointed to cannot change.       |
  +---------------------------------------------------------------+

MSDN : What are the differences between const int*, int * const, and const int * const?

40 intel vt 에서 ldtr을 세팅하면 intel vt가 happy한 이유는?

intel vt는 인텔의 가상화 기술이다. 보호모드 일때 intel vt는 완전한 초기화가 이루어지지 않는 상황을 싫어한다. 그래서 ldt와 tr는 잘 사용하지 않는데도 0과 더미값으로 초기화시켜준다.

Subject: x86 setup: initialize LDTR and TR to make life easier to Intel VT

pmjump.S

41 cpu family 값

CPUID EAX=1은 cpu의 tfms(type, family, model, stepping)값을 반환한다.
Intel은 486=4, Pentium=5, Pentium Pro/II/III=6, P4(netburst)=15, 이후 core microarchitecture부터는 6값으로 회귀했다. 아이테니엄은 7, 16, 17이다.
AMD는 Am5x86=4, K5/6=5, Athlon(K7)=6, Athlon64(K8)=15, K10=16, Bobcat=20, Bulldozer=21

42 `native_read_msr` 의 `EAX_EDX_VAL` 매크로에서 32/64비트를 "a" "d", "A"를 나눠놓은 이유

x86의 일부 명령어는 edx:eax 레지스터를 사용한다. gcc 인라인 어셈의 output/input 필드에서 "A" 는 32비트에서는 edx:eax를 나타내지만 64비트에서는 rdx 또는 rax를 나타내기 때문에 제대로 동작하지 않는다.
"a"와 "d"를 나눠서 처리하는 것보다 "A"로 알려주는게 변수할당이나 소스 길이(속도)등에서 이득이다.

msr.h

43 `atomic_set` 에서 mov에는 왜 lock prefix가 안붙는가?

아키텍쳐에서 load/store와 레지스터끼리의 연산은 원자성을 보장한다.

44 `__this_fixmap_does_not_exist` 함수는 선언이 안되있는 이유?

인자가 상수값으로 들어오면 최적화 옵션을 주면 에러를 내지 않는다. 혹은 함수 내부에서 둘 다 상수를 사용하면 최적화 옵션을 주지 않아도 에러를 내지 않는다.

45 Write combining이란?

x86의 캐시정책중 하나이다. x86_64 아키텍쳐에는 UC, WT, WP, WC, WB의 다섯가지 캐시정책이 있고 MTRR(memory type range registers)와 PAT(Page attribute table)로 메모리의 캐시정책을 결정한다.

이중 WC는 순차적인 쓰기에서 뛰어난 성능을 보인다.