Linux Kernel Stack Overflow/Linux 内核栈溢出

发布时间:2024年01月13日

不同于Linux应用程序的栈能够动态增长,Linux内核栈是固定的,并且比较小,比如Linux 2.6.x内核,在X86 32位架构上一般是4K或8K(在进行内核编译时,Kernel hacking下进行配置,默认8K),而在X86 64位架构上固定为8K。Linux内核会分配一页(4K stack)或两页连续(8K stack)不可交换(non-swappable)内存来作为内核栈使用。Linux 2.4.x内核在X86位架构上,内核栈固定为8K。

当一个进行运行在内核态时(比如通过系统调用),它就将开始使用它自己的内核栈,如果内核栈大小为8K,那么此时的触发的中断处理也将使用这个栈,如果内核栈大小为4K,那么此时的触发的中断处理则将使用单独的内核栈。

由于内核栈大小固定且比较小,很容易出现内核栈溢出的情况,所以不能在内核代码里使用递归调用(除非你非常清楚它递归的层次,但仍建议将递归改为循环,因为谁也不知道将来哪一天递归的层次是否会发生变化),也不建议使用较大或大小未知的栈变量(比如动态数组)等。

由于task_struct和内核栈共用同一块内存区域,所以内核栈溢出最直接的后果就是把task_struct结构体踩坏,在linux下,这个结构体是至关重要的,每一个进程都是由这个task_struct数据结构来定义,它也就是我们通常所说的PCB,它是用来对进程进行控制的唯一手段,也是最有效的手段;但这是kernel 2.4.x的情况,下面代码来之kernel 2.4.37.11:

1

2

3

4

5

6

7

8

9

10

11

12

13

14

15

16

17

18

19

20

21

22

23

union task_union {

????struct task_struct task;

????unsigned long stack[INIT_TASK_SIZE/sizeof(long)];

};

struct task_struct {

????/*

?????* offsets of these are hardcoded elsewhere - touch with care

?????*/

????volatile long state;??? /* -1 unrunnable, 0 runnable, >0 stopped */

????unsigned long flags;??? /* per process flags, defined below */

????int sigpending;

????mm_segment_t addr_limit;??? /* thread address space:

????????????????????????0-0xBFFFFFFF for user-thead

????????????????????????0-0xFFFFFFFF for kernel-thread

?????????????????????*/

????struct exec_domain *exec_domain;

????volatile long need_resched;

????unsigned long ptrace;

????int lock_depth;???? /* Lock depth */

...

};

到了kernel 2.6.x之后,和内核栈共用同一块内存区域的不再是task_struct,而是结构体thread_info,下面代码来之kernel 2.6.38.8:

1

2

3

4

5

6

7

8

9

10

11

12

13

14

15

16

17

18

19

20

21

22

23

24

union thread_union {

????struct thread_info thread_info;

????unsigned long stack[THREAD_SIZE/sizeof(long)];

};

struct thread_info {

????struct task_struct? *task;????? /* main task structure */

????struct exec_domain? *exec_domain;?? /* execution domain */

????__u32?????????? flags;????? /* low level flags */

????__u32?????????? status;???? /* thread synchronous flags */

????__u32?????????? cpu;??????? /* current CPU */

????int???????? preempt_count;? /* 0 => preemptable,

???????????????????????????<0 => BUG */

????mm_segment_t??????? addr_limit;

????struct restart_block??? restart_block;

????void __user???? *sysenter_return;

#ifdef CONFIG_X86_32

????unsigned long?????????? previous_esp;?? /* ESP of the previous stack in

???????????????????????????case of nested (IRQ) stacks

????????????????????????*/

????__u8??????????? supervisor_stack[0];

#endif

????int???????? uaccess_err;

};

不过由于thread_info结构体的第一个字段就是task_struct指针,所以内核栈溢出的话同样会损坏task_struct,因为此时task指针指向一个不可预知的地址,相应的task_struct结构体各个字段数据当然也就都是垃圾数据了。

看图示更直观:


一旦内核栈溢出,间接导致task_struct结构体里的数据异常,那么就会导致系统处于一种不稳定状态(当然,一般情况也就是宕机):


当在查宕机问题时,如果定位到是由于thread_info结构体或task_struct结构体里数据异常导致(比如引起系统宕机的指令是访问task_struct结构体变量的某个字段),那么就要优先考虑是否由内核栈下溢引起宕机。对于栈的保护,Linux内核提供了一系列选项,比如DEBUG_STACKOVERFLOW、CC_STACKPROTECTOR等,但这些都只是辅助手段,要如何定位到更具体的位置呢?内核函数对栈空间的预留和应用层没有什么两样,同样是移动esp或rsp,所以我们可以先找出所有这些预留的地方,看哪个地方预留得最多,也就是该函数占用的栈空间最多,那么就是最有可能引发内核栈溢出的地方。
示例:

1

2

3

4

5

6

7

8

9

10

11

12

13

14

15

16

17

18

19

20

21

22

23

24

25

26

27

28

29

30

31

32

33

34

35

36

37

38

39

40

41

42

43

44

45

46

47

48

49

50

51

/**

?* kernel_stack.c

?*/

#include<linux/kernel.h>

#include<linux/types.h>

#include<linux/stddef.h>

#include<linux/module.h>

#define NUM 256

static int just_copy_half(int *array, int count)

{

????int i;

????int another_array[count];

????for (i = 0; i < count; i ++) {

????????another_array[i] = array[i];

????}

????return 0;

}

static int __init kernel_stack_init(void)

{

????int ret = 0;

????int total = 0;

????int array[NUM];

????int i;

????for (i = 0; i < NUM; i ++)

????????array[i] = i;

?????????

????for (i = 0; i < NUM; i ++)

????????total += array[i];

?????????

????just_copy_half(array, NUM/2);

?????????

????printk("Total: %d\n", total);

????return ret;

}

static void __exit kernel_stack_fini(void)

{

????//Do Nothing

????return;

}

module_init(kernel_stack_init);

module_exit(kernel_stack_fini);

MODULE_LICENSE("GPL");

MODULE_AUTHOR("lenky0401 at gmail dot com");

1

2

3

4

5

6

7

8

9

10

11

12

13

14

15

16

17

18

19

20

21

22

23

24

25

26

27

28

29

30

31

32

33

34

35

# Makefile

MDIR = $(shell pwd)

ifeq (, $(KSRC))

????KSRC := /usr/src/linux-2.6.36

endif

ifeq (, $(PROJECT_DIR))

????PROJECT_DIR := $(PWD)/../

endif

module := kernel_stack

obj-m := $(module).o

srcs =? $(wildcard, *.c)

$(module)-objs := $(addsuffix .o, $(basename $(srcs)))

EXTRA_CFLAGS += $(FLAG) -I$(PROJECT_DIR)/inc -I${SHAREDHDR} -I$(KERNELHDR) -O2 -D__KERNEL__ -DMODULE $(INCLUDE) -DEXPORT_SYMTAB

TARGET = $(module).ko

all:

????make -C $(KSRC) M=$(MDIR) modules

debug:

????make EXTRA_FLAGS="${EXTRA_CFLAGS} -DDEBUG" -C $(KSRC) M=$(MDIR) modules

clean:

????make -C $(KSRC) M=$(MDIR) clean

install: all

????cp -f $(TARGET) $(INSTALL_DIR)

1

2

3

4

5

6

7

8

9

10

11

12

13

14

15

16

17

18

[root@localhost kernel_stack]# ls

kernel_stack.c? Makefile

[root@localhost kernel_stack]# make

make -C /usr/src/linux-2.6.36 M=/home/lenky/modules/kernel_stack modules

make[1]: Entering directory `/usr/src/linux-2.6.36'

??CC [M]? /home/lenky/modules/kernel_stack/kernel_stack.o

??Building modules, stage 2.

??MODPOST 1 modules

??CC????? /home/lenky/modules/kernel_stack/kernel_stack.mod.o

??LD [M]? /home/lenky/modules/kernel_stack/kernel_stack.ko

make[1]: Leaving directory `/usr/src/linux-2.6.36'

[root@localhost kernel_stack]# ls

kernel_stack.c?? kernel_stack.mod.c? kernel_stack.o? modules.order

kernel_stack.ko? kernel_stack.mod.o? Makefile??????? Module.symvers

[root@localhost kernel_stack]# objdump --source kernel_stack.ko > kernel_stack.s

[root@localhost kernel_stack]# cat kernel_stack.s | grep sub | grep rsp

??13:?? 48 29 c4??????????????? sub??? %rax,%rsp

???3:?? 48 81 ec 00 04 00 00??? sub??? $0x400,%rsp

可以看到,一个地方(kernel_stack_init函数)的栈占去$0x400(有几个局部变量是直接使用的寄存器,所以才没有消耗栈),而另外一个地方(just_copy_half函数)的栈占去%rax是个不定值,这就需要继续看对应的汇编来进行分析(因为数组是动态数组),这只是个示例,如果在真实内核代码中有这样的函数,那是相当危险的。更多细节不说,相关汇编代码:

1

2

3

4

5

6

7

8

9

10

11

12

13

14

15

16

17

18

19

20

21

22

23

24

25

26

27

28

29

30

31

32

33

34

35

36

37

38

39

40

41

42

43

44

45

46

47

48

49

50

51

52

53

54

55

56

57

58

59

60

61

62

63

64

65

66

67

68

69

70

71

72

73

74

75

76

77

78

79

80

81

82

83

84

85

86

87

88

89

90

91

92

93

94

95

96

97

98

kernel_stack.ko:???? file format elf64-x86-64

Disassembly of section .text:

0000000000000000 <just_copy_half>:

static int just_copy_half(int *array, int count)

{

????int i;

????int another_array[count];

???0:?? 48 63 c6??????????????? movslq %esi,%rax

???3:?? 55????????????????????? push?? %rbp

???4:?? 48 8d 04 85 1e 00 00??? lea??? 0x1e(,%rax,4),%rax

???b:?? 00

???c:?? 48 89 e5??????????????? mov??? %rsp,%rbp

???f:?? 48 83 e0 f0???????????? and??? $0xfffffffffffffff0,%rax

??13:?? 48 29 c4??????????????? sub??? %rax,%rsp

??16:?? 4c 8d 44 24 0f????????? lea??? 0xf(%rsp),%r8

??1b:?? 49 83 e0 f0???????????? and??? $0xfffffffffffffff0,%r8

????for (i = 0; i < count; i ++) {

??1f:?? 85 f6?????????????????? test?? %esi,%esi

??21:?? 7e 16?????????????????? jle??? 39 <just_copy_half+0x39>

??23:?? 31 c9?????????????????? xor??? %ecx,%ecx

??25:?? 31 d2?????????????????? xor??? %edx,%edx

????????another_array[i] = array[i];

??27:?? 8b 04 97??????????????? mov??? (%rdi,%rdx,4),%eax

??2a:?? 83 c1 01??????????????? add??? $0x1,%ecx

??2d:?? 41 89 04 90???????????? mov??? %eax,(%r8,%rdx,4)

??31:?? 48 83 c2 01???????????? add??? $0x1,%rdx

??35:?? 39 f1?????????????????? cmp??? %esi,%ecx

??37:?? 75 ee?????????????????? jne??? 27 <just_copy_half+0x27>

????}

????return 0;

}

??39:?? c9????????????????????? leaveq

??3a:?? 31 c0?????????????????? xor??? %eax,%eax

??3c:?? c3????????????????????? retq??

??3d:?? 00 00?????????????????? add??? %al,(%rax)

????...

Disassembly of section .exit.text:

0000000000000000 <cleanup_module>:

static int __init kernel_stack_init(void)

{

????int ret = 0;

????int total = 0;

????int array[NUM];

????int i;

????for (i = 0; i < NUM; i ++)

????????array[i] = i;

?????????

????for (i = 0; i < NUM; i ++)

????????total += array[i];

?????????

????just_copy_half(array, NUM/2);

?????????

????printk("Total: %d\n", total);

????return ret;

}

static void __exit kernel_stack_fini(void)

{

????//Do Nothing

????return;

}

???0:?? f3 c3?????????????????? repz retq

Disassembly of section .init.text:

0000000000000000 <init_module>:

???0:?? 53????????????????????? push?? %rbx

???1:?? 31 c0?????????????????? xor??? %eax,%eax

???3:?? 48 81 ec 00 04 00 00??? sub??? $0x400,%rsp

???a:?? 48 89 e7??????????????? mov??? %rsp,%rdi

???d:?? 0f 1f 00??????????????? nopl?? (%rax)

??10:?? 89 04 87??????????????? mov??? %eax,(%rdi,%rax,4)

??13:?? 48 83 c0 01???????????? add??? $0x1,%rax

??17:?? 48 3d 00 01 00 00?????? cmp??? $0x100,%rax

??1d:?? 75 f1?????????????????? jne??? 10 <init_module+0x10>

??1f:?? 31 db?????????????????? xor??? %ebx,%ebx

??21:?? 66 31 c0??????????????? xor??? %ax,%ax

??24:?? 03 1c 87??????????????? add??? (%rdi,%rax,4),%ebx

??27:?? 48 83 c0 01???????????? add??? $0x1,%rax

??2b:?? 48 3d 00 01 00 00?????? cmp??? $0x100,%rax

??31:?? 75 f1?????????????????? jne??? 24 <init_module+0x24>

??33:?? 48 89 e7??????????????? mov??? %rsp,%rdi

??36:?? be 80 00 00 00????????? mov??? $0x80,%esi

??3b:?? e8 00 00 00 00????????? callq? 40 <init_module+0x40>

??40:?? 89 de?????????????????? mov??? %ebx,%esi

??42:?? 48 c7 c7 00 00 00 00??? mov??? $0x0,%rdi

??49:?? 31 c0?????????????????? xor??? %eax,%eax

??4b:?? e8 00 00 00 00????????? callq? 50 <init_module+0x50>

??50:?? 48 81 c4 00 04 00 00??? add??? $0x400,%rsp

??57:?? 31 c0?????????????????? xor??? %eax,%eax

??59:?? 5b????????????????????? pop??? %rbx

??5a:?? c3????????????????????? retq??

上面提到Linux 2.6.x内核在X86 32位架构上可以配置内核栈大小(在进行内核编译时,Kernel hacking下进行配置,默认8K,配置之后对应的宏为CONFIG_4KSTACKS),具体生效代码可以看:
LXR / The Linux Cross Reference

1

2

3

4

5

6

#ifdef CONFIG_4KSTACKS

#define THREAD_ORDER??? 0

#else

#define THREAD_ORDER??? 1

#endif

#define THREAD_SIZE???? (PAGE_SIZE << THREAD_ORDER)

通过CONFIG_4KSTACKS宏来定义THREAD_SIZE大小,但是在2.6.37后的内核代码里都已经找不到这个定义了:
LXR / The Linux Cross Reference
LXR / The Linux Cross Reference
LXR / The Linux Cross Reference
后来一查才知道CONFIG_4KSTACKS已经被移除了:https://lkml.org/lkml/2010/6/29/107,好吧,继续8K内核栈。

文章来源:https://blog.csdn.net/lenky0401/article/details/135567504
本文来自互联网用户投稿,该文观点仅代表作者本人,不代表本站立场。本站仅提供信息存储空间服务,不拥有所有权,不承担相关法律责任。