0. 前言

内核稳定性问题复杂多样,最常见的莫过于“kernel panic”,意为“内核恐慌,不知所措”。这种情况下系统自然无法正常运转,只能自我结束生命,留下死亡信息。
诸如:

“Unable to handle kernel XXX at virtual address XXX”
“undefined instruction XXX”
“Bad mode in Error handler detected on CPUX, code 0xbe000011 -- SError”
......

这些死亡信息是系统在什么状态下产生?如何产生?以及如何处理?

本文主要就是从这三个方面介绍,在看本章前,请确保已经看完aarch64异常模型以及Linux arm64中断处理

1. 异常处理流程

本节案例参考[Android稳定性] 第015篇 [问题篇] Unable to handle kernel NULL pointer dereference的这个异常。

panic的异常如下:

[    9.188060][  T175] Unable to handle kernel NULL pointer dereference at virtual address 0000000000000102
[    9.188065][  T175] Mem abort info:
[    9.188067][  T175]   ESR = 0x0000000096000005
[    9.188069][  T175]   EC = 0x25: DABT (current EL), IL = 32 bits
[    9.188072][  T175]   SET = 0, FnV = 0
[    9.188074][  T175]   EA = 0, S1PTW = 0
[    9.188075][  T175]   FSC = 0x05: level 1 translation fault
[    9.188078][  T175] Data abort info:
[    9.188079][  T175]   ISV = 0, ISS = 0x00000005
[    9.188080][  T175]   CM = 0, WnR = 0
[    9.188083][  T175] user pgtable: 4k pages, 39-bit VAs, pgdp=00000000c850e000
[    9.188086][  T175] [0000000000000102] pgd=0000000000000000, p4d=0000000000000000, pud=0000000000000000
[    9.188095][  T175] Internal error: Oops: 0000000096000005 [#1] PREEMPT SMP
[    9.188188][  T175] Dumping ftrace buffer:
[    9.188199][  T175]    (ftrace buffer empty)

[    9.188845][  T175] Hardware name: Qualcomm Technologies, Inc. Spring QRD (DT)
[    9.188849][  T175] Workqueue: events power_supply_changed_work
[    9.188863][  T175] pstate: 604000c5 (nZCv daIF +PAN -UAO -TCO -DIT -SSBS BTYPE=--)
[    9.188868][  T175] pc : __queue_work+0x28/0x550
[    9.188876][  T175] lr : queue_work_on+0x3c/0x80
[    9.188880][  T175] sp : ffffffc00b473ca0
[    9.188882][  T175] x29: ffffffc00b473ca0 x28: ffffff804531dbc8 x27: ffffff82f2740fa8
[    9.188890][  T175] x26: ffffff800b791f10 x25: 0000000000000000 x24: 0000000000000007
[    9.188896][  T175] x23: 0000000000000000 x22: 0000000000000001 x21: 0000000000000000
[    9.188902][  T175] x20: 0000000000000000 x19: ffffff806d0f9148 x18: ffffffc00ac0d040
[    9.188908][  T175] x17: 000000002a4cec24 x16: 000000002a4cec24 x15: 0000000000000046
[    9.188914][  T175] x14: 0000000000000000 x13: 0000000000000ef0 x12: 0000000000000002
[    9.188920][  T175] x11: 0000000000000000 x10: ffffffffffffd240 x9 : 000000000000001b
[    9.188926][  T175] x8 : 0000000000000001 x7 : ffffff806baa9380 x6 : 000000161b03f216
[    9.188932][  T175] x5 : 1672031b16000000 x4 : 0080000000000000 x3 : 1b430b9338000000
[    9.188939][  T175] x2 : ffffff806d0f9148 x1 : 0000000000000000 x0 : 0000000000000020
[    9.188946][  T175] Call trace:
[    9.188948][  T175]  __queue_work+0x28/0x550
[    9.188953][  T175]  queue_work_on+0x3c/0x80
[    9.188957][  T175]  fts_power_usb_notifier_callback+0x2c/0x40 [focaltech_spi]
[    9.189037][  T175]  blocking_notifier_call_chain+0x70/0xbc
[    9.189047][  T175]  power_supply_changed_work+0x7c/0xc8
[    9.189054][  T175]  process_one_work+0x1e4/0x43c
[    9.189060][  T175]  worker_thread+0x25c/0x430
[    9.189065][  T175]  kthread+0x104/0x1d4
[    9.189069][  T175]  ret_from_fork+0x10/0x20
[    9.189079][  T175] Code: a9054ff4 910003fd aa0203f3 aa0103f7 (39440828) 

恐慌msg为:Unable to handle kernel NULL pointer dereference at virtual address 0000000000000102
下面我们来介绍这条语句的由来!

aarch64异常模型以及Linux arm64中断处理我们应该会知道有一个寄存器是用来存储异常类型的,也就是ESR寄存器(Exception Syndrome Register)。
从上面的log中我们可以知道这个异常出现时ESR寄存器的值为:0x0000000096000005

1.1 ESR寄存器的字段定义

本章截图来自于armv8-a的官方手册

我们需要关注该寄存器的 EC,bits[31:26]以及ISS,bits[24:0],下面是官方对此字段的介绍

针对本文的案例中提到的ESR寄存器值:0x0000000096000005,EC取[31:26]

对应的EC==0b100101,对这种类型官方文档有如下的解释:

对应的ISS==0b101

基本初步断定这个异常为Data abort

针对EC==0b100101的ISS字段的解释:

BIT[5:0] DFSC(Data Fault Status Code)解释了data abort发生的状态信息:

由此我们知道了如下的信息:

  1. 此异常为 0b100101 对应的为 Data Abort taken without a change in Exception level
  2. 发生的状态信息 0b000101 对应的为 Translation fault, level 1.

也就对应着log中的这部分的解释

[    9.188065][  T175] Mem abort info:
[    9.188067][  T175]   ESR = 0x0000000096000005
[    9.188069][  T175]   EC = 0x25: DABT (current EL), IL = 32 bits
[    9.188072][  T175]   SET = 0, FnV = 0
[    9.188074][  T175]   EA = 0, S1PTW = 0
[    9.188075][  T175]   FSC = 0x05: level 1 translation fault
[    9.188078][  T175] Data abort info:
[    9.188079][  T175]   ISV = 0, ISS = 0x00000005
[    9.188080][  T175]   CM = 0, WnR = 0

1.2 异常入口

每个异常都有特定的异常级别。异常所对应的异常级别是由软件编程决定,或者由异常自身性质决定的。在任何情况下,异常执行时都不会移至较低的异常级别。异常入口的基本执行内容是:

  • 处理器状态保存到目标异常级别的SPSR_ELx中。
  • 返回地址保存到目标异常级别的ELR_ELx中。
  • 如果异常是同步异常或SError中断,异常的表征信息将保存在目标异常级别的ESR_ELx中。
  • 如果是指令止异常(Instruction Abort exception),数据中止异常(Data Abort exception,),PC对齐错误异常(PC alignment fault exception),故障的虚拟地址将保存在FAR_ELx中。
  • 堆栈指针保存到目标异常级别的专用堆栈指针寄存器SP_ELx。
  • 执行移至目标异常级别,并从异常向量定义的地址开始执行。

1.3 异常向量表

SYM_CODE_START(vectors)
	// vectors就是异常向量表
	kernel_ventry	1, t, 64, sync		// Synchronous EL1t
	kernel_ventry	1, t, 64, irq		// IRQ EL1t
	kernel_ventry	1, t, 64, fiq		// FIQ EL1h
	kernel_ventry	1, t, 64, error		// Error EL1t

	///linux异常向量入口,这里是同步异常,kernel_ventry宏展开为el1h_64_sync
	kernel_ventry	1, h, 64, sync		// Synchronous EL1h
	kernel_ventry	1, h, 64, irq		// IRQ EL1h
	kernel_ventry	1, h, 64, fiq		// FIQ EL1h
	kernel_ventry	1, h, 64, error		// Error EL1h

	///aarch64 异常向量入口,kernel_ventry宏展开为el0t_64_sync
	kernel_ventry	0, t, 64, sync		// Synchronous 64-bit EL0
	kernel_ventry	0, t, 64, irq		// IRQ 64-bit EL0
	kernel_ventry	0, t, 64, fiq		// FIQ 64-bit EL0
	kernel_ventry	0, t, 64, error		// Error 64-bit EL0

	///aarch32 异常向量入口
	kernel_ventry	0, t, 32, sync		// Synchronous 32-bit EL0
	kernel_ventry	0, t, 32, irq		// IRQ 32-bit EL0
	kernel_ventry	0, t, 32, fiq		// FIQ 32-bit EL0
	kernel_ventry	0, t, 32, error		// Error 32-bit EL0
SYM_CODE_END(vectors)

用另外一张表可以更好理解这个异常向量表的入口

而在本案例中出现的Data abort异常对应的入口地址就是 0x200
最终会执行相应的异常处理函数:el1h_64_sync_handler (调用过程中出现的macro解释见aarch64异常模型以及Linux arm64中断处理第3.1章节)

1.4 el1h_64_sync_handler

asmlinkage void noinstr el1h_64_sync_handler(struct pt_regs *regs)
{
	unsigned long esr = read_sysreg(esr_el1);

//printk("---esr:0x%x, at line-%d\n",ESR_ELx_EC(esr),__LINE__);
	switch (ESR_ELx_EC(esr)) {   ///读取esr_el1的EC域,判断异常类型
	case ESR_ELx_EC_DABT_CUR:  ///0x25,表示来自当前的异常等级的数据异常
	case ESR_ELx_EC_IABT_CUR:
		el1_abort(regs, esr);  ///数据异常入口
		break;
	/*
	 * We don't handle ESR_ELx_EC_SP_ALIGN, since we will have hit a
	 * recursive exception when trying to push the initial pt_regs.
	 */
	case ESR_ELx_EC_PC_ALIGN:
		el1_pc(regs, esr);
		break;
	case ESR_ELx_EC_SYS64:
	case ESR_ELx_EC_UNKNOWN:
		el1_undef(regs);
		break;
	case ESR_ELx_EC_BREAKPT_CUR:
	case ESR_ELx_EC_SOFTSTP_CUR:
	case ESR_ELx_EC_WATCHPT_CUR:
	case ESR_ELx_EC_BRK64:
		el1_dbg(regs, esr);
		break;
	case ESR_ELx_EC_FPAC:
		el1_fpac(regs, esr);
		break;
	default:
		__panic_unhandled(regs, "64-bit el1h sync", esr);
	}
}

EC==0b100101也就是 0x25,对应的宏就是 ESR_ELx_EC_DABT_CUR,故函数进入el1_abort

1.5 el1_abort

static void noinstr el1_abort(struct pt_regs *regs, unsigned long esr)
{
	unsigned long far = read_sysreg(far_el1); ///从far_el1读取出现异常的虚拟地址

	enter_from_kernel_mode(regs);
	local_daif_inherit(regs);     
	do_mem_abort(far, esr, regs); ///异常处理函数
	local_daif_mask();            
	exit_to_kernel_mode(regs);
}

1.6 do_mem_abort

static inline const struct fault_info *esr_to_fault_info(unsigned int esr)
{
	return fault_info + (esr & ESR_ELx_FSC); ///根据ESR_ELx_FSC字段,选择处理函数
}

/************************************************************************************
 * 缺页中断处理函数
 * 参数:
 * far:  出错虚拟地址
 * esr:  ESR_EL1值
 * regs: 异常发生时的堆栈指针
 ************************************************************************************/
void do_mem_abort(unsigned long far, unsigned int esr, struct pt_regs *regs)
{
	const struct fault_info *inf = esr_to_fault_info(esr);  ///根据DFSC字段值,查询fault_info表,获取相应的处理函数
	unsigned long addr = untagged_addr(far);

	if (!inf->fn(far, esr, regs))  ///执行esr_to_fault_info获取的函数
		return;

	if (!user_mode(regs)) {
		pr_alert("Unhandled fault at 0x%016lx\n", addr);
		mem_abort_decode(esr);
		show_pte(addr);
	}

	/*
	 * At this point we have an unrecognized fault type whose tag bits may
	 * have been defined as UNKNOWN. Therefore we only expose the untagged
	 * address to the signal handler.
	 */
	arm64_notify_die(inf->name, regs, inf->sig, inf->code, addr, esr);  ///如果没找到相应处理函数,打印出错信息
}

fault_info定义如下:

static const struct fault_info fault_info[] = {
	{ do_bad,		SIGKILL, SI_KERNEL,	"ttbr address size fault"	},
	{ do_bad,		SIGKILL, SI_KERNEL,	"level 1 address size fault"	},
	{ do_bad,		SIGKILL, SI_KERNEL,	"level 2 address size fault"	},
	{ do_bad,		SIGKILL, SI_KERNEL,	"level 3 address size fault"	},
	{ do_translation_fault,	SIGSEGV, SEGV_MAPERR,	"level 0 translation fault"	},
	{ do_translation_fault,	SIGSEGV, SEGV_MAPERR,	"level 1 translation fault"	}, // 对应的0x5的行
	{ do_translation_fault,	SIGSEGV, SEGV_MAPERR,	"level 2 translation fault"	},
	{ do_translation_fault,	SIGSEGV, SEGV_MAPERR,	"level 3 translation fault"	},
	{ do_bad,		SIGKILL, SI_KERNEL,	"unknown 8"			},
	{ do_page_fault,	SIGSEGV, SEGV_ACCERR,	"level 1 access flag fault"	},
	{ do_page_fault,	SIGSEGV, SEGV_ACCERR,	"level 2 access flag fault"	},
    //...
}

ESR_EL1.FSC == 0x5 所以对应的为

{ do_translation_fault,	SIGSEGV, SEGV_MAPERR,	"level 1 translation fault"	}

故而函数走到do_translation_fault执行

1.7 do_translation_fault

static int __kprobes do_translation_fault(unsigned long far,
					  unsigned int esr,
					  struct pt_regs *regs)
{
	unsigned long addr = untagged_addr(far); //在ARMv8中,地址可能包含标签(tag bits),用于内存 tagging 扩展(MTE)。这里去除标签以获取实际的虚拟地址

	if (is_ttbr0_addr(addr)) //在ARMv8中,TTBR0 用于用户空间的地址转换,TTBR1 用于内核空间的地址转换
		return do_page_fault(far, esr, regs); 

	do_bad_area(far, esr, regs);
	return 0;
}

1.8 do_bad_area

static void do_bad_area(unsigned long far, unsigned int esr,
			struct pt_regs *regs)
{
	unsigned long addr = untagged_addr(far);

	/*
	 * If we are in kernel mode at this point, we have no context to
	 * handle this fault with.
	 */
	if (user_mode(regs)) { // 用户模式
		const struct fault_info *inf = esr_to_fault_info(esr);  // 获取fault_info

		set_thread_esr(addr, esr);  // 将异常信息保存到当前线程的上下文中
		arm64_force_sig_fault(inf->sig, inf->code, far, inf->name); //向用户进程发送信号
	} else {
		__do_kernel_fault(addr, esr, regs);  //kernel异常走到这里
	}
}

1.9 __do_kernel_fault

static void __do_kernel_fault(unsigned long addr, unsigned int esr,
			      struct pt_regs *regs)
{
	const char *msg;

	/*
	 * Are we prepared to handle this kernel fault?
	 * We are almost certainly not prepared to handle instruction faults.
	 */
	 ///指令异常,搜索异常表,修复异常
	if (!is_el1_instruction_abort(esr) && fixup_exception(regs))
		return;
    
    //虚假的 Translation Fault 可能是由于硬件或软件问题导致的,不需要进一步处理
	if (WARN_RATELIMIT(is_spurious_el1_translation_fault(addr, esr, regs),
	    "Ignoring spurious kernel translation fault at virtual address %016lx\n", addr))
		return;

    // 如果是 MTE(Memory Tagging Extension)同步标签检查错误,调用 do_tag_recovery 进行标签恢复。
	if (is_el1_mte_sync_tag_check_fault(esr)) {
		do_tag_recovery(addr, esr, regs);

		return;
	}

    // 如果是权限错误(如写只读内存、执行非可执行内存、读不可读内存),根据 esr 的值设置错误信息 msg。
	if (is_el1_permission_fault(addr, esr, regs)) {
		if (esr & ESR_ELx_WNR)
			msg = "write to read-only memory";
		else if (is_el1_instruction_abort(esr))
			msg = "execute from non-executable memory";
		else
			msg = "read from unreadable memory";
    //如果地址小于 PAGE_SIZE,表示空指针解引用,设置错误信息
	} else if (addr < PAGE_SIZE) {
		msg = "NULL pointer dereference";
	} else {
		if (kfence_handle_page_fault(addr, esr & ESR_ELx_WNR, regs))
			return;

		msg = "paging request";
	}

	die_kernel_fault(msg, addr, esr, regs);
}

1.10 die_kernel_fault

这个函数就对应日志中的报错信息的打印,下面逐行解释

static void die_kernel_fault(const char *msg, unsigned long addr,
			     unsigned int esr, struct pt_regs *regs)
{
	bust_spinlocks(1); //停止内核中的自旋锁保护机制,以便可以安全地输出调试信息
// -----------------------------------------------------------------------------------------------------------//
// [    9.188060][  T175] Unable to handle kernel NULL pointer dereference at virtual address 0000000000000102
	pr_alert("Unable to handle kernel %s at virtual address %016lx\n", msg,
		 addr);         // 对应报错日志的错误
// -----------------------------------------------------------------------------------------------------------//

// -----------------------------------------------------------------------------------------------------------//
// [    9.188065][  T175] Mem abort info:
// [    9.188067][  T175]   ESR = 0x0000000096000005
// [    9.188069][  T175]   EC = 0x25: DABT (current EL), IL = 32 bits
// [    9.188072][  T175]   SET = 0, FnV = 0
// [    9.188074][  T175]   EA = 0, S1PTW = 0
// [    9.188075][  T175]   FSC = 0x05: level 1 translation fault
// [    9.188078][  T175] Data abort info:
// [    9.188079][  T175]   ISV = 0, ISS = 0x00000005
// [    9.188080][  T175]   CM = 0, WnR = 0
	mem_abort_decode(esr);   // 解码ESR寄存器
// -----------------------------------------------------------------------------------------------------------//

// -----------------------------------------------------------------------------------------------------------//
// [    9.188083][  T175] user pgtable: 4k pages, 39-bit VAs, pgdp=00000000c850e000
// [    9.188086][  T175] [0000000000000102] pgd=0000000000000000, p4d=0000000000000000, pud=0000000000000000
	show_pte(addr); 显示与出错地址 addr 相关的pte
// -----------------------------------------------------------------------------------------------------------//

	die("Oops", regs, esr);
	bust_spinlocks(0);
	do_exit(SIGKILL);
}

2. die函数

die函数最终可能会调用到panic。但die函数也不是一定会走到panic,它先是走oops流程告警系统现在的异常,如果异常发生在中断上下文,走panic。或者如果设定了CONFIG_PANIC_ON_OOPS_VALUE=y,无论是否在中断上下文均走panic。

void die(const char *str, struct pt_regs *regs, int err)
{
	int ret;
	unsigned long flags;

	raw_spin_lock_irqsave(&die_lock, flags);

	oops_enter();

	console_verbose();
	bust_spinlocks(1);
	ret = __die(str, err, regs);

	if (regs && kexec_should_crash(current))
		crash_kexec(regs);

	bust_spinlocks(0);
	add_taint(TAINT_DIE, LOCKDEP_NOW_UNRELIABLE);
	oops_exit();

	if (in_interrupt())
		panic("%s: Fatal exception in interrupt", str);
	if (panic_on_oops)
		panic("%s: Fatal exception", str);

	raw_spin_unlock_irqrestore(&die_lock, flags);

	if (ret != NOTIFY_STOP)
		do_exit(SIGSEGV);
}

2.1 oops_enter

void oops_enter(void)
{
	tracing_off();          // 禁用内核跟踪(ftrace 等工具), 确保内核跟踪工具不会记录错误或不可靠的信息。
	/* can't trust the integrity of the kernel anymore: */
	debug_locks_off();      // 禁用内核的锁依赖检测功能(lockdep)
	do_oops_enter_exit();   // 通知内核的其他子系统(如调试器或监控工具)当前进入了 "Oops" 处理状态

	if (sysctl_oops_all_cpu_backtrace)
		trigger_all_cpu_backtrace();
}

注意:这里有一个比较重要的节点:/proc/sys/kernel/oops_all_cpu_backtrace
oops_all_cpu_backtrace 的作用是:

  • 记录每个 CPU 当前的执行状态(调用栈、寄存器等)。
  • 在多核环境下,这对调试同步问题(如死锁或竞态条件)非常重要。

2.2 console_verbose

在需要时切换控制台到最详细的日志输出模式


static bool printk_console_no_auto_verbose;

void console_verbose(void)
{
	if (console_loglevel && !printk_console_no_auto_verbose)
		console_loglevel = CONSOLE_LOGLEVEL_MOTORMOUTH;
}
EXPORT_SYMBOL_GPL(console_verbose);

module_param_named(console_no_auto_verbose, printk_console_no_auto_verbose, bool, 0644);
MODULE_PARM_DESC(console_no_auto_verbose, "Disable console loglevel raise to highest on oops/panic/etc");

2.3 __die

static int __die(const char *str, int err, struct pt_regs *regs)
{
	static int die_counter;
	int ret;
// -----------------------------------------------------------------------------------------------------------//
// [    9.188095][  T175] Internal error: Oops: 0000000096000005 [#1] PREEMPT SMP
	pr_emerg("Internal error: %s: %x [#%d]" S_PREEMPT S_SMP "\n",
		 str, err, ++die_counter);
// -----------------------------------------------------------------------------------------------------------//

	/* trap and error numbers are mostly meaningless on ARM */
	ret = notify_die(DIE_OOPS, str, regs, err, 0, SIGSEGV);     // call die的内核通知链
	if (ret == NOTIFY_STOP)
		return ret;

	print_modules();        // 打印所有驱动模块的名字
	show_regs(regs);        // 输出寄存器信息

	dump_kernel_instr(KERN_EMERG, regs);

	return ret;
}

show_regs函数有两个函数组成,分别是__show_regs以及dump_backtrace

2.3.1 __show_regs

void __show_regs(struct pt_regs *regs)
{
	int i, top_reg;
	u64 lr, sp;

	if (compat_user_mode(regs)) {
		lr = regs->compat_lr;
		sp = regs->compat_sp;
		top_reg = 12;
	} else {
		lr = regs->regs[30];
		sp = regs->sp;
		top_reg = 29;
	}

	show_regs_print_info(KERN_DEFAULT);
	print_pstate(regs);

// -----------------------------------------------------------------------------------------------------------//
// [    9.188868][  T175] pc : __queue_work+0x28/0x550
// [    9.188876][  T175] lr : queue_work_on+0x3c/0x80
// [    9.188880][  T175] sp : ffffffc00b473ca0
// [    9.188882][  T175] x29: ffffffc00b473ca0 x28: ffffff804531dbc8 x27: ffffff82f2740fa8
// [    9.188890][  T175] x26: ffffff800b791f10 x25: 0000000000000000 x24: 0000000000000007
// [    9.188896][  T175] x23: 0000000000000000 x22: 0000000000000001 x21: 0000000000000000
// [    9.188902][  T175] x20: 0000000000000000 x19: ffffff806d0f9148 x18: ffffffc00ac0d040
// [    9.188908][  T175] x17: 000000002a4cec24 x16: 000000002a4cec24 x15: 0000000000000046
// [    9.188914][  T175] x14: 0000000000000000 x13: 0000000000000ef0 x12: 0000000000000002
// [    9.188920][  T175] x11: 0000000000000000 x10: ffffffffffffd240 x9 : 000000000000001b
// [    9.188926][  T175] x8 : 0000000000000001 x7 : ffffff806baa9380 x6 : 000000161b03f216
// [    9.188932][  T175] x5 : 1672031b16000000 x4 : 0080000000000000 x3 : 1b430b9338000000
// [    9.188939][  T175] x2 : ffffff806d0f9148 x1 : 0000000000000000 x0 : 0000000000000020

	if (!user_mode(regs)) {
		printk("pc : %pS\n", (void *)regs->pc);
		printk("lr : %pS\n", (void *)ptrauth_strip_insn_pac(lr));
	} else {
		printk("pc : %016llx\n", regs->pc);
		printk("lr : %016llx\n", lr);
	}

	printk("sp : %016llx\n", sp);

	if (system_uses_irq_prio_masking())
		printk("pmr_save: %08llx\n", regs->pmr_save);

	i = top_reg;

	while (i >= 0) {
		printk("x%-2d: %016llx", i, regs->regs[i]);

		while (i-- % 3)
			pr_cont(" x%-2d: %016llx", i, regs->regs[i]);

		pr_cont("\n");
	}
}
// -----------------------------------------------------------------------------------------------------------//
  1. show_regs_print_info函数
void show_regs_print_info(const char *log_lvl)
{
	dump_stack_print_info(log_lvl);
}

void dump_stack_print_info(const char *log_lvl)
{
	printk("%sCPU: %d PID: %d Comm: %.20s %s%s %s %.*s" BUILD_ID_FMT "\n",
	       log_lvl, raw_smp_processor_id(), current->pid, current->comm,
	       kexec_crash_loaded() ? "Kdump: loaded " : "",
	       print_tainted(),
	       init_utsname()->release,
	       (int)strcspn(init_utsname()->version, " "),
	       init_utsname()->version, BUILD_ID_VAL);
// -----------------------------------------------------------------------------------------------------------//
// [    9.188845][  T175] Hardware name: Qualcomm Technologies, Inc. Spring QRD (DT)
	if (dump_stack_arch_desc_str[0] != '\0')
		printk("%sHardware name: %s\n",
		       log_lvl, dump_stack_arch_desc_str);
// -----------------------------------------------------------------------------------------------------------//

// -----------------------------------------------------------------------------------------------------------//
// [    9.188849][  T175] Workqueue: events power_supply_changed_work
	print_worker_info(log_lvl, current);
// -----------------------------------------------------------------------------------------------------------//
	print_stop_info(log_lvl, current);
}
  1. print_pstate函数
static void print_pstate(struct pt_regs *regs)
{
	u64 pstate = regs->pstate;

	if (compat_user_mode(regs)) {
		printk("pstate: %08llx (%c%c%c%c %c %s %s %c%c%c %cDIT %cSSBS)\n",
			pstate,
			pstate & PSR_AA32_N_BIT ? 'N' : 'n',
			pstate & PSR_AA32_Z_BIT ? 'Z' : 'z',
			pstate & PSR_AA32_C_BIT ? 'C' : 'c',
			pstate & PSR_AA32_V_BIT ? 'V' : 'v',
			pstate & PSR_AA32_Q_BIT ? 'Q' : 'q',
			pstate & PSR_AA32_T_BIT ? "T32" : "A32",
			pstate & PSR_AA32_E_BIT ? "BE" : "LE",
			pstate & PSR_AA32_A_BIT ? 'A' : 'a',
			pstate & PSR_AA32_I_BIT ? 'I' : 'i',
			pstate & PSR_AA32_F_BIT ? 'F' : 'f',
			pstate & PSR_AA32_DIT_BIT ? '+' : '-',
			pstate & PSR_AA32_SSBS_BIT ? '+' : '-');
	} else {
		const char *btype_str = btypes[(pstate & PSR_BTYPE_MASK) >>
					       PSR_BTYPE_SHIFT];

		printk("pstate: %08llx (%c%c%c%c %c%c%c%c %cPAN %cUAO %cTCO %cDIT %cSSBS BTYPE=%s)\n",
			pstate,
			pstate & PSR_N_BIT ? 'N' : 'n',
			pstate & PSR_Z_BIT ? 'Z' : 'z',
			pstate & PSR_C_BIT ? 'C' : 'c',
			pstate & PSR_V_BIT ? 'V' : 'v',
			pstate & PSR_D_BIT ? 'D' : 'd',
			pstate & PSR_A_BIT ? 'A' : 'a',
			pstate & PSR_I_BIT ? 'I' : 'i',
			pstate & PSR_F_BIT ? 'F' : 'f',
			pstate & PSR_PAN_BIT ? '+' : '-',
			pstate & PSR_UAO_BIT ? '+' : '-',
			pstate & PSR_TCO_BIT ? '+' : '-',
			pstate & PSR_DIT_BIT ? '+' : '-',
			pstate & PSR_SSBS_BIT ? '+' : '-',
			btype_str);
	}
}

对应的日志如下:

[    9.188863][  T175] pstate: 604000c5 (nZCv daIF +PAN -UAO -TCO -DIT -SSBS BTYPE=--)

2.3.2 dump_backtrace

void dump_backtrace(struct pt_regs *regs, struct task_struct *tsk,
		    const char *loglvl)
{
	struct stackframe frame;        // 存储当前栈帧信息,用于遍历调用栈
	int skip = 0;

	pr_debug("%s(regs = %p tsk = %p)\n", __func__, regs, tsk);

	if (regs) {
		if (user_mode(regs))
			return;
		skip = 1;
	}

	if (!tsk)
		tsk = current;

	if (!try_get_task_stack(tsk))
		return;
    
    // 初始化栈帧
    // 如果是当前任务
    // 使用编译器内置函数 __builtin_frame_address(0) 获取当前帧指针。
    // 设置程序计数器为当前函数 dump_backtrace 的地址。
	if (tsk == current) {
		start_backtrace(&frame,
				(unsigned long)__builtin_frame_address(0),
				(unsigned long)dump_backtrace);
    // 其他任务:
    // 使用线程上下文中保存的帧指针(thread_saved_fp)和程序计数器(thread_saved_pc)。
	} else {
		/*
		 * task blocked in __switch_to
		 */
		start_backtrace(&frame,
				thread_saved_fp(tsk),
				thread_saved_pc(tsk));
	}
    // 打印调用栈头信息
// -----------------------------------------------------------------------------------------------------------//
// [    9.188946][  T175] Call trace:
	printk("%sCall trace:\n", loglvl);
// -----------------------------------------------------------------------------------------------------------//
	do {
		/* skip until specified stack frame */
		if (!skip) {
			dump_backtrace_entry(frame.pc, loglvl);
		} else if (frame.fp == regs->regs[29]) {
			skip = 0;
			/*
			 * Mostly, this is the case where this function is
			 * called in panic/abort. As exception handler's
			 * stack frame does not contain the corresponding pc
			 * at which an exception has taken place, use regs->pc
			 * instead.
			 */
            // 核心函数,使用寄存器中保存的程序计数器(regs->pc)记录异常点。
			dump_backtrace_entry(regs->pc, loglvl);
		}
    // 更新栈帧为上一个栈帧
	} while (!unwind_frame(tsk, &frame));

	put_task_stack(tsk); // 释放任务栈
}

这部分就对应着日志中的:

[    9.188946][  T175] Call trace:
[    9.188948][  T175]  __queue_work+0x28/0x550
[    9.188953][  T175]  queue_work_on+0x3c/0x80
[    9.188957][  T175]  fts_power_usb_notifier_callback+0x2c/0x40 [focaltech_spi]
[    9.189037][  T175]  blocking_notifier_call_chain+0x70/0xbc
[    9.189047][  T175]  power_supply_changed_work+0x7c/0xc8
[    9.189054][  T175]  process_one_work+0x1e4/0x43c
[    9.189060][  T175]  worker_thread+0x25c/0x430
[    9.189065][  T175]  kthread+0x104/0x1d4
[    9.189069][  T175]  ret_from_fork+0x10/0x20

dump_backtrace 的核心功能是:

  • 初始化调用栈并遍历每一帧。
  • 打印调用栈的详细信息(地址、寄存器上下文等)。
  • 支持用户提供寄存器上下文(如异常发生时)或指定任务的调用栈。
  • 处理异常情况(如跳过异常处理器帧)以精确记录调用栈。

3. panic函数

void die(const char *str, struct pt_regs *regs, int err)
{
    //...
	if (panic_on_oops)
		panic("%s: Fatal exception", str);
    //...
}

这里涉及了一个内核参数的节点:/proc/sys/kernel/panic_on_oops
只有当此参数设置为1是 oops的报错才会触发panic的流程!!!1
而在android项目中,会在init.rc中设置此参数

on init 
//...
    write /proc/sys/kernel/panic_on_oops 1

panic流程本章节不再介绍,后续在整理panic流程时,会有相关文章!