一、问题背景

【复现概率】10/10

【前提条件】正常测试过程中

【复现步骤】电池温度达到35度

【预期结果】手机正常使用

【实际结果】手机进入dump

二、问题分析

2.1 dmesg_TZ.txt

[  492.250281][ T1879] [usbpd-pm]: usbpd_pm_fc2_charge_algo: usbpd_pm_fc2_charge_algo: ctrl val: smart chg = 0, night chg = 0, endurance pro = 0
[  492.250304][ T1879] [usbpd-pm]: usbpd_pm_fc2_charge_algo: vbat:4468,lmt:4470; ibat:2072,lmt:2800; ibus:1102,lmt:1400
[  492.250736][ T1879] [usbpd-pm]: usbpd_pm_check_cp_enabled: cp charging is 1, ret=0
[  492.250764][ T1879] [usbpd-pm]: usbpd_pm_fc2_charge_algo: charge pump taper charging done
[  492.250796][ T1879] [usbpd-pm]: usbpd_pm_sm: move to PD_PM_STATE_FC2_EXIT
[  492.250808][ T1879] Unexpected kernel BRK exception at EL1
[  492.250819][ T1879] Internal error: BRK handler: 00000000f2005512 [#1] PREEMPT SMP

[  492.254096][ T1879] CPU: 0 PID: 1879 Comm: kworker/0:13 Tainted: G         C OE      6.1.118-android14-11-g06896949dc87-ab13257279 #1
[  492.254116][ T1879] Hardware name: Qualcomm Technologies, Inc. Spring QRD (DT)
[  492.254128][ T1879] Workqueue: events usbpd_pm_workfunc [pd_policy_manager]
[  492.254189][ T1879] pstate: 60400005 (nZCv daif +PAN -UAO -TCO -DIT -SSBS BTYPE=--)
[  492.254209][ T1879] pc : usbpd_pm_workfunc+0x1058/0x1060 [pd_policy_manager]
[  492.254253][ T1879] lr : usbpd_pm_workfunc+0x1058/0x1060 [pd_policy_manager]
[  492.254292][ T1879] sp : ffffffc01402bd30
[  492.254300][ T1879] x29: ffffffc01402bd40 x28: ffffffc0024a0000 x27: ffffffc0024a0000
[  492.254331][ T1879] x26: 0000006e978a79c5 x25: 0000000000000000 x24: 0000000000000001
[  492.254356][ T1879] x23: 0000000000000001 x22: 0000000000000000 x21: 000000729bdb9809
[  492.254380][ T1879] x20: ffffffc0024a0000 x19: ffffff805fb34070 x18: ffffffc0132cd020
[  492.254407][ T1879] x17: 0000000000000015 x16: ffffffffffffffff x15: 0000000000000004
[  492.254433][ T1879] x14: ffffff80fae10000 x13: 000000000000ffff x12: 0000000000000003
[  492.254460][ T1879] x11: 00000000fffeffff x10: c0000000fffeffff x9 : 714a26d5a1c0e600
[  492.254486][ T1879] x8 : 714a26d5a1c0e600 x7 : 70627375203a5d6d x6 : 702d64706273755b
[  492.254511][ T1879] x5 : ffffffc00a1e988d x4 : ffffff80fb5128d5 x3 : 0000000000000000
[  492.254536][ T1879] x2 : 0000000000000000 x1 : ffffffc01402baf0 x0 : 0000000000000035
[  492.254563][ T1879] Call trace:
[  492.254570][ T1879]  usbpd_pm_workfunc+0x1058/0x1060 [pd_policy_manager]
[  492.254610][ T1879]  process_one_work+0x1e4/0x43c
[  492.254639][ T1879]  worker_thread+0x25c/0x430
[  492.254659][ T1879]  kthread+0x104/0x1d4
[  492.254675][ T1879]  ret_from_fork+0x10/0x20
[  492.254705][ T1879] Code: 9104fc00 f0000001 91021821 95ad2ff4 (d42aa240) 
[  492.254729][ T1879] ---[ end trace 0000000000000000 ]---

关键点:

  • 问题模块:pd_policy_manager
  • 问题代码行:usbpd_pm_workfunc+0x1058
  • 问题cpu: CPU: 0

2.2 trace32分析

通过work获取struct usbpd_pm chip的地址,得到结构体

v.v &((struct usbpd_pm *)0x0)->pm_work  

得到偏移量:0x70,那么地址为0xFFFFFF805FB34070-0x70 = 0xFFFFFF805FB34000

查看出问题的地方usbpd_pm_workfunc+0x1058
函数usbpd_pm_workfunc的地址为:0xFFFFFFC00249A374
则问题行为:0xFFFFFFC00249A374 + 0x1058 = 0xFFFFFFC00249B3CC

我们可以看到并无什么可突破的点,目前我们已知问题出现在函数usbpd_pm_sm,并且知道它的唯一参数struct usbpd_pm chip的地址,所以我们可以从这儿一行一行的去根据chip的成员变量来确定代码的走向。


usbpd_pm_sm函数是根据pdpm->state成员进行switch操作

而当前pdpm->state = PD_PM_STATE_FC2_TUNE

走到这个case后,有如图的函数逻辑,省略一系列的代码,我们从usbpd_pm_fc2_charge_algo开始,这个函数相当长,我们不一行一行check了,我们从打印的日志来定位问题。

从dmesg_TZ.txt中出现异常KE前夕的打印日志如下:

[  492.250281][ T1879] [usbpd-pm]: usbpd_pm_fc2_charge_algo: usbpd_pm_fc2_charge_algo: ctrl val: smart chg = 0, night chg = 0, endurance pro = 0
[  492.250304][ T1879] [usbpd-pm]: usbpd_pm_fc2_charge_algo: vbat:4468,lmt:4470; ibat:2072,lmt:2800; ibus:1102,lmt:1400
[  492.250736][ T1879] [usbpd-pm]: usbpd_pm_check_cp_enabled: cp charging is 1, ret=0
[  492.250764][ T1879] [usbpd-pm]: usbpd_pm_fc2_charge_algo: charge pump taper charging done
[  492.250796][ T1879] [usbpd-pm]: usbpd_pm_sm: move to PD_PM_STATE_FC2_EXIT
[  492.250808][ T1879] Unexpected kernel BRK exception at EL1

第四行日志对应:

所以函数usbpd_pm_fc2_charge_algo返回值为PM_ALGO_RET_TAPER_DONE

usbpd_pm_sm会走到如下的逻辑

也对应日志中的

[  492.250796][ T1879] [usbpd-pm]: usbpd_pm_sm: move to PD_PM_STATE_FC2_EXIT

继续执行usbpd_pm_move_state函数,参数为PD_PM_STATE_FC2_EXIT,关于这个类型的定义如下:

我们看一下usbpd_pm_move_state函数

走到这里,kernel发生了panic,我们也能够很清晰的看到问题点了。

state = PD_PM_STATE_FC2_EXIT = 7
pdpm->state = PD_PM_STATE_FC2_TUNE = 5

而pm_str数组只有7个成员,最大数组下标为6,所以出现了数组越界

三、解决方案

pm_str数组增加PD_PM_STATE_FC2_HOLD