一、问题背景
【复现概率】10/10
【前提条件】正常测试过程中
【复现步骤】电池温度达到35度
【预期结果】手机正常使用
【实际结果】手机进入dump
二、问题分析
2.1 dmesg_TZ.txt
[ 492.250281][ T1879] [usbpd-pm]: usbpd_pm_fc2_charge_algo: usbpd_pm_fc2_charge_algo: ctrl val: smart chg = 0, night chg = 0, endurance pro = 0
[ 492.250304][ T1879] [usbpd-pm]: usbpd_pm_fc2_charge_algo: vbat:4468,lmt:4470; ibat:2072,lmt:2800; ibus:1102,lmt:1400
[ 492.250736][ T1879] [usbpd-pm]: usbpd_pm_check_cp_enabled: cp charging is 1, ret=0
[ 492.250764][ T1879] [usbpd-pm]: usbpd_pm_fc2_charge_algo: charge pump taper charging done
[ 492.250796][ T1879] [usbpd-pm]: usbpd_pm_sm: move to PD_PM_STATE_FC2_EXIT
[ 492.250808][ T1879] Unexpected kernel BRK exception at EL1
[ 492.250819][ T1879] Internal error: BRK handler: 00000000f2005512 [#1] PREEMPT SMP
[ 492.254096][ T1879] CPU: 0 PID: 1879 Comm: kworker/0:13 Tainted: G C OE 6.1.118-android14-11-g06896949dc87-ab13257279 #1
[ 492.254116][ T1879] Hardware name: Qualcomm Technologies, Inc. Spring QRD (DT)
[ 492.254128][ T1879] Workqueue: events usbpd_pm_workfunc [pd_policy_manager]
[ 492.254189][ T1879] pstate: 60400005 (nZCv daif +PAN -UAO -TCO -DIT -SSBS BTYPE=--)
[ 492.254209][ T1879] pc : usbpd_pm_workfunc+0x1058/0x1060 [pd_policy_manager]
[ 492.254253][ T1879] lr : usbpd_pm_workfunc+0x1058/0x1060 [pd_policy_manager]
[ 492.254292][ T1879] sp : ffffffc01402bd30
[ 492.254300][ T1879] x29: ffffffc01402bd40 x28: ffffffc0024a0000 x27: ffffffc0024a0000
[ 492.254331][ T1879] x26: 0000006e978a79c5 x25: 0000000000000000 x24: 0000000000000001
[ 492.254356][ T1879] x23: 0000000000000001 x22: 0000000000000000 x21: 000000729bdb9809
[ 492.254380][ T1879] x20: ffffffc0024a0000 x19: ffffff805fb34070 x18: ffffffc0132cd020
[ 492.254407][ T1879] x17: 0000000000000015 x16: ffffffffffffffff x15: 0000000000000004
[ 492.254433][ T1879] x14: ffffff80fae10000 x13: 000000000000ffff x12: 0000000000000003
[ 492.254460][ T1879] x11: 00000000fffeffff x10: c0000000fffeffff x9 : 714a26d5a1c0e600
[ 492.254486][ T1879] x8 : 714a26d5a1c0e600 x7 : 70627375203a5d6d x6 : 702d64706273755b
[ 492.254511][ T1879] x5 : ffffffc00a1e988d x4 : ffffff80fb5128d5 x3 : 0000000000000000
[ 492.254536][ T1879] x2 : 0000000000000000 x1 : ffffffc01402baf0 x0 : 0000000000000035
[ 492.254563][ T1879] Call trace:
[ 492.254570][ T1879] usbpd_pm_workfunc+0x1058/0x1060 [pd_policy_manager]
[ 492.254610][ T1879] process_one_work+0x1e4/0x43c
[ 492.254639][ T1879] worker_thread+0x25c/0x430
[ 492.254659][ T1879] kthread+0x104/0x1d4
[ 492.254675][ T1879] ret_from_fork+0x10/0x20
[ 492.254705][ T1879] Code: 9104fc00 f0000001 91021821 95ad2ff4 (d42aa240)
[ 492.254729][ T1879] ---[ end trace 0000000000000000 ]---
关键点:
- 问题模块:pd_policy_manager
- 问题代码行:usbpd_pm_workfunc+0x1058
- 问题cpu: CPU: 0
2.2 trace32分析
通过work获取struct usbpd_pm chip的地址,得到结构体
v.v &((struct usbpd_pm *)0x0)->pm_work
得到偏移量:0x70,那么地址为0xFFFFFF805FB34070-0x70 = 0xFFFFFF805FB34000
查看出问题的地方usbpd_pm_workfunc+0x1058
函数usbpd_pm_workfunc的地址为:0xFFFFFFC00249A374
则问题行为:0xFFFFFFC00249A374 + 0x1058 = 0xFFFFFFC00249B3CC
我们可以看到并无什么可突破的点,目前我们已知问题出现在函数usbpd_pm_sm
,并且知道它的唯一参数struct usbpd_pm chip
的地址,所以我们可以从这儿一行一行的去根据chip的成员变量来确定代码的走向。
usbpd_pm_sm
函数是根据pdpm->state
成员进行switch操作
而当前pdpm->state = PD_PM_STATE_FC2_TUNE
走到这个case后,有如图的函数逻辑,省略一系列的代码,我们从usbpd_pm_fc2_charge_algo
开始,这个函数相当长,我们不一行一行check了,我们从打印的日志来定位问题。
从dmesg_TZ.txt中出现异常KE前夕的打印日志如下:
[ 492.250281][ T1879] [usbpd-pm]: usbpd_pm_fc2_charge_algo: usbpd_pm_fc2_charge_algo: ctrl val: smart chg = 0, night chg = 0, endurance pro = 0
[ 492.250304][ T1879] [usbpd-pm]: usbpd_pm_fc2_charge_algo: vbat:4468,lmt:4470; ibat:2072,lmt:2800; ibus:1102,lmt:1400
[ 492.250736][ T1879] [usbpd-pm]: usbpd_pm_check_cp_enabled: cp charging is 1, ret=0
[ 492.250764][ T1879] [usbpd-pm]: usbpd_pm_fc2_charge_algo: charge pump taper charging done
[ 492.250796][ T1879] [usbpd-pm]: usbpd_pm_sm: move to PD_PM_STATE_FC2_EXIT
[ 492.250808][ T1879] Unexpected kernel BRK exception at EL1
第四行日志对应:
所以函数usbpd_pm_fc2_charge_algo
返回值为PM_ALGO_RET_TAPER_DONE
usbpd_pm_sm
会走到如下的逻辑
也对应日志中的
[ 492.250796][ T1879] [usbpd-pm]: usbpd_pm_sm: move to PD_PM_STATE_FC2_EXIT
继续执行usbpd_pm_move_state
函数,参数为PD_PM_STATE_FC2_EXIT
,关于这个类型的定义如下:
我们看一下usbpd_pm_move_state
函数
走到这里,kernel发生了panic,我们也能够很清晰的看到问题点了。
state = PD_PM_STATE_FC2_EXIT = 7
pdpm->state = PD_PM_STATE_FC2_TUNE = 5
而pm_str数组只有7个成员,最大数组下标为6,所以出现了数组越界
三、解决方案
pm_str数组增加PD_PM_STATE_FC2_HOLD