AI智能摘要
在测试版本V816.0.24.8.26.UGUCNXM的稳定版挂测中,出现了大量的空指针引用错误。通过离线解析工具分析dump文件,发现问题的核心在于对NULL指针的引用。具体表现为在`mutex_lock`函数中尝试对一个来自`iocb->ki_filp->private_data`的NULL变量加锁,而这个变量是从`struct file`结构体中获取的。进一步检查发现,这与`/proc/hwinfo`节点有关,当尝试读取这个节点时,会导致手机死机。此节点是早期指纹需求所创建,目前已无实际用途,因此解决方案建议移除该节点。
此摘要由AI分析文章内容生成,仅供参考。
一、问题背景
https://wayawbott0.f.mioffice.cn/sheets/shtk4qr1GSkUjvozmsj0OWi0tGe
测试版本:V816.0.24.8.26.UGUCNXM
稳定版挂测MTBF报出大量的空指针引用的报错
二、问题分析
2.1 dump解析
使用离线解析工具linux ramdump parser解析dump,打开dmesg_tz.txt
[51222.768793][T13540] Unable to handle kernel NULL pointer dereference at virtual address 0000000000000038
[51222.768825][T13540] Mem abort info:
[51222.768836][T13540] ESR = 0x96000007
[51222.768848][T13540] EC = 0x25: DABT (current EL), IL = 32 bits
[51222.768858][T13540] SET = 0, FnV = 0
[51222.768868][T13540] EA = 0, S1PTW = 0
[51222.768877][T13540] Data abort info:
[51222.768887][T13540] ISV = 0, ISS = 0x00000007
[51222.768896][T13540] CM = 0, WnR = 0
[51222.768909][T13540] user pgtable: 4k pages, 39-bit VAs, pgdp=00000000bd874000
[51222.768919][T13540] [0000000000000038] pgd=00000000e7355003, p4d=00000000e7355003, pud=00000000e7355003, pmd=000000084d2de003, pte=0000000000000000
[51222.768955][T13540] Internal error: Oops: 96000007 [#1] PREEMPT SMP
[51222.768996][T13540] Skip md ftrace buffer dump for: 0x1609e0
//...
[51222.770472][T13540] CPU: 1 PID: 13540 Comm: pool-10-thread- Tainted: G WC O 5.10.198-android12-9-00085-g226a9632f13d-ab11136126 #1
[51222.770483][T13540] Hardware name: Qualcomm Technologies, Inc. Flame QRD (DT)
[51222.770498][T13540] pstate: 00400005 (nzcv daif +PAN -UAO -TCO BTYPE=--)
[51222.770518][T13540] pc : mutex_lock+0x34/0x184
[51222.770535][T13540] lr : seq_read_iter+0x4c/0x640
[51222.770545][T13540] sp : ffffffc031873bb0
[51222.770555][T13540] x29: ffffffc031873bc0 x28: ffffff883b6d5c80
[51222.770572][T13540] x27: 0000000000000000 x26: 0000000000000000
[51222.770589][T13540] x25: 0000000000000000 x24: ffffff884e36b478
[51222.770606][T13540] x23: ffffffc031873c50 x22: 0000000000000400
[51222.770622][T13540] x21: ffffffc031873c78 x20: 0000000000000000
[51222.770638][T13540] x19: 0000000000000038 x18: ffffffc01b6ad050
[51222.770654][T13540] x17: 0000000000000000 x16: 0000000000000000
[51222.770670][T13540] x15: 0000000000000000 x14: 0000000000000008
[51222.770686][T13540] x13: ffffffc031873ca8 x12: 0000000000000004
[51222.770703][T13540] x11: ffffff883b6d5c80 x10: 0000000000000000
[51222.770719][T13540] x9 : 0000000000000000 x8 : 0000000000000038
[51222.770735][T13540] x7 : 0000000000000000 x6 : 0000000000000000
[51222.770751][T13540] x5 : ffffff805abde818 x4 : 0000000000000000
[51222.770768][T13540] x3 : ffffffc031873de0 x2 : ffffff883b6d5c80
[51222.770784][T13540] x1 : 0000000000000000 x0 : 0000000000000038
[51222.770801][T13540] Call trace:
[51222.770815][T13540] mutex_lock+0x34/0x184
[51222.770828][T13540] seq_read_iter+0x4c/0x640
[51222.770841][T13540] seq_read+0xfc/0x134
[51222.770856][T13540] proc_reg_read+0x104/0x1fc
[51222.770871][T13540] vfs_read+0xf4/0x368
[51222.770884][T13540] ksys_read+0x7c/0xf0
[51222.770897][T13540] __arm64_sys_read+0x20/0x30
[51222.770911][T13540] el0_svc_common+0xd4/0x270
[51222.770926][T13540] el0_svc+0x28/0x98
[51222.770939][T13540] el0_sync_handler+0x8c/0xf0
[51222.770952][T13540] el0_sync+0x1b8/0x1c0
[51222.770968][T13540] Code: d503201f aa0803e0 aa1f03e1 aa0103e9 (c8e97d02)
[51222.770982][T13540] ---[ end trace a7da2251c6cbb391 ]---
2.2 trace32恢复现场
r.s pc mutex_lock+0x34
r.s lr seq_read_iter+0x4c
r.s x30 0xffffffc031873bb0
r.s x29 0xffffffc031873bc0
r.s x28 0xffffff883b6d5c80
r.s x27 0x0000000000000000
r.s x26 0x0000000000000000
r.s x25 0x0000000000000000
r.s x24 0xffffff884e36b478
r.s x23 0xffffffc031873c50
r.s x22 0x0000000000000400
r.s x21 0xffffffc031873c78
r.s x20 0x0000000000000000
r.s x19 0x0000000000000038
r.s x18 0xffffffc01b6ad050
r.s x17 0x0000000000000000
r.s x16 0x0000000000000000
r.s x15 0x0000000000000000
r.s x14 0x0000000000000008
r.s x13 0xffffffc031873ca8
r.s x12 0x0000000000000004
r.s x11 0xffffff883b6d5c80
r.s x10 0x0000000000000000
r.s x9 0x0000000000000000
r.s x8 0x0000000000000038
r.s x7 0x0000000000000000
r.s x6 0x0000000000000000
r.s x5 0xffffff805abde818
r.s x4 0x0000000000000000
r.s x3 0xffffffc031873de0
r.s x2 0xffffff883b6d5c80
r.s x1 0x0000000000000000
r.s x0 0x0000000000000038
输入寄存器信息后,打开堆栈,可以检查出出问题的地方PC指针处是 m->lock锁,
而变量是从iocb->ki_filp->private_data而来,而此时该值为NULL。
继续查看堆栈,将PC指针前移,发现此变量是由struct file结构体而来。
查看此地址的file结构体情况
v.v %s %t %o (struct file *)0xFFFFFF884E36B400
得到出现问题的file为hwinfo
2.3 /proc/hwinfo节点
通过在手机里查找,发现了/proc/hwinfo节点,手动cat一下,手机进入死机状态,dump信息与mtbf跑测的dump一致。
2.4 code检查
关于hwinfo节点的创建是由fingerprint模块创建的,查看代码后,此节点只是为了打印一句log,且有的fingerprint驱动注释掉了,有的驱动保留了。
同步验证了出问题的机器都是有此/proc/hwinfo节点,且cat一下均死机,堆栈信息和之前MTBF测试的死机堆栈一致
三、解决方案
此节点为指纹很久以前的需求,在22年确认此需求已不需要,可以将此节点移除。