AI智能摘要
一台售后机器频繁重启,日志分析定位到kernel在同一代码处异常crash,且product分区未损坏。无论刷super单镜像还是整包软件,问题都复现,确认是存储损坏(storage corruption)导致。后续将通过UFS交叉验证和检测,进一步排查硬件问题,以寻找更深层次故障原因。
此摘要由AI分析文章内容生成,仅供参考。
问题背景
售后返回一台机器频繁出现重启,经过初步断定,手机kernel出现异常crash,由于没有刷fulldump dp,所以才会出现反复重启的现象。
问题分析
拿到主板后,通过刷apdp抓取fulldump分析
[ 30.226854][ T580] Unable to handle kernel paging request at virtual address 0000ff0f05628f52
[ 30.227303][ T580] CPU: 7 PID: 580 Comm: kworker/7:3H Tainted: G C OE 6.1.118-android14-11-ga3b9c44908dd-ab13320413 #1
[ 30.227306][ T580] Hardware name: Qualcomm Technologies, Inc. Spring QRD (DT)
[ 30.227308][ T580] Workqueue: kverityd verity_work
[ 30.227318][ T580] pstate: 80400005 (Nzcv daif +PAN -UAO -TCO -DIT -SSBS BTYPE=--)
[ 30.227320][ T580] pc : z_erofs_decompress_queue+0x958/0xcb8
[ 30.227325][ T580] lr : z_erofs_decompress_queue+0x694/0xcb8
[ 30.227328][ T580] sp : ffffffc00f3cb870
[ 30.227328][ T580] x29: ffffffc00f3cba60 x28: ffffff803acee800 x27: ffffffc00f3cba30
[ 30.227331][ T580] x26: 0000000000000000 x25: ffffffc00f3cba30 x24: ffffff80a4792bc0
[ 30.227334][ T580] x23: ffffff80a4792bd0 x22: 0000000000000000 x21: ffffffc00f3cba30
[ 30.227337][ T580] x20: 0000000000fe09fc x19: 00000000000009c4 x18: ffffffc00bfcb060
[ 30.227339][ T580] x17: ffffffc01cd0f02c x16: ffffffc01cd0fffb x15: ffffffc01cd0fff8
[ 30.227341][ T580] x14: ffffffc01cd0f027 x13: 0000000000000000 x12: 0000000000fe09fc
[ 30.227344][ T580] x11: 0000000000001002 x10: 00000000ff01f604 x9 : ff00000000000000
[ 30.227346][ T580] x8 : ff00ff0f05628f52 x7 : 1f0001f9e420717f x6 : 000000000000266f
[ 30.227349][ T580] x5 : ffffffc01cd0f033 x4 : ffffffc01cd10000 x3 : ffffffc01cd0f02e
[ 30.227351][ T580] x2 : 0000000000000001 x1 : 0000000000000000 x0 : ff00ff0f05628f52
[ 30.227354][ T580] Call trace:
[ 30.227355][ T580] z_erofs_decompress_queue+0x958/0xcb8
[ 30.227358][ T580] z_erofs_decompressqueue_work+0x34/0x90
[ 30.227360][ T580] z_erofs_decompress_kickoff+0x120/0x170
[ 30.227362][ T580] z_erofs_submissionqueue_endio+0x13c/0x160
[ 30.227365][ T580] bio_endio+0x1a0/0x1c4
[ 30.227367][ T580] __dm_io_complete+0x224/0x274
[ 30.227371][ T580] clone_endio+0xe0/0x228
[ 30.227373][ T580] bio_endio+0x1a0/0x1c4
[ 30.227374][ T580] verity_work+0x658/0x6a4
[ 30.227375][ T580] process_one_work+0x1e4/0x43c
[ 30.227379][ T580] worker_thread+0x25c/0x430
[ 30.227381][ T580] kthread+0x104/0x1d4
[ 30.227383][ T580] ret_from_fork+0x10/0x20
[ 30.227387][ T580] Code: d5033bbf 6b01001f 54fffdc1 f9400300 (f940001f)
[ 30.227392][ T580] ---[ end trace 0000000000000000 ]---经过多次抓取日志,日志显示的callrace是一样的
问题出现在dm-10,也就是product分区
回读super分区
将super分区回读后,通过lpunpack解包出product,利用sha256sum 检查
刷机包里的super,解包后的product
ubuntu@sh-liuqiN:~/test1/images$ sha256sum product_a.img
f839586ebd9ba23ecfaf728134507e49bc7e95632c4fb4715a10f047334fc9ed product_a.img
回读异常机器的super,解包后的product
ubuntu@sh-liuqiN:~/test1/images$ cd ../readback/
ubuntu@sh-liuqiN:~/test1/readback$ sha256sum product_a.img
f839586ebd9ba23ecfaf728134507e49bc7e95632c4fb4715a10f047334fc9ed product_a.img结论:product分区并未出现损坏
单刷super
异常机器,单独刷基本当前的软件版本中的super镜像,check是否能够开机?
结论:异常,callrace与之前一致
刷整包软件
异常机器,fastboot刷机整包软件,check是否能够开机?
结论:异常,calltrace与之前一致
问题结论
异常机器的每次异常calltrace均死在同一处代码:f940001f
异常机器单数刷super,问题仍然复现
异常机器刷取整包软件,问题仍然复现
可以基本确认,当前出现的是storage corruption
下一步:
硬件通过交叉验证ufs
进行一些ufs的检测