AI智能摘要
一台售后机器频繁重启,日志分析定位到kernel在同一代码处异常crash,且product分区未损坏。无论刷super单镜像还是整包软件,问题都复现,确认是存储损坏(storage corruption)导致。后续将通过UFS交叉验证和检测,进一步排查硬件问题,以寻找更深层次故障原因。
此摘要由AI分析文章内容生成,仅供参考。

问题背景

售后返回一台机器频繁出现重启,经过初步断定,手机kernel出现异常crash,由于没有刷fulldump dp,所以才会出现反复重启的现象。

问题分析

拿到主板后,通过刷apdp抓取fulldump分析

[   30.226854][  T580] Unable to handle kernel paging request at virtual address 0000ff0f05628f52
[   30.227303][  T580] CPU: 7 PID: 580 Comm: kworker/7:3H Tainted: G         C OE      6.1.118-android14-11-ga3b9c44908dd-ab13320413 #1
[   30.227306][  T580] Hardware name: Qualcomm Technologies, Inc. Spring QRD (DT)
[   30.227308][  T580] Workqueue: kverityd verity_work
[   30.227318][  T580] pstate: 80400005 (Nzcv daif +PAN -UAO -TCO -DIT -SSBS BTYPE=--)
[   30.227320][  T580] pc : z_erofs_decompress_queue+0x958/0xcb8
[   30.227325][  T580] lr : z_erofs_decompress_queue+0x694/0xcb8
[   30.227328][  T580] sp : ffffffc00f3cb870
[   30.227328][  T580] x29: ffffffc00f3cba60 x28: ffffff803acee800 x27: ffffffc00f3cba30
[   30.227331][  T580] x26: 0000000000000000 x25: ffffffc00f3cba30 x24: ffffff80a4792bc0
[   30.227334][  T580] x23: ffffff80a4792bd0 x22: 0000000000000000 x21: ffffffc00f3cba30
[   30.227337][  T580] x20: 0000000000fe09fc x19: 00000000000009c4 x18: ffffffc00bfcb060
[   30.227339][  T580] x17: ffffffc01cd0f02c x16: ffffffc01cd0fffb x15: ffffffc01cd0fff8
[   30.227341][  T580] x14: ffffffc01cd0f027 x13: 0000000000000000 x12: 0000000000fe09fc
[   30.227344][  T580] x11: 0000000000001002 x10: 00000000ff01f604 x9 : ff00000000000000
[   30.227346][  T580] x8 : ff00ff0f05628f52 x7 : 1f0001f9e420717f x6 : 000000000000266f
[   30.227349][  T580] x5 : ffffffc01cd0f033 x4 : ffffffc01cd10000 x3 : ffffffc01cd0f02e
[   30.227351][  T580] x2 : 0000000000000001 x1 : 0000000000000000 x0 : ff00ff0f05628f52
[   30.227354][  T580] Call trace:
[   30.227355][  T580]  z_erofs_decompress_queue+0x958/0xcb8
[   30.227358][  T580]  z_erofs_decompressqueue_work+0x34/0x90
[   30.227360][  T580]  z_erofs_decompress_kickoff+0x120/0x170
[   30.227362][  T580]  z_erofs_submissionqueue_endio+0x13c/0x160
[   30.227365][  T580]  bio_endio+0x1a0/0x1c4
[   30.227367][  T580]  __dm_io_complete+0x224/0x274
[   30.227371][  T580]  clone_endio+0xe0/0x228
[   30.227373][  T580]  bio_endio+0x1a0/0x1c4
[   30.227374][  T580]  verity_work+0x658/0x6a4
[   30.227375][  T580]  process_one_work+0x1e4/0x43c
[   30.227379][  T580]  worker_thread+0x25c/0x430
[   30.227381][  T580]  kthread+0x104/0x1d4
[   30.227383][  T580]  ret_from_fork+0x10/0x20
[   30.227387][  T580] Code: d5033bbf 6b01001f 54fffdc1 f9400300 (f940001f) 
[   30.227392][  T580] ---[ end trace 0000000000000000 ]---

经过多次抓取日志,日志显示的callrace是一样的

问题出现在dm-10,也就是product分区

回读super分区

将super分区回读后,通过lpunpack解包出product,利用sha256sum 检查

刷机包里的super,解包后的product
 ubuntu@sh-liuqiN:~/test1/images$ sha256sum product_a.img
 f839586ebd9ba23ecfaf728134507e49bc7e95632c4fb4715a10f047334fc9ed product_a.img
  
 回读异常机器的super,解包后的product
 ubuntu@sh-liuqiN:~/test1/images$ cd ../readback/
 ubuntu@sh-liuqiN:~/test1/readback$ sha256sum product_a.img
 f839586ebd9ba23ecfaf728134507e49bc7e95632c4fb4715a10f047334fc9ed product_a.img

结论:product分区并未出现损坏

单刷super

异常机器,单独刷基本当前的软件版本中的super镜像,check是否能够开机?

结论:异常,callrace与之前一致

刷整包软件

异常机器,fastboot刷机整包软件,check是否能够开机?

结论:异常,calltrace与之前一致

问题结论

  1. 异常机器的每次异常calltrace均死在同一处代码:f940001f

  2. 异常机器单数刷super,问题仍然复现

  3. 异常机器刷取整包软件,问题仍然复现

可以基本确认,当前出现的是storage corruption

下一步:

  1. 硬件通过交叉验证ufs

  2. 进行一些ufs的检测