问题背景
在一次基线升级后(引入高通baseline的代码),开机重启会进入死机模式
基线升级前的软件没有问题
问题分析
dmesg日志分析
[ 397.719194][ T1] mtdoops: mtdoops_do_dump start , count = 279 , page = 6, reason = 5, dump_count = 1
[ 397.733201][ T1] mtdoops: mtdoops_do_dump pmsg paddr = 0x000000000677a9c7
[ 397.733228][ T1] Unable to handle kernel paging request at virtual address ffffff8000e00000
[ 397.733234][ T1] Mem abort info:
[ 397.733240][ T1] ESR = 0x0000000096000007
[ 397.733246][ T1] EC = 0x25: DABT (current EL), IL = 32 bits
[ 397.733253][ T1] SET = 0, FnV = 0
[ 397.733260][ T1] EA = 0, S1PTW = 0
[ 397.733265][ T1] FSC = 0x07: level 3 translation fault
[ 397.733271][ T1] Data abort info:
[ 397.733276][ T1] ISV = 0, ISS = 0x00000007, ISS2 = 0x00000000
[ 397.733283][ T1] CM = 0, WnR = 0, TnD = 0, TagAccess = 0
[ 397.733289][ T1] GCS = 0, Overlay = 0, DirtyBit = 0, Xs = 0
[ 397.733295][ T1] swapper pgtable: 4k pages, 39-bit VAs, pgdp=00000000a9c9f000
[ 397.733302][ T1] [ffffff8000e00000] pgd=180000097ff78003, p4d=180000097ff78003, pud=180000097ff78003, pmd=180000097ff73003, pte=0000000000000000
[ 397.733323][ T1] Internal error: Oops: 0000000096000007 [#1] PREEMPT SMP
[ 397.734277][ T1] Hardware name: Qualcomm Technologies, Inc. Kunzite QRD (DT)
[ 397.734283][ T1] pstate: 60400005 (nZCv daif +PAN -UAO -TCO -DIT -SSBS BTYPE=--)
[ 397.734291][ T1] pc : mtdoops_do_dump+0x1a0/0x2cc [mtdoops]
[ 397.734323][ T1] lr : mtdoops_do_dump+0x1a0/0x2cc [mtdoops]
[ 397.734351][ T1] sp : ffffffc0823abba0
[ 397.734358][ T1] x29: ffffffc0823abbc0 x28: ffffff883b9c8000 x27: ffffff879a608c20
[ 397.734370][ T1] x26: ffffffc082262000 x25: ffffff883b9c8000 x24: ffffffc0820bae18
[ 397.734382][ T1] x23: ffffffc07c898000 x22: 0000000000000001 x21: ffffffc0823abcd8
[ 397.734393][ T1] x20: ffffffc07c8981a8 x19: ffffff8000e00000 x18: ffffffc082389058
[ 397.734404][ T1] x17: 0000000000000001 x16: ffffffffffffffff x15: 0000000000000004
[ 397.734415][ T1] x14: ffffff88f5300000 x13: 0000000000000003 x12: 0000000000000003
[ 397.734426][ T1] x11: 00000000fffeffff x10: c0000000fffeffff x9 : c92068df22ab7800
[ 397.734438][ T1] x8 : c92068df22ab7800 x7 : 205b5d3130323333 x6 : 372e37393320205b
[ 397.734449][ T1] x5 : ffffffc0822e579f x4 : ffffffc081580cfc x3 : 0000000000000000
[ 397.734460][ T1] x2 : 0000000000000000 x1 : ffffffc0823ab950 x0 : 0000000000000039
[ 397.734471][ T1] Call trace:
[ 397.734478][ T1] mtdoops_do_dump+0x1a0/0x2cc [mtdoops 0e68315fd17942ad7985f5125ce422a1f47288bf]
[ 397.734517][ T1] mtdoops_reboot_nb_handle+0x2c/0x40 [mtdoops 0e68315fd17942ad7985f5125ce422a1f47288bf]
[ 397.734534][ T1] notifier_call_chain+0x90/0x174
[ 397.734549][ T1] blocking_notifier_call_chain+0x48/0x78
[ 397.734559][ T1] kernel_restart+0x28/0x114
[ 397.734570][ T1] __arm64_sys_reboot+0x1b0/0x288
[ 397.734580][ T1] invoke_syscall+0x58/0x114
[ 397.734593][ T1] el0_svc_common+0xac/0xe0
[ 397.734603][ T1] do_el0_svc+0x1c/0x28
[ 397.734613][ T1] el0_svc+0x38/0x68
[ 397.734625][ T1] el0t_64_sync_handler+0x68/0xbc
[ 397.734635][ T1] el0t_64_sync+0x1a8/0x1ac
[ 397.734645][ T1] Code: cb080128 b2596113 aa1303e1 951e72aa (b9400261)
[ 397.734652][ T1] ---[ end trace 0000000000000000 ]---我们可以确定的是代码死在了mtdoops_do_dump+0x1a0
trace32恢复现场

PC位于B9400261处,这儿的ldr w1,[x19]
这里的x19也就是p_hdr的地址

而这个地址从dmesg中可以看到pte为空
[ 397.733302][ T1] [ffffff8000e00000] pgd=180000097ff78003, p4d=180000097ff78003, pud=180000097ff78003, pmd=180000097ff73003, pte=0000000000000000在ARM64页表机制中:
PTE为0表示该页表项为空
没有建立虚拟地址到物理地址的映射
这点也可以用trace32查询该地址得到:

代码追踪
我们需要得到p_hdr的由来,查看他是否经过了映射!
static void mtdoops_do_dump(struct kmsg_dumper *dumper,
enum mtd_dump_reason reason)
{
//...
pmsg_buffer_start = phys_to_virt(
(cxt->pmsg_data.mem_address + cxt->pmsg_data.mem_size)-
cxt->pmsg_data.pmsg_size);
p_hdr = (struct pmsg_buffer_hdr *)pmsg_buffer_start;
pr_err("mtdoops_do_dump pmsg paddr = 0x%p \n",
pmsg_buffer_start);
//...从代码里我们可以看到p_hdr结构体pmsg_buffer_hdr指针,指向pmsg_buffer_start
而pmsg_buffer_start是通过phys_to_virt将cxt->pmsg_data.mem_address转换成虚拟地址后,经过计算得来
static int mtdoops_pmsg_probe(struct platform_device *pdev)
{
struct mtdoops_context *cxt = &oops_cxt;
struct resource *res;
u32 value;
int ret;
res = platform_get_resource(pdev, IORESOURCE_MEM, 0);
if (!res) {
pr_err("failed to locate DT /reserved-memory resource\n");
return -EINVAL;
}
cxt->pmsg_data.mem_size = resource_size(res);
cxt->pmsg_data.mem_address = res->start;
pr_err( "pares mtd_dt, mem_address =0x%llx, mem_size =0x%lx \n",
cxt->pmsg_data.mem_address, cxt->pmsg_data.mem_size);
pr_err( "pares mtd_dt, pmsg_size =0x%lx, console-size =0x%lx \n",
cxt->pmsg_data.pmsg_size, cxt->pmsg_data.console_size);而
cxt->pmsg_data.mem_address = res->start;
res = platform_get_resource(pdev, IORESOURCE_MEM, 0);
从原理上来讲,驱动匹配后,获取到设备树里的内存资源,然后通过phys_to_virt转换成虚拟地址后,这个是没有问题的!
并且一个很关键的信息:
基线升级之前代码没有问题
基线升级的代码中mtdoops.c源码没有改动
所以现阶段只能怀疑是设备树的这段内存出现了问题!因为使用phys_to_virt转换虚拟地址,必须保证这段内存是线性内存区域!
mtdoops使用的内存区域,是reserved memory,也即是ramoops,这段地址是线性区域
&reserved_memory {
//...
ramoops_mem: ramoops@80D00000 {
compatible = "ramoops";
reg = <0x0 0x80D00000 0x0 0x200000>;
record-size = <0x40000>;
pmsg-size = <0x100000>;
console-size = <0x80000>;
};
//...
};继续检查dmesg日志,因为开机时会打印所有的reserved_mem的region
[ 0.000000][ T0] OF: reserved mem: OVERLAP DETECTED!
[ 0.000000][ T0] pvm_fw_region@80c01000 (0x0000000080c01000--0x0000000080e01000) overlaps with ramoops@80D00000 (0x0000000080d00000--0x0000000080f00000)
[ 0.000000][ T0] Reserved memory: created CMA memory pool at 0x00000000ff800000, size 4 MiB
[ 0.000000][ T0] OF: reserved mem: initialized node vm_comm_mem_region, compatible id shared-dma-pool
[ 0.000000][ T0] OF: reserved mem: 0x00000000ff800000..0x00000000ffbfffff (4096 KiB) map reusable vm_comm_mem_region
[ 0.000000][ T0] OF: reserved mem: 0x00000000fec00000..0x00000000ff7fffff (12288 KiB) map non-reusable mem_dump_region
[ 0.000000][ T0] Reserved memory: created CMA memory pool at 0x00000000fcc00000, size 32 MiB
[ 0.000000][ T0] OF: reserved mem: initialized node linux,cma, compatible id shared-dma-pool
[ 0.000000][ T0] OF: reserved mem: 0x00000000fcc00000..0x00000000febfffff (32768 KiB) map reusable linux,cma
[ 0.000000][ T0] Reserved memory: created CMA memory pool at 0x00000000fc000000, size 12 MiB
[ 0.000000][ T0] OF: reserved mem: initialized node adsp_heap_region, compatible id shared-dma-pool
[ 0.000000][ T0] OF: reserved mem: 0x00000000fc000000..0x00000000fcbfffff (12288 KiB) map reusable adsp_heap_region
[ 0.000000][ T0] Reserved memory: created CMA memory pool at 0x00000000fa400000, size 28 MiB
[ 0.000000][ T0] OF: reserved mem: initialized node audio_cma_region, compatible id shared-dma-pool
[ 0.000000][ T0] OF: reserved mem: 0x00000000fa400000..0x00000000fbffffff (28672 KiB) map reusable audio_cma_region
[ 0.000000][ T0] Reserved memory: created CMA memory pool at 0x00000000fa000000, size 4 MiB
[ 0.000000][ T0] OF: reserved mem: initialized node sdsp_region, compatible id shared-dma-pool
[ 0.000000][ T0] OF: reserved mem: 0x00000000fa000000..0x00000000fa3fffff (4096 KiB) map reusable sdsp_region
[ 0.000000][ T0] Reserved memory: created CMA memory pool at 0x00000000f7800000, size 40 MiB
[ 0.000000][ T0] OF: reserved mem: initialized node secure_cdsp_region, compatible id shared-dma-pool
[ 0.000000][ T0] OF: reserved mem: 0x00000000f7800000..0x00000000f9ffffff (40960 KiB) map reusable secure_cdsp_region
[ 0.000000][ T0] OF: reserved mem: 0x000000097ffff000..0x000000097fffffff (4 KiB) nomap non-reusable debug_kinfo_region
[ 0.000000][ T0] Reserved memory: created CMA memory pool at 0x000000097ec00000, size 16 MiB
[ 0.000000][ T0] OF: reserved mem: initialized node va_md_mem_region, compatible id shared-dma-pool
[ 0.000000][ T0] OF: reserved mem: 0x000000097ec00000..0x000000097fbfffff (16384 KiB) map reusable va_md_mem_region
[ 0.000000][ T0] Reserved memory: created CMA memory pool at 0x00000000f1c00000, size 92 MiB
[ 0.000000][ T0] OF: reserved mem: initialized node non_secure_display_region, compatible id shared-dma-pool
[ 0.000000][ T0] OF: reserved mem: 0x00000000f1c00000..0x00000000f77fffff (94208 KiB) map reusable non_secure_display_region
[ 0.000000][ T0] Reserved memory: created CMA memory pool at 0x00000000f0800000, size 20 MiB
[ 0.000000][ T0] OF: reserved mem: initialized node qseecom_region, compatible id shared-dma-pool
[ 0.000000][ T0] OF: reserved mem: 0x00000000f0800000..0x00000000f1bfffff (20480 KiB) map reusable qseecom_region
[ 0.000000][ T0] Reserved memory: created CMA memory pool at 0x00000000e7800000, size 16 MiB
[ 0.000000][ T0] OF: reserved mem: initialized node qseecom_ta_region, compatible id shared-dma-pool
[ 0.000000][ T0] OF: reserved mem: 0x00000000e7800000..0x00000000e87fffff (16384 KiB) map reusable qseecom_ta_region
[ 0.000000][ T0] OF: reserved mem: 0x0000000080600000..0x000000008063ffff (256 KiB) nomap non-reusable xbl_dtlog_region@80600000
[ 0.000000][ T0] OF: reserved mem: 0x0000000080640000..0x00000000807fffff (1792 KiB) nomap non-reusable xbl_ramdump_region@80640000
[ 0.000000][ T0] OF: reserved mem: 0x0000000080800000..0x000000008085ffff (384 KiB) nomap non-reusable aop_image_region@80800000
[ 0.000000][ T0] OF: reserved mem: 0x0000000080860000..0x000000008087ffff (128 KiB) nomap non-reusable aop_cmd_db_region@80860000
[ 0.000000][ T0] OF: reserved mem: 0x0000000080880000..0x000000008089ffff (128 KiB) nomap non-reusable aop_config_region@80880000
[ 0.000000][ T0] OF: reserved mem: 0x00000000808a0000..0x00000000808dffff (256 KiB) nomap non-reusable tme_crash_dump_region@808a0000
[ 0.000000][ T0] OF: reserved mem: 0x00000000808e0000..0x00000000808e3fff (16 KiB) nomap non-reusable tme_log_region@808e0000
[ 0.000000][ T0] OF: reserved mem: 0x00000000808e4000..0x00000000808f3fff (64 KiB) nomap non-reusable uefi_log_region@808e4000
[ 0.000000][ T0] OF: reserved mem: 0x0000000080900000..0x0000000080afffff (2048 KiB) nomap non-reusable smem_region@80900000
[ 0.000000][ T0] OF: reserved mem: 0x0000000080b00000..0x0000000080bfffff (1024 KiB) nomap non-reusable cpucp_fw_region@80b00000
[ 0.000000][ T0] OF: reserved mem: 0x0000000080c00000..0x0000000080c00fff (4 KiB) nomap non-reusable chipinfo_region@80c00000
[ 0.000000][ T0] OF: reserved mem: 0x0000000080c01000..0x0000000080e00fff (2048 KiB) nomap non-reusable pvm_fw_region@80c01000
[ 0.000000][ T0] OF: reserved mem: 0x0000000080d00000..0x0000000080efffff (2048 KiB) map non-reusable ramoops@80D00000
[ 0.000000][ T0] OF: reserved mem: 0x0000000082a00000..0x0000000082cfffff (3072 KiB) nomap non-reusable wlan_fw_region@82a00000
[ 0.000000][ T0] OF: reserved mem: 0x0000000083600000..0x00000000843fffff (14336 KiB) nomap non-reusable hyp_region@0x83600000
[ 0.000000][ T0] OF: reserved mem: 0x0000000084b00000..0x00000000852fffff (8192 KiB) nomap non-reusable camera_region@84b00000
[ 0.000000][ T0] OF: reserved mem: 0x0000000085300000..0x0000000086bfffff (25600 KiB) nomap non-reusable wpss_region@85300000
[ 0.000000][ T0] OF: reserved mem: 0x0000000086c00000..0x00000000893fffff (40960 KiB) nomap non-reusable adsp_region@86c00000
[ 0.000000][ T0] OF: reserved mem: 0x0000000089400000..0x0000000089dfffff (10240 KiB) nomap non-reusable cdsp_region@89400000
[ 0.000000][ T0] OF: reserved mem: 0x0000000089e00000..0x0000000089e0ffff (64 KiB) nomap non-reusable ipa_fw_region@89e00000
[ 0.000000][ T0] OF: reserved mem: 0x0000000089e10000..0x0000000089e19fff (40 KiB) nomap non-reusable ipa_gsi_region@89e10000
[ 0.000000][ T0] OF: reserved mem: 0x0000000089e1a000..0x0000000089e1bfff (8 KiB) nomap non-reusable gpu_microcode_region@89e1a000
[ 0.000000][ T0] OF: reserved mem: 0x000000008bc00000..0x000000009a1fffff (235520 KiB) nomap non-reusable mpss_region@8bc00000
[ 0.000000][ T0] OF: reserved mem: 0x000000009a200000..0x000000009a8fffff (7168 KiB) nomap non-reusable video_region@9a200000
[ 0.000000][ T0] OF: reserved mem: 0x00000000a6e00000..0x00000000a6e3ffff (256 KiB) nomap non-reusable xbl_sc_region@a6e00000
[ 0.000000][ T0] OF: reserved mem: 0x00000000a6f00000..0x00000000a6ffffff (1024 KiB) nomap non-reusable global_sync_region@a6f00000
[ 0.000000][ T0] OF: reserved mem: 0x00000000b8000000..0x00000000baafffff (44032 KiB) map non-reusable splash_region
[ 0.000000][ T0] OF: reserved mem: 0x00000000e0600000..0x00000000e09fffff (4096 KiB) nomap non-reusable cpusys_vm_region@e0600000
[ 0.000000][ T0] Reserved memory: created CMA memory pool at 0x00000000e0c00000, size 76 MiB
[ 0.000000][ T0] OF: reserved mem: initialized node trust_ui_vm_region@e0c00000, compatible id shared-dma-pool
[ 0.000000][ T0] OF: reserved mem: 0x00000000e0c00000..0x00000000e57fffff (77824 KiB) map reusable trust_ui_vm_region@e0c00000
[ 0.000000][ T0] OF: reserved mem: 0x00000000e8800000..0x00000000e88fffff (1024 KiB) nomap non-reusable tz_stat_region@e8800000
[ 0.000000][ T0] OF: reserved mem: 0x00000000e8900000..0x00000000e8f7ffff (6656 KiB) nomap non-reusable tags_region@e8900000
[ 0.000000][ T0] OF: reserved mem: 0x00000000e8f80000..0x00000000e947ffff (5120 KiB) nomap non-reusable qtee_region@e8f80000
[ 0.000000][ T0] OF: reserved mem: 0x00000000e9480000..0x00000000efc7ffff (106496 KiB) nomap non-reusable trusted_apps_region@e9480000我们可以看到这段异常点:
[ 0.000000][ T0] OF: reserved mem: OVERLAP DETECTED!
[ 0.000000][ T0] pvm_fw_region@80c01000 (0x0000000080c01000--0x0000000080e01000) overlaps with ramoops@80D00000 (0x0000000080d00000--0x0000000080f00000)这个pvm_rw_region的内存区域和ramoops的内存区域发生重叠!!
查询改动

而这段区域被设置为非映射区域
问题根因
直接原因
三级页表转换错误(level 3 translation fault)
页表项为空,没有建立虚拟地址到物理地址的映射
任何访问该地址的操作都会触发页错误
根本原因
[ 0.000000][ T0] OF: reserved mem: OVERLAP DETECTED!
[ 0.000000][ T0] pvm_fw_region@80c01000 (0x0000000080c01000--0x0000000080e01000) overlaps with ramoops@80D00000 (0x0000000080d00000--0x0000000080f00000)两个驱动声明了重叠的物理内存区域
内核无法为重叠区域建立一致的页表映射
解决方案
重新规划pvm_fw_region和ramoops的内存区域,避免重叠
关键技术知识点总结
物理地址与虚拟地址映射
直接映射 (线性映射)
c
// ARM64典型配置
#define PAGE_OFFSET 0xffff000000000000
虚拟地址 = PAGE_OFFSET + 物理地址适用场景: 常规内核内存访问
动态映射
c
// 保留内存、设备内存等特殊区域
void *ioremap(phys_addr_t offset, size_t size);
void *memremap(phys_addr_t offset, size_t size, unsigned long flags);适用场景:
设备寄存器访问
保留内存区域
非连续物理内存
设备树内存管理
reserved memory声明
reserved-memory {
region@address {
reg = <0x0 base_address 0x0 size>;
no-map; // 内核不创建映射
reusable; // 内核可临时使用
};
};内存属性
no-map: 内核不创建线性映射,必须手动ioremap
reusable: 内核可在驱动未加载时使用该内存
alignment: 内存对齐要求