问题背景
- 在一次基线升级后(引入高通baseline的代码),开机重启会进入死机模式 
- 基线升级前的软件没有问题 
问题分析
dmesg日志分析
[  397.719194][    T1] mtdoops: mtdoops_do_dump start , count = 279 , page = 6, reason = 5, dump_count = 1
[  397.733201][    T1] mtdoops: mtdoops_do_dump pmsg paddr = 0x000000000677a9c7 
[  397.733228][    T1] Unable to handle kernel paging request at virtual address ffffff8000e00000
[  397.733234][    T1] Mem abort info:
[  397.733240][    T1]   ESR = 0x0000000096000007
[  397.733246][    T1]   EC = 0x25: DABT (current EL), IL = 32 bits
[  397.733253][    T1]   SET = 0, FnV = 0
[  397.733260][    T1]   EA = 0, S1PTW = 0
[  397.733265][    T1]   FSC = 0x07: level 3 translation fault
[  397.733271][    T1] Data abort info:
[  397.733276][    T1]   ISV = 0, ISS = 0x00000007, ISS2 = 0x00000000
[  397.733283][    T1]   CM = 0, WnR = 0, TnD = 0, TagAccess = 0
[  397.733289][    T1]   GCS = 0, Overlay = 0, DirtyBit = 0, Xs = 0
[  397.733295][    T1] swapper pgtable: 4k pages, 39-bit VAs, pgdp=00000000a9c9f000
[  397.733302][    T1] [ffffff8000e00000] pgd=180000097ff78003, p4d=180000097ff78003, pud=180000097ff78003, pmd=180000097ff73003, pte=0000000000000000
[  397.733323][    T1] Internal error: Oops: 0000000096000007 [#1] PREEMPT SMP
[  397.734277][    T1] Hardware name: Qualcomm Technologies, Inc. Kunzite QRD (DT)
[  397.734283][    T1] pstate: 60400005 (nZCv daif +PAN -UAO -TCO -DIT -SSBS BTYPE=--)
[  397.734291][    T1] pc : mtdoops_do_dump+0x1a0/0x2cc [mtdoops]
[  397.734323][    T1] lr : mtdoops_do_dump+0x1a0/0x2cc [mtdoops]
[  397.734351][    T1] sp : ffffffc0823abba0
[  397.734358][    T1] x29: ffffffc0823abbc0 x28: ffffff883b9c8000 x27: ffffff879a608c20
[  397.734370][    T1] x26: ffffffc082262000 x25: ffffff883b9c8000 x24: ffffffc0820bae18
[  397.734382][    T1] x23: ffffffc07c898000 x22: 0000000000000001 x21: ffffffc0823abcd8
[  397.734393][    T1] x20: ffffffc07c8981a8 x19: ffffff8000e00000 x18: ffffffc082389058
[  397.734404][    T1] x17: 0000000000000001 x16: ffffffffffffffff x15: 0000000000000004
[  397.734415][    T1] x14: ffffff88f5300000 x13: 0000000000000003 x12: 0000000000000003
[  397.734426][    T1] x11: 00000000fffeffff x10: c0000000fffeffff x9 : c92068df22ab7800
[  397.734438][    T1] x8 : c92068df22ab7800 x7 : 205b5d3130323333 x6 : 372e37393320205b
[  397.734449][    T1] x5 : ffffffc0822e579f x4 : ffffffc081580cfc x3 : 0000000000000000
[  397.734460][    T1] x2 : 0000000000000000 x1 : ffffffc0823ab950 x0 : 0000000000000039
[  397.734471][    T1] Call trace:
[  397.734478][    T1]  mtdoops_do_dump+0x1a0/0x2cc [mtdoops 0e68315fd17942ad7985f5125ce422a1f47288bf]
[  397.734517][    T1]  mtdoops_reboot_nb_handle+0x2c/0x40 [mtdoops 0e68315fd17942ad7985f5125ce422a1f47288bf]
[  397.734534][    T1]  notifier_call_chain+0x90/0x174
[  397.734549][    T1]  blocking_notifier_call_chain+0x48/0x78
[  397.734559][    T1]  kernel_restart+0x28/0x114
[  397.734570][    T1]  __arm64_sys_reboot+0x1b0/0x288
[  397.734580][    T1]  invoke_syscall+0x58/0x114
[  397.734593][    T1]  el0_svc_common+0xac/0xe0
[  397.734603][    T1]  do_el0_svc+0x1c/0x28
[  397.734613][    T1]  el0_svc+0x38/0x68
[  397.734625][    T1]  el0t_64_sync_handler+0x68/0xbc
[  397.734635][    T1]  el0t_64_sync+0x1a8/0x1ac
[  397.734645][    T1] Code: cb080128 b2596113 aa1303e1 951e72aa (b9400261) 
[  397.734652][    T1] ---[ end trace 0000000000000000 ]---我们可以确定的是代码死在了mtdoops_do_dump+0x1a0
trace32恢复现场

PC位于B9400261处,这儿的ldr w1,[x19]
这里的x19也就是p_hdr的地址

而这个地址从dmesg中可以看到pte为空
[  397.733302][    T1] [ffffff8000e00000] pgd=180000097ff78003, p4d=180000097ff78003, pud=180000097ff78003, pmd=180000097ff73003, pte=0000000000000000在ARM64页表机制中:
- PTE为0表示该页表项为空 
- 没有建立虚拟地址到物理地址的映射 
这点也可以用trace32查询该地址得到:

代码追踪
我们需要得到p_hdr的由来,查看他是否经过了映射!
static void mtdoops_do_dump(struct kmsg_dumper *dumper,
			    enum mtd_dump_reason reason)
{
//...
	pmsg_buffer_start = phys_to_virt(
		(cxt->pmsg_data.mem_address + cxt->pmsg_data.mem_size)-
		cxt->pmsg_data.pmsg_size);
	p_hdr = (struct pmsg_buffer_hdr *)pmsg_buffer_start;
	pr_err("mtdoops_do_dump pmsg paddr = 0x%p \n",
			pmsg_buffer_start);
//...从代码里我们可以看到p_hdr结构体pmsg_buffer_hdr指针,指向pmsg_buffer_start
而pmsg_buffer_start是通过phys_to_virt将cxt->pmsg_data.mem_address转换成虚拟地址后,经过计算得来
static int mtdoops_pmsg_probe(struct platform_device *pdev)
{
	struct mtdoops_context *cxt = &oops_cxt;
	struct resource *res;
	u32 value;
	int ret;
	res = platform_get_resource(pdev, IORESOURCE_MEM, 0);
	if (!res) {
		pr_err("failed to locate DT /reserved-memory resource\n");
		return -EINVAL;
	}
	cxt->pmsg_data.mem_size = resource_size(res);
	cxt->pmsg_data.mem_address = res->start;
	pr_err( "pares mtd_dt, mem_address =0x%llx, mem_size =0x%lx \n",
			cxt->pmsg_data.mem_address, cxt->pmsg_data.mem_size);
	pr_err( "pares mtd_dt, pmsg_size =0x%lx, console-size =0x%lx \n",
			cxt->pmsg_data.pmsg_size, cxt->pmsg_data.console_size);而
cxt->pmsg_data.mem_address = res->start;
res = platform_get_resource(pdev, IORESOURCE_MEM, 0);
从原理上来讲,驱动匹配后,获取到设备树里的内存资源,然后通过phys_to_virt转换成虚拟地址后,这个是没有问题的!
并且一个很关键的信息:
- 基线升级之前代码没有问题 
- 基线升级的代码中mtdoops.c源码没有改动 
所以现阶段只能怀疑是设备树的这段内存出现了问题!因为使用phys_to_virt转换虚拟地址,必须保证这段内存是线性内存区域!
mtdoops使用的内存区域,是reserved memory,也即是ramoops,这段地址是线性区域
&reserved_memory {	
    //...
    ramoops_mem: ramoops@80D00000 {
		compatible = "ramoops";
		reg = <0x0 0x80D00000 0x0 0x200000>;
        record-size = <0x40000>;
		pmsg-size = <0x100000>;
        console-size = <0x80000>;
	};
    //...
};继续检查dmesg日志,因为开机时会打印所有的reserved_mem的region
[    0.000000][    T0] OF: reserved mem: OVERLAP DETECTED!
[    0.000000][    T0] pvm_fw_region@80c01000 (0x0000000080c01000--0x0000000080e01000) overlaps with ramoops@80D00000 (0x0000000080d00000--0x0000000080f00000)
[    0.000000][    T0] Reserved memory: created CMA memory pool at 0x00000000ff800000, size 4 MiB
[    0.000000][    T0] OF: reserved mem: initialized node vm_comm_mem_region, compatible id shared-dma-pool
[    0.000000][    T0] OF: reserved mem: 0x00000000ff800000..0x00000000ffbfffff (4096 KiB) map reusable vm_comm_mem_region
[    0.000000][    T0] OF: reserved mem: 0x00000000fec00000..0x00000000ff7fffff (12288 KiB) map non-reusable mem_dump_region
[    0.000000][    T0] Reserved memory: created CMA memory pool at 0x00000000fcc00000, size 32 MiB
[    0.000000][    T0] OF: reserved mem: initialized node linux,cma, compatible id shared-dma-pool
[    0.000000][    T0] OF: reserved mem: 0x00000000fcc00000..0x00000000febfffff (32768 KiB) map reusable linux,cma
[    0.000000][    T0] Reserved memory: created CMA memory pool at 0x00000000fc000000, size 12 MiB
[    0.000000][    T0] OF: reserved mem: initialized node adsp_heap_region, compatible id shared-dma-pool
[    0.000000][    T0] OF: reserved mem: 0x00000000fc000000..0x00000000fcbfffff (12288 KiB) map reusable adsp_heap_region
[    0.000000][    T0] Reserved memory: created CMA memory pool at 0x00000000fa400000, size 28 MiB
[    0.000000][    T0] OF: reserved mem: initialized node audio_cma_region, compatible id shared-dma-pool
[    0.000000][    T0] OF: reserved mem: 0x00000000fa400000..0x00000000fbffffff (28672 KiB) map reusable audio_cma_region
[    0.000000][    T0] Reserved memory: created CMA memory pool at 0x00000000fa000000, size 4 MiB
[    0.000000][    T0] OF: reserved mem: initialized node sdsp_region, compatible id shared-dma-pool
[    0.000000][    T0] OF: reserved mem: 0x00000000fa000000..0x00000000fa3fffff (4096 KiB) map reusable sdsp_region
[    0.000000][    T0] Reserved memory: created CMA memory pool at 0x00000000f7800000, size 40 MiB
[    0.000000][    T0] OF: reserved mem: initialized node secure_cdsp_region, compatible id shared-dma-pool
[    0.000000][    T0] OF: reserved mem: 0x00000000f7800000..0x00000000f9ffffff (40960 KiB) map reusable secure_cdsp_region
[    0.000000][    T0] OF: reserved mem: 0x000000097ffff000..0x000000097fffffff (4 KiB) nomap non-reusable debug_kinfo_region
[    0.000000][    T0] Reserved memory: created CMA memory pool at 0x000000097ec00000, size 16 MiB
[    0.000000][    T0] OF: reserved mem: initialized node va_md_mem_region, compatible id shared-dma-pool
[    0.000000][    T0] OF: reserved mem: 0x000000097ec00000..0x000000097fbfffff (16384 KiB) map reusable va_md_mem_region
[    0.000000][    T0] Reserved memory: created CMA memory pool at 0x00000000f1c00000, size 92 MiB
[    0.000000][    T0] OF: reserved mem: initialized node non_secure_display_region, compatible id shared-dma-pool
[    0.000000][    T0] OF: reserved mem: 0x00000000f1c00000..0x00000000f77fffff (94208 KiB) map reusable non_secure_display_region
[    0.000000][    T0] Reserved memory: created CMA memory pool at 0x00000000f0800000, size 20 MiB
[    0.000000][    T0] OF: reserved mem: initialized node qseecom_region, compatible id shared-dma-pool
[    0.000000][    T0] OF: reserved mem: 0x00000000f0800000..0x00000000f1bfffff (20480 KiB) map reusable qseecom_region
[    0.000000][    T0] Reserved memory: created CMA memory pool at 0x00000000e7800000, size 16 MiB
[    0.000000][    T0] OF: reserved mem: initialized node qseecom_ta_region, compatible id shared-dma-pool
[    0.000000][    T0] OF: reserved mem: 0x00000000e7800000..0x00000000e87fffff (16384 KiB) map reusable qseecom_ta_region
[    0.000000][    T0] OF: reserved mem: 0x0000000080600000..0x000000008063ffff (256 KiB) nomap non-reusable xbl_dtlog_region@80600000
[    0.000000][    T0] OF: reserved mem: 0x0000000080640000..0x00000000807fffff (1792 KiB) nomap non-reusable xbl_ramdump_region@80640000
[    0.000000][    T0] OF: reserved mem: 0x0000000080800000..0x000000008085ffff (384 KiB) nomap non-reusable aop_image_region@80800000
[    0.000000][    T0] OF: reserved mem: 0x0000000080860000..0x000000008087ffff (128 KiB) nomap non-reusable aop_cmd_db_region@80860000
[    0.000000][    T0] OF: reserved mem: 0x0000000080880000..0x000000008089ffff (128 KiB) nomap non-reusable aop_config_region@80880000
[    0.000000][    T0] OF: reserved mem: 0x00000000808a0000..0x00000000808dffff (256 KiB) nomap non-reusable tme_crash_dump_region@808a0000
[    0.000000][    T0] OF: reserved mem: 0x00000000808e0000..0x00000000808e3fff (16 KiB) nomap non-reusable tme_log_region@808e0000
[    0.000000][    T0] OF: reserved mem: 0x00000000808e4000..0x00000000808f3fff (64 KiB) nomap non-reusable uefi_log_region@808e4000
[    0.000000][    T0] OF: reserved mem: 0x0000000080900000..0x0000000080afffff (2048 KiB) nomap non-reusable smem_region@80900000
[    0.000000][    T0] OF: reserved mem: 0x0000000080b00000..0x0000000080bfffff (1024 KiB) nomap non-reusable cpucp_fw_region@80b00000
[    0.000000][    T0] OF: reserved mem: 0x0000000080c00000..0x0000000080c00fff (4 KiB) nomap non-reusable chipinfo_region@80c00000
[    0.000000][    T0] OF: reserved mem: 0x0000000080c01000..0x0000000080e00fff (2048 KiB) nomap non-reusable pvm_fw_region@80c01000
[    0.000000][    T0] OF: reserved mem: 0x0000000080d00000..0x0000000080efffff (2048 KiB) map non-reusable ramoops@80D00000
[    0.000000][    T0] OF: reserved mem: 0x0000000082a00000..0x0000000082cfffff (3072 KiB) nomap non-reusable wlan_fw_region@82a00000
[    0.000000][    T0] OF: reserved mem: 0x0000000083600000..0x00000000843fffff (14336 KiB) nomap non-reusable hyp_region@0x83600000
[    0.000000][    T0] OF: reserved mem: 0x0000000084b00000..0x00000000852fffff (8192 KiB) nomap non-reusable camera_region@84b00000
[    0.000000][    T0] OF: reserved mem: 0x0000000085300000..0x0000000086bfffff (25600 KiB) nomap non-reusable wpss_region@85300000
[    0.000000][    T0] OF: reserved mem: 0x0000000086c00000..0x00000000893fffff (40960 KiB) nomap non-reusable adsp_region@86c00000
[    0.000000][    T0] OF: reserved mem: 0x0000000089400000..0x0000000089dfffff (10240 KiB) nomap non-reusable cdsp_region@89400000
[    0.000000][    T0] OF: reserved mem: 0x0000000089e00000..0x0000000089e0ffff (64 KiB) nomap non-reusable ipa_fw_region@89e00000
[    0.000000][    T0] OF: reserved mem: 0x0000000089e10000..0x0000000089e19fff (40 KiB) nomap non-reusable ipa_gsi_region@89e10000
[    0.000000][    T0] OF: reserved mem: 0x0000000089e1a000..0x0000000089e1bfff (8 KiB) nomap non-reusable gpu_microcode_region@89e1a000
[    0.000000][    T0] OF: reserved mem: 0x000000008bc00000..0x000000009a1fffff (235520 KiB) nomap non-reusable mpss_region@8bc00000
[    0.000000][    T0] OF: reserved mem: 0x000000009a200000..0x000000009a8fffff (7168 KiB) nomap non-reusable video_region@9a200000
[    0.000000][    T0] OF: reserved mem: 0x00000000a6e00000..0x00000000a6e3ffff (256 KiB) nomap non-reusable xbl_sc_region@a6e00000
[    0.000000][    T0] OF: reserved mem: 0x00000000a6f00000..0x00000000a6ffffff (1024 KiB) nomap non-reusable global_sync_region@a6f00000
[    0.000000][    T0] OF: reserved mem: 0x00000000b8000000..0x00000000baafffff (44032 KiB) map non-reusable splash_region
[    0.000000][    T0] OF: reserved mem: 0x00000000e0600000..0x00000000e09fffff (4096 KiB) nomap non-reusable cpusys_vm_region@e0600000
[    0.000000][    T0] Reserved memory: created CMA memory pool at 0x00000000e0c00000, size 76 MiB
[    0.000000][    T0] OF: reserved mem: initialized node trust_ui_vm_region@e0c00000, compatible id shared-dma-pool
[    0.000000][    T0] OF: reserved mem: 0x00000000e0c00000..0x00000000e57fffff (77824 KiB) map reusable trust_ui_vm_region@e0c00000
[    0.000000][    T0] OF: reserved mem: 0x00000000e8800000..0x00000000e88fffff (1024 KiB) nomap non-reusable tz_stat_region@e8800000
[    0.000000][    T0] OF: reserved mem: 0x00000000e8900000..0x00000000e8f7ffff (6656 KiB) nomap non-reusable tags_region@e8900000
[    0.000000][    T0] OF: reserved mem: 0x00000000e8f80000..0x00000000e947ffff (5120 KiB) nomap non-reusable qtee_region@e8f80000
[    0.000000][    T0] OF: reserved mem: 0x00000000e9480000..0x00000000efc7ffff (106496 KiB) nomap non-reusable trusted_apps_region@e9480000我们可以看到这段异常点:
[    0.000000][    T0] OF: reserved mem: OVERLAP DETECTED!
[    0.000000][    T0] pvm_fw_region@80c01000 (0x0000000080c01000--0x0000000080e01000) overlaps with ramoops@80D00000 (0x0000000080d00000--0x0000000080f00000)这个pvm_rw_region的内存区域和ramoops的内存区域发生重叠!!
查询改动

而这段区域被设置为非映射区域
问题根因
直接原因
- 三级页表转换错误(level 3 translation fault) 
- 页表项为空,没有建立虚拟地址到物理地址的映射 
- 任何访问该地址的操作都会触发页错误 
根本原因
[    0.000000][    T0] OF: reserved mem: OVERLAP DETECTED!
[    0.000000][    T0] pvm_fw_region@80c01000 (0x0000000080c01000--0x0000000080e01000) overlaps with ramoops@80D00000 (0x0000000080d00000--0x0000000080f00000)- 两个驱动声明了重叠的物理内存区域 
- 内核无法为重叠区域建立一致的页表映射 
解决方案
重新规划pvm_fw_region和ramoops的内存区域,避免重叠
关键技术知识点总结
物理地址与虚拟地址映射
直接映射 (线性映射)
c
// ARM64典型配置
#define PAGE_OFFSET     0xffff000000000000
虚拟地址 = PAGE_OFFSET + 物理地址适用场景: 常规内核内存访问
动态映射
c
// 保留内存、设备内存等特殊区域
void *ioremap(phys_addr_t offset, size_t size);
void *memremap(phys_addr_t offset, size_t size, unsigned long flags);适用场景:
- 设备寄存器访问 
- 保留内存区域 
- 非连续物理内存 
设备树内存管理
reserved memory声明
reserved-memory {
    region@address {
        reg = <0x0 base_address 0x0 size>;
        no-map;          // 内核不创建映射
        reusable;        // 内核可临时使用
    };
};内存属性
- no-map: 内核不创建线性映射,必须手动ioremap 
- reusable: 内核可在驱动未加载时使用该内存 
- alignment: 内存对齐要求 
 
             林渡
 林渡