AI智能摘要
本文分析了测试过程中出现的ANR问题,通过分析bugreport日志,发现大量内核线程卡在“不可中断睡眠”状态,表明线程正在等待I/O操作。进一步分析发现,问题可能出在电池/充电控制相关驱动上,因为涉及I²C通信和电源管理的模块出现异常。此外,fg_read_volt函数在I²C读失败后,会尝试重试,但可能因为互斥锁或I2C总线问题导致永久阻塞,进而引发系统内多个线程进入D状态。根本原因可能是I²C传输超时导致regmap_raw_read函数卡住或失败,进而导致fg_read_word和fg_read_volt函数卡住或多次失败,最终引发线程风暴。可能的原因包括硬件层面的I²C总线锁死、Fuel Gauge芯片异常、电池连接问题,以及软件层面的I²C驱动问题、多线程并发访问问题、I²C错误处理问题等。
此摘要由AI分析文章内容生成,仅供参考。

一、问题背景

测试过程中出现ANR问题

二、问题分析

2.1 bugreport日志分析

01-01 00:01:36.727  1047  1350  1350 E i2c_geni 4c88000.i2c: I2C xfer timeout: 126
01-01 00:01:39.287  root   150   150 E i2c_geni 4c88000.i2c: I2C xfer timeout: 126
01-01 00:01:41.335  root   150   150 E fg_read_word: [FG_UNKNOWN] I2C failed to read 0x12, ret:-110
01-01 00:01:41.847  1047  1350  1350 E i2c_geni 4c88000.i2c: I2C xfer timeout: 126
01-01 00:01:44.407  root   150   150 E i2c_geni 4c88000.i2c: I2C xfer timeout: 126
01-01 00:01:46.455  root   150   150 E fg_read_word: [FG_UNKNOWN] I2C failed to read 0x0c, ret:-110
01-01 00:01:46.967  1047  1350  1350 E i2c_geni 4c88000.i2c: I2C xfer timeout: 126
01-01 00:01:49.527  root   150   150 E i2c_geni 4c88000.i2c: I2C xfer timeout: 126
01-01 00:01:51.575  root   150   150 E fg_read_word: [FG_UNKNOWN] I2C failed to read 0x0c, ret:-110
01-01 00:01:52.091  1047  1350  1350 E i2c_geni 4c88000.i2c: I2C xfer timeout: 126
01-01 00:01:54.647  root   150   150 E i2c_geni 4c88000.i2c: I2C xfer timeout: 126
01-01 00:01:56.695  root   150   150 E fg_read_word: [FG_UNKNOWN] I2C failed to read 0x06, ret:-110
01-01 00:01:57.207  1047  1350  1350 E i2c_geni 4c88000.i2c: I2C xfer timeout: 126
01-01 00:01:59.767  root   150   150 E i2c_geni 4c88000.i2c: I2C xfer timeout: 126
01-01 00:02:01.815  root   150   150 E fg_read_word: [FG_UNKNOWN] I2C failed to read 0x06, ret:-110
01-01 00:02:02.327  1047  1350  1350 E i2c_geni 4c88000.i2c: I2C xfer timeout: 126
01-01 00:02:04.887  root   150   150 E i2c_geni 4c88000.i2c: I2C xfer timeout: 126
01-01 00:02:06.935  root   150   150 E fg_read_word: [FG_UNKNOWN] I2C failed to read 0x08, ret:-110
01-01 00:02:07.447  1047  1350  1350 E i2c_geni 4c88000.i2c: I2C xfer timeout: 126
01-01 00:02:10.007  root   150   150 E i2c_geni 4c88000.i2c: I2C xfer timeout: 126
01-01 00:02:12.055  root   150   150 E fg_read_word: [FG_UNKNOWN] I2C failed to read 0x08, ret:-110
01-01 00:02:12.062  root   150   150 E fg_read_volt: [FG_UNKNOWN] failed to read VBAT
01-01 00:02:12.567  1047  1350  1350 E i2c_geni 4c88000.i2c: I2C xfer timeout: 126
01-01 00:02:13.807  1000  2454  2454 I sysrq   : Show Blocked State
01-01 00:02:13.812  1000  2454  2454 I task    : g_reclaim_threa state:D stack:0     pid:122   ppid:2      flags:0x00000008
01-01 00:02:13.812  1000  2454  2454 I task    : kworker/4:2     state:D stack:0     pid:150   ppid:2      flags:0x00000008
01-01 00:02:13.812  1000  2454  2454 I task    : kworker/6:8     state:D stack:0     pid:702   ppid:2      flags:0x00000008
01-01 00:02:13.812  1000  2454  2454 I         : iio_read_channel_processed_scale+0x44/0x20c
01-01 00:02:13.812  1000  2454  2454 I task    : kworker/6:9     state:D stack:0     pid:703   ppid:2      flags:0x00000008
01-01 00:02:13.812  1000  2454  2454 I task    : kworker/6:11    state:D stack:0     pid:705   ppid:2      flags:0x00000008
01-01 00:02:13.812  1000  2454  2454 I         : iio_read_channel_processed_scale+0x158/0x20c
01-01 00:02:13.812  1000  2454  2454 I task    : kworker/6:12    state:D stack:0     pid:706   ppid:2      flags:0x00000008
01-01 00:02:13.812  1000  2454  2454 I         : iio_read_channel_processed_scale+0x44/0x20c
01-01 00:02:13.813  1000  2454  2454 I task    : android.hardwar state:D stack:0     pid:1335  ppid:1      flags:0x04000001
01-01 00:02:13.813  1000  2454  2454 I         : iio_read_channel_processed_scale+0x44/0x20c
01-01 00:02:13.813  1000  2454  2454 I task    : binder:1350_2   state:D stack:0     pid:1350  ppid:1      flags:0x04000008
01-01 00:02:13.814  1000  2454  2454 I task    : vendor.xiaomi.h state:D stack:0     pid:3067  ppid:1      flags:0x04000000
01-01 00:02:13.814  1000  2454  2454 I         : iio_read_channel_processed_scale+0x44/0x20c
01-01 00:02:13.814  1000  2454  2454 I task    : hvdcp_opti      state:D stack:0     pid:2606  ppid:1      flags:0x04000000
01-01 00:02:13.814  1000  2454  2454 I         : iio_read_channel_processed_scale+0x44/0x20c
01-01 00:02:15.127  root   150   150 E i2c_geni 4c88000.i2c: I2C xfer timeout: 126
01-01 00:02:17.175  root   150   150 E fg_read_word: [FG_UNKNOWN] I2C failed to read 0x0a, ret:-110
01-01 00:02:17.687  1047  1350  1350 E i2c_geni 4c88000.i2c: I2C xfer timeout: 126
01-01 00:02:20.247  root   705   705 E i2c_geni 4c88000.i2c: I2C xfer timeout: 126
01-01 00:02:22.295  root   705   705 E fg_read_word: [FG_UNKNOWN] I2C failed to read 0x0c, ret:-110
01-01 00:02:22.807  1047  1350  1350 E i2c_geni 4c88000.i2c: I2C xfer timeout: 126
01-01 00:02:25.367  root   705   705 E i2c_geni 4c88000.i2c: I2C xfer timeout: 126
01-01 00:02:27.415  root   705   705 E fg_read_word: [FG_UNKNOWN] I2C failed to read 0x0c, ret:-110
01-01 00:02:27.927  1047  1350  1350 E i2c_geni 4c88000.i2c: I2C xfer timeout: 126
01-01 00:02:30.487  root   703   703 E i2c_geni 4c88000.i2c: I2C xfer timeout: 126

日志显示,有大量内核线程(如 kworker/*android.hardware.* 等)处于 D 状态,也就是“不可中断睡眠”(uninterruptible sleep)状态。这种状态通常说明线程正在等待 I/O 操作(比如读写硬件),并且不能被打断。

这种情况经常是由于:

  • 硬件驱动死锁
  • 某些 I²C 设备响应超时
  • 互斥锁(mutex)争用
  • 或者 设备驱动 bug 导致的。

2.2 关键信息

  1. 大量线程卡在 __mutex_lockrt_mutex_lock

    __mutex_lock+0x414/0xdec
    rt_mutex_lock+0x84/0xf8
    

    这些是内核中的互斥锁,线程在等待资源释放,如果锁未及时释放就会一直卡住。

  2. 涉及 I²C 总线相关驱动

    i2c_adapter_lock_bus
    i2c_transfer
    geni_i2c_xfer [i2c_msm_geni]
    

    表示线程在访问 I²C 总线时被阻塞了,可能是某个 I²C 设备没有正确响应,或在高并发访问中出现死锁

  3. 模块相关

    日志中提到了几个相关模块(括号中是驱动模块名):

    • fg_monitor_workfuncfg_get_propertybq28z610 电池监控)
    • xm_charge_workxm_smart_chg,智能充电相关)
    • qg_status_change_workgauge_iio,电池电量管理)
    • status_change_workqpnp_smb5_main,高通电源管理)

    这些模块都跟 电源管理I²C通信 密切相关,说明问题可能出在 电池/充电控制相关驱动 上。

  4. 问题初始点

    01-01 00:02:12.055  root   150   150 E fg_read_word: [FG_UNKNOWN] I2C failed to read 0x08, ret:-110
    01-01 00:02:12.062  root   150   150 E fg_read_volt: [FG_UNKNOWN] failed to read VBAT
    

    I²C 读失败后,线程并未退出,而是卡在了等待锁或重复尝试中,造成后续多个线程排队等待同一锁或资源,最终引发系统内多个 kworkerandroid.hardware.* 等线程全部进入 D 状态;

2.3 fg_read_volt函数

static int fg_read_volt(struct bq_fg_chip *bq)
{
	u16 vbat = 0;
	bool retry = false;
	int ret = 0;

retry:
	ret = fg_read_word(bq, bq->regs[BQ_FG_REG_VOLT], &vbat);
	if (ret < 0) {
		if (!retry) {
			retry = true;
			msleep(10);
			goto retry;
		} else {
			fg_err("%s failed to read VBAT\n", bq->log_tag);
			vbat = 4000;
			if (bq->i2c_error_count < 10)
				bq->i2c_error_count++;
		}
	} else {
		if (bq->i2c_error_count > 0)
			bq->i2c_error_count = 0;
	}

	bq->vbat = (int)vbat;

	if (bq->device_name == BQ_FG_BQ28Z610)
		fg_read_cell_voltage(bq);
	else
		bq->cell_voltage[0] = bq->cell_voltage[1] =
			bq->cell_voltage[2] = bq->vbat;

	return ret;
}
  • 当 I²C 总线发生异常(如设备死机、掉电或SCL被拉低),fg_read_word() 返回 -110(超时)。
  • 这时代码会重试一次(这是对的),但问题在于:
    • fg_read_word()内部可能会卡在 I2C 总线等待mutex 锁无法释放
    • 如果驱动没有设好超时保护或有锁未释放,就可能在这里永久阻塞
    • 最终造成整个 workqueue 卡住,后续线程不断堆积,形成“线程风暴”。

三、根本原因

i2c_geni I2C 传输超时(xfer timeout)
  ↓
regmap_raw_read() 卡住或失败返回 -110
  ↓
fg_read_word() 返回失败
  ↓
fg_read_volt() 卡住、或尝试多次失败
  ↓
workqueue(fg_monitor_workfunc)卡住
  ↓
其他模块依赖电池、电源信息也开始排队卡住
  ↓
大量线程进入 D 状态,系统响应缓慢甚至软死机

可能原因(硬/软件并存)

✅ 硬件层面:

  • I²C 总线被锁死(例如 SDA/SCL 拉低,常见于设备掉电或 hang 住);
  • Fuel Gauge 芯片(如 bq28z610)未上电或异常;
  • 电池连接松动/接触不良导致设备未响应;

✅ 软件层面:

  • i2c_geni 驱动在 I²C timeout 后未完全清理通道;
  • 多线程并发访问 I²C 导致调度异常或中断丢失;
  • 某些 I²C 错误后未复位控制器,导致后续 I²C 通信全部失败。