AI智能摘要
本文分析了测试过程中出现的ANR问题,通过分析bugreport日志,发现大量内核线程卡在“不可中断睡眠”状态,表明线程正在等待I/O操作。进一步分析发现,问题可能出在电池/充电控制相关驱动上,因为涉及I²C通信和电源管理的模块出现异常。此外,fg_read_volt函数在I²C读失败后,会尝试重试,但可能因为互斥锁或I2C总线问题导致永久阻塞,进而引发系统内多个线程进入D状态。根本原因可能是I²C传输超时导致regmap_raw_read函数卡住或失败,进而导致fg_read_word和fg_read_volt函数卡住或多次失败,最终引发线程风暴。可能的原因包括硬件层面的I²C总线锁死、Fuel Gauge芯片异常、电池连接问题,以及软件层面的I²C驱动问题、多线程并发访问问题、I²C错误处理问题等。
此摘要由AI分析文章内容生成,仅供参考。
一、问题背景
测试过程中出现ANR问题
二、问题分析
2.1 bugreport日志分析
01-01 00:01:36.727 1047 1350 1350 E i2c_geni 4c88000.i2c: I2C xfer timeout: 126
01-01 00:01:39.287 root 150 150 E i2c_geni 4c88000.i2c: I2C xfer timeout: 126
01-01 00:01:41.335 root 150 150 E fg_read_word: [FG_UNKNOWN] I2C failed to read 0x12, ret:-110
01-01 00:01:41.847 1047 1350 1350 E i2c_geni 4c88000.i2c: I2C xfer timeout: 126
01-01 00:01:44.407 root 150 150 E i2c_geni 4c88000.i2c: I2C xfer timeout: 126
01-01 00:01:46.455 root 150 150 E fg_read_word: [FG_UNKNOWN] I2C failed to read 0x0c, ret:-110
01-01 00:01:46.967 1047 1350 1350 E i2c_geni 4c88000.i2c: I2C xfer timeout: 126
01-01 00:01:49.527 root 150 150 E i2c_geni 4c88000.i2c: I2C xfer timeout: 126
01-01 00:01:51.575 root 150 150 E fg_read_word: [FG_UNKNOWN] I2C failed to read 0x0c, ret:-110
01-01 00:01:52.091 1047 1350 1350 E i2c_geni 4c88000.i2c: I2C xfer timeout: 126
01-01 00:01:54.647 root 150 150 E i2c_geni 4c88000.i2c: I2C xfer timeout: 126
01-01 00:01:56.695 root 150 150 E fg_read_word: [FG_UNKNOWN] I2C failed to read 0x06, ret:-110
01-01 00:01:57.207 1047 1350 1350 E i2c_geni 4c88000.i2c: I2C xfer timeout: 126
01-01 00:01:59.767 root 150 150 E i2c_geni 4c88000.i2c: I2C xfer timeout: 126
01-01 00:02:01.815 root 150 150 E fg_read_word: [FG_UNKNOWN] I2C failed to read 0x06, ret:-110
01-01 00:02:02.327 1047 1350 1350 E i2c_geni 4c88000.i2c: I2C xfer timeout: 126
01-01 00:02:04.887 root 150 150 E i2c_geni 4c88000.i2c: I2C xfer timeout: 126
01-01 00:02:06.935 root 150 150 E fg_read_word: [FG_UNKNOWN] I2C failed to read 0x08, ret:-110
01-01 00:02:07.447 1047 1350 1350 E i2c_geni 4c88000.i2c: I2C xfer timeout: 126
01-01 00:02:10.007 root 150 150 E i2c_geni 4c88000.i2c: I2C xfer timeout: 126
01-01 00:02:12.055 root 150 150 E fg_read_word: [FG_UNKNOWN] I2C failed to read 0x08, ret:-110
01-01 00:02:12.062 root 150 150 E fg_read_volt: [FG_UNKNOWN] failed to read VBAT
01-01 00:02:12.567 1047 1350 1350 E i2c_geni 4c88000.i2c: I2C xfer timeout: 126
01-01 00:02:13.807 1000 2454 2454 I sysrq : Show Blocked State
01-01 00:02:13.812 1000 2454 2454 I task : g_reclaim_threa state:D stack:0 pid:122 ppid:2 flags:0x00000008
01-01 00:02:13.812 1000 2454 2454 I task : kworker/4:2 state:D stack:0 pid:150 ppid:2 flags:0x00000008
01-01 00:02:13.812 1000 2454 2454 I task : kworker/6:8 state:D stack:0 pid:702 ppid:2 flags:0x00000008
01-01 00:02:13.812 1000 2454 2454 I : iio_read_channel_processed_scale+0x44/0x20c
01-01 00:02:13.812 1000 2454 2454 I task : kworker/6:9 state:D stack:0 pid:703 ppid:2 flags:0x00000008
01-01 00:02:13.812 1000 2454 2454 I task : kworker/6:11 state:D stack:0 pid:705 ppid:2 flags:0x00000008
01-01 00:02:13.812 1000 2454 2454 I : iio_read_channel_processed_scale+0x158/0x20c
01-01 00:02:13.812 1000 2454 2454 I task : kworker/6:12 state:D stack:0 pid:706 ppid:2 flags:0x00000008
01-01 00:02:13.812 1000 2454 2454 I : iio_read_channel_processed_scale+0x44/0x20c
01-01 00:02:13.813 1000 2454 2454 I task : android.hardwar state:D stack:0 pid:1335 ppid:1 flags:0x04000001
01-01 00:02:13.813 1000 2454 2454 I : iio_read_channel_processed_scale+0x44/0x20c
01-01 00:02:13.813 1000 2454 2454 I task : binder:1350_2 state:D stack:0 pid:1350 ppid:1 flags:0x04000008
01-01 00:02:13.814 1000 2454 2454 I task : vendor.xiaomi.h state:D stack:0 pid:3067 ppid:1 flags:0x04000000
01-01 00:02:13.814 1000 2454 2454 I : iio_read_channel_processed_scale+0x44/0x20c
01-01 00:02:13.814 1000 2454 2454 I task : hvdcp_opti state:D stack:0 pid:2606 ppid:1 flags:0x04000000
01-01 00:02:13.814 1000 2454 2454 I : iio_read_channel_processed_scale+0x44/0x20c
01-01 00:02:15.127 root 150 150 E i2c_geni 4c88000.i2c: I2C xfer timeout: 126
01-01 00:02:17.175 root 150 150 E fg_read_word: [FG_UNKNOWN] I2C failed to read 0x0a, ret:-110
01-01 00:02:17.687 1047 1350 1350 E i2c_geni 4c88000.i2c: I2C xfer timeout: 126
01-01 00:02:20.247 root 705 705 E i2c_geni 4c88000.i2c: I2C xfer timeout: 126
01-01 00:02:22.295 root 705 705 E fg_read_word: [FG_UNKNOWN] I2C failed to read 0x0c, ret:-110
01-01 00:02:22.807 1047 1350 1350 E i2c_geni 4c88000.i2c: I2C xfer timeout: 126
01-01 00:02:25.367 root 705 705 E i2c_geni 4c88000.i2c: I2C xfer timeout: 126
01-01 00:02:27.415 root 705 705 E fg_read_word: [FG_UNKNOWN] I2C failed to read 0x0c, ret:-110
01-01 00:02:27.927 1047 1350 1350 E i2c_geni 4c88000.i2c: I2C xfer timeout: 126
01-01 00:02:30.487 root 703 703 E i2c_geni 4c88000.i2c: I2C xfer timeout: 126
日志显示,有大量内核线程(如 kworker/*、android.hardware.* 等)处于 D 状态,也就是“不可中断睡眠”(uninterruptible sleep)状态。这种状态通常说明线程正在等待 I/O 操作(比如读写硬件),并且不能被打断。
这种情况经常是由于:
- 硬件驱动死锁;
- 某些 I²C 设备响应超时;
- 互斥锁(mutex)争用;
- 或者 设备驱动 bug 导致的。
2.2 关键信息
-
大量线程卡在
__mutex_lock或rt_mutex_lock:__mutex_lock+0x414/0xdec rt_mutex_lock+0x84/0xf8这些是内核中的互斥锁,线程在等待资源释放,如果锁未及时释放就会一直卡住。
-
涉及 I²C 总线相关驱动
i2c_adapter_lock_bus i2c_transfer geni_i2c_xfer [i2c_msm_geni]表示线程在访问 I²C 总线时被阻塞了,可能是某个 I²C 设备没有正确响应,或在高并发访问中出现死锁
-
模块相关
日志中提到了几个相关模块(括号中是驱动模块名):
fg_monitor_workfunc和fg_get_property(bq28z610电池监控)xm_charge_work(xm_smart_chg,智能充电相关)qg_status_change_work(gauge_iio,电池电量管理)status_change_work(qpnp_smb5_main,高通电源管理)
这些模块都跟 电源管理 和 I²C通信 密切相关,说明问题可能出在 电池/充电控制相关驱动 上。
-
问题初始点
01-01 00:02:12.055 root 150 150 E fg_read_word: [FG_UNKNOWN] I2C failed to read 0x08, ret:-110 01-01 00:02:12.062 root 150 150 E fg_read_volt: [FG_UNKNOWN] failed to read VBATI²C 读失败后,线程并未退出,而是卡在了等待锁或重复尝试中,造成后续多个线程排队等待同一锁或资源,最终引发系统内多个
kworker、android.hardware.*等线程全部进入D状态;
2.3 fg_read_volt函数
static int fg_read_volt(struct bq_fg_chip *bq)
{
u16 vbat = 0;
bool retry = false;
int ret = 0;
retry:
ret = fg_read_word(bq, bq->regs[BQ_FG_REG_VOLT], &vbat);
if (ret < 0) {
if (!retry) {
retry = true;
msleep(10);
goto retry;
} else {
fg_err("%s failed to read VBAT\n", bq->log_tag);
vbat = 4000;
if (bq->i2c_error_count < 10)
bq->i2c_error_count++;
}
} else {
if (bq->i2c_error_count > 0)
bq->i2c_error_count = 0;
}
bq->vbat = (int)vbat;
if (bq->device_name == BQ_FG_BQ28Z610)
fg_read_cell_voltage(bq);
else
bq->cell_voltage[0] = bq->cell_voltage[1] =
bq->cell_voltage[2] = bq->vbat;
return ret;
}
- 当 I²C 总线发生异常(如设备死机、掉电或SCL被拉低),
fg_read_word()返回-110(超时)。 - 这时代码会重试一次(这是对的),但问题在于:
fg_read_word()内部可能会卡在 I2C 总线等待或 mutex 锁无法释放。- 如果驱动没有设好超时保护或有锁未释放,就可能在这里永久阻塞;
- 最终造成整个 workqueue 卡住,后续线程不断堆积,形成“线程风暴”。
三、根本原因
i2c_geni I2C 传输超时(xfer timeout)
↓
regmap_raw_read() 卡住或失败返回 -110
↓
fg_read_word() 返回失败
↓
fg_read_volt() 卡住、或尝试多次失败
↓
workqueue(fg_monitor_workfunc)卡住
↓
其他模块依赖电池、电源信息也开始排队卡住
↓
大量线程进入 D 状态,系统响应缓慢甚至软死机
可能原因(硬/软件并存):
✅ 硬件层面:
- I²C 总线被锁死(例如 SDA/SCL 拉低,常见于设备掉电或 hang 住);
- Fuel Gauge 芯片(如
bq28z610)未上电或异常; - 电池连接松动/接触不良导致设备未响应;
✅ 软件层面:
i2c_geni驱动在 I²C timeout 后未完全清理通道;- 多线程并发访问 I²C 导致调度异常或中断丢失;
- 某些 I²C 错误后未复位控制器,导致后续 I²C 通信全部失败。