arm linux "r",ARM Linux

Update: Friday 8 January 2021 - Fixed!

Since kernel version 5.4, my Aarch64 systems have become very unreliable,

requiring regular reboots to keep them working. Worringly, symptoms have

so far pointed towards filesystem data corruption, which results in the

root filesystem being marked read-only. This normally results in something

like one of these messages:

EXT4-fs error (device nvme0n1p2): ext4_lookup:1707: inode #271688: comm mandb: iget: checksum invalid

[7478798.720368] EXT4-fs error (device mmcblk0p1): ext4_lookup:1707: inode #157096: comm mandb: iget: checksum invalid

EXT4-fs error (device mmcblk0p1): ext4_lookup:1707: inode #173544: comm mandb: iget: checksum invalid

[365750.234472] EXT4-fs error (device mmcblk0p1): ext4_lookup:1707: inode #166384: comm mandb: iget: checksum invalid

[4175456.231948] EXT4-fs error (device mmcblk0p1): htree_dirblock_to_tree:1004: inode #396582: comm find: Directory block failed checksum

The result is the journal is aborted, the rootfs is marked read only.

The known facts so far:it has not been seen on kernel 5.2 on Armada 8040 hardware

(with an uptime of 560 days).

it has been seen on all mainline kernel versions from 5.4 to 5.9.

it occurs on several of my Armada 8040 and NXP LX2160A based systems,

which are both Cortex-A72 based systems. I have all the errata enabled

in the kernel.

it seems independent of the media; it has been seen on the rootfs of

two different NVMes on two different platforms, uSD, and eMMC.

it occurs between a week and three months, which makes attempting a

bisection of the changes between 5.2 to 5.4 infeasible.

I've run xfstests (as suggested by tytso) on the LX2160A and

generic/531 triggered the inode checksum error.

Investigation with debugfs sometimes shows that the inode checksum

is invalid, but if the block device is flushed (via hdparm) and re-read

from the media, the inode checksum is then correct. This implies that the

data in memory/CPU caches does not match the data on the media, especially

when the inode has not changed for days.

Below is a log of some of the recent instances:

29th February 2020

Error: [73729.556544] EXT4-fs error (device nvme0n1p2): ext4_lookup:1700: inode #917524: comm rm: iget: checksum invalid

Platform: NXP LX2160A

Media: XPG SX8200PNP NVMe

Kernel: 5.5

Uptime: 20 hours

Inode #917524 was /var/backups/dpkg.status.6.gz.

Running e2fsck -n /dev/nvme0n1p2 without rebooting showed that the

checksum was incorrect, so further investigation with debugfs was

warranted:debugfs: id <917524>

0000 a481 0000 30ff 0300 3d3d 465e bd77 4f5e ....0...==F^.wO^

0020 29ca 345e 0000 0000 0000 0100 0002 0000 ).4^............

0040 0000 0800 0100 0000 0af3 0100 0400 0000 ................

0060 0000 0000 0000 0000 4000 0000 c088 3800 ........@.....8.

0100 0000 0000 0000 0000 0000 0000 0000 0000 ................

*

0140 0000 0000 5fc4 cfb4 0000 0000 0000 0000 ...._...........

0160 0000 0000 0000 0000 0000 0000 af23 0000 .............#..

0200 2000 1cc3 ac95 c9c8 a4d2 9883 583e addf ...........X>..

0220 3de0 485e b04d 7151 0000 0000 0000 0000 =.H^.MqQ........

0240 0000 0000 0000 0000 0000 0000 0000 0000 ................

*

debugfs: stat <917524>

Inode: 917524 Type: regular Mode: 0644 Flags: 0x80000

Generation: 3033515103 Version: 0x00000000:00000001

User: 0 Group: 0 Project: 0 Size: 261936

File ACL: 0

Links: 1 Blockcount: 512

Fragment: Address: 0 Number: 0 Size: 0

ctime: 0x5e4f77bd:c8c995ac -- Fri Feb 21 06:25:01 2020

atime: 0x5e463d3d:dfad3e58 -- Fri Feb 14 06:25:01 2020

mtime: 0x5e34ca29:8398d2a4 -- Sat Feb 1 00:45:29 2020

crtime: 0x5e48e03d:51714db0 -- Sun Feb 16 06:25:01 2020

Size of extra inode fields: 32

Inode checksum: 0xc31c23af

EXTENTS:

(0-63):3705024-3705087

This is, as I remember, operating on the in-memory data rather than

the on-disk data, and the inode checksum of 0xc31c23af was incorrect.

I corrected the checksum using debugfs "sif" command, which wrote a

corrected checksum. This resulted in:

debugfs: id <917524>

0000 a481 0000 30ff 0300 3d3d 465e bd77 4f5e ....0...==F^.wO^

0020 29ca 345e 0000 0000 0000 0100 0002 0000 ).4^............

0040 0000 0800 0100 0000 0af3 0100 0400 0000 ................

0060 0000 0000 0000 0000 4000 0000 c088 3800 ........@.....8.

0100 0000 0000 0000 0000 0000 0000 0000 0000 ................

*

0140 0000 0000 5fc4 cfb4 0000 0000 0000 0000 ...._...........

0160 0000 0000 0000 0000 0000 0000 b61f 0000 ................

^^^^

0200 2000 aa15 ac95 c9c8 a4d2 9883 583e addf ...........X>..

^^^^

0220 3de0 485e b04d 7151 0000 0000 0000 0000 =.H^.MqQ........

0240 0000 0000 0000 0000 0000 0000 0000 0000 ................

*

With only that change, e2fsck then passed:

e2fsck -n /dev/nvme0n

  • 0
    点赞
  • 0
    收藏
    觉得还不错? 一键收藏
  • 0
    评论

“相关推荐”对你有帮助么?

  • 非常没帮助
  • 没帮助
  • 一般
  • 有帮助
  • 非常有帮助
提交
评论
添加红包

请填写红包祝福语或标题

红包个数最小为10个

红包金额最低5元

当前余额3.43前往充值 >
需支付:10.00
成就一亿技术人!
领取后你会自动成为博主和红包主的粉丝 规则
hope_wisdom
发出的红包
实付
使用余额支付
点击重新获取
扫码支付
钱包余额 0

抵扣说明:

1.余额是钱包充值的虚拟货币,按照1:1的比例进行支付金额的抵扣。
2.余额无法直接购买下载,可以购买VIP、付费专栏及课程。

余额充值