XFS not recovering space on read-only filesystems following a system crash on RHEL
Environment
- Red Hat Enterprise Linux 7
- kernel versions prior to kernel-3.10.0-862.el7
- XFS filesystems
Issue
-
After a system crash space allocated is not being reclaimed on my root filesystem.
-
After a system crash xfs_repair -nv returns -1 indicating corruption was detected, shouldn't log recovery prevent this?
root@rhel7 ~]# xfs_repair -nv $IMAGE Phase 1 - find and verify superblock... - block cache size set to 89000 entries Phase 2 - using internal log - zero log... zero_log: head block 101 tail block 101 - scan filesystem freespace and inode maps... agi unlinked bucket 3 is 67 in ag 0 (inode=67) - found root inode chunk Phase 3 - for each AG... ... Phase 6 - check inode connectivity... - traversing filesystem ... - agno = 0 - agno = 1 - agno = 2 - agno = 3 - traversal finished ... - moving disconnected inodes to lost+found ... disconnected inode 67, would move to lost+found Phase 7 - verify link counts... would have reset inode 67 nlinks from 0 to 1 No modify flag set, skipping filesystem flush and exiting.
Resolution
This issue was resolved in kernel-3.10.0-862.el7 via errata RHSA-2018:1062
Root Cause
XFS log recovery was not processing unlinked inodes on read-only mounts.
The boot process initially mounts the root filesystem read-only prior to switching to read-write, meaning that root filesystems are most effected.
Diagnostic Steps
[root@rhel7 ~]# uname -r
3.10.0-514.el7.x86_64
[root@rhel7 ~]# IMAGE=$(mktemp /tmp/xfs.image.XXX)
[root@rhel7 ~]# truncate -s 1G $IMAGE
[root@rhel7 ~]# mkfs.xfs $IMAGE
meta-data=/tmp/xfs.image.8as isize=512 agcount=4, agsize=65536 blks
= sectsz=512 attr=2, projid32bit=1
= crc=1 finobt=0, sparse=0
data = bsize=4096 blocks=262144, imaxpct=25
= sunit=0 swidth=0 blks
naming =version 2 bsize=4096 ascii-ci=0 ftype=1
log =internal log bsize=4096 blocks=2560, version=2
= sectsz=512 sunit=0 blks, lazy-count=1
realtime =none extsz=4096 blocks=0, rtextents=0
[root@rhel7 ~]# mount $IMAGE /mnt
[root@rhel7 ~]# df -h /mnt
Filesystem Size Used Avail Use% Mounted on
/dev/loop0 1014M 33M 982M 4% /mnt
[root@rhel7 ~]# fallocate -l 800M /mnt/a_file
[root@rhel7 ~]# df -h /mnt/a_file
Filesystem Size Used Avail Use% Mounted on
/dev/loop0 1014M 833M 182M 83% /mnt
[root@rhel7 ~]# tail -f /mnt/a_file > /dev/null &
[1] 3488
[root@rhel7 ~]# jobs
[1]+ Running tail -f /mnt/a_file > /dev/null &
[root@rhel7 ~]# rm -f /mnt/a_file
[root@rhel7 ~]# ls /mnt
[root@rhel7 ~]#
If lsof is installed the file can be seen as still open, but in a deleted state.
[root@rhel7 ~]# lsof /mnt
COMMAND PID USER FD TYPE DEVICE SIZE/OFF NODE NAME
tail 3686 root 3r REG 7,0 838860800 67 /mnt/a_file (deleted)
[root@rhel7 ~]#
Force a shutdown of the filesystem, this simulates a crash or other unclean shutdown.
[root@rhel7 ~]# xfs_io -x -c "shutdown -f" /mnt
[root@rhel7 ~]# umount /mnt
umount: /mnt: target is busy.
(In some cases useful info about processes that use
the device is found by lsof(8) or fuser(1))
[root@rhel7 ~]# kill %1
[root@rhel7 ~]# umount /mnt
[1]+ Terminated tail -f /mnt/a_file > /dev/null
Mount the image read-only.
Note mount can usually be used to create the loop device automatically, but if the read-only option is used the loop device will be created as readonly, preventing log recovery from taking place.
[root@rhel7 ~]# losetup --find --show $IMAGE
/dev/loop0
[root@rhel7 ~]# mount -o ro /dev/loop0 /mnt
[root@rhel7 ~]# mount -o remount,rw /dev/loop0 /mnt
[root@rhel7 ~]# dmesg | tail -3
[ 490.130228] XFS (loop0): Mounting V5 Filesystem
[ 490.158399] XFS (loop0): Starting recovery (logdev: internal)
[ 490.158866] XFS (loop0): Ending recovery (logdev: internal)
[root@rhel7 ~]# df -h /mnt
Filesystem Size Used Avail Use% Mounted on
/dev/loop0 1014M 801M 214M 79% /mnt
[root@rhel7 ~]# ls /mnt
[root@rhel7 ~]# umount /mnt
[root@rhel7 ~]# xfs_repair -nv $IMAGE
Phase 1 - find and verify superblock...
- block cache size set to 89000 entries
Phase 2 - using internal log
- zero log...
zero_log: head block 101 tail block 101
- scan filesystem freespace and inode maps...
agi unlinked bucket 3 is 67 in ag 0 (inode=67)
- found root inode chunk
Phase 3 - for each AG...
- scan (but don't clear) agi unlinked lists...
- process known inodes and perform inode discovery...
- agno = 0
- agno = 1
- agno = 2
- agno = 3
- process newly discovered inodes...
Phase 4 - check for duplicate blocks...
- setting up duplicate extent list...
- check for inodes claiming duplicate blocks...
- agno = 0
- agno = 1
- agno = 2
- agno = 3
No modify flag set, skipping phase 5
Phase 6 - check inode connectivity...
- traversing filesystem ...
- agno = 0
- agno = 1
- agno = 2
- agno = 3
- traversal finished ...
- moving disconnected inodes to lost+found ...
disconnected inode 67, would move to lost+found
Phase 7 - verify link counts...
would have reset inode 67 nlinks from 0 to 1
No modify flag set, skipping filesystem flush and exiting.
XFS_REPAIR Summary Thu Jan 10 00:33:42 2019
Phase Start End Duration
Phase 1: 01/10 00:33:42 01/10 00:33:42
Phase 2: 01/10 00:33:42 01/10 00:33:42
Phase 3: 01/10 00:33:42 01/10 00:33:42
Phase 4: 01/10 00:33:42 01/10 00:33:42
Phase 5: Skipped
Phase 6: 01/10 00:33:42 01/10 00:33:42
Phase 7: 01/10 00:33:42 01/10 00:33:42
Total run time:
[root@rhel7 ~]# echo $?
1
[root@rhel7 ~]# man xfs_repair
...
xfs_repair -n (no modify node) will return a status of 1 if
filesystem corruption was detected and 0 if no filesystem corrup‐
tion was detected. xfs_repair run without the -n option will
always return a status code of 0.
Running without -n moves the unlinked file to lost+found allowing it to be manually removed after mounting.
[root@rhel7 ~]# xfs_repair $IMAGE
...
Phase 6 - check inode connectivity...
- resetting contents of realtime bitmap and summary inodes
- traversing filesystem ...
- traversal finished ...
- moving disconnected inodes to lost+found ...
disconnected inode 67, moving to lost+found
Phase 7 - verify and correct link counts...
done
[root@rhel7 ~]# mount $IMAGE /mnt
[root@rhel7 ~]# df -h /mnt
Filesystem Size Used Avail Use% Mounted on
/dev/loop1 1014M 833M 182M 83% /mnt
[root@rhel7 ~]# ls -lh /mnt/lost+found/
total 800M
-rw-r--r--. 1 root root 800M Jan 10 00:32 67
[root@rhel7 ~]# rm -f /mnt/lost+found/67
This solution is part of Red Hat’s fast-track publication program, providing a huge library of solutions that Red Hat engineers have created while supporting our customers. To give you the knowledge you need the instant it becomes available, these articles may be presented in a raw and unedited form.
Comments