Metadata Corruption detected

jlrosssc · Post by **jlrosssc** » 2024/03/23 14:54:04

I know this is several years old but I've run into the same issue but Centos 7 is installed on a RAID. This says "against your root filesystem device (may /dev/mapper/vg-root or some such name)." but I'm not sure what the root filesystem devices in on the RAID. I have /dev/mapper/control listed but running xfs_repair -f against that does not work. Any advice on identifying the root filesystem device with a RAID?

Post by **TrevorH** » 2024/03/23 17:44:29

I split this away from its parent topic and moved it to the CentOS 7 section. The original was posted as a reply to viewtopic.php?f=54&t=73935

Your root filesystem is whatever gets mounted on / when you boot normally. If you only have /dev/mapper/control in your /dev/mapper directory then try running `vgchange -ay` to activate all VG's and LV's before you look. If you did not install using LVM and you have a plain partition holding your root filesystem then you need to run it against that instead. Also "RAID" can be hardware - in which case the resulting RAID array is presented to the operating system as a standard device, usually /dev/sda - or it can be software RAID using mdadm. In the latter case it's quite unlikely that you have / on a separate device as it needs special steps done at install time to make that happen.

jlrosssc · Post by **jlrosssc** » 2024/03/24 00:44:43

This is on an out of town server and I had some screen shots sent. Does this give any information that you can offer some guidance on how to proceed? I've tried xfs_repair -f but am not sure where to point. It references dm-0 but I'm not familiar what that designation on one of the raid drives. It is a Centos software raid. The message is below. A screenshot was too large to attach:

XFS (dm-0): Metadata corruption detected at _xfs_dir3_data_check+0xxxxxxxx
XFS (dm-0): Unmount and run xfs_repair
XFS (dm-0): First 128 bytes of corrupted metadata buffer:

xxxxxx

Does that offer anything to provide some guidance?

Post by **TrevorH** » 2024/03/24 01:18:16

XFS (dm-0): Unmount and run xfs_repair

So it's currently mounted so you don't want to fsck that while it is or it may be damaged. It's likely that dm-0 is your root filesystem. You need to boot in either single user mode with it mounted ro, or you need to boot a rescue environment and run it from there with it not mounted at all. This is not something that can be done remotely unless this is a server with a remote console facility (e.g. Dell iDRAC, HP iLO). And on CentOS 7 you cannot make it run at boot time using /.forcefsck as fsck.xfs is a shell script that does nothing. So you'll either need a pair of remote hands to do the typing and reading for you or you'll need to be in front of it. If it's in a data center then it's possible that the DC staff may be able to set up a remote IP KVM connected to the hardware.

Code: Select all

[trevor@centos7 ~]$ ls -la /dev/mapper/
total 0
drwxr-xr-x.  2 root root     100 Feb 25 15:03 .
drwxr-xr-x. 19 root root    3160 Feb 25 15:03 ..
lrwxrwxrwx.  1 root root       7 Feb 25 15:03 centos-root -> ../dm-0
lrwxrwxrwx.  1 root root       7 Feb 25 15:03 centos-swap -> ../dm-1
crw-------.  1 root root 10, 236 Feb 25 15:03 control
[/code

jlrosssc · Post by **jlrosssc** » 2024/03/24 01:52:33

I have remote hands there. It is currently booted from a Centos Live CD in troubleshooting mode. Can I safely run fsck from there?

jlrosssc · Post by **jlrosssc** » 2024/03/24 12:43:52

I am by no means an expert on this. This server is very important as a personal file server that I set up and keep updated but you are not replying back to a Linux Admin at all so this may seem somewhat elementary. The 4 SCSI drives are /dev/sg0-sg3. I've had there person at the remote location boot from Centos 7 Live and to to Troubleshooting. The drives are visible when listing /dev and I've had them start by running xfs_repair /dev/sg0. After many hours (2TB drives), it came back and said cannot open /dev/sg0: no such device.

Any advice?

Post by **TrevorH** » 2024/03/24 13:34:25

Yes, to fsck it you will need to assemble the mdadm array first. Something like

mdadm --assemble /dev/md0 /dev/sda1 /dev/sdb1 /dev/sdc1 /devsdd1

You will probably need to change the device names to the ones in use by your disk drives and partitions (/dev/sda1 being the first partition on the first physical disk drive, /dev/sda2 being the 2nd partition on the same drive etc). The /dev/sg* device names are not the correct ones to use - those are special devices names for the disks but used for sending SCSI commands direct to them. You need to look for /dev/sd* devices.

Once you have the mdadm device assembled it will then show up in /proc/mdadm which is a kernel file which shows you the status of RAID arrays. Now you will need to run pvscan and vgchange -ay to get the booted system to see and activate the LVM devices. Now you should have /dev/mapper/$volumegroup-$lvname entries that you can use to run fsck against.

jlrosssc · Post by **jlrosssc** » 2024/03/24 14:26:07

OK. I'll work through that advice. Thanks!

Post by **TrevorH** » 2024/03/24 16:16:17

Please keep this in the forum where it may be of use to others at a later date.

In the mdadm --assemble command, /dev/md0 is the new mdadm device that you wish it to create and the devices that follow that in the command line are the individual disk partitions that go to make up that array. If you do not know which devices are part of the array then I would suggest running

mdadm --examine /dev/sda1

against each possible partition on the disks that you think make up the array. That should return output similar to

Code: Select all

[root@trevor4 ~]# mdadm --examine /dev/sda1
/dev/sda1:
          Magic : a92b4efc
        Version : 1.2
    Feature Map : 0x1
     Array UUID : b2dbfe57:67c00875:a7ed581d:c31b856e
           Name : trevor4.trevor.local:0  (local to host trevor4.trevor.local)
  Creation Time : Wed May  4 12:09:12 2022
     Raid Level : raid10
   Raid Devices : 2

 Avail Dev Size : 11720777728 sectors (5.46 TiB 6.00 TB)
     Array Size : 5860388864 KiB (5.46 TiB 6.00 TB)
    Data Offset : 264192 sectors
   Super Offset : 8 sectors
   Unused Space : before=264112 sectors, after=0 sectors
          State : clean
    Device UUID : 18b4675f:8ee44e89:491663dc:13c1941e

Internal Bitmap : 8 sectors from superblock
    Update Time : Sun Mar 24 09:34:53 2024
  Bad Block Log : 512 entries available at offset 32 sectors
       Checksum : 17081a3d - correct
         Events : 14316

         Layout : near=2
     Chunk Size : 512K

   Device Role : Active device 0
   Array State : AA ('A' == active, '.' == missing, 'R' == replacing)

If that partition is not an array member that command will tell you

mdadm: No md superblock detected on /dev/sdc1.

From the mdadm --examine command output you can also take the "Array UUID" and use that in a revised `mdadm --assemble -u $uuid` (plugging the correct uuid in there). That should detect the component devices and assemble them all into the newly created /dev/md0 device.

jlrosssc · Post by **jlrosssc** » 2024/03/24 16:45:13

Again, much appreciated. I’ll update after we work throughout go this.

CentOS

Metadata Corruption detected

Metadata Corruption detected

Re: Metadata Corruption detected

Re: Metadata Corruption detected

Re: Metadata Corruption detected

Re: Metadata Corruption detected

Re: Metadata Corruption detected

Re: Metadata Corruption detected

Re: Metadata Corruption detected

Re: Metadata Corruption detected

Re: Metadata Corruption detected