RAID6 using Linux Logical Volume Manager
Build a RAID6 LVM volume, then test failure and replacement of one (or two) of the drives
While learning about different RAID levels and their configurations, I wanted to mess with the RAID abilities of the Linux Logical Volume Manager (LVM), in my case using the Debian lvm2 package in SparkyLinux on a virtual machine setup using VirtualBox.
In real-life scenarios you should also (rather?) consider making use of hardware-based RAID, MD-based RAID or even DRBD setups!
Before you jump into following the steps described here, please make sure you understand the LVM basics (PVs, VGs, LVs, and the related basic commands).
Most commands in this experiment need root privileges.
Resources:
Build the logical RAID6
Prepare all physical disks
-
cfdisk {device}, e.g. cfdisk /dev/sdb
- create a partition filling the whole drive,
- define its type as “Linux LVM”
-
pvcreate {partition/device}, e.g. pvcreate /dev/sdb1
Create a volume group and the logical volume
-
vgcreate {volume group, e.g. vg1} {list of pvs}, e.g. vgcreate /dev/sdb1 /dev/sdc1 /dev/sdd1 /dev/sde1 /dev/sdf1
-
lvcreate --type raid6 -l100%vg -n{name} {volume group}, e.g. lvcreate --type raid6 -l100%vg -n data6 vg1
Create a filesystem on the logical volume
-
mke2fs -t ext4 -L {FS label, e.g. data} {device, e.g. /dev/vg1/data6}, e.g. mke2fs -t ext4 -L data /dev/vg1/data6
Mount & use the logical volume:
mkdir -p /mnt/data
mount /dev/mapper/vg1-data6 /mnt/data
Optional: Change ownership and privileges inside the new file system:
chown -R root:users /mnt/data
chmod -R 770 /mnt/data
Check the perfect health of your new system:
lvs -o name,lv_health_status,sync_percent
Start something with your file system
dd if=/dev/urandom of=/mnt/data/random.dat bs=1M
(You might cancel / stop this process at some point, unless you really need to fill the whole file system)
In parallel, you can observe I/O performance (in another shell), if you like:
iotop
Take down one of the physical disks
You should consider doing this while random data writing is going on!
echo 1 > /sys/block/{device}/device/delete, e.g.
echo 1 > /sys/block/sde/device/delete
Check the not-so-perfect system health at this point:
lvs -o name,lv_health_status
Health of the logical volume should now be “partial”, as described in the lmvraid man page.
At this time, kill the dd
process and shut down the system.
Replace the disk you had taken down with an empty / fresh / new one.
Rebuild the logical RAID6 system
Re-activate the logical volume manually after reboot (maybe it is not automatically activated in the “partial” state):
-
lvchange -ay {logical volume}, e.g. lvchange -ay /dev/vg1/data6
Prepare the replacement drive
-
cfdisk /dev/sdf (see above) pvcreate /dev/sdf1
- Add it to the volume group:
vgextend {volume group} {physical volume}, e.g. vgextend {vg1} {/dev/sdf1}
Show the current configuration:
pvscan / vgscan / vgdisplay {volume group}
Repair the RAID6 system
-
lvconvert --repair {logical volume, e.g. /dev/vg1/data6} {physical volume, e.g. /dev/sdf1}, e.g. lvconvert --repair /dev/vg1/data6 /dev/sdf1
- Confirm: Attempt to replace failed RAID images (requires full drive sync)?
This starts a background rebuild of the RAID6, which you can watch using:
lvs -o name,lv_health_status,sync_percent
-
vgreduce --removemissing {volume group}, e.g. vgreduce --removemissing vg1
Congrats, after rebuild you are back to a fully functioning RAID6 system.