I wanted to create a script that would e-mail me when it detected a failure in a ZFS storage pool, but to test it I needed a failing zpool. You could create a zpool with a USB hub and some USB memory sticks, and then pull them out as you like, but that involves getting up, something I try to reserve for important things like getting a new cup of tea, or a biscuit. Luckily ZFS has the ability to create pool just from files.
I'm doing this on FreeBSD 14.1-RELEASE, but should work on all good operating systems where ZFS1 is available. All these commands are being run as root.
ZFS lets you create a pool using files, as listed in the zpool-create(8) manual page examples; I think this exists purely for testing and experimenting, and I wouldn never suggest doing this for anything important. I want to create various failures that look like the errors I'd seen in my NAS with mirrored drives, so that's the setup that I'm going to simulate.
Create A ZPool
First create two files using truncate(1), each 1GB in size, somewhere temporary (I'll use /mnt/poolfiles).
# mkdir /mnt/poolfiles
# cd /mnt/poolfiles/
# truncate -s 1G file0
# truncate -s 1G file1
# ls -lh
total 1045721
-rw-r--r-- 1 root wheel 1.0G Sep 23 21:21 file0
-rw-r--r-- 1 root wheel 1.0G Sep 23 21:22 file1Then create a new mirrored zpool called 'testpool' and check the status. Make sure you use the full path to the files, otherwise you might get complaints that it can't find those devices, as I think it defaults to looking in the /dev folder, and not the one where the command is being run from. Also check that the pool is running as expected with zpool-status(8) before we start our experiments.
# zpool create testpool mirror /mnt/poolfiles/file0 /mnt/poolfiles/file1
# zpool status testpool
pool: testpool
state: ONLINE
config:
NAME STATE READ WRITE CKSUM
testpool ONLINE 0 0 0
mirror-0 ONLINE 0 0 0
/mnt/poolfiles/file0 ONLINE 0 0 0
/mnt/poolfiles/file1 ONLINE 0 0 0
errors: No known data errorsWe can then create a filesystem in that pool like any other,
# zfs create testfs testpool/testfs
# zfs list -d 2 testpool
NAME USED AVAIL REFER MOUNTPOINT
testpool 504K 832M 96K /testpool
testpool/testfs 96K 832M 96K /testpool/testfsand merrily create files in there too, which we can hash for fun just to prove it's OK afterwards.
# head -c 1M /dev/urandom | uuencode -m test > /testpool/testfs/testfile.txt
# md5 testfile.txt > md5.txtCreating Chaos
The cleanest way to create a degraded status is to remove one of the 'disks' from the pool, using the zpool-offline(8) command:
# zpool offline testpool /mnt/poolfiles/file0zpool status then recognises the issue straight away:
# zpool status testpool
pool: testpool
state: DEGRADED
status: One or more devices has been taken offline by the administrator.
Sufficient replicas exist for the pool to continue functioning in a
degraded state.
action: Online the device using 'zpool online' or replace the device with
'zpool replace'.
config:
NAME STATE READ WRITE CKSUM
testpool DEGRADED 0 0 0
mirror-0 DEGRADED 0 0 0
/mnt/poolfiles/file0 OFFLINE 0 0 0
/mnt/poolfiles/file1 ONLINE 0 0 0
errors: No known data errorsWe can bring it back again by using the zpool-online(8) command:
# zpool online testpool /mnt/poolfiles/file0
# zpool status testpool
pool: testpool
state: ONLINE
scan: resilvered 144K in 00:00:00 with 0 errors on Tue Sep 24 22:52:33 2024
config:
NAME STATE READ WRITE CKSUM
testpool ONLINE 0 0 0
mirror-0 ONLINE 0 0 0
/mnt/poolfiles/file0 ONLINE 0 0 0
/mnt/poolfiles/file1 ONLINE 0 0 0
errors: No known data errorsThis is already enough for my original goal of having an example of a pool that is not "online" that I can test a script against. But now we're learning; let's see what else we can do.
Let's try to 'corrupt' the disk, by writing random data into one of the two disk files. For this we'll use dd(1) to write 10 (count) 1M sized blocks (bs), offset from the start of the file by 2 (seek) block sizes2. In the same directory as the files run:
# dd if=/dev/urandom of=file0 bs=1M count=10 seek=2This doesn't appear to be detected automatically for me, so run zpool-scrub(8), to make sure that our wanton destruction has been noticed, and check the status:
# zpool scrub testpool
# zpool status testpool
pool: testpool
state: DEGRADED
status: One or more devices could not be used because the label is missing or
invalid. Sufficient replicas exist for the pool to continue
functioning in a degraded state.
action: Replace the device using 'zpool replace'.
see: https://openzfs.github.io/openzfs-docs/msg/ZFS-8000-4J
scan: scrub repaired 0B in 00:00:00 with 0 errors on Tue Sep 24 23:07:59 2024
config:
NAME STATE READ WRITE CKSUM
testpool DEGRADED 0 0 0
mirror-0 DEGRADED 0 0 0
/mnt/poolfiles/file0 UNAVAIL 0 0 0 corrupted data
/mnt/poolfiles/file1 ONLINE 0 0 0
errors: No known data errorsThat doesn't look so happy. dd will also have now replaced file0 with a much smaller one, as that is all that we wrote.
# ls -l
total 13617
-rw-r--r-- 1 root wheel 12582912 Sep 24 23:07 file0
-rw-r--r-- 1 root wheel 1073741824 Sep 24 23:06 file1But we can fix that too, with a new file to zpool-replace(8) the badly damaged one:
# truncate -s 1G file2
# ls -l
total 13674
-rw-r--r-- 1 root wheel 12582912 Sep 24 23:07 file0
-rw-r--r-- 1 root wheel 1073741824 Sep 24 23:07 file1
-rw-r--r-- 1 root wheel 1073741824 Sep 24 23:10 file2
# zpool replace testpool /mnt/poolfiles/file0 /mnt/poolfiles/file2
# zpool status testpool
pool: testpool
state: ONLINE
scan: resilvered 2.99M in 00:00:00 with 0 errors on Tue Sep 24 23:11:20 2024
config:
NAME STATE READ WRITE CKSUM
testpool ONLINE 0 0 0
mirror-0 ONLINE 0 0 0
/mnt/poolfiles/file2 ONLINE 0 0 0
/mnt/poolfiles/file1 ONLINE 0 0 0
errors: No known data errorsThe mirror will then automatically resilver itself, which should be quick given the tiny amount of data on it. (Apparently the phrase 'resilvering' comes from old mirrors, that with time needed the silver coating on the plate of glass, that made them reflective, restoring.)
Now I have enough ways to generate a failing pool, I can look at adapting the 404.status-zfs script to also email me only if it finds a problem.
But just one more thing…
Creating Confusion
I ended up spending much more time with ZFS than I expected, which often happens to me when I'm learning. There are a few things that I haven't really understood, and couldn't find an explanation for.
Where's the (z)pool?
The first is that the 'testpool' above, when in full working order, wouldn't automatically re-appear after a reboot, listing the pools only shows the zroot pool, which occupies the whole disk of my laptop:
# zpool list
NAME SIZE ALLOC FREE CKPOINT EXPANDSZ FRAG CAP DEDUP HEALTH ALTROOT
zroot 460G 23.6G 436G - - 1% 5% 1.00x ONLINE -It can be brought back on line using zpool-import(8), but you have to find it first, using the -d flag to specify the directory /mnt/poolfiles where we have the two files.
Just running zpool import should list available pools, but finds nothing, adding the directory then finds it, and if you run the command again with the pool name, the pool is then again 'available'3
# zpool import
no pools available to import
# zpool import -d /mnt/poolfiles
pool: testpool
id: 1173309968352596375
state: ONLINE
action: The pool can be imported using its name or numeric identifier.
config:
testpool ONLINE
mirror-0 ONLINE
/mnt/poolfiles/file2 ONLINE
/mnt/poolfiles/file1 ONLINE
# zpool import -d /mnt/poolfiles testpool
# zpool list
NAME SIZE ALLOC FREE CKPOINT EXPANDSZ FRAG CAP DEDUP HEALTH ALTROOT
testpool 960M 2.14M 958M - - 0% 0% 1.00x ONLINE -
zroot 460G 23.6G 436G - - 1% 5% 1.00x ONLINE -Once it's imported, the ZFS filesystem is also mounted, and we can find our test file in the directory /testpool/testfs.
I'm not really sure why it's not automatically imported and mounted on boot, but I suspect it's to do with the files being on a ZFS filesystem means it has to be imported/mounted first, but the importing/mounting process then doesn't recursively search the newly mounted filesystems for more. I don't have this problem with a pool created from real disks, they appear automatically.
Schrödinger's File
The other odd behaviour is that I couldn't get the pool to react to one of the 'disk files' being deleted. To simulate one of the disks having a catastrophic event, I delete file1, and then check the pool status:
# rm /mnt/poolfiles/file1
# zpool status testpool
pool: testpool
state: ONLINE
config:
NAME STATE READ WRITE CKSUM
testpool ONLINE 0 0 0
mirror-0 ONLINE 0 0 0
/mnt/poolfiles/file0 ONLINE 0 0 0
/mnt/poolfiles/file2 ONLINE 0 0 0
errors: No known data errorsAlso kicking off a scrub doesn't seem to worry it, this was needed for it to notice the damage to the disk, as above, but with the file completely gone, the scrub seems to go ahead without problem:
# zpool scrub testpool
# zpool status testpool
pool: testpool
state: ONLINE
scan: scrub repaired 0B in 00:00:00 with 0 errors on Thu Sep 26 22:30:46 2024
config:
NAME STATE READ WRITE CKSUM
testpool ONLINE 0 0 0
mirror-0 ONLINE 0 0 0
/mnt/poolfiles/file2 ONLINE 0 0 0
/mnt/poolfiles/file1 ONLINE 0 0 0
errors: No known data errorsI also tried running zpool-sync(8), but doesn't cause it to realise something is wrong either, the status still returns as "online". If anyone knows why this is, please let me know. I can get it to start complaining if I offline the missing file, but then I can't online it again because it's not there, but that's a different problem.
One way of making it realise something is wrong is to export the pool, and then try to import it again:
# zpool export testpool
# zpool import -d /mnt/poolfiles/
pool: testpool
id: 1173309968352596375
state: DEGRADED
status: One or more devices are missing from the system.
action: The pool can be imported despite missing or damaged devices. The
fault tolerance of the pool may be compromised if imported.
see: https://openzfs.github.io/openzfs-docs/msg/ZFS-8000-2Q
config:
testpool DEGRADED
mirror-0 DEGRADED
/mnt/poolfiles/file1 UNAVAIL cannot open
/mnt/poolfiles/file2 ONLINEFrom there importing, creating a new 1GB file and running the replace command works as expected, and once it's resilvered, the pool is happy again.
For this behaviour I'm less able to think of a good explanation. Perhaps it's something to do with it being hosted on another ZFS filesystem that is somehow preserving the file in such a way that it appears deleted, but the zpool can still access? (I know snapshots work by just recording the changes in the filesystem, and that's why you can return to an earlier state, as the files aren't really gone; maybe something related to that?).
I ended up going far beyond what I'd intended to do just to create an example for a script; but I've learnt much more about ZFS, and the fact you can use files to do this kind of experimentation is great.
Now it really is time to stand up and get another cup of tea, and maybe even a biscuit.
References
- A Crash Course on ZFS [Internet Archive] 2013-12-04 - Many more examples of what you can do with ZFS, and experimenting with files as the vdevs. Interestingly this also contains an example where he deletes one of the files that were used to create the pool, and after a scrub it sees the issue, which I couldn't replicate.
- Demonstrating ZFS Pool with truncated files [Reddit] 2019-07-03
- Are there any ZFS disk failure simulations? [Reddit] 2020-10-31
- How to Corrupt an archive file in a controlled way [Stack Exchange] 2015-08-10
For those in the audience who speak American English, please read ZFS as "zed eff ess" for the rest of this post, to give it a more international flavour. ↩