Simulating ZFS Failures

Sep 27, 2024 · 2089 words · 10 minute read

I wanted to create a script that would e-mail me when it detected a failure in a ZFS storage pool, but to test it I needed a failing zpool. You could create a zpool with a USB hub and some USB memory sticks, and then pull them out as you like, but that involves getting up, something I try to reserve for important things like getting a new cup of tea, or a biscuit. Luckily ZFS has the ability to create pool just from files.

I'm doing this on FreeBSD 14.1-RELEASE, but should work on all good operating systems where ZFS¹ is available. All these commands are being run as root.

ZFS lets you create a pool using files, as listed in the zpool-create(8) manual page examples; I think this exists purely for testing and experimenting, and I wouldn never suggest doing this for anything important. I want to create various failures that look like the errors I'd seen in my NAS with mirrored drives, so that's the setup that I'm going to simulate.

Create A ZPool

First create two files using truncate(1), each 1GB in size, somewhere temporary (I'll use /mnt/poolfiles).

# mkdir /mnt/poolfiles
# cd /mnt/poolfiles/
# truncate -s 1G file0
# truncate -s 1G file1
# ls -lh
total 1045721
-rw-r--r--  1 root wheel  1.0G Sep 23 21:21 file0
-rw-r--r--  1 root wheel  1.0G Sep 23 21:22 file1

Then create a new mirrored zpool called 'testpool' and check the status. Make sure you use the full path to the files, otherwise you might get complaints that it can't find those devices, as I think it defaults to looking in the /dev folder, and not the one where the command is being run from. Also check that the pool is running as expected with zpool-status(8) before we start our experiments.

# zpool create testpool mirror /mnt/poolfiles/file0 /mnt/poolfiles/file1
# zpool status testpool
  pool: testpool
 state: ONLINE
config:

        NAME                      STATE     READ WRITE CKSUM
        testpool                  ONLINE       0     0     0
          mirror-0                ONLINE       0     0     0
            /mnt/poolfiles/file0  ONLINE       0     0     0
            /mnt/poolfiles/file1  ONLINE       0     0     0

errors: No known data errors

We can then create a filesystem in that pool like any other,

# zfs create testfs testpool/testfs
# zfs list -d 2 testpool
NAME              USED  AVAIL  REFER  MOUNTPOINT
testpool          504K   832M    96K  /testpool
testpool/testfs    96K   832M    96K  /testpool/testfs

and merrily create files in there too, which we can hash for fun just to prove it's OK afterwards.

# head -c 1M /dev/urandom | uuencode -m test > /testpool/testfs/testfile.txt
# md5 testfile.txt > md5.txt

Creating Chaos

The cleanest way to create a degraded status is to remove one of the 'disks' from the pool, using the zpool-offline(8) command:

# zpool offline testpool /mnt/poolfiles/file0

zpool status then recognises the issue straight away:

# zpool status testpool
  pool: testpool
 state: DEGRADED
status: One or more devices has been taken offline by the administrator.
        Sufficient replicas exist for the pool to continue functioning in a
        degraded state.
action: Online the device using 'zpool online' or replace the device with
        'zpool replace'.
config:

        NAME                      STATE     READ WRITE CKSUM
        testpool                  DEGRADED     0     0     0
          mirror-0                DEGRADED     0     0     0
            /mnt/poolfiles/file0  OFFLINE      0     0     0
            /mnt/poolfiles/file1  ONLINE       0     0     0

errors: No known data errors

We can bring it back again by using the zpool-online(8) command:

# zpool online testpool /mnt/poolfiles/file0
# zpool status testpool
  pool: testpool
 state: ONLINE
  scan: resilvered 144K in 00:00:00 with 0 errors on Tue Sep 24 22:52:33 2024
config:

        NAME                      STATE     READ WRITE CKSUM
        testpool                  ONLINE       0     0     0
          mirror-0                ONLINE       0     0     0
            /mnt/poolfiles/file0  ONLINE       0     0     0
            /mnt/poolfiles/file1  ONLINE       0     0     0

errors: No known data errors

This is already enough for my original goal of having an example of a pool that is not "online" that I can test a script against. But now we're learning; let's see what else we can do.

Let's try to 'corrupt' the disk, by writing random data into one of the two disk files. For this we'll use dd(1) to write 10 (count) 1M sized blocks (bs), offset from the start of the file by 2 (seek) block sizes². In the same directory as the files run:

# dd if=/dev/urandom of=file0 bs=1M count=10 seek=2

This doesn't appear to be detected automatically for me, so run zpool-scrub(8), to make sure that our wanton destruction has been noticed, and check the status:

# zpool scrub testpool
# zpool status testpool
  pool: testpool
 state: DEGRADED
status: One or more devices could not be used because the label is missing or
        invalid.  Sufficient replicas exist for the pool to continue
        functioning in a degraded state.
action: Replace the device using 'zpool replace'.
   see: https://openzfs.github.io/openzfs-docs/msg/ZFS-8000-4J
  scan: scrub repaired 0B in 00:00:00 with 0 errors on Tue Sep 24 23:07:59 2024
config:

        NAME                      STATE     READ WRITE CKSUM
        testpool                  DEGRADED     0     0     0
          mirror-0                DEGRADED     0     0     0
            /mnt/poolfiles/file0  UNAVAIL      0     0     0  corrupted data
            /mnt/poolfiles/file1  ONLINE       0     0     0

errors: No known data errors

That doesn't look so happy. dd will also have now replaced file0 with a much smaller one, as that is all that we wrote.

# ls -l
total 13617
-rw-r--r--  1 root wheel   12582912 Sep 24 23:07 file0
-rw-r--r--  1 root wheel 1073741824 Sep 24 23:06 file1

But we can fix that too, with a new file to zpool-replace(8) the badly damaged one:

# truncate -s 1G file2
# ls -l
total 13674
-rw-r--r--  1 root wheel   12582912 Sep 24 23:07 file0
-rw-r--r--  1 root wheel 1073741824 Sep 24 23:07 file1
-rw-r--r--  1 root wheel 1073741824 Sep 24 23:10 file2
# zpool replace testpool /mnt/poolfiles/file0 /mnt/poolfiles/file2
# zpool status testpool
  pool: testpool
 state: ONLINE
  scan: resilvered 2.99M in 00:00:00 with 0 errors on Tue Sep 24 23:11:20 2024
config:

        NAME                      STATE     READ WRITE CKSUM
        testpool                  ONLINE       0     0     0
          mirror-0                ONLINE       0     0     0
            /mnt/poolfiles/file2  ONLINE       0     0     0
            /mnt/poolfiles/file1  ONLINE       0     0     0

errors: No known data errors

The mirror will then automatically resilver itself, which should be quick given the tiny amount of data on it. (Apparently the phrase 'resilvering' comes from old mirrors, that with time needed the silver coating on the plate of glass, that made them reflective, restoring.)

Now I have enough ways to generate a failing pool, I can look at adapting the 404.status-zfs script to also email me only if it finds a problem.

But just one more thing…

Creating Confusion

I ended up spending much more time with ZFS than I expected, which often happens to me when I'm learning. There are a few things that I haven't really understood, and couldn't find an explanation for.

Where's the (z)pool?

The first is that the 'testpool' above, when in full working order, wouldn't automatically re-appear after a reboot, listing the pools only shows the zroot pool, which occupies the whole disk of my laptop:

# zpool list
NAME    SIZE  ALLOC   FREE  CKPOINT  EXPANDSZ   FRAG    CAP  DEDUP    HEALTH  ALTROOT
zroot   460G  23.6G   436G        -         -     1%     5%  1.00x    ONLINE  -

It can be brought back on line using zpool-import(8), but you have to find it first, using the -d flag to specify the directory /mnt/poolfiles where we have the two files.

Just running zpool import should list available pools, but finds nothing, adding the directory then finds it, and if you run the command again with the pool name, the pool is then again 'available'³

# zpool import
no pools available to import
# zpool import -d /mnt/poolfiles
   pool: testpool
     id: 1173309968352596375
  state: ONLINE
 action: The pool can be imported using its name or numeric identifier.
 config:

        testpool                  ONLINE
          mirror-0                ONLINE
            /mnt/poolfiles/file2  ONLINE
            /mnt/poolfiles/file1  ONLINE
# zpool import -d /mnt/poolfiles testpool
# zpool list
NAME       SIZE  ALLOC   FREE  CKPOINT  EXPANDSZ   FRAG    CAP  DEDUP    HEALTH  ALTROOT
testpool   960M  2.14M   958M        -         -     0%     0%  1.00x    ONLINE  -
zroot      460G  23.6G   436G        -         -     1%     5%  1.00x    ONLINE  -

Once it's imported, the ZFS filesystem is also mounted, and we can find our test file in the directory /testpool/testfs.

I'm not really sure why it's not automatically imported and mounted on boot, but I suspect it's to do with the files being on a ZFS filesystem means it has to be imported/mounted first, but the importing/mounting process then doesn't recursively search the newly mounted filesystems for more. I don't have this problem with a pool created from real disks, they appear automatically.

Schrödinger's File

The other odd behaviour is that I couldn't get the pool to react to one of the 'disk files' being deleted. To simulate one of the disks having a catastrophic event, I delete file1, and then check the pool status:

# rm /mnt/poolfiles/file1
# zpool status testpool
  pool: testpool
 state: ONLINE
config:

        NAME                      STATE     READ WRITE CKSUM
        testpool                  ONLINE       0     0     0
          mirror-0                ONLINE       0     0     0
            /mnt/poolfiles/file0  ONLINE       0     0     0
            /mnt/poolfiles/file2  ONLINE       0     0     0

errors: No known data errors

Also kicking off a scrub doesn't seem to worry it, this was needed for it to notice the damage to the disk, as above, but with the file completely gone, the scrub seems to go ahead without problem:

# zpool scrub testpool
# zpool status testpool
  pool: testpool
 state: ONLINE
  scan: scrub repaired 0B in 00:00:00 with 0 errors on Thu Sep 26 22:30:46 2024
config:

        NAME                      STATE     READ WRITE CKSUM
        testpool                  ONLINE       0     0     0
          mirror-0                ONLINE       0     0     0
            /mnt/poolfiles/file2  ONLINE       0     0     0
            /mnt/poolfiles/file1  ONLINE       0     0     0

errors: No known data errors

I also tried running zpool-sync(8), but doesn't cause it to realise something is wrong either, the status still returns as "online". If anyone knows why this is, please let me know. I can get it to start complaining if I offline the missing file, but then I can't online it again because it's not there, but that's a different problem.

One way of making it realise something is wrong is to export the pool, and then try to import it again:

# zpool export testpool
# zpool import -d /mnt/poolfiles/
   pool: testpool
     id: 1173309968352596375
  state: DEGRADED
status: One or more devices are missing from the system.
 action: The pool can be imported despite missing or damaged devices.  The
        fault tolerance of the pool may be compromised if imported.
   see: https://openzfs.github.io/openzfs-docs/msg/ZFS-8000-2Q
 config:

        testpool                  DEGRADED
          mirror-0                DEGRADED
            /mnt/poolfiles/file1  UNAVAIL  cannot open
            /mnt/poolfiles/file2  ONLINE

From there importing, creating a new 1GB file and running the replace command works as expected, and once it's resilvered, the pool is happy again.

For this behaviour I'm less able to think of a good explanation. Perhaps it's something to do with it being hosted on another ZFS filesystem that is somehow preserving the file in such a way that it appears deleted, but the zpool can still access? (I know snapshots work by just recording the changes in the filesystem, and that's why you can return to an earlier state, as the files aren't really gone; maybe something related to that?).

I ended up going far beyond what I'd intended to do just to create an example for a script; but I've learnt much more about ZFS, and the fact you can use files to do this kind of experimentation is great.

Now it really is time to stand up and get another cup of tea, and maybe even a biscuit.

References

A Crash Course on ZFS [Internet Archive] 2013-12-04 - Many more examples of what you can do with ZFS, and experimenting with files as the vdevs. Interestingly this also contains an example where he deletes one of the files that were used to create the pool, and after a scrub it sees the issue, which I couldn't replicate.
Demonstrating ZFS Pool with truncated files [Reddit] 2019-07-03
Are there any ZFS disk failure simulations? [Reddit] 2020-10-31
How to Corrupt an archive file in a controlled way [Stack Exchange] 2015-08-10

For those in the audience who speak American English, please read ZFS as "zed eff ess" for the rest of this post, to give it a more international flavour. ↩

I offset random data from the start of the file just to avoid overwriting any metadata/partition information - in case it exists there. That might cause a different kind of failure that I didn't want, but I'm not sure this is necessary. ↩

Here I find the terminology a bit odd. When searching the directory, the pool is found and lists itself as 'online' - but since I can't directly access the content I'd argue that's not a very useful state description, it feels like it should be 'available' or something in limbo. ↩

←

Last Day Before School

Lost Tooth Form

→