System unresponsive during "zfs destroy" of clone

JosB · Oct 21, 2015

Today I tried to destroy a clone that has about 7000 snapshots. When issuing the command, everything seems to be going fine, until a certain moment when the machine seems to partially lock up. Eventually I had to hard reboot the machine

What keeps working:
Ping to the machine
Screen output and no kernel panics or errors are shown
The disks keep heavily active, as expected.

What stops working:
Current ssh sessions stop responding completely, but are not disconnected
Not possible to establish a new ssh session (asks for password, but the server closes the connection before showing a shell)
Pluging in a keyboard gives no response on the keyboard (leds don't light up) nor the screen. Does not respond to key strokes.
Webinterface not available

Configuration:
Freenas 9.2.1.7
Dell R550
Opteron 2.2GHz
32 GB RAM (ECC)
4 x 2TB Sata in RAID-Z
Perc 700 controller, each disk configured as RAID-0, caching disabled. (JBOD is not available :( )

Hourly snapshots are set up. I guess that destroying the clone and later the snapshot routine kicking in might have brought the server to its knees, but I'm not sure. I have attempted this before and got the same result.

Any ideas what might be the problem?

Bidule0hm · Oct 21, 2015

Why hard reboot? that's the worst thing you can do to a heavily loaded system, just let it finish what it's currently doing.

What do you mean by a clone that has 7k snapshots?

JosB · Oct 21, 2015

It's the main file server and everyone at the office depends on it. The previous time I tried it, I let the server crunch for four full days (weekend + holidays). Knowing that, I rebooted it as a last resort.An act of desperation really, since I can't stop the office for the rest of the week.

What I mean by clone with 7k snapshots is the following: when recovering a deleted file, instead of going into the snapshots dir and copy the file, someone created a clone from the snapshot that had the file inside the main dataset and then copied the file. After that, he forgot to delete the clone. Since snapshotting is set to be recursive, it also started making hourly snapshots of the clone. As if that isn't bad enough, this happened four times... So, now I'm stuck with several clones having almost 20k snapshots in total :(

I obviously want to get rid of this mess, but there seems no easy way out

Bidule0hm · Oct 21, 2015

Hum, ok.

Ok, so this is just poor administration of the server if everyone has access to the server to make clones... In theory only the admin of the server has access to it and if someone need a clone then he ask him to do it.

Now I don't know if clones are snapshooted or not. You'll need to wait for an answer from @cyberjock for example who knows far better than me on this subject.

Question: why do you have so many snapshots if you only have one per hour?

And just to put the usual warning about HW RAID: this is bad, you shouldn't use a RAID card with ZFS, even with each drive as a RAID-0.

JosB · Oct 21, 2015

Well, to be honest, there are only two admins, me and another guy. We are the only ones that have root access to it. I admit that I don't remember who made the mistake of forgetting those clones, could have been me :D

The clones seem to be snapshotted, at least ZFS tells me so.

The reason of having so many snapshots is that we have hourly snapshots (only mon-fri during office hours) AND remote replication. As you probably know, having different snapshot schedules doesn't play nice with remote replication. I'd happily only retain the hourlies only for a week or so and keep weeklies, but in the end the replication will stop working. We experimented with retaining the hourlies for quite a long time, generating this absurd amount of snapshots. We changed that to one month, although that still generates quite a lot of snapshots.

I'm well aware of the risk of using HW RAID, but other than flashing the raid card with firmware of LSI (that apparently is an option) and loosing the warranty, this is our only option. I read that configuring the disks as single drive RAID-0 and completely disabling the cache on the card would basically be the same as JBOD. Is that information incorrect?

Bidule0hm · Oct 21, 2015

Ah ok, there's also replication, it's getting better and better... definitely wait for an answer from a more knowledgeable member :)

Yep, it's incorrect. For example one of the problems is that you can't access SMART data on the drives, that's a very bad thing. You still have the option to put a proper HBA (like the M1015 for example) instead of the Perc card. See the Hardware Recommendations thread (link is in my signature) for more infos ;)

cyberjock · Oct 21, 2015

JosB said:
I'm well aware of the risk of using HW RAID, but other than flashing the raid card with firmware of LSI (that apparently is an option) and loosing the warranty, this is our only option. I read that configuring the disks as single drive RAID-0 and completely disabling the cache on the card would basically be the same as JBOD. Is that information incorrect?

To be blunt, totally 100% incorrect. You'll find no documentation from iXsystems, and nobody with any kind of experience with FreeNAS recommend RAID controllers in any way, shape, or form. They are bad, bad, bad. I'm sorry but you cannot "be aware of the risks of using HW RAID" if you are using it with ZFS. In my experience, they are the number one reason why people lose their data.

Frankly, if "that was your only option" you were much better off using another OS that doesn't use ZFS. This may sound harsh, but it's also the unfortunate truth. Your data is at much higher risk of being lost because you are using ZFS with hardware RAID. Just read the hardware recommendations thread and you'll have another blunt answer.

You are also mistaken about other things:

1. Remote replication doesn't care if you have multiple snapshot schedules. I know people that have 5 different schedules, and they all work just fine with replication.
2. Snapshots should expire at whatever schedule you set for them, except when they have NOT been replicated to the destination.

Both of the above statements mean that you do NOT need to keep an absurd number of snapshots, nor do you need to limit yourself to just one schedule for snapshots.

To be honest (and I'm just trying to be honest) I get the impression that someone isn't really fully understanding how FreeNAS, replication, etc work and have made quite a few false conclusions about how it works.

My advice would be to pretend you know nothing about FreeNAS and ZFS and start reading up on the forums. There's no doubt more that you don't know, but I'm betting once you learn about it you will be much happier with FreeNAS and ZFS than you are right now.

We all start somewhere. I didn't know what ZFS was 5 years ago. You just need to get some knowledge on the basics and you'll probably find that you too can be a pro at ZFS. ;)

Edit: Also, to touch back on your problem with the system going unresponsive, it's almost certainly because ZFS is trying to smartly schedule reads and writes with your disks, but your RAID controller is also trying to smartly schedule reads and writes, so they conflict and you lose. You should be able to destroy 50TB snapshots and have no performance impact like you are experiencing. In fact, I've seen 20-40TB snapshots be destroyed while in production environments with no consequence. ;)

Bidule0hm · Oct 21, 2015

cyberjock said:
1. Remote replication doesn't care if you have multiple snapshot schedules. I know people that have 5 different schedules, and they all work just fine with replication.
2. Snapshots should expire at whatever schedule you set for them, except when they have NOT been replicated to the destination.

Both of the above statements mean that you do NOT need to keep an absurd number of snapshots, nor do you need to limit yourself to just one schedule for snapshots.

Ah, that what I told myself but I wasn't sure, thanks for the confirmation ;)

And just for my general knowledge: are the clones snapshooted?

JosB · Oct 21, 2015

BiduleOhm and Cyberjock, thanks for all the advice!

We are actually quite happy with FreeNas for years now, we installed it in 2012. So far nothing nasty has happened (except for this hickup and even that wasnt a disaster), but it seems we are just lucky. We went with this configuration because it was suggested by several people, admittedly not in the documentation. We know that zfs needs full control over the disks to be able to work and unfortunately the advice gave us the impression this configuration would guarantee that.

Knowing now that this is bound to go wrong somewhere in the future, I'll change the controller asap. Any idea if the m1015 would work with the backplane/extender of the r515 (12 disks)? I suppose it should work, but I just want to be sure.

On the replication issue: we did set up a normal schedule (hourlies, dailies, weeklies and monthlies) in the beginning, but after some months the replication would get stuck, requiring a reinitialization of the remote side. According to
https://bugs.freenas.org/issues/3115

And
https://forums.freenas.org/index.php?threads/replication-frequently-getting-stuck.17882/

this is an issue (and other similar ones) that hasnt been resolved yet. This is the only reason we are doing it this way, but we would be glad to revert it to the original schedule if that is guaranteed to work.

About the snapshot expiration, as far as I understand this is solely determined at the moment the snapshot is made and changing the schedule afterwards doesn't seem to retroactively modify existing snapshots. Although this can be fixed using the command line, we refrained from doing so, because fiddling manually with the snapshots is not recommended as it can break replication. Breaking the replication in our case requires a long trip by car to get the replication server, sync it and bring it back again.

Thanks!

Important Announcement for the TrueNAS Community.

System unresponsive during "zfs destroy" of clone

JosB

Dabbler

Bidule0hm

Server Electronics Sorcerer

JosB

Dabbler

Bidule0hm

Server Electronics Sorcerer

JosB

Dabbler

Bidule0hm

Server Electronics Sorcerer

cyberjock

Inactive Account

Bidule0hm

Server Electronics Sorcerer

JosB

Dabbler

Similar threads