FreeNAS extremely slow

Status
Not open for further replies.

Mirfster

Doesn't know what he's talking about
Joined
Oct 2, 2015
Messages
3,215
Wondering though if any of the Pools are at 90% Capacity. Looks like you have several of them and while I have never had a Pool get above 70%, it does make me wonder if FreeNAS goes from "performance" to "space-based optimization" when any of the Pools reach that threshold?

For sure "Volume2" is hurting and trying to recover...
 

Stux

MVP
Joined
Jun 2, 2016
Messages
4,419
The performance problem at 90% (or was it 80) should only be about free space block finding.

That should only be affecting writes. And I don't think resilvers count.

Don't know though :)
 

MR. T.

Explorer
Joined
Jan 17, 2016
Messages
59
I do have a bunch of volumes over 90%... but already had them when the performance dipped.

I can unmount of remove the disks of the volumes over 80% but that would be annoying if that was the problem.
disks are fairly expensive and losing 20% of disk space for volume size and who knows how much from the difference between the volume and dataset size would make the waste prohibitive for a home NAS
 

Mirfster

Doesn't know what he's talking about
Joined
Oct 2, 2015
Messages
3,215
TBH, I am not sure. I have never hit those numbers and am just pondering if having one Pool getting that high; does it affect other Pools and/or the entire system?

Referencing the ZFS Primer:
While ZFS provides many benefits, there are some caveats to be aware of:
  • At 90% capacity, ZFS switches from performance- to space-based optimization, which has massive performance implications. For maximum write performance and to prevent problems with drive replacement, add more capacity before a pool reaches 80%. If you are using iSCSI, it is recommended to not let the pool go over 50% capacity to prevent fragmentation issues.
 

MR. T.

Explorer
Joined
Jan 17, 2016
Messages
59
Yes, i had seen that but what i read from it was it would impact write speed on the specific volume.
But maybe it means that it impacts performance on the entire box. (and quite an extreme performance impact).

all volumes i have over 80% are single drive volumes.

I'll try to remove them to see what happens, it's worth the try.
Thanks for the hint
 
Joined
Jan 7, 2015
Messages
1,155
FWIW, I had similar issues that you are having once, specifically, the dropping out of drives and pending sectors and such. It was driving me mad. I finally figured out it was the power supply, and specifically, the 12v rail on which I had about 10 or 12 drives connected via SATA splitters and such. I found this out via IPMI and looking at the event logs. Around the time that I was dropping drives the log coincided with a dip or loss in the +12v rail of the power supply (possible 5v?). I bought an EVGA 850 80+ Platinum (10 year warranty!), and I have not had a single problem since. Not sure about your dips in speeds, but I suspect with the amount of drives you are running, it could be a power supply issue. The drives I had that were dropping out were brand new and were BITed using Solnet among others. I got them in two batches of 6, so all twelve (16 with SSDS) were initially not all connected during testing, but once they were vetted and all connected the issues started. Something for you to ponder. Good luck.
 

depasseg

FreeNAS Replicant
Joined
Sep 16, 2014
Messages
2,874
I have 22 disks connected at the moment if that is relevant
Sorry I just noticed this. What Power Supply do you have? Might as well provide the rest of the HW specs as well.
 

MR. T.

Explorer
Joined
Jan 17, 2016
Messages
59
Ok... over the weekend i filled myself of liquid courage and decided to replace the PSU (corsair gold 550w) for the one on my "gaming" PC (generic 850"). "Gaming" is between quotes as i am little more than a casual now a days.

It did improve the problem, went from 500kbps to 1.5mbps on volume2. The rest of the volumes are now running at reasonable speeds.
I tried to disconnect a few disks from other volumes but that didn't change anything.

I also got a good disk and started the replace of the one that dropped off.

Now the transfer speeds are marginally better, but the resilver is stuck in a loop. It goes for a while and then restarts.

This is driving me insane.


The full specs for the machine are:
An asrock C2550D4I motherboard. CPU is on board. (4 core and never seen it using more than 50% of a single core)
32gb of ram. I think is corsair, but i might have bought kingston instead.
The PSU was a corsair gold 550W, but now has a 850w psu.
an LSI HBA 16 port.
16 disks are connected to the lba
7 to the motherboard.

The disks are at their majority old 500gb that i bought over the years, i have about 7 2tb hdds and 5 8tb.

The ram is non ecc but the motherboard supports ecc ram and that is going to be the next upgrade as soon as i get this to work correctly and get the rest of the disks (6 more 8tb disks to go).

PS: also whenever i touch any PC part something stops working. In this case it was my "gaming" PC that stopped and i had to replace the CPU :(
 

Ericloewe

Server Wrangler
Moderator
Joined
Feb 15, 2014
Messages
20,194
"Generic" PSU? Oh God, that can't be good.
 

MR. T.

Explorer
Joined
Jan 17, 2016
Messages
59
It's what i had at hand. The new Corsair Gold 850w will only arrive tomorrow. It was just a stopgap solution

I am cutting some corners but if it's going to be running 24/7 i'm spending some on a decent PSU
 

MR. T.

Explorer
Joined
Jan 17, 2016
Messages
59
There's a development on my slow and painful journey through the world of crappy HDDs!

Replacing the PSU improved thing a bit but doesn't seem to be the problem.

Yesterday i managed to run the script that Stux suggested and after 2 hours i had the results. One of the disk (none of the ones i suspected) had a read speed of 0 (that's a zero although it looks like an O).

I went into the volume and offlined that disk and immediately the speed went up to reasonable speeds. I'm not sure if it was going at the full speed but looked substantially better.

Unfortunately something happened and the resliver restarted (as i had mentioned before that seems to be happening and never allowing the resilver to complete) and the disk was back on the volume and this time it doesn't allow me to offline it stating it has not enough valid replicas.

That makes me think that it shouldn't have allowed me to offline it in the first place but it did anyway.

At this point i am thinking of just shutting down the NAS and removing that disk.
It's the second missing disk of a zfs2 volume and although it should be fine, i can't be 100% sure the rest of the disks will be reliable.

If i were to remove the disk and remove as much data as i can while it's offline, and then plug it back in again, would freeNAS be able to figure out i removed a bunch of data and proceed correctly? Or i would't be able to make that disk join the volume anymore?
 

toadman

Guru
Joined
Jun 4, 2013
Messages
619
Yes, seems like a faulty disk is causing your low performance. That would have been my guess as it happened to me previously. (Note, the fact it's dropping in/out could be related to cabling or something else in the data path vs. the disk itself.)

The data is already at risk. I agree with your next step. Shutdown, remove the faulty disk from the system, put in fresh disk that you want to be the replacement, then reboot and see if the pool comes up. If it doesn't come up, then you need that disk for redundancy, and it sounds like you're in hot water. If it does come up, then use the fresh disk to replace the one you took out. The pool should resilver at a reasonable speed. (Don't throw the faulty disk away or reformat, etc until the resilver is complete.)

I don't know what you mean by "if I were to remove the disk and remove as much data as I can while it's offline." What does "remove...data" mean? Deleting data, or copying it somewhere else? And are you talking about removing data from the pool, or the disk you removed?

Even if you remove the disk, it thinks it's part of the pool. If you alter the state of the pool while the disk is removed, then put the disk back in, I'm not sure what will happen. Might work, might not as the metadata will be wrong. Perhaps someone else on the thread will know. I am assuming if you remove it and the pool comes up you're going to replace it anyway, so it's probably a moot point?
 

MR. T.

Explorer
Joined
Jan 17, 2016
Messages
59
So... i removed the offending disk leaving me with absolutely no redundancy.
It did in fact started resilvering at a very decent speed. It was at 25% when i went to bed yesterday.

What i wanted to know was if the rest of the disks have problems if i can reconnect this faulty disk back in and have freenas calculate whatever missing data from there.
The plan was to remove the data from the volume to have less data to resilver (making it faster) but it seems fairly fast and reliable now (not restarting all the time).

the volume at about 15% got 4 disks under degraded state. I'm not sure what's going on there but i'll wait for it to finish before i look into that any further.

I only have the nas for 2 or 3 months, i use it mainly for cold storage and I have lots of disks on this NAS, of multiple brands and multiple ages.
I've noted that:
i had 4 western digital disks that all died
I had 6 toshibas (4 x 2tb brand new, 2 x 1tb older), that the old ones simply failed and the new ones are all failing and chugging lots of SMART errors
I have lots of over 10 year old 500gb seagate disks and 4 new 8tb seagate disks. All of them working fine so far.

i know lots of people don't like seagate but in my case they simply seem to work unlike most others
i have some other brands but only 1 disk of each so it's hard to see any trend in that)
 

Stux

MVP
Joined
Jun 2, 2016
Messages
4,419
Good luck
 

depasseg

FreeNAS Replicant
Joined
Sep 16, 2014
Messages
2,874
What i wanted to know was if the rest of the disks have problems if i can reconnect this faulty disk back in and have freenas calculate whatever missing data from there.
The plan was to remove the data from the volume to have less data to resilver (making it faster) but it seems fairly fast and reliable now (not restarting all the time).
If you haven't "removed" the drive from the pool, then yes, it should help with a resilver in the future. In fact that is the preferred way to replace a disk (you can tell freenas to replace disk A with B, and it resilvers to the new drive with the old one still providing redundancy).

The fact that you've had so many drives fail over a short time make me think there is another issue going on (too much heat, bad power, excessive head parking issues, etc).
 

MR. T.

Explorer
Joined
Jan 17, 2016
Messages
59
The fact that you've had so many drives fail over a short time make me think there is another issue going on (too much heat, bad power, excessive head parking issues, etc).


This is the UK. we have no heat at all.
We all have penguins as pets.
The greatest killer other that heart attack due to excess fat is mauling by polar bears.
Others when bored watch the grass grow, we watch the icicles grow.


Jokes aside... I'll have to look into that. As soon as the NAS is back and working i'll be reading upon excessive head parking issues.
Bad power would be moot as i just replaced the PSU... but i believe that the PSU could not provide enough power, although all online power consumption calculators said that i had a 40% headroom.

As i was building the nas i had 4 western digital disks that died... they all used to work and simply didn't when it was time to connect to the NAS
i also have 3 disks that connected to the NAS they are not recognised, even though the crappy consumer grade NAS i have is happy to work with them.

I only have these disks because i had to create the volume with the final number of disks, because the last time i read, freenas is unable to add extra disks to a volume correctly.
All these disks will eventually be replaced with fairly expensive 8tb hdds
 

depasseg

FreeNAS Replicant
Joined
Sep 16, 2014
Messages
2,874
smartctl -a /dev/[drive device name] will give you some information about temperatures and drive start stop counts. @Bidule0hm has some great scripts which will put all the info in an easy to read file/email. As for scheduling SMART tests, please follow the directions in the Docs: http://doc.freenas.org/9.10/tasks.html#s-m-a-r-t-tests
 

toadman

Guru
Joined
Jun 4, 2013
Messages
619
What i wanted to know was if the rest of the disks have problems if i can reconnect this faulty disk back in and have freenas calculate whatever missing data from there.

I don't know the answer with certainty, but I think the answer is "probably not." Given that the pool is running well now (resilver speed) it looks like the disk you pulled is a bad disk. (It's possible the cabling or something is bad vs. the disk, but you'd have to test that part.) So I don't think a bad disk is going to help much.

BTW, you mentioned you were at no redundancy. Are you replacing the bad disk with a new one so you get back to a full RAIDZ or RAIDZ2 (can't remember which way the pool is configured). I assume so, yes?
 

MR. T.

Explorer
Joined
Jan 17, 2016
Messages
59
smartctl -a /dev/[drive device name] will give you some information about temperatures and drive start stop counts. @Bidule0hm has some great scripts which will put all the info in an easy to read file/email. As for scheduling SMART tests, please follow the directions in the Docs: http://doc.freenas.org/9.10/tasks.html#s-m-a-r-t-tests

I'll look into scheduling the tests and read more about SMART, but i must first create a burn in test. I had never even heard of it, but now that it has been pointed out i see that i really need one :)

BTW, you mentioned you were at no redundancy. Are you replacing the bad disk with a new one so you get back to a full RAIDZ or RAIDZ2 (can't remember which way the pool is configured). I assume so, yes?


Im running RAIDZ2, i have a disk that dropped off the volume (for some unknown reason, probably simply died), and the disk i pulled out was the second one, leaving me with no redundancy.

The resilver finished but i ended up with 4 disks degraded and it complaining that i had lost data.

so... what happened was:
I started replacing disks when i started having problems. The original volume is 11 disks and it's up to 14 right now with a few disks being replaced.
The resilver finished but a couple of pairs of disks ended up degraded. i think it was because one of the disks was unavailable and i removed a second one (leaving me with no redundancy) and a third disk ended up on a faulty state.

The resilver did not release the 3 extra disks... i'm not sure why, but probably because of the degraded state of a few of the disks.

As a last effort i have connected back the disk i pulled out and is resilvering again. The disk seems a bit more cooperative right now, so i am trying to move the data out of the volume in case it decides to stop working again.
Hopefully i didn't lose any data, or (worse) get any corrupted data.

I expect that at the end of this resilver the extra disks (added as replacements of some of the disks) get released and i can replace some of these failing ones.

I'll see how that's going when i get home.

Till then... fingers crossed.
 

MR. T.

Explorer
Joined
Jan 17, 2016
Messages
59
An update on this:

The first resilver didn't work. I have connected the bad disk back but the performance issue came back.

I've disconnected the disk again and it somehow managed the finish the resilver.

And then another disk started to create problems (another toshiba). I've replaced that one and now the NAS works fine.

One thing i had not understood when i looked into FreeNAS is that the volume is only as fast as the slowest disk.
This would make sense if you were reading from that disk but makes no sense at all when reading data that shouldn't even be on that "slow disk".

I'm pretty sure this is not FreeNAS fault, but some (in my opinion) bad design choice by the developers of ZFS.
In my head it would be much better if the data block were to be read from the disk, the checksum verified and if it checks out just returned. That only requires the usage of the disk where the data is stored and the slowest disk would probably not be triggered.

Anyway, i am now ordering new disks (seagate this time) and will get a burn in rig to test every disk from now onwards.

Thank you all for your precious help.
 
Last edited by a moderator:
Status
Not open for further replies.
Top