Choppy performance since pool reached 80% utilization

Status
Not open for further replies.

freenas-supero

Contributor
Joined
Jul 27, 2014
Messages
128
Hello,

So recently (about a month ago), one of my volumes reached 80% utilization. I did not get a warning via email from freenas, but I noticed that access to the pool was slower than usual, so I logged in to freenas and noticed the yellow light sign indicating that the pool had reached 80%.

I did not have a chance to delete some content to free up space since I had reached 85% in the past without noticeable issues (other than degraded performance) so being at 80% didn't worry me much.

Then a few days later, I went back to freenas and the light was green again. I checked the volumes and the most filled one was at 73%. I then cleaned and deleted some stuff. Right now, it is at 71%.

The problem is that since the pool reached 80%, performance during movie playback is horrendous. The playback is fine in itself, but every 15 to 20 minutes, it locks up (stops) for several seconds (up to a minute) while buffering, and the hard drive caddies on the server are being lit up either in a sequence (hdd1, hdd2, hdd3, etc for all 8 drives), or all flashing heavily at the same time until playback resumes. No need to say, this kills the experience.. I did not used to have this issue in the past, and nothing has changed in the way the server is being utilized (connected clients via nfs, etc). In other words, workload has remained similar in the last 2 years or so..

Attached are some screenshots captured for a period where I experienced a lockup so significant, Kodi gave up playback and quit!

Some server data:

Supermicro X7DBE+
48GB DDR-667 RAM
2x Xeon L5420 quad cores CPUs
2x IBM M1015 HBA
Mixture of 8 SATA2 or SATA3 drives
Gigabit connection
No LAGG

I hope someone can shed light on this.. Is it caused by fragmentation?
Thanks
 

Attachments

  • 1.png
    1.png
    30.6 KB · Views: 345
  • 2.png
    2.png
    46.7 KB · Views: 358
  • 3.png
    3.png
    46.6 KB · Views: 360
  • 4.png
    4.png
    43 KB · Views: 358
  • 5.png
    5.png
    27.5 KB · Views: 361
  • 6.png
    6.png
    38.8 KB · Views: 360
  • 7.png
    7.png
    12.5 KB · Views: 342
  • dmesg.txt
    14 KB · Views: 473

jgreco

Resident Grinch
Joined
May 29, 2011
Messages
18,680
The 80% is probably a red herring here. It is likely that you're suffering from fragmentation, which isn't _just_ a function of percent-occupancy, but also of the number of times you've freed and reallocated space. You can create a hell of a mess and then go back down to 60% and it's still bad-"ish".

You seem to be getting hit by some large bursts of writes around the time you're seeing a problem. This could also add to the performance issues, because on a fragmented pool, it is possible for the system to overestimate the capabilities of your pool, and instead get locked into a crummy situation where it has committed to writing a large quantity of data but takes forever to do it (variation on bug 1531) during which time other writers block (and maybe other readers).

See if you can reduce or eliminate the writes. While you're at it, see if you've got atime updates disabled, which would be a good optimization to reduce trite writes.
 

Robert Trevellyan

Pony Wrangler
Joined
May 16, 2014
Messages
3,778
Don't rule out the possibility of a hard drive starting to fail, particularly this one:
da6 at mps1 bus 0 scbus1 target 4 lun 0
da6: <ATA ST3000DM001-1CH1 CC26> Fixed Direct Access SCSI-6 device
da6: Serial Number Z1F31JSY
da6: 600.000MB/s transfers
da6: Command Queueing enabled
da6: 2861588MB (5860533168 512 byte sectors: 255H 63S/T 364801C)
da6: quirks=0x8<4K>
 

freenas-supero

Contributor
Joined
Jul 27, 2014
Messages
128
See if you can reduce or eliminate the writes.
You mean reduce the actual amount of writes? Well this is gonna be difficult because this is probably clients backups or other automated actions happening. PLus, what would be the point of this server if not to use it? Are you saying the server is overwhelmed?

For the atime thing, I am not aware of what it does. Perhaps I need to read a bit on it? But I still posted some screenshots of the volume parameters from the webGUI. First is from the problematic volume (media), and the other is for the parent pool..
 

Attachments

  • 1.png
    1.png
    40.3 KB · Views: 376
  • 2.png
    2.png
    30.4 KB · Views: 354

jgreco

Resident Grinch
Joined
May 29, 2011
Messages
18,680
Don't rule out the possibility of a hard drive starting to fail, particularly this one:

Yeah, that's pretty much taking the "overestimate the capabilities of the pool" issue and turning it to look at it from the other end, definitely take a look and see if some of the individual drives are just being crappy or throwing errors.
 

Robert Trevellyan

Pony Wrangler
Joined
May 16, 2014
Messages
3,778
How did you isolate this specific drive from the 7 other ones?
It's a model that's notorious for early failure. Try a web search for that model number with "problems" or "failure" or similar terms. A failing drive can lead to stalling while the system recovers from a failed read. I don't know if the pattern of drive LED activity indicates this, but I suggest running some SMART tests and checking smartctl output.
For the atime thing, I am not aware of what it does
atime is short for 'access time', and when enabled, the filesystem records the time when any file was last accessed, even if only for reading. This adds a pointless[*] write load, and is something I turn off immediately on the top level dataset whenever I create a pool.
[*] except in a few specific cases.
 

freenas-supero

Contributor
Joined
Jul 27, 2014
Messages
128
If I go back to the volume settings, and select OFF (it is INHERITED for now), will it destroy my pool or volume?

I will run smart on all drives and post back what I find!

Thanks!!
 

jgreco

Resident Grinch
Joined
May 29, 2011
Messages
18,680
If I go back to the volume settings, and select OFF (it is INHERITED for now), will it destroy my pool or volume

No. It just tells ZFS to stop writing out the date files were last accessed. That's a UNIX behaviour that makes sense for certain situations, and wasn't a problem "back in the day," but kinda sucks now. The setting, as with most other settings, can be changed at any time.
 

cyberjock

Inactive Account
Joined
Mar 25, 2012
Messages
19,526
Also keep in mind that ZFS is heavily limited in performance when using systems that have a FSB. Yes, yours is so old it still has a FSB, so it's not something that we recommend you use for FreeNAS. ZFS pushes lots of data to and from the FSB to various components, which turns the FSB into a bottleneck. The catch is that there is no good "FSB meter" to tell you when your FSB is saturated. CPU usage won't reflect this limitation. :/
 

freenas-supero

Contributor
Joined
Jul 27, 2014
Messages
128
Also keep in mind that ZFS is heavily limited in performance when using systems that have a FSB. Yes, yours is so old it still has a FSB, so it's not something that we recommend you use for FreeNAS. ZFS pushes lots of data to and from the FSB to various components, which turns the FSB into a bottleneck. The catch is that there is no good "FSB meter" to tell you when your FSB is saturated. CPU usage won't reflect this limitation. :/

Im discussing about energy consumption for this server on another thread here and Im tempted to upgrade to a modern hardware setup but the savings ($) are just not worth it yet, unless I can find a decent ZFS ready setup (mobo, RAM, CPU) for less than $200...

I must say, for a week or so now, the pool is back to behaving normally. I haven't seen any performance issues, certainly not high performance by any means but the choppy playback issue I was having seems to be gone now.
 

freenas-supero

Contributor
Joined
Jul 27, 2014
Messages
128
Id rather upgrade my existing chassis to minimize the expense, or at least go for rackmount to fit with my other hardware..

I guess a decent CPU, 32GB ECC RAM and a simple server grade mobo could be had used for < 200$? If I can cut my energy usage by 150W from the current 340W, then it would take 2 years to recover this expense... The logic being that 8 hard drives use around 80W, and a HBA perhaps 20W? So that means everything else uses 240W. Assuming a serious usage reduction of 60% ( I doubt a decent CPU, 6 or 8 memory sticks, a mobo and some chassis fans would use only 100W max...)
 

jgreco

Resident Grinch
Joined
May 29, 2011
Messages
18,680

freenas-supero

Contributor
Joined
Jul 27, 2014
Messages
128
An idle E3-1230 of the five year ago era
5 years old and still almost $200 just for the CPU? No chance at 8.5cents per kWh this is financially viable. The upgrade will never pays itself back before it dies... Unless 44W is the entire server's usage, and using that as an assumption, payback time is more than 2 years..

What is the electricity cost where you live?
 

jgreco

Resident Grinch
Joined
May 29, 2011
Messages
18,680
5 years old and still almost $200 just for the CPU? No chance at 8.5cents per kWh this is financially viable. The upgrade will never pays itself back before it dies... Unless 44W is the entire server's usage, and using that as an assumption, payback time is more than 2 years..

What is the electricity cost where you live?

The point here is that even five years ago, it was completely reasonable to build a Xeon class computing platform that idled at 44 watts and when pushed full tilt consumes 85 watts. Now add eight hard disks to that for a FreeNAS box and you're probably idling at around 111 watts and 176 watts at full tilt. Evidence suggests that today's platforms might be (read: "are") even cheaper.

So if you're currently idling at 340 watts and you could be idling at 111 watts, that means you could be saving 229 watts in opex utility costs, or around 500 watts if you're also paying for cooling.

Just assuming 229 watts saved, that works out to .229 * 24 * 365 * 5 = 10030 kilowatt-hours, which at 8.5c per kilowatt-hours works out to $852.55 in savings. But what the fsck do I know. Not like I've been trying to build low power platforms for decades or anything like that.

In the meantime please feel free to enjoy that space heater of yours. Hope its cold outside.
 

freenas-supero

Contributor
Joined
Jul 27, 2014
Messages
128
Your math is correct but I doubt the numbers a bit. I did the calculation on my side, and while I did not spend much time researching (not yet), I ended up with these numbers: tell me based on your experience, if they make sense!

Cost to upgrade (I would keep Chassis, PSU's M1015, case fans and HDDs): approx. $470 CAD
Mobo (Supermicro X9 class for ECC & Xeon support): $150
RAM (3x 8GB DDR3-1333 ECC Kingston): $85
CPU: Xeon E3-1230: $235

That would idle at around 210W. I know this is much higher than the numbers you provided but the math works out like this:

Hard drives: 8x 6w idle each = 48W
Mobo: 75W (idle) (includes SAS backplane and all the other controller crap in the chassis)
RAM: 3 sticks at 3W ea = 9W (idle)
Fans: 4 fans at 4W ea. = 16W (at 1800RPM)
CPU: 1x Xeon E3 = 20W (idle)
HBA: 1x M1015 at 11W (idle)
PSU: 2 redundant PSU with small cooling fans = 2x 5W= 10W

"ROI" would be about 2 years which is OK if I assume the energy saved wont need to be cooled. I might actually pull the trigger considering the performance upgrade in the equation, which if we recall, was the original reason for posting here in the first place...

Anyways, this is getting more interesting. And to answer you, yes it does get cold here in the winter, sometimes around -25C... I save on heating costs with my "space heater" ;)
 

maglin

Patron
Joined
Jun 20, 2015
Messages
299
So ROI of 2 years and you have more computing power available. Seems like an easy kill if you have the money now. Also all that extra heat gets cooled by the A/C in the summer.


Sent from my iPhone using Tapatalk
 

jgreco

Resident Grinch
Joined
May 29, 2011
Messages
18,680
Your math is correct but I doubt the numbers a bit. I did the calculation on my side, and while I did not spend much time researching (not yet), I ended up with these numbers: tell me based on your experience, if they make sense!

Cost to upgrade (I would keep Chassis, PSU's M1015, case fans and HDDs): approx. $470 CAD
Mobo (Supermicro X9 class for ECC & Xeon support): $150
RAM (3x 8GB DDR3-1333 ECC Kingston): $85
CPU: Xeon E3-1230: $235

That would idle at around 210W. I know this is much higher than the numbers you provided but the math works out like this:

Hard drives: 8x 6w idle each = 48W
Mobo: 75W (idle) (includes SAS backplane and all the other controller crap in the chassis)
RAM: 3 sticks at 3W ea = 9W (idle)
Fans: 4 fans at 4W ea. = 16W (at 1800RPM)
CPU: 1x Xeon E3 = 20W (idle)
HBA: 1x M1015 at 11W (idle)
PSU: 2 redundant PSU with small cooling fans = 2x 5W= 10W

"ROI" would be about 2 years which is OK if I assume the energy saved wont need to be cooled. I might actually pull the trigger considering the performance upgrade in the equation, which if we recall, was the original reason for posting here in the first place...

Anyways, this is getting more interesting. And to answer you, yes it does get cold here in the winter, sometimes around -25C... I save on heating costs with my "space heater" ;)

Your numbers are way off. I gave you examples of actual measured idle power of a well designed system at 44 watts. That *is* the sum total of all the parts of the system inventory provided. Adding hard drives and an HBA will add 11 watts plus ~7 watts per drive. There's no need for an SAS expander, but that'd be 11-12 watts if you did it. Otherwise, a backplane only takes a very tiny amount of power for the LEDs and base logic. It's likely that ancient Supermicro PSU's could chew through a significant amount of power, but that'd be more of an argument to slot one out and keep it as a cold spare, or to actually replace them entirely.
 

freenas-supero

Contributor
Joined
Jul 27, 2014
Messages
128
Your numbers are way off. I gave you examples of actual measured idle power of a well designed system at 44 watts.

Probably, that's the way I thought when I was typing this, but the math didn't check out quite well because out of the actual measured usage of 330w of that dinosaur, nothing seems to justify the bulk of the power being used. Anyways, instead of trying to determine what uses so much energy out of the 330w I measured at the outlet, I have decided to upgrade that thing. Now I have a few question for the Guru's here!

  1. CPU: since I mostly use my freenas server for general file storage (serve files over NFS) and to do backups with rsync, there is no high performance need at this point (no VM storage other than backups) and no iscsi or anything like that. Other uses are mount points for clients to save files to, and general multimedia storage (movies, music, family pics, etc).. I assume any Xeon E3-12XX would do? Care to recommend a specific model? Any major benefit to go for a newer edition (V5) compared to an older V2 or V3 edition?
  2. What about socket? LGA 1150 or 1155?
  3. Am I right to assume clock speed is more important than core count? Especially with freenas and my usage pattern/needs? I do NO transcoding or any other CPU intensive tasks on my freenas. I have a dual Xeon E5 server for that. If I am right to assume so, then a fast 4 core CPU would be ideal?
  4. Should I consider a E5 instead due to the fact that E3's are locked at 32GB RAM? Right now I have 48GB RAM which is all used by ZFS. My current pool consist in a mixture of 2 & 3TB drives, so RAW I have around 18TB, and 10.5TB usable (RAIDZ3). With the rule of "1GB per TB of pool", and assuming that I will never use more than 8x drives (to minimize energy usage) and that the drives I will be able to afford will never exceed 4TB each, then with 32GB of RAM I would still meet the guideline.
  5. Supermicro X9 or X10 (or newer)? My quick research showed no benefit to go for a X10 or X11 unless I buy a really new CPU.
  6. Supermicro backplane: Should I keep it or connect the drives directly with a SAS to SATA expander cable directly to the M1015? I like the hotswap functionality to replace a failed drive without shutting down the server but I guess I can shut it down once in a while to replace a failed drive... Any performance or reliability hit (major)with using a backplane?
  7. PSU: I planned to reuse my current supermicro 16 bay chassis, but I am wondering of I am not talking a significant risk by keeping the hotswap PSU's in use. They are pushing 8 years old now so.... And they are probably not platinum or anywhere close to that..
Looking for your input guys! Cheers and thanks in advance!
 
Last edited:
Status
Not open for further replies.
Top