I've recently started using fio (in addition to SMART, badblocks/nwipe, and diskinfo) to do a better job screening hard drives before adding them to a pool. In the past, disks without the 24hr fio would pass everything, yet they will still start throwing errors and sometimes even get kicked hours/days after being added to a pool. I kept those disks around out of curiousity, but even those disks pass fio - no threads crash and there's no change in SMART wear parameters afterwards. so I feel like this burn-in is still lacking. What else can I add?
I've based my process on
https://www.truenas.com/docs/core/gettingstarted/corehardwareguide/ and
https://www.truenas.com/community/threads/how-do-you-burn-in-new-disks.67774/
specifically:
I've based my process on
https://www.truenas.com/docs/core/gettingstarted/corehardwareguide/ and
https://www.truenas.com/community/threads/how-do-you-burn-in-new-disks.67774/
specifically:
- SMART extended to compare against ( smartctl -a /dev/adaX > adaX_1baseline.txt )
- - reject condition: any non-zero pending or uncorrectable sectors
- (unless I'm interested in whether any of the subsequent writing helps 'heal' the drive with sector reallocation)
- 24hr of random small FIO to also stress the drive ( fio --name=randrw --time_based --runtime=86400 --iodepth=64 --rw=randrw --bs=512 --direct=1 --numjobs=4 --filename=/dev/adaX )
- - reject condition: if any threads die
- another SMART extended ( smartctl -a /dev/adaX > adaX_3post_fio.txt )
- - this only takes a few minutes for such small SSDs
- - reject condition: any wear parameters increase
- write latency consistency (diskinfo -wS)
- - reject condition: unclear. informational.
- badblocks to look for bad regions ( time badblocks -b 4096 -c 16384 -p 1 -svw /dev/adaX )
- - timing it also gives a feel of net R&W sequential performance
- - reject condition: if any bad blocks are detected
- - nice for b parameter to equal physical sector size (512 bytes in this case)
- - b parameter X c parameter should roughly match drive cache side to maximize speed/stress. (for HDDs I use -b 4096 -c 16384 for 64MiB)
- one final SMART extended ( smartctl -a /dev/ada0 > ada0_6final.txt )
- - reject condition: if any wear parameters increase