Strange boot device problems

snaptec

Guru
Joined
Nov 30, 2015
Messages
502
Hello everybody.

I've got really strange problems with a HP Gen8 G1610 16gb ECC FN 9.10.2 Box.

From the beginning:
Installed the System with 4x 2TB Raidz2 early last year.
No problems till last week.
There are 2 Kingston Data Traveler Mirrored.
One gave up without any notice.

I changed the faulty one and replace the first one.

During resilver the second one has thrown read errors.

I decided its the best to replace booth.

Put in a new Intenso and a new Toshiba 16GB Stick. Installed FN from scratch, restored the config from a backup.
FN back up and everyone was happy.


Now one week later (last night) I got a mail:
The boot volume state is ONLINE: One or more devices has experienced an unrecoverable error. An attempt was made to correct the error. Applications are unaffected.

The toshiba stick is gone bad.
Ok, faulty new stick can happen.
Some /var/log/messages from the dying stick:
Code:
Jan 19 06:36:04 fnnas (da0:umass-sim0:0:0:0): READ(10). CDB: 28 00 01 cd ee 38 00 00 10 00 

Jan 19 06:36:04 fnnas (da0:umass-sim0:0:0:0): CAM status: CCB request completed with an error

Jan 19 06:36:04 fnnas (da0:umass-sim0:0:0:0): Retrying command

Jan 19 06:36:04 fnnas (da0:umass-sim0:0:0:0): READ(10). CDB: 28 00 01 cd ee 38 00 00 10 00 

Jan 19 06:36:04 fnnas (da0:umass-sim0:0:0:0): CAM status: CCB request completed with an error

Jan 19 06:36:04 fnnas (da0:umass-sim0:0:0:0): Retrying command

Jan 19 06:36:05 fnnas (da0:umass-sim0:0:0:0): READ(10). CDB: 28 00 01 cd ee 38 00 00 10 00 

Jan 19 06:36:05 fnnas (da0:umass-sim0:0:0:0): CAM status: CCB request completed with an error

Jan 19 06:36:05 fnnas (da0:umass-sim0:0:0:0): Error 5, Retries exhausted

Jan 19 06:36:07 fnnas (da0:umass-sim0:0:0:0): got CAM status 0x44

Jan 19 06:36:07 fnnas (da0:umass-sim0:0:0:0): fatal error, failed to attach to device

Jan 19 06:36:07 fnnas da0 at umass-sim0 bus 0 scbus7 target 0 lun 0

Jan 19 06:36:07 fnnas da0: <TOSHIBA TransMemory 1.00> s/n 7427EA2C3BC7C0216ABA520B detached

Jan 19 06:36:08 fnnas (da0:umass-sim0:0:0:0): Periph destroyed


Added another one and started a replace.
/var/log/messages/
Code:
Jan 19 12:23:20 fnnas da0 at umass-sim2 bus 2 scbus9 target 0 lun 0

Jan 19 12:23:20 fnnas da0: <Intenso Rainbow Line 8.07> Removable Direct Access SPC-2 SCSI device

Jan 19 12:23:20 fnnas da0: Serial Number C8EDFA52

Jan 19 12:23:20 fnnas da0: 40.000MB/s transfers

Jan 19 12:23:20 fnnas da0: 30000MB (61440000 512 byte sectors)

Jan 19 12:23:20 fnnas da0: quirks=0x2<NO_6_BYTE>

Jan 19 12:40:11 fnnas notifier: 32+0 records in

Jan 19 12:40:11 fnnas notifier: 32+0 records out

Jan 19 12:40:11 fnnas notifier: 33554432 bytes transferred in 3.492055 secs (9608793 bytes/sec)

Jan 19 12:40:15 fnnas notifier: dd: /dev/da0: end of device

Jan 19 12:40:15 fnnas notifier: 33+0 records in

Jan 19 12:40:15 fnnas notifier: 32+0 records out

Jan 19 12:40:15 fnnas notifier: 33554432 bytes transferred in 3.628615 secs (9247173 bytes/sec)

Jan 19 12:40:16 fnnas ZFS: vdev state changed, pool_guid=964699875610815041 vdev_guid=13863602393974146011

Jan 19 12:41:15 fnnas notifier: /usr/local/sbin/grub-install: Input/output error

Jan 19 12:41:21 fnnas manage.py: [py.warnings:206] /usr/local/www/freenasUI/../freenasUI/freeadmin/middleware.py:206: DeprecationWarning: BaseException.message has been deprecated as of Python 2.6

  else unicode(excp.message)

Jan 19 16:38:52 fnnas ZFS: vdev state changed, pool_guid=964699875610815041 vdev_guid=6409044869017274353				

Some output from zpool status

Code:
  pool: freenas-boot

 state: DEGRADED

status: One or more devices is currently being resilvered.  The pool will

	continue to function, possibly in a degraded state.

action: Wait for the resilver to complete.

  scan: resilver in progress since Thu Jan 19 12:40:25 2017

		574M scanned out of 650M at 75.7K/s, 0h17m to go

		568M resilvered, 88.29% done

config:


	NAME			 STATE	 READ WRITE CKSUM

	freenas-boot	 DEGRADED	 0	 0   445

	  mirror-0	   DEGRADED	 0	 0   929

		replacing-0  FAULTED	  0	 0	 0

		  da0p2/old  FAULTED	 18   301	44  too many errors

		  da0p2	  ONLINE	   0	 0	 0  (resilvering)

		da1p2		DEGRADED	 0	 0   975  too many errors


The resilver is damn slow and the "ok" usb sticks seems to go bad too.

A little later:

Code:
 pool: freenas-boot

 state: DEGRADED

status: One or more devices is currently being resilvered.  The pool will

	continue to function, possibly in a degraded state.

action: Wait for the resilver to complete.

  scan: resilver in progress since Thu Jan 19 12:40:25 2017

		676M scanned out of 650M at 73.0K/s, (scan is slow, no estimated time)

		668M resilvered, 103.93% done

config:


	NAME			 STATE	 READ WRITE CKSUM

	freenas-boot	 DEGRADED	 0	 0   495

	  mirror-0	   DEGRADED	 0	 0 1.01K

		replacing-0  FAULTED	  0	 0	 0

		  da0p2/old  FAULTED	 18   301	44  too many errors

		  da0p2	  ONLINE	   0	 0	 0  (resilvering)

		da1p2		DEGRADED	 0	 0 1.05K  too many errors  (resilvering)


errors: 495 data errors, use '-v' for a list


After it has finished I started a scrub, now both are resilvering:

Code:
 pool: freenas-boot

 state: DEGRADED

status: One or more devices is currently being resilvered.  The pool will

	continue to function, possibly in a degraded state.

action: Wait for the resilver to complete.

  scan: resilver in progress since Thu Jan 19 16:39:14 2017

		270M scanned out of 652M at 100K/s, 1h4m to go

		269M resilvered, 41.34% done

config:


	NAME					   STATE	 READ WRITE CKSUM

	freenas-boot			   DEGRADED	 0	 0	 3

	  mirror-0				 DEGRADED	 0	 0	 6

		replacing-0			DEGRADED	 6	 0	 0

		  4788658346873365335  UNAVAIL	  0	 0	 0  was /dev/da0p2/old

		  da0p2				ONLINE	   0	 0	 6  (resilvering)

		da1p2				  ONLINE	   0	 0	 9  (resilvering)


errors: 273 data errors, use '-v' for a list

Even the brand new already has checksum errors!?
Every stick is in a different USB Port.
So the inner, rear and front have been in use.
What the hell am I doing wrong?
What shall I do now?
Problem is that the FN is 1,5h driving away from me.
If the resilver will work add a third stick and make a 3-way mirror?
As the grub install failed, will it boot?


Thanks in advance
 

mauorrizze

Cadet
Joined
Jan 19, 2017
Messages
4
I may have got the same problem with my new setup and two SanDisk thumb drives that test really well on the same machine in linux. In FreeNAS I get the same
CAM status: CCB request completed with an error after a random time. "Luckily" in my case it started right after installation and not days or weeks later.
Supermicro-Board, ECC, FreeNAS-9.10.2-U1

As the long as the system is running I'd suggest you keep it on, as a reboot shouldn't fix anything (or only temporarily). Reboot should work as long as your mainboard finds one drive with a valid MBR/EFI, depends on the boot order.
Do you have suggestions what I should test? Another version? Older or 10 nightly?
 

Arwen

MVP
Joined
May 17, 2014
Messages
3,611
@snaptec
From what people have found out, USB flash drives are not very reliable.

Since FreeNAS 9.3, (released early 2015), we have ZFS boot. This gives us several
nice options; easy boot device mirroring, update roll back, (because of snapshots), and
verification of data. This last has seem to found many USB flash drives are either fake
in size, (meaning they lie, and are really MUCH smaller than they say they are), or are
simply not reliable.

My suggestion is to run a size test against all new flash drives. Basically a program writes
data to the entire flash drive in a specific pattern. Then reads it back verifying that it was
written correctly. I use the Unix F3 program from;

https://github.com/AltraMayor/f3
http://oss.digirati.com.br/f3/

But there is a MS-Windows program to do the same.
 

SweetAndLow

Sweet'NASty
Joined
Nov 6, 2013
Messages
6,421
Flash drives die. Sandisk cruzer is the go to Device around here. Mine have lasted 3 years so far. If you don't like the failure rate get a small ssd and it will be more reliable and a little faster at updating.

Sent from my Nexus 5X using Tapatalk
 

pschatz100

Guru
Joined
Mar 30, 2014
Messages
1,184
I average about one flash drive failure per year. I see this as annoying, but not really a big problem. Just be certain to maintain regular backups of your configuration. I only update FreeNAS once per month at the most, and always make certain to run a configuration backup before an update.

Sandisk Cruzer is a good choice. However, even the most reliable flash drives are not very reliable.
 

snaptec

Guru
Joined
Nov 30, 2015
Messages
502
@SweetAndLow
Sure USB Drives are not that reliable, but 5 drives in a week is even for flash drives really unreliable!?
There is no free Slot in the HP Gen8.
On my other systems I always use mirrored 20gb ssds without problems.

@Arwen
I will try that with new ones. unix is fine, I don't have M$ systems at all.

@pschatz100
One per year would be ok

Any hint what shall I do know with that system? State is still degraded, but it runs.
 

mauorrizze

Cadet
Joined
Jan 19, 2017
Messages
4
@snaptec have you actually tested your drives if they were faulty after FN complained? E.g. with F3 which @Arwen mentioned? Because my two SanDisk Cruzer drives really are not faulty. I've used badblocks in linux and it found zero errors, directly after FreeNAS complained with the CAM/CCB errors. After some testing and reinstalling (different versions) the errors have occurred less frequently and then disappeared completely, also on the current version. I still have one of the "faulty" Cruzers in use with two other drives (mirror of 3) and my boot volume is still healthy.

Perhaps there were some solar winds the USB controller didn't like o_O ... at least in my case.
In your case you first have to get a clean boot volume state, easiest would be re-installation and upload settings, I guess. But then please test your drives on another system!

Starting with FreeNAS 10 I'm gonna use SSDs, as I've already noticed more boot volume access in the current beta (which looks very nice).
 

snaptec

Guru
Joined
Nov 30, 2015
Messages
502
Not yet. As Mentioned at the beginning the Freenas is physically 170km away from me.

At the moment there is one online and one degraded device, both with "just" checksum errors.
Freenas is running as normal.

Sure Reinstall would be best, but costs me over 4 hours of time -.-
 

scrappy

Patron
Joined
Mar 16, 2017
Messages
347
I had the same CAM/CCB write errors while trying to install FreeNAS to my 1U Supermicro X8DTU-F server. Five hours were spent pulling my hair out, continually failing to even finish creating a boot drive installation. When I finally got it installed without error, after booting into FreeNAS my system would start to lockup after 5-10 minutes until I forced a hard reset. Fixing this problem resorted to using a separate laptop hard drive mounted inside of an eSata enclosure and booting from it.

Today I took the same "failed" 16GB SanDisk Cruzer Fit drives and tested them in my laptop as a makeshift FreeNAS server. Installation went perfectly with zero errors. Both Cruzer flash drives are setup as mirrored boot drives. I have not had any problems yet. This makes me wonder if I have a server motherboard incompatibility with those SanDisk Cruzer Fits, or if something in BIOS is improperly configured.
 

push

Cadet
Joined
Jun 15, 2019
Messages
2
We have the same issue on our X8DT3. Not caused by a usb disk. AAAH what should we do?

Best,
A
 

Chilternburt

Cadet
Joined
Apr 30, 2014
Messages
6
Sorry to resurrect an old thread, but I've suddenly started having these same issues, except mine has been up and running fine since 2016... never a days issue
Currently my setup is inside a Gen 7 HP MicroServer with 8GB of RAM and running FreeNAS 9.10.2-U6 which was the final release for 9.10 as far as I know.
I have 4 x 2TB Drives running in ZFS RAID and again never a days issue, but this morning i did a reboot and got the following...
If anyone has any suggestions, I'm all ears.
FreeNAS.jpg
 

KrisBee

Wizard
Joined
Mar 20, 2017
Messages
1,288
@Chilternburt If you're booting from a USB memory stick, it looks like this is failing.
 

SweetAndLow

Sweet'NASty
Joined
Nov 6, 2013
Messages
6,421
Top