Stuck after reboot

carlosmp · Oct 23, 2014

Having a problem with our Freenas box.

Our server is a Supermicro X7DBE+ Dual L5420, 16GB RAM, LSI 9211-8i. We are using 8x Seagate 2TB Barracuda ST2000DM001 drives. From the boot process, it says it's FreeBSD-RELEASE-p4 #0 r262572+17a4d3d

On boot it says:

Code:

Trying to mount rot from ufs:/dev/ufs/FreeNASs1a [ro]...
WARNING: /data was not properly discoumted
Leading early kernel modules:
GEOM_RAID5: Module loaded, version 1.1.20130907.44 (rev 5c6d2a159411)
/dev/ufs/FreeNASs4: 12 Files, 144459 used, 26068 free (36frags, 3254 blocks 0 % fragmentation)
** /dev/ufs/FreeNASs4
** Last Mounted on /data
** Phase 1 - Check Blocks and sizes
** Phase 2 - Check Pathnames
** Phase 3 - Check Connectivity
** Phase 4 - Check Reference Counts
** Phase 5 - Check Cyl groups
12 files, 14459 used, 26068 free (36 frags, 3254 blocks, 0.1% fragmentation)

***** FILE SYSTEM IS CLEAN *****
savecore: /dev/dumpdev: No such file or directory
Settings hostuuid: 53d19f64-d663-xxxxxx 
Seting hostid: 0x7396de3a
ZFS filesystem version: 5
ZFS storage pool version: features support (5000)
No suitable dump device was found.
Entropy harvesting: interrupts ethernet point_to_point kickstart.
Starting file system checks:
/dev/ufs/FreeNASs1a: FILE SYSTEM CLEAN; SKIPPING CHECKS
/dev/ufs/FreeNASs1a: clean, 200300 free (
/dev/ufs/FreeNASs3: FILE SYSTEM CLEAN; SKIPPING CHECKS
/dev/ufs/FreeNASs3: clean 2829 free
/dev/ufs/FreeNASs4: FILE SYSTEM CLEAN; SKIPPING CHECKS
/dev/ufs/FreeNASs4: clean, 26068 free
Mounting local file systems:.
...
scrolls too fast
...
File "/usr/local/lib/python2.7/site-packages/django/db/backends/util.py:. line 53, in execute
  return self.cursor.execute(sql, params)
File "/usr/local/lib/python2.7/site-packages/django/db/utils.py:. line 99, in _exit_
  six.reraise(dj_exc_type, dj_exc_value, traceback)
File "/usr/local/lib/python2.7/site-packages/django/db/backends/util.py:. line 53, in execute
  return self.cursor.execute(sql, params)
File "/usr/local/lib/python2.7/site-packages/django/db/backends/sqlite3/base.py:. line 450, in execute
  return Database.Cursor.execute(self, query, params)
django.db.utils.OperationalError: no such column: system_advanced.adv_system_pl (scrolled off screen)
net.inet.tcp.sendbuf_max: 2097152 -> 2097152
kern.ipc.maxsockbuf: 2097152 -> 2097152
net.inet.tcp.recvbuf_max: 2097152-> 2097152
vfs.zfs.l2arc_headroom: 2-> 16
vfs.zfs.l2arc_noprefetch: 1 -> 0
vfs.zfs.l2arc_write_boost: 8388608 -> 400000000
net.inet.tcp.delayed_ack: 0 -> 0
vfs.zfs.l2arc_write_max: 8388608 -> 400000000
vfs.zfs.l2arc_norw: 1 -> 0

after sitting there for a few minutes, it reboots.

We had an issue a few weeks back where the freespace ran out. I was able to clear that up and the server was working fine. The other day, I happen to be on the server, and noticed the alerts indicated one of the drives was bad. It had no errrors, just reporting it had some retries on it from what I could remember. I attempted to go into the GUI to see if we bring the disk offline and use the spare disk to rebuild while the "bad" disk is replaced. I remember being able to pull teh information up, and seeing the option to edit/offline the disk. Wasn't quite sure, and decided I would handle this when I was back at the office. When I came back to the office, I was not able to see any of those options in the GUI. After a reboot, the system gets stuck, at a point where it is looking for l2arc_no_rw. If I disconnect the drives, the system boots, but of course complains about missing volume1.

I was wondering if there's a way of clearing the volume from the freenas system through the shell, and attempt to reimport the pool. I've come to grips with the fact that our pool may be done. However, I'd like to see if anything is possible to attempt to recover the data. It's mostly off-site backups, so it's just a matter of re-seeding the backups. Would prefer to not, but if we must, we must.

Thanks,

Carlos.

cyberjock · Oct 23, 2014

You can't really do anything from the volume without importing the pool first. But the pool won't mount because something is broken. That's the catch 22. It's also why I have it clearly mentioned in my noobie guide that ZFS doesn't offer recovery tools. ZFS is one of those where you have 100% assured data, or you are fscked. There is only a very thin line in between.

Now, if you are a hardcore programmer, zdb (ZFS debugger) will let you do amazing things, if you are a pro at it. But I can tell you that you aren't a pro (and aren't likely to figure out what the problem is and then fix it yourself) since you are asking for help. ;)

To be honest, if they're just backups I'd just remake the pool and redo the backups. For the time and effort you'll put in you are extremely unlikely to recover the pool and the data doesn't sound to be uniquely important. Cut the losses and rebuild it.

carlosmp · Oct 23, 2014

That's the exiting portion...trying to recover. I think I've made a bit of progress....I had another box with OmniOS and decided to see what happened. I'm able to import the pool as read only, and it seems to pull up the directory structure, as well as the actual sizes. If I import the pool like a normal pool, it will crash and reboot. If I do it readonly, it imports ok. Which leads me to wonder if I can export that pool out of freenas in an attempt to import it again, and move/copy the files to another disk/pool...

Building another box with 11x3TB in raidz3 since we were running out of space, with only 3TB free.

cyberjock · Oct 23, 2014

I actually just explained that exact scenario like 2 posts before yours. Look at my history and you'll probably find it. ;)

carlosmp · Oct 23, 2014

cyberjock - I just saw it, and figure I'll continue from that point...

I pulled the drives out, rebooted, and was able to export the pool. I told it to save the shares so I could remap if they came back. I then rebooted the box, with the disks back in the box came up normally.

Here's the results of the zpool status and zpool import from the first round:

Code:

[root@freenas18] ~# zpool import
   pool: volume01-fn18
     id: 17270216442001154287
  state: ONLINE
status: The pool was last accessed by another system.
action: The pool can be imported using its name or numeric identifier and
        the '-f' flag.
   see: http://illumos.org/msg/ZFS-8000-EY
config:

        volume01-fn18                                   ONLINE
          raidz1-0                                      ONLINE
            gptid/60382671-61d9-11e2-9407-003048307606  ONLINE
            gptid/60eff638-61d9-11e2-9407-003048307606  ONLINE
            gptid/61a9ad8c-61d9-11e2-9407-003048307606  ONLINE
            gptid/6267c82f-61d9-11e2-9407-003048307606  ONLINE
            gptid/630ecc32-61d9-11e2-9407-003048307606  ONLINE
            gptid/63b9c5ca-61d9-11e2-9407-003048307606  ONLINE
            gptid/64650bb0-61d9-11e2-9407-003048307606  ONLINE
        spares
          gptid/65a31102-61d9-11e2-9407-003048307606
[root@freenas18] ~# zpool import -f volume01-fn18
cannot mount '/volume01-fn18': failed to create mountpoint
cannot mount '/volume01-fn18/.system': failed to create mountpoint
cannot mount '/volume01-fn18/.system/cores': failed to create mountpoint
cannot mount '/volume01-fn18/.system/samba4': failed to create mountpoint
cannot mount '/volume01-fn18/.system/syslog': failed to create mountpoint
cannot mount '/volume01-fn18/backup': failed to create mountpoint
cannot mount '/volume01-fn18/cmp-datavault': failed to create mountpoint
cannot mount '/volume01-fn18/cmp-repository': failed to create mountpoint
cannot mount '/volume01-fn18/iso': failed to create mountpoint
cannot mount '/volume01-fn18/preappletter': failed to create mountpoint
cannot mount '/volume01-fn18/pve-iscsi': failed to create mountpoint
cannot mount '/volume01-fn18/pvebackup': failed to create mountpoint
cannot mount '/volume01-fn18/pvestorage': failed to create mountpoint
cannot mount '/volume01-fn18/spim': failed to create mountpoint
cannot mount '/volume01-fn18/spim/cridenlove': failed to create mountpoint
cannot mount '/volume01-fn18/spim/genstair': failed to create mountpoint
cannot mount '/volume01-fn18/spim/jmhlaw': failed to create mountpoint
cannot mount '/volume01-fn18/spim/mhhcpa': failed to create mountpoint
cannot mount '/volume01-fn18/spim/roadspeds': failed to create mountpoint
cannot mount '/volume01-fn18/spim/runcentral': failed to create mountpoint
cannot mount '/volume01-fn18/spim/spimiami': failed to create mountpoint
cannot mount '/volume01-fn18/storegrid': failed to create mountpoint
cannot mount '/volume01-fn18/wcsicm': failed to create mountpoint

Found an article - http://docs.oracle.com/cd/E19253-01/819-5461/ghnoq/index.html that seemed to adress the issue

Code:

[root@freenas18] ~# zfs list -r
NAME                            USED  AVAIL  REFER  MOUNTPOINT
volume01-fn18                  6.58T  3.88T  10.0G  /volume01-fn18
volume01-fn18/.system           494M  3.88T   283K  /volume01-fn18/.system
volume01-fn18/.system/cores     242K  3.88T   242K  /volume01-fn18/.system/cores
volume01-fn18/.system/samba4   16.5M  3.88T  16.5M  /volume01-fn18/.system/samba4
volume01-fn18/.system/syslog    477M  3.88T   477M  /volume01-fn18/.system/syslog
volume01-fn18/backup            242K  3.88T   242K  /volume01-fn18/backup
volume01-fn18/cmp-datavault     242K  3.88T   242K  /volume01-fn18/cmp-datavault
volume01-fn18/cmp-repository    242K  3.88T   242K  /volume01-fn18/cmp-repository
volume01-fn18/iso               635G  3.88T   635G  /volume01-fn18/iso
volume01-fn18/preappletter      330K  1024G   330K  /volume01-fn18/preappletter
volume01-fn18/pve-iscsi         242K  3.88T   242K  /volume01-fn18/pve-iscsi
volume01-fn18/pvebackup        1016G  3.88T  1016G  /volume01-fn18/pvebackup
volume01-fn18/pvestorage       1.94G  3.88T  1.94G  /volume01-fn18/pvestorage
volume01-fn18/spim             3.13T  3.88T   494G  /volume01-fn18/spim
volume01-fn18/spim/cridenlove   222G  3.88T   222G  /volume01-fn18/spim/cridenlove
volume01-fn18/spim/genstair     790G  3.88T   790G  /volume01-fn18/spim/genstair
volume01-fn18/spim/jmhlaw       213G  3.88T   213G  /volume01-fn18/spim/jmhlaw
volume01-fn18/spim/mhhcpa       434G  3.88T   434G  /volume01-fn18/spim/mhhcpa
volume01-fn18/spim/roadspeds   44.8G  3.88T  44.8G  /volume01-fn18/spim/roadspeds
volume01-fn18/spim/runcentral   509G  3.88T   509G  /volume01-fn18/spim/runcentral
volume01-fn18/spim/spimiami     501G  3.88T   501G  /volume01-fn18/spim/spimiami
volume01-fn18/storegrid         488G  3.88T   488G  /volume01-fn18/storegrid
volume01-fn18/wcsicm           1.34T  3.88T  1.13T  /volume01-fn18/wcsicm
[root@freenas18] ~# zfs inherit -r mountpoint volume01-fn18

I rebooted and then had:

Code:

[root@freenas18] ~# zpool status
no pools available
[root@freenas18] ~# zpool import
   pool: volume01-fn18
     id: 17270216442001154287
  state: ONLINE
status: The pool is formatted using a legacy on-disk version.
action: The pool can be imported using its name or numeric identifier, though
        some features will not be available without an explicit 'zpool upgrade'.
config:

        volume01-fn18                                   ONLINE
          raidz1-0                                      ONLINE
            gptid/60382671-61d9-11e2-9407-003048307606  ONLINE
            gptid/60eff638-61d9-11e2-9407-003048307606  ONLINE
            gptid/61a9ad8c-61d9-11e2-9407-003048307606  ONLINE
            gptid/6267c82f-61d9-11e2-9407-003048307606  ONLINE
            gptid/630ecc32-61d9-11e2-9407-003048307606  ONLINE
            gptid/63b9c5ca-61d9-11e2-9407-003048307606  ONLINE
            gptid/64650bb0-61d9-11e2-9407-003048307606  ONLINE
        spares
          gptid/65a31102-61d9-11e2-9407-003048307606

I went ahead and ran the import and seems to be processing, as it's somewhat stuck, but assuming the watchdog would've rebooted by now, and seeing plenty of blinking lights. Going to give it a while and see what happens, but it definitely looks pretty good right now.

Or am I being too optimistic?

Thanks,
Carlos.

cyberjock · Oct 23, 2014

You're being abit optimistic in my opinion. If you do an import from the WebGUI and it doesn't import within about 20 seconds (could be a minute for systems with 50+ drives) then something is probably broken and stuck in a loop.

Even *if* you were to get the pool back, I'd never trust it with data again. I'd still recommend you destroy it and rebuild it (and as a RAIDZ2). So you gained nothing but spent time trying to fix something that wasn't really trustworthy anyway. That's why the practical side of me said to just make a new pool and let life go on.

Do NOT try to change the mountpoints using that link. That's NOT for FreeNAS and you'll be in even worse shape if you try it. That link is Oracle, some of it will work, some of it will seriously fubar things. This is why we tell people don't go to the CLI and willy-nilly do things with pools that have uniquely important data. They google, find some well written link from something looking professional and figure "I can't lose anything". Well, you can.. your data. And we've seen it many times. Of course, in your case it's backups so who cares.

But the pool would still never be trusted by me every again.

Important Announcement for the TrueNAS Community.

Stuck after reboot

carlosmp

Cadet

cyberjock

Inactive Account

carlosmp

Cadet

cyberjock

Inactive Account

carlosmp

Cadet

cyberjock

Inactive Account

Similar threads