11.3 RC1 broke 3 of my 5 jails

jon_2112

Explorer
Joined
Feb 8, 2016
Messages
52
I upgraded from 11.2 U7 to 11.3 RC1 tonight. All of my jails were on 11.2-RELEASE and all manually created (no plugins). After I upgraded, two of my jails are listed as CORRUPT and one of them fails to start with a DHCP error.

iocage list
Code:
+-----+---------+---------+--------------+------+
| JID |  NAME   |  STATE  |   RELEASE    | IP4  |
+=====+=========+=========+==============+======+
| -   | plex    | CORRUPT | N/A          | N/A  |
+-----+---------+---------+--------------+------+
| -   | radarr  | CORRUPT | N/A          | N/A  |
+-----+---------+---------+--------------+------+
| 15  | sabnzbd | up      | 11.2-RELEASE | DHCP |
+-----+---------+---------+--------------+------+
| -   | sonarr  | down    | 11.3-RELEASE | DHCP |
+-----+---------+---------+--------------+------+
| 3   | unifi   | up      | 11.2-RELEASE | DHCP |
+-----+---------+---------+--------------+------+


iocage start sonarr
Code:
No default gateway found for ipv6.
* Starting sonarr
  + Started OK
  + Using devfs_ruleset: 8
  + Configuring VNET OK
  + Using IP options: vnet
  + Starting services OK
  + Executing poststart OK
  + Acquiring DHCP address: FAILED, address received: ERROR, check jail logs

Stopped sonarr due to DHCP failure


If I manually set an IP in Sonarr, it starts. The jail log is uninteresting; the entries are all identical to before the upgrade, when everything worked fine:

Code:
ELF ldconfig path: /lib /usr/lib /usr/lib/compat /usr/local/lib /usr/local/lib/perl5/5.30/mach/CORE
32-bit compatibility ldconfig path: /usr/lib32
32-bit compatibility ldconfig path: /usr/lib32
Starting Network: lo0.
lo0: flags=8049<UP,LOOPBACK,RUNNING,MULTICAST> metric 0 mtu 16384
        options=680003<RXCSUM,TXCSUM,LINKSTATE,RXCSUM_IPV6,TXCSUM_IPV6>
        inet 127.0.0.1 netmask 0xff000000
        inet6 ::1 prefixlen 128
        inet6 fe80::1%lo0 prefixlen 64 scopeid 0x1
        nd6 options=21<PERFORMNUD,AUTO_LINKLOCAL>
        groups: lo
add host 127.0.0.1: gateway lo0 fib 0: route already in table
add host ::1: gateway lo0 fib 0: route already in table
add net fe80::: gateway ::1
add net ff02::: gateway ::1
add net ::ffff:0.0.0.0: gateway ::1
add net ::0.0.0.0: gateway ::1
Creating and/or trimming log files.
Starting syslogd.
Clearing /tmp (X related).
Updating motd:.
Starting sonarr.
Starting cron.

Tue Dec 10 21:49:40 CST 2019


As for plex and radarr, the root/ directories for them have subdirectories (bin, dev, boot, etc) but they're all empty. I suppose I'll have to go create all new jails but I'm not sure what happened here.
 

waqarahmed

iXsystems
iXsystems
Joined
Aug 28, 2019
Messages
136
Hi @jon_2112, can you please create a ticket at https://jira.ixsystems.com/projects/NAS and attach a debug of your system please ( without deleting the affected jails ).
I have been looking at the reddit thread and your jail should have freebsd-version in the bin directory, however you mention that it's empty, that's why it's coming up as CORRUPT because iocage is not able to determine the jail version. Can you please post your fstab for the affected jails ? ( If you create a debug which i would highly recommend as you have another jail with networking issues - likely just some prop setting for that, then attaching your fstab here is not necessary ).

Looking forward to hearing from you. Thank you
 

jon_2112

Explorer
Joined
Feb 8, 2016
Messages
52
Sure, will do that now. I've been playing with the plex jail a little bit but haven't touched the radarr one. Which kind of debug should I run? Everything, just jail?
 

waqarahmed

iXsystems
iXsystems
Joined
Aug 28, 2019
Messages
136
Please go to system->advanced and save debug. That will generate a debug of the system and finally please attach it to the jira ticket.
 

appliance

Explorer
Joined
Nov 6, 2019
Messages
96
have problem with always busy pools in beta1, tcg sdd drive not loading, and regular reboots so i will try RC and check jails too.
update isn't going well :
Error: [ENOENT] Invalid train name.
CLOSE
also the combobox in version selection is broken, nothing is selected, but below there's some text with 4 lines about version 11.3. -> i see the note about switching train now. the train is chosen, and it takes some time before buttons are available. now it's working, confusing thou.

result: all 10 jails intact. will monitor for other issues.
 
Last edited:

appliance

Explorer
Joined
Nov 6, 2019
Messages
96
nothing wrong with configs or jails, except replication logs disappeared. even after a replication, "No logs are available for this task." and wrong status (FAIL instead of last SUCCESS). Could it be encrypted drives? I checked the status after unlocking.
note: i didn't mix baselines. only 11.3-RELEASE. so easier for me.

same problems as in beta 1 not related to your problem: server crashes at the start of replication "NAS had an unscheduled system reboot", i just run it for you and finally see that kernel panic.. but it's not stored and camera was dead so i didn't take the picture. So i start replication again, another crash, camera on:
get_dsize_sync() at bp_get_dsize_sync+0x11b/frame ........
enter panic
thread pid 0 tid 101497
stopped at kdb_enter+0x3b: movq $0.kdb_why
kdb.enter.default>write cn_mute 1


improvement: no more blank SSD drive after reboot, that was terrible as it blocked all jails and ssh.

some annoyance pop up on machine console, all the time last month:
rrdcached plugin: Failed to connect to RRDCacheD at unix:/var/run/rrdcached.sock: Unable to connect to rrdcached: No such file or directory (status=2)
 

jon_2112

Explorer
Joined
Feb 8, 2016
Messages
52
Wow you guys are quick. Thanks for linking to the ticket @waqarahmed.

For the sonarr DHCP issue: I was looking through the debug output that I uploaded and noticed that the rc.conf for the sonarr jail had a few extra lines that the working jails didn't:
Code:
ifconfig_epair0="DHCP"
network_interfaces="lo0"


I took those lines out and that seemed to fix it.
 
Joined
Mar 3, 2019
Messages
17
replication logs disappeared. even after a replication, "No logs are available for this task." and wrong status (FAIL instead of last SUCCESS).
Are you using new replication engine? (because we didn't have logs for one we were using in <11.2) Also, logs are lost after reboot (but status is persisted), can this be the case?

Please post here or DM me what does
Code:
midclt call replication.query

command print and what do you have in /var/log/middlewared.log and /var/log/zettarepl.log. Or even better you can create an issue with debug information attached in our JIRA https://jira.ixsystems.com/ using System → Support. Thank you.
 

appliance

Explorer
Joined
Nov 6, 2019
Messages
96
I can send. It just crashed when i looked at this thread :) Pool1 and Pool2 are drives being snapshotted at 6:05, and after that automatically replicated to RAID pool. Pool2 replication crashes. It's the system dataset with iocages.
That command is not timestamped. i think it's after crash, showing pools that are not yet unlocked:
Code:
[{"id": 1, "name": "SourcePool1 - RAID/Replication", "direction": "PUSH", "transport": "LOCAL", "ssh_credentials": null, "netcat_active_side": null, "netcat_active_side_listen_address": null, "netcat_active_side_port_min": null, "netcat_active_side_port_max": null, "netcat_passive_side_connect_address": null, "source_datasets": ["SourcePool1/Dataset3", "SourcePool1/Dataset1", "SourcePool1/Dataset2"], "target_dataset": "RAID/Replication", "recursive": true, "exclude": [], "properties": true, "naming_schema": [], "auto": true, "only_matching_schedule": false, "allow_from_scratch": false, "hold_pending_snapshots": false, "retention_policy": "CUSTOM", "lifetime_value": 2, "lifetime_unit": "MONTH", "compression": null, "speed_limit": null, "dedup": false, "large_block": true, "embed": false, "compressed": true, "retries": 5, "logging_level": null, "enabled": true, "state": {"state": "ERROR", "datetime": {"$date": 1576127400518}, "error": "cannot open 'SourcePool1/Dataset1': dataset does not exist.", "last_snapshot": null}, "periodic_snapshot_tasks": [{"id": 15, "dataset": "SourcePool1/Dataset1", "recursive": true, "exclude": [], "lifetime_value": 2, "lifetime_unit": "WEEK", "naming_schema": "auto-%Y-%m-%d_%H-%M", "allow_empty": false, "enabled": true, "schedule": {"minute": "10", "hour": "6", "dom": "*", "month": "*", "dow": "*", "begin": "06:00", "end": "06:15"}}, {"id": 2, "dataset": "SourcePool1/Dataset2", "recursive": true, "exclude": [], "lifetime_value": 2, "lifetime_unit": "WEEK", "naming_schema": "auto-%Y-%m-%d_%H-%M", "allow_empty": false, "enabled": true, "schedule": {"minute": "10", "hour": "6", "dom": "*", "month": "*", "dow": "*", "begin": "06:00", "end": "06:15"}}, {"id": 4, "dataset": "SourcePool1/Dataset3", "recursive": true, "exclude": [], "lifetime_value": 2, "lifetime_unit": "WEEK", "naming_schema": "auto-%Y-%m-%d_%H-%M", "allow_empty": false, "enabled": true, "schedule": {"minute": "10", "hour": "6", "dom": "*", "month": "*", "dow": "*", "begin": "06:00", "end": "06:15"}}], "also_include_naming_schema": [], "schedule": null, "restrict_schedule": null, "job": {"id": 22, "method": "replication.run", "arguments": [1, false], "logs_path": "/tmp/middlewared/jobs/22.log", "logs_excerpt": "[2019/12/12 06:10:00] ERROR    [replication_task__task_1] [zettarepl.replication.run] For task 'task_1' unhandled replication error ExecException(1, \"cannot open 'SourcePool1/Dataset1': dataset does not exist\\n\")\nTraceback (most recent call last):\n  File \"/usr/local/lib/python3.7/site-packages/zettarepl/replication/run.py\", line 117, in run_replication_tasks\n    run_replication_task_part(replication_task, source_dataset, src_context, dst_context, observer)\n  File \"/usr/local/lib/python3.7/site-packages/zettarepl/replication/run.py\", line 169, in run_replication_task_part\n    src_context, dst_context)\n  File \"/usr/local/lib/python3.7/site-packages/zettarepl/replication/run.py\", line 182, in calculate_replication_step_templates\n    replication_task.recursive)\n  File \"/usr/local/lib/python3.7/site-packages/zettarepl/replication/run.py\", line 206, in list_datasets_with_snapshots\n    datasets = list_datasets(shell, dataset, recursive)\n  File \"/usr/local/lib/python3.7/site-packages/zettarepl/dataset/list.py\", line 19, in list_datasets\n    return list(filter(None, shell.exec(args).split(\"\\n\")))\n  File \"/usr/local/lib/python3.7/site-packages/zettarepl/transport/interface.py\", line 83, in exec\n    return self.exec_async(args, encoding, stdout).wait()\n  File \"/usr/local/lib/python3.7/site-packages/zettarepl/transport/local.py\", line 40, in wait\n    raise ExecException(self.process.returncode, stdout)\nzettarepl.transport.interface.ExecException: cannot open 'SourcePool1/Dataset1': dataset does not exist\n", "progress": {"percent": null, "description": null, "extra": null}, "result": null, "error": "[EFAULT] cannot open 'SourcePool1/Dataset1': dataset does not exist.", "exception": "Traceback (most recent call last):\n  File \"/usr/local/lib/python3.7/site-packages/middlewared/job.py\", line 349, in run\n    await self.future\n  File \"/usr/local/lib/python3.7/site-packages/middlewared/job.py\", line 386, in __run_body\n    rv = await self.method(*([self] + args))\n  File \"/usr/local/lib/python3.7/site-packages/middlewared/schema.py\", line 949, in nf\n    return await f(*args, **kwargs)\n  File \"/usr/local/lib/python3.7/site-packages/middlewared/plugins/replication.py\", line 398, in run\n    await self.middleware.call(\"zettarepl.run_replication_task\", id, really_run, job)\n  File \"/usr/local/lib/python3.7/site-packages/middlewared/main.py\", line 1122, in call\n    app=app, pipes=pipes, job_on_progress_cb=job_on_progress_cb, io_thread=True,\n  File \"/usr/local/lib/python3.7/site-packages/middlewared/main.py\", line 1075, in _call\n    return await run_method(methodobj, *args)\n  File \"/usr/local/lib/python3.7/site-packages/middlewared/utils/run_in_thread.py\", line 10, in run_in_thread\n    return await self.loop.run_in_executor(self.run_in_thread_executor, functools.partial(method, *args, **kwargs))\n  File \"/usr/local/lib/python3.7/site-packages/middlewared/utils/io_thread_pool_executor.py\", line 25, in run\n    result = self.fn(*self.args, **self.kwargs)\n  File \"/usr/local/lib/python3.7/site-packages/middlewared/plugins/zettarepl.py\", line 345, in run_replication_task\n    raise CallError(make_sentence(message.error))\nmiddlewared.service_exception.CallError: [EFAULT] cannot open 'SourcePool1/Dataset1': dataset does not exist.\n", "exc_info": {"type": "CallError", "extra": null}, "state": "FAILED", "time_started": {"$date": 1576131000488}, "time_finished": {"$date": 1576131000520}}}, {"id": 4, "name": "SourcePool2 - RAID/Replication", "direction": "PUSH", "transport": "LOCAL", "ssh_credentials": null, "netcat_active_side": null, "netcat_active_side_listen_address": null, "netcat_active_side_port_min": null, "netcat_active_side_port_max": null, "netcat_passive_side_connect_address": null, "source_datasets": ["SourcePool2/iocage", "SourcePool2/System"], "target_dataset": "RAID/Replication", "recursive": true, "exclude": [], "properties": true, "naming_schema": [], "auto": true, "only_matching_schedule": false, "allow_from_scratch": false, "hold_pending_snapshots": false, "retention_policy": "CUSTOM", "lifetime_value": 2, "lifetime_unit": "MONTH", "compression": null, "speed_limit": null, "dedup": false, "large_block": true, "embed": false, "compressed": true, "retries": 5, "logging_level": null, "enabled": true, "state": {"state": "FINISHED", "datetime": {"$date": 1575781601056}, "last_snapshot": "SourcePool2/iocage/releases/11.3-RELEASE/root@auto-2019-12-08_06-05"}, "periodic_snapshot_tasks": [{"id": 1, "dataset": "SourcePool2/System", "recursive": true, "exclude": [], "lifetime_value": 2, "lifetime_unit": "WEEK", "naming_schema": "auto-%Y-%m-%d_%H-%M", "allow_empty": false, "enabled": true, "schedule": {"minute": "5", "hour": "6", "dom": "*", "month": "*", "dow": "*", "begin": "06:00", "end": "06:15"}}, {"id": 8, "dataset": "SourcePool2/iocage", "recursive": true, "exclude": [], "lifetime_value": 2, "lifetime_unit": "WEEK", "naming_schema": "auto-%Y-%m-%d_%H-%M", "allow_empty": false, "enabled": true, "schedule": {"minute": "5", "hour": "6", "dom": "*", "month": "*", "dow": "*", "begin": "06:00", "end": "06:15"}}], "also_include_naming_schema": [], "schedule": null, "restrict_schedule": null, "job": null}]


middleware: zetta starting replication at 6:05 and after few seconds, system crashes. wonder why MailService complains here, i am getting emails.
Code:
[2019/12/12 06:05:00] (DEBUG) ZettareplService._observer_queue_reader():597 - Observer queue got <zettarepl.observer.PeriodicSnapshotTaskStart object at 0x83524ba50>
[2019/12/12 06:05:00] (DEBUG) ZettareplService._observer_queue_reader():597 - Observer queue got <zettarepl.observer.PeriodicSnapshotTaskSuccess object at 0x83428fa50>
[2019/12/12 06:05:00] (DEBUG) ZettareplService._observer_queue_reader():597 - Observer queue got <zettarepl.observer.PeriodicSnapshotTaskStart object at 0x83428f150>
[2019/12/12 06:05:00] (DEBUG) ZettareplService._observer_queue_reader():597 - Observer queue got <zettarepl.observer.PeriodicSnapshotTaskSuccess object at 0x835271550>
[2019/12/12 06:05:00] (DEBUG) ZettareplService._observer_queue_reader():597 - Observer queue got <zettarepl.observer.ReplicationTaskStart object at 0x8352719d0>
[2019/12/12 06:05:00] (DEBUG) ZettareplService._observer_queue_reader():597 - Observer queue got <middlewared.plugins.zettarepl.ReplicationTaskLog object at 0x835271490>
[2019/12/12 06:05:13] (DEBUG) ZettareplService._observer_queue_reader():597 - Observer queue got <zettarepl.observer.ReplicationTaskSnapshotSuccess object at 0x83524b610>
[2019/12/12 06:05:14] (DEBUG) ZettareplService._observer_queue_reader():597 - Observer queue got <middlewared.plugins.zettarepl.ReplicationTaskLog object at 0x83524ba50>
[2019/12/12 06:05:14] (DEBUG) ZettareplService._observer_queue_reader():597 - Observer queue got <middlewared.plugins.zettarepl.ReplicationTaskLog object at 0x83524bcd0>
[2019/12/12 05:07:39] (DEBUG) raven.base.Client.set_dsn():272 - Configuring Raven for host: https://sentry.ixsystems.com
[2019/12/12 06:07:51] (DEBUG) middlewared.setup():1242 - Timezone set to Europe/Germany
[2019/12/12 06:07:54] (DEBUG) middlewared.setup():2173 - Certificate setup for System complete
[2019/12/12 06:07:55] (DEBUG) middlewared.__plugins_load():818 - All plugins loaded
[2019/12/12 06:07:56] (DEBUG) middlewared.__initialize():1343 - Accepting connections
[2019/12/12 06:07:58] (WARNING) MailService.send_raw():375 - Failed to send email: [Errno 8] hostname nor servname provided, or not known
Traceback (most recent call last):
  File "/usr/local/lib/python3.7/site-packages/middlewared/plugins/mail.py", line 353, in send_raw
    server = self._get_smtp_server(config, message['timeout'], local_hostname=local_hostname)
  File "/usr/local/lib/python3.7/site-packages/middlewared/plugins/mail.py", line 395, in _get_smtp_server
    local_hostname=local_hostname)
  File "/usr/local/lib/python3.7/smtplib.py", line 1031, in __init__
    source_address)
  File "/usr/local/lib/python3.7/smtplib.py", line 251, in __init__
    (code, msg) = self.connect(host, port)
  File "/usr/local/lib/python3.7/smtplib.py", line 336, in connect
    self.sock = self._get_socket(host, port, self.timeout)
  File "/usr/local/lib/python3.7/smtplib.py", line 1037, in _get_socket
    self.source_address)
  File "/usr/local/lib/python3.7/socket.py", line 707, in create_connection
    for res in getaddrinfo(host, port, 0, SOCK_STREAM):
  File "/usr/local/lib/python3.7/socket.py", line 748, in getaddrinfo
    for res in _socket.getaddrinfo(host, port, family, type, proto, flags):


zettarepl.log is 0 bytes since installation (old timestamp).
i'd like to know how to trigger scheduled snapshots manually, using the same "auto-%Y-%m-%d_%H-%M" profiles. that't be good for testing. but even when i replicate from 'Replication tasks', therefore without new snapshots (nothing to do), Pool2 crashes.
 

appliance

Explorer
Joined
Nov 6, 2019
Messages
96
the replication misbehaves so much in 11.3
Replication "RAID/Replication" failed: No incremental base on dataset 'Pool1/Dataset1' and replication from scratch is not allowed.
then i check the 'replication from scratch' setting temporarily, and next action gives mount is busy errors.
both errors were given quite many times. finally, the kernel panic and the wrong status (with last snapshot unknown).
 

appliance

Explorer
Joined
Nov 6, 2019
Messages
96
in RC there's now kernelpanic fest for me, now i'm getting crashes several times a day, plus disconnection from NIC one time, so had to hard reboot. wasn't happening at all in first weeks of beta1. then after replication setup, got daily crashes, and after upgrade to RC, also emby background transcoding throws kernel panic.
replication button no longer shows any logs "No logs are available for this task.". This is a new error in replication:
Code:
[2019/12/19 06:10:06] (DEBUG) ZettareplService._observer_queue_reader():597 - Observer queue got <zettarepl.observer.ReplicationTaskSuccess object at 0x8357c53d0>
[2019/12/19 06:18:31] (WARNING) middlewared.__periodic_task_wrapper():849 - Exception while calling periodic task
Traceback (most recent call last):
  File "/usr/local/lib/python3.7/site-packages/middlewared/main.py", line 847, in __periodic_task_wrapper
    await self._call(method_name, service_obj, method)
  File "/usr/local/lib/python3.7/site-packages/middlewared/main.py", line 1061, in _call
    return await methodobj(*args)
  File "/usr/local/lib/python3.7/site-packages/middlewared/plugins/smb.py", line 1146, in check_share_info_tdb
    await self.middleware.call('smb.sharesec.synchronize_acls')
  File "/usr/local/lib/python3.7/site-packages/middlewared/main.py", line 1122, in call
    app=app, pipes=pipes, job_on_progress_cb=job_on_progress_cb, io_thread=True,
  File "/usr/local/lib/python3.7/site-packages/middlewared/main.py", line 1061, in _call
    return await methodobj(*args)
  File "/usr/local/lib/python3.7/site-packages/middlewared/schema.py", line 949, in nf
    return await f(*args, **kwargs)
  File "/usr/local/lib/python3.7/site-packages/middlewared/plugins/smb.py", line 1177, in synchronize_acls
    rc_info = (list(filter(lambda x: s['name'] == x['share_name'], rc)))[0]
IndexError: list index out of range

also SSD pool 'UNKNOWN' is back, but not so frequent.
 

appliance

Explorer
Joined
Nov 6, 2019
Messages
96
Top