Cannot import 'mypool': no such pool or dataset

hansch

Explorer
Joined
Jan 8, 2019
Messages
52
Hello guys,

I set up this FreeNAS system that I have been using to replicate another system.
Hardware: a Dell PowerEdge T420
8 disks attached to a SATA controller.
4 TB each.
I created a Z2 pool using all disks.

Two things: one of the disks had had SMART errors for a while.
I saw this recently, because email monitoring was not set up.

The other: I noticed the other day that there was a message that more than 80% of the storage was used.
80% seems low to me, but I removed some files. Still, free space was around 80%. I wanted to correct this yesterday, but:

At some point (I think yesterday) replication stopped. The system was very slow. I rebooted, but afterwards could not log into this system. I went to the location, and had to reboot the system "the brute force way": power off.

Afterwards, messages like this appeared:

Shortening read at 656867807 from 16 to 10
gptzfsboot: error 16 lba 49

And after a lot of messages:

BIOS drive M: is disk10

read 350 from 34 to 0x59680, error: 0x10
( a few like these)
panic: free: guard1 fail @ 0x1 from unknown:0

The system would not boot with the "guilty" drive attached.
So I removed that disk, and after that the system booted.

It showed this error: pool data0 status Unknown.

I tried to import it:
zpool import -fF data0 cannot import 'data0': no such pool or dataset Destroy and re-create the pool from a backup source.

I thought that with RAIDZ2, TWO disks could fail and data would still be recoverable....
No other disks have SMART errors.

One of the things that I saw was that the disk that was missing (/dev/da2) is now there (again), and new disks that I attach get higher numbers.
Is that important? Will FreeBSD re enumerate disks at boot and still know how to recreate a Z2 dataset?

Losing this data is not serious, but I want to find out what went wrong.
Yes, I know I should have replaced the drive immediately.
Yes, less than 80% free space is not best practice.
BUT... why will this result in total data loss?
 
Last edited:
Joined
Oct 18, 2018
Messages
969
One of the things that I saw was that the disk that was missing (/dev/da2) is now there (again), and new disks that I attach get higher numbers.
Is that important? Will FreeBSD re enumerate disks at boot and still know how to recreate a Z2 dataset?
zfs doesn't rely on the numbers to know which drives go into a pool so you don't have to worry about those numbers changing. In fact, because they can change they aren't the most reliable way to identify you specific disks. The Serial Numbers and gpart UUID are a bit more reliable.

The other: I noticed the other day that there was a message that more than 80% of the storage was used.
80% seems low to me, but I removed some files. Still, free space was around 80%. I wanted to correct this yesterday, but:
zfs is copy-on-write. When you change a file it copies the file to a new location on disk prior to reporting back that data is written. This allows for many nice features but also contributes to the 80% "rule". If you don't have enough continuous space available in your pool zfs may have a hard time writing the new data and may fragment some data, this can slow your system down quite a lot for both reads and writes as your disks' read heads move frantically trying to keep up. Many folks consider 80% to be the max to fill a pool to avoid these performance consequences. There are cases where folks have filled their pool completely full and had a very difficult time accessing their data and in some cases were unable to do so. If a pool is ~99-100% full it can actually result in failures importing the pool.

Two things: one of the disks had had SMART errors for a while.
I saw this recently, because email monitoring was not set up.
Depending on the errors you'll probably want to replace that drive once you get the import and storage space issues sorted.

The system would not boot with the "guilty" drive attached.
So I removed that disk, and after that the system booted.
Is this the same drive as the one with the SMART errors?

I thought that with RAIDZ2, TWO disks could fail and data would still be recoverable....
No other disks have SMART errors.
That is true. If your pool isn't so full it can't be imported I expect you'll likely be able to get access to your data restored.
 

hansch

Explorer
Joined
Jan 8, 2019
Messages
52
Thanks for the replies... but what can I still do?
The drive I removed is the (only) one with SMART errors.

The volume was about 24 TB in total, with around 2 TB free at the peak I guess (because it is a replication system, data is sent to it almost all the time).

I see this when booting up:
($import, config trusted): current settings allow for maximum 0 missing

Can I import again setting "maximum 2 missing"?

I tried zpool import -m:


root@remote[~]# zpool import -m

pool: data0
id: 3194361063939189996
state: UNAVAIL
status: One or more devices are missing from the system.
action: The pool cannot be imported. Attach the missing
devices and try again.
see: http://illumos.org/msg/ZFS-8000-3C
config:

data0 UNAVAIL insufficient replicas
gptid/4964fb72-2a32-11e9-a4af-e0db5508b658 ONLINE
gptid/4e963477-2a32-11e9-a4af-e0db5508b658 ONLINE
9395040751123775912 UNAVAIL cannot open
gptid/5722cde5-2a32-11e9-a4af-e0db5508b658 ONLINE
gptid/5c3eeb89-2a32-11e9-a4af-e0db5508b658 ONLINE
gptid/61cd995a-2a32-11e9-a4af-e0db5508b658 ONLINE
gptid/67a3e605-2a32-11e9-a4af-e0db5508b658 ONLINE
gptid/6df3676e-2a32-11e9-a4af-e0db5508b658 ONLINE


And then:
root@remote[~]# zpool import -m data0
cannot import 'data0': no such pool or dataset
Destroy and re-create the pool from
a backup source.

Tricky UH? I am sure I used RAIDZ2.
https://docs.oracle.com/cd/E36784_01/html/E36835/gazuf.html I am not sure there are options left now.

I could do this:
Notifying ZFS of Device Availability
After a device is reattached to the system, ZFS might or might not automatically detect its availability. If the pool was previously UNAVAIL or SUSPENDED, or the system was rebooted as part of the attach procedure, then ZFS automatically rescans all devices when it tries to open the pool. If the pool was degraded and the device was replaced while the system was running, you must notify ZFS that the device is now available and ready to be reopened by using the zpool online command. For example:
# zpool online tank c0t1d0

... but only if the original disk was still healthy. It is not, it is unreadeable.

Tried again with the faulty drive attached.

#zpool import data0

I see a lot of errors on the console:
CAM Status: scsi status error
unretryable error

How can I:
1. Replace the drive
2. Import the pool

I think I can only attach a new drive to an EXISTING pool. Besides, if it does not want to import using 7 out of 8 drives, it is not redundant in the first place. Or is is generally impossible to import a pool with a missing drive?

# zpool import data0
This is stil l waiting, although it says on the console: da2(etc) invalidating pack

The system is very unresponsive.

#zpool status
(no response)

Top

66 processes: 1 running, 65 sleeping
CPU: 0.0% user, 0.0% nice, 0.0% system, 0.0% interrupt, 100% idle
Mem: 975M Active, 179M Inact, 1399M Wired, 29G Free
ARC: 435M Total, 100M MFU, 317M MRU, 96K Anon, 3314K Header, 15M Other
116M Compressed, 346M Uncompressed, 2.98:1 Ratio
Swap: 64G Total, 64G Free

PID USERNAME THR PRI NICE SIZE RES STATE C TIME WCPU COMMAND
6058 root 1 20 0 7940K 3840K CPU10 10 0:00 0.07% top
3310 root 1 20 0 77980K 62796K select 4 0:00 0.01% winbindd
249 root 25 20 0 214M 169M kqread 11 0:11 0.01% python3.6
3285 root 1 20 0 29132K 16280K select 11 0:00 0.01% nmbd
2375 nobody 1 20 0 6928K 2944K select 11 0:00 0.01% mdnsd
3293 root 1 20 0 37140K 21140K select 9 0:00 0.01% winbindd
3426 nobody 1 20 0 6928K 3020K select 9 0:00 0.00% mdnsd
3184 uucp 1 20 0 6744K 2848K select 15 0:00 0.00% usbhid-up
5942 root 1 20 0 12916K 8064K select 8 0:00 0.00% sshd
6028 root 1 20 0 17496K 11328K select 3 0:00 0.00% mc
3082 root 1 20 0 12488K 12596K select 10 0:00 0.00% ntpd
3324 root 8 20 0 30908K 11076K select 0 0:02 0.00% rrdcached
3480 root 1 20 0 115M 100M kqread 14 0:02 0.00% uwsgi-3.6
3186 uucp 1 20 0 31156K 2628K select 5 0:00 0.00% upsd

Is it actually (trying) to import now, or is it just plain STUCK?
Why is it not showing any status messages?


ZFS is not doing a lot:

5949 root 1 20 0 12916K 8056K select 13 0:00 0.00% sshd
6029 root 1 20 0 8176K 4372K pause 13 0:00 0.00% zsh
4060 root 1 20 0 9012K 5120K spa_na 9 0:00 0.00% zfsd
4070 root 1 20 0 6488K 2532K nanslp 5 0:00 0.00% cron
5944 root 1 20 0 8176K 4436K pause 0 0:00 0.00% zsh
6050 root 1 20 0 7804K 3796K spa_na 13 0:00 0.00% zpool
6026 root 1 22 0 7832K 3816K spa_na 6 0:00 0.00% zfs
6233 root 1 20 0 8176K 4364K pause 4 0:00 0.00% zsh
3189 uucp 1 20 0 6556K 2596K nanslp 0 0:00 0.00% upslog
5951 root 1 20 0 7832K 3816K spa_na 4 0:00 0.00% zfs

I am puzzled.
 
Last edited:

hansch

Explorer
Joined
Jan 8, 2019
Messages
52
Any thoughts?

I have run out of options.
At the moment I am trying to add a new disk in stead of the defective drive, but I do not know how to ADD it to a non-imported zfs pool.

And it does not bode well for the supposed robustness of ZFS.

Of course, in all tests it works as expected, except in the real world :)

Edit:
I "exported" the still offline zpool, reattached the disk and imported again (-m).. and it imported.

data0 ONLINE 0 0 0
gptid/4964fb72-2a32-11e9-a4af-e0db5508b658 ONLINE 0 0 0
gptid/4e963477-2a32-11e9-a4af-e0db5508b658 ONLINE 0 0 0
gptid/5450ecad-2a32-11e9-a4af-e0db5508b658 ONLINE 0 0 0
gptid/5722cde5-2a32-11e9-a4af-e0db5508b658 ONLINE 0 0 0
gptid/5c3eeb89-2a32-11e9-a4af-e0db5508b658 ONLINE 0 0 0
gptid/61cd995a-2a32-11e9-a4af-e0db5508b658 ONLINE 0 0 0
gptid/67a3e605-2a32-11e9-a4af-e0db5508b658 ONLINE 0 0 0
gptid/6df3676e-2a32-11e9-a4af-e0db5508b658 ONLINE 0 0 0


errors: 137 data errors, use '-v' for a list

# zpool list
NAME SIZE ALLOC FREE CKPOINT EXPANDSZ FRAG CAP DEDUP HEALTH ALTROOT
data0 29T 26.4T 2.56T - - 18% 91% 1.00x UNAVAIL -
freenas-boot 14G 4.46G 9.54G - - - 31% 1.00x ONLINE -
systemdata_VRAM 168G 71.0M 168G - - 1% 0% 1.00x ONLINE /mnt

So it had 91% allocated data. Too much, OK. But I still think it should not become unavailable when 1 disk is faulty.
After a while, the disk became unavailable again.

I tried to replace the disk by taking it offline. It reported that it could not take it offline because of lack of redundancy. Hello, I am removing an unreadable disk.
 
Last edited:

hansch

Explorer
Joined
Jan 8, 2019
Messages
52
OK. I took the advice. Wiped the disks, created a new zpool. Too bad. This should not have happened with just one defective disk.

IF it is true that the fact that 91% usage is TOO MUCH and has caused this, then why does zfs not stop writing?
What is simpler than just stop WRITING above say 85% usage?
 

HoneyBadger

actually does care
Administrator
Moderator
iXsystems
Joined
Feb 6, 2014
Messages
5,110
91% will impact performance, but should not have caused your pool to go unavailable.

My greater concern is for your pool's overall configuration. A single faulted disk will not harm a RAIDZ2. Can you please run zpool status and post the results inside of the CODE tags?

Based on the results of your import/list commands I suspect you actually had no redundancy there and were striping data across all disks. In which case yes, one bad disk will absolutely make the whole pool UNAVAIL.

Code:
data0 ONLINE 0 0 0
gptid/4964fb72-2a32-11e9-a4af-e0db5508b658 ONLINE 0 0 0
gptid/4e963477-2a32-11e9-a4af-e0db5508b658 ONLINE 0 0 0
gptid/5450ecad-2a32-11e9-a4af-e0db5508b658 ONLINE 0 0 0
gptid/5722cde5-2a32-11e9-a4af-e0db5508b658 ONLINE 0 0 0
gptid/5c3eeb89-2a32-11e9-a4af-e0db5508b658 ONLINE 0 0 0
gptid/61cd995a-2a32-11e9-a4af-e0db5508b658 ONLINE 0 0 0
gptid/67a3e605-2a32-11e9-a4af-e0db5508b658 ONLINE 0 0 0
gptid/6df3676e-2a32-11e9-a4af-e0db5508b658 ONLINE 0 0 0 


It reported that it could not take it offline because of lack of redundancy.

Code:
# zpool list
NAME SIZE ALLOC FREE CKPOINT EXPANDSZ FRAG CAP DEDUP HEALTH ALTROOT
data0 29T 26.4T 2.56T - - 18% 91% 1.00x UNAVAIL -
freenas-boot 14G 4.46G 9.54G - - - 31% 1.00x ONLINE -
systemdata_VRAM 168G 71.0M 168G - - 1% 0% 1.00x ONLINE /mnt 

Based on the output of these commands I'm fairly confident you had a fully striped pool.
 
Last edited:

hansch

Explorer
Joined
Jan 8, 2019
Messages
52
A single faulted disk SHOULD not have harmed a RAIDZ2 but it did. Theoretically it should not, but you know Murphy's law. It applied here.

I would NEVER create a striped pool in a production system.
I am very sure I had not. Of course not. Seriously. RAIDZ2.

I do not know If I can tell from logs what RAIDZ level the old zpool had.
And I have not seen a GUI that shows the RAIDZ level of a given pool.

To recreate the pool I destroyed the old UNAVAILABLE pool:
# zpool destroy data0
Thank god it worked.
Then wiped the disks one by one (legacy interface, that option was not in the new interface).

I then created a new RAIDZ2 pool with 7 disks (out of 8), and one spare using the (new) Web interface. Just to be sure.

Current pool:

# zpool list
NAME SIZE ALLOC FREE CKPOINT EXPANDSZ FRAG CAP DEDUP HEALTH ALTROOT
data0 25.2T 2.99G 25.2T - - 0% 0% 1.00x ONLINE /mnt
freenas-boot 14G 4.46G 9.54G - - - 31% 1.00x ONLINE -
systemdata_VRAM 174G 75.1M 174G - - 1% 0% 1.00x ONLINE /mnt

You see the total size is 25.2 T instead of 29 T. It counts all disk sizes and adds them together. So 8 x 3.75 = 29T, 7x3.75 is 25.2T. The total "size" is not the usable space.

In this case, it has 16.29 TiB free space. 25TB -/- (2x3.75 TB)

I do not know why the RAIDZ level was not there in the previous zpool status command. I think because the pool was unavailable.

Current pool:

# zpool status
pool: data0
state: ONLINE
scan: none requested
config:

NAME STATE READ WRITE CKSUM
data0 ONLINE 0 0 0
raidz2-0 ONLINE 0 0 0
gptid/169f396e-d53b-11e9-95ec-0007430408d0 ONLINE 0 0 0
gptid/18e2cfb6-d53b-11e9-95ec-0007430408d0 ONLINE 0 0 0
gptid/1ac17f2c-d53b-11e9-95ec-0007430408d0 ONLINE 0 0 0
gptid/1d67263c-d53b-11e9-95ec-0007430408d0 ONLINE 0 0 0
gptid/20ad07a0-d53b-11e9-95ec-0007430408d0 ONLINE 0 0 0
gptid/23fb7e2e-d53b-11e9-95ec-0007430408d0 ONLINE 0 0 0
gptid/27568d66-d53b-11e9-95ec-0007430408d0 ONLINE 0 0 0
spares
gptid/2aa640fa-d53b-11e9-95ec-0007430408d0 AVAIL

The other system:
NAME STATE READ WRITE CKS UM
data0 ONLINE 0 0 0
raidz2-0 ONLINE 0 0 0
gptid/937e2942-19b9-11e9-a8e3-fc4dd4f27e05 ONLINE 0 0 0
gptid/95937542-19b9-11e9-a8e3-fc4dd4f27e05 ONLINE 0 0 0
gptid/979bf0b0-19b9-11e9-a8e3-fc4dd4f27e05 ONLINE 0 0 0
gptid/99c1e04e-19b9-11e9-a8e3-fc4dd4f27e05 ONLINE 0 0 0
gptid/9ccb8d91-19b9-11e9-a8e3-fc4dd4f27e05 ONLINE 0 0 0
gptid/9ee02eb5-19b9-11e9-a8e3-fc4dd4f27e05 ONLINE 0 0 0
gptid/a11008ba-19b9-11e9-a8e3-fc4dd4f27e05 ONLINE 0 0 0
gptid/a369c1b3-19b9-11e9-a8e3-fc4dd4f27e05 ONLINE 0 0


NAME SIZE ALLOC FREE CKPOINT EXPANDSZ FRAG CAP DEDUP HEALTH ALTROOT
data0 58T 14.6T 43.4T - - 4% 25% 1.00x ONLINE /mnt
freenas-boot 14G 2.98G 11.0G - - - 21% 1.00x ONLINE -
systemdata_VRAM 55G 74.1M 54.9G - - 3% 0% 1.00x ONLINE /mnt

43.4T free is not really usable, it has 29.58TB free. No striping there either.

I will now drive the replication system to the other location to replicate using a 10 Gbit connection.

Oh, and it is a bit disappointing that I could only use 80% of a RAIDZ2 pool.

It means:
8 disks of 3.75 TB (total say 29 TB, 16 TB free, of this 80% usable = 12.8 out of 29 TB or 44%. ) Tell me this is not true.
 
Last edited:

HoneyBadger

actually does care
Administrator
Moderator
iXsystems
Joined
Feb 6, 2014
Messages
5,110
Unfortunately zpool history does not show information for exported or destroyed pools.

But an error of: cannot offline /dev/your/bad/device: no valid replicas is exactly what comes up if you try to offline a disk from a striped pool. I quickly made a Z2 pool from sparse files and it was more than happy to mount with two devices (files) missing. Creating a striped pool and knocking out a device (file) resulted in failure to import.

All I can further offer on that front is that if the loss of a single device damaged your RAIDZ2 pool, you must have done something very upsetting to Mr. Murphy for him to apply his law to you so harshly; and watch out for lightning strikes or other sources of bad luck. ;)

Oh, and it is a bit disappointing that I could only use 80% of a RAIDZ2 pool.

It means:
8 disks of 3.75 TB (total say 29 TB, 16 TB free, of this 80% usable = 12.8 out of 29 TB or 44%. ) Tell me this is not true.

You can certainly use more; ZFS will quite happily fill itself to the brim (and then at that point is where you start to have serious, pool-breaking issues) - 80% is a rule of thumb that is about sustained write performance. ZFS tries its best to write into sequential chunks of free space, but the more full a pool gets, the more fragmented its free space becomes. 80% is approximately the point where most casual users begin to see major impact - users who use ZFS for block storage protocols (iSCSI/FC) or generate heavy random I/O may notice the effects much earlier at 50% or even 25% free space.

There's a longer explanation involving "metaslabs" and other fun aspects of ZFS, but that's deeper theory than most folks need. The short answer is "performance goes down as percentage used goes up."
 

hansch

Explorer
Joined
Jan 8, 2019
Messages
52
Thanks for the extra interesting info :) The cause of this failure will likely remain a mystery forever.

I have the two systems now at the same location, each with a Chelsio 10 Gbit NIC, connected with a DAC. It is linked at 10 Gbit.

iperf shows more than 6.5 Gbit/s bandwidth, even during replication, which is quite slow at around 47 MB/s (580 Mbit/s). So total bandwidth is more than 7 Gbit/s.

Transfer speed is slower than I expected. I bought those 10 Gbit cards specially for moments like these, and I expected transfer speeds well above Gbit speed. If not, I could have used the Gbit connection without loss of speed...

The sending system is the one in my signature.

Any thoughts about this? Is this transfer speed normal for this kind of setup?

A second thing - which is more alarming: a spontaneous reboot. This is the first time it happened. This system (see sig) has been running since the beginning of this year, sometimes 2 months without reboot.

Last night, during replication, I think around or after 4:00 in the morning, the (sending) system rebooted and became offline.
The system is connected to a UPS. The other (receiving) system was still online.
I drove to the location, and saw "this is a Freenas data disk" or something like that.
Seems like the box rebooted spontaneously, and tried to boot off the first SATA disk. I changed all boot sequences to boot from the FreeNAS USB disks. (Lenovo system: it has several boot sequences, like "error boot sequence" and "normal boot sequence".

BTW, one of them was not visible any more in the BIOS.
This may be the cause of the reboot as well. One of the USB disks is on USB3, the other on a USB2 connection. (I cannot change this.)
I reseated it and it became online, it had to be resilvered (mirrored boot).

Could the fact that the data set that was being replicated expired be a cause as well?

Has anyone ever tried replicating a data set which was on the point of being deleted because it timed out?
The data set that was being replicated was one of Augus 29th, 14 days ago.

If the data set cannot be replicated befere it expires, it will never replicate. As this is more than 6 TB of data, it will take around 2 days to replicate (ca. 2% per hour).

Seems like Murphy's law is doing its thing again... but hey, it IS Friday 13th...

I changed the snapshot maximum age from 2 to 4 weeks.
I also connected the 2 systems' dual power supplies, one each to a UPS and the other to the wall power. Fingers crossed...
 

HoneyBadger

actually does care
Administrator
Moderator
iXsystems
Joined
Feb 6, 2014
Messages
5,110
Any thoughts about this? Is this transfer speed normal for this kind of setup?

If both the sending and receiving pools are similarly full, you may be at the point where the reads have become effectively fully random as @PhiloEpisteme suggested earlier. If your receiving pool is also full you may be experiencing delay from that end; but you mentioned you just rebuilt the pool.

Does copying a file directly from the sender or to the receiver using a third system show the same slow speeds? Ensuring that you don't re-test with the same file over and over, since ARC will cache it on the sending side, and that your file is big enough to overwhelm the "write buffer" on the receiver.

Question; do you have any supplemental cooling on your H200 in the TS440? I have a TS430 and find that without additional cooling aimed directly at the HBA it gets very toasty in there, and the H200 is known to run hot.
 

hansch

Explorer
Joined
Jan 8, 2019
Messages
52
The pools are both far from full as you correctly supposed.
Sending pool is as specified above:

NAME SIZE ALLOC FREE CKPOINT EXPANDSZ FRAG CAP DEDUP HEALTH ALTROOT
data0 58T 14.6T 43.4T - - 4% 25% 1.00x ONLINE /mnt

14.6TB used, 43.4 TB free.

Receiving pool:
NAME SIZE ALLOC FREE CKPOINT EXPANDSZ FRAG CAP DEDUP HEALTH ALTROOT
data0 25.2T 1.81T 23.4T - - 0% 7% 1.00x ONLINE /mnt

So the receiving system is almost empty as it was set up again after the crash.

I have a third system - a laptop - attached to the Gbit interface via a Gbit switch; unfortunately this laptop seems to connect with only 100 Mbit/s speed and it just pulls an additional 10 MB/s. (Using an iSCSI target.) When I try this, the transfer speed between the two syncing systems is not affected. So, it seems the sending system has some headroom left.

I have no third 10 Gbit card/system to test it and I don't want to take that NIC out of the sending server.

Cooling:
Well, no additional cooling, but it has the extra fan fitted that is mandatory when fitting a second 4-bay disk cage in the TS440 server.
The system is running very cool. Total power use with 8 disks is less than 100 watts.
It has been running very reliably, even when the airco failed this summer and temperatures in the server room were above 40 degrees Centigrade for a few hours. Temperature is now around 23 degrees.
I still suspect the USB disk or the syncing of a - suddenly deleted - snapshot.

I am not sure if "old" snapshots that were made 2 weeks ago will get deleted or not when the 2 weeks are past now that I changed snapshot age to 4 weeks. I don't know if it applies to thoses snapshots as well.


EDIT

I found out what the bottleneck is --- to my surprise, it is the CPU.

RECEIVING:

hw.model: Intel(R) Xeon(R) CPU E5-2450 0 @ 2.10GHz
hw.machine: amd64
hw.ncpu: 16

This is a dual E5-2450 machine (hyperthreading is off). Come on... Just to pull 47 MB/s it is puffing like a steam locomotive.

last pid: 63785; load averages: 1.41, 1.55, 1.56 up 1+02:27:04 19:31:15
53 processes: 2 running, 51 sleeping
CPU: 3.4% user, 0.0% nice, 4.5% system, 1.9% interrupt, 90.2% idle
Mem: 2000K Active, 295M Inact, 947M Laundry, 29G Wired, 1027M Free
ARC: 23G Total, 73M MFU, 23G MRU, 51M Anon, 172M Header, 93M Other
22G Compressed, 35G Uncompressed, 1.57:1 Ratio
Swap: 64G Total, 64G Free

PID USERNAME THR PRI NICE SIZE RES STATE C TIME WCPU COMMAND
46516 root 1 103 0 21108K 15292K CPU12 12 578:19 99.57% sshd
46519 root 1 29 0 7832K 3932K piperd 2 91:26 14.43% zfs
249 root 21 20 0 267M 211M kqread 10 9:50 0.63% python3.6
63785 root 1 20 0 7940K 3784K CPU7 7 0:00 0.07% top
17736 www 1 20 0 14344K 8496K kqread 11 0:02 0.01% nginx

SENDING:

root@minio[~]# sysctl hw.model hw.machine hw.ncpu
hw.model: Intel(R) Xeon(R) CPU E3-1245 v3 @ 3.40GHz
hw.machine: amd64
hw.ncpu: 8


last pid: 43682; load averages: 1.51, 1.58, 1.94 up 0+09:51:35 19:33:20
60 processes: 2 running, 58 sleeping
CPU: 6.1% user, 0.0% nice, 9.4% system, 0.9% interrupt, 83.6% idle
Mem: 6736K Active, 291M Inact, 994M Laundry, 21G Wired, 627M Free
ARC: 18G Total, 13G MFU, 4683M MRU, 428K Anon, 254M Header, 15M Other
17G Compressed, 31G Uncompressed, 1.87:1 Ratio
Swap: 72G Total, 72G Free

PID USERNAME THR PRI NICE SIZE RES STATE C TIME WCPU COMMAND
4550 root 1 101 0 16528K 12176K CPU6 6 405:11 69.49% ssh
4549 root 1 32 0 5228K 2072K select 5 152:08 26.00% pipewatcher
4548 root 1 30 0 6256K 2056K nanslp 2 123:41 20.99% throttle
251 root 22 52 0 239M 189M kqread 2 6:30 1.69% python3.6
4546 root 2 20 0 9936K 4852K pipewr 7 10:36 1.59% zfs
3292 root 8 52 0 35644K 13288K select 2 1:03 0.35% rrdcached
42670 root 1 20 0 7940K 3804K CPU1 1 0:00 0.07% top
5497 www 1 20 0 12296K 8172K kqread 2 0:02 0.01% nginx

Not sure what the pipewatcher thing is.
Hmm....
https://www.ixsystems.com/community/threads/tip-making-remote-replications-faster.55945/

Is there a way to replicate without using SSH? The waste of CPU cycles seams ludicrous. I mean... the machines are standing next to each other. FTP would be perfect.
As far as I can see, one problem is that SSH is single threaded. It only uses one CPU, and then fully (99.7%). If it could use 16 CPUs it would be way faster.

https://www.ixsystems.com/community/threads/multithread-rsync-transfers.71642/
Not conclusive.

I want to set up replication, not a regular copy, and I cannot hook up all disks to the same machine either.
 
Last edited:

hansch

Explorer
Joined
Jan 8, 2019
Messages
52
Sending data0/miniodata0@auto-20190830.1100-2w (59%)
At least it isn't rebooting this time... fingers crossed ;)
 

hansch

Explorer
Joined
Jan 8, 2019
Messages
52
Replicated allright, moved to a second location, all is OK now.
 
Top