Concerning notification/error

Glorious1

Guru
Joined
Nov 23, 2014
Messages
1,211
I got two series of notifications in an email from my server that I've never seen before, don't understand, and that worry me a bit.

The first I found in /var/log/system:
Code:
Aug 11 19:10:44 Tabernacle kernel: arp: 192.168.0.102 moved from 0c:c4:7a:03:aa:46 to 0c:c4:7a:31:9a:ec on epair0b
Aug 12 09:17:39 Tabernacle kernel: arp: 192.168.0.102 moved from 0c:c4:7a:03:aa:46 to 0c:c4:7a:31:9a:ec on epair0b
Aug 12 10:02:43 Tabernacle kernel: arp: 192.168.0.102 moved from 0c:c4:7a:03:aa:46 to 0c:c4:7a:31:9a:ec on epair0b
Aug 12 10:30:46 Tabernacle kernel: arp: 192.168.0.102 moved from 0c:c4:7a:03:aa:46 to 0c:c4:7a:31:9a:ec on epair0b
Aug 12 10:53:39 Tabernacle kernel: arp: 192.168.0.102 moved from 0c:c4:7a:03:aa:46 to 0c:c4:7a:31:9a:ec on epair0b

I've seen these kinds of messages frequently when it's reporting on other devices in the network, especially when I have my laptop on both WiFi and ethernet. But this IP address is the server itself. There's only one ethernet cable going in. And it switches one way and doesn't switch back. Puzzles me.

The second is maybe more concerning because it suggests some kind of memory error, but looks like the CPU rathern than RAM? From /var/log/messages:
Code:
Aug 12 10:46:44 Tabernacle MCA: Bank 5, Status 0xd40000c000910091
Aug 12 10:46:44 Tabernacle MCA: Global Cap 0x0000000000000806, Status 0x0000000000000000
Aug 12 10:46:44 Tabernacle MCA: Vendor "GenuineIntel", ID 0x406d8, APIC ID 0
Aug 12 10:46:44 Tabernacle MCA: CPU 0 COR OVER RD channel 1 memory error
Aug 12 10:46:44 Tabernacle MCA: Address 0x354267b98


What do these mean? Should I be concerned?
 

sretalla

Powered by Neutrality
Moderator
Joined
Jan 1, 2016
Messages
9,702
ARP moving happens when jails start or stop, not much to be concerned about there.

Memory errors are more concerning. Consider running memtest to see exactly how worried you need to be.
 

Chris Moore

Hall of Famer
Joined
May 2, 2015
Messages
10,080
192.168.0.102
I find it unusual that your server is on the same IP address as one of my servers...

That memory error is telling you that there was one error from bank five, which is not necessarily related to anything we would understand as bank 5. If possible, I would test one module at a time to try and isolate the offender so it can be replaced. You can narrow the test to the modules connected to CPU 0, if that helps.
 

Glorious1

Guru
Joined
Nov 23, 2014
Messages
1,211
I find it unusual that your server is on the same IP address as one of my servers...

That memory error is telling you that there was one error from bank five, which is not necessarily related to anything we would understand as bank 5. If possible, I would test one module at a time to try and isolate the offender so it can be replaced. You can narrow the test to the modules connected to CPU 0, if that helps.

Thanks for the replies!

Ha! Probably not so surprising if, like me, you give certain devices IPs starting at 100 so as not to get mixed in with the DHCP addresses.

So by 'modules' are we talking about RAM? I'm not aware they are connected to any particular core (I have only one CPU with 8 cores). I assumed that it was something on the CPU itself since the message refers to CPU.

I did memtest after I built the server. I'll have to dig back and figure out how to do it again.
 

Glorious1

Guru
Joined
Nov 23, 2014
Messages
1,211
I did memtest on the two RAM sticks separately. One of them had repeated 'ECC errors', all at the same single address. They were all 'ECC correctable errors', and the final tally of plain 'Errors' was 0. I'm not sure what this means as far as needing to replace it.

This is during the test - I forgot to get a screen grab of the final summary
ecc errors.jpg
 

sretalla

Powered by Neutrality
Moderator
Joined
Jan 1, 2016
Messages
9,702
It would mean that no damage was done to your pools as a result of scrubbing or writing, etc. since the ECC did it's job.

It's probably a reasonable idea to not make your system work to correct errors all the time, so replace the chip when you can.
 

Glorious1

Guru
Joined
Nov 23, 2014
Messages
1,211
Thanks. Too bad there isn't a way to 'mask' an address so it can't be used - seems like a waste of a whole lot of good bits. I guess it's like they say, one bad apple spoils the barrel.
 
Top