The volume "Volume1" (ZFS) state is UNKNOWN:

Status
Not open for further replies.

Seub

Dabbler
Joined
May 27, 2015
Messages
16
Bonjour à tous,

Ce matin, en allumant mon Freenas, j'ai reçu un message d'erreur par mail:
The volume "Volume1" (ZFS) state is UNKNOWN:

Le volume est en erreur quand je regarde sur l'interface HTTP:
Volume 1 / Utilisé 0 (erreur) / Erreur en récupérant l'espace disponible / Status Unknow

Voici la configuration de mon freenas:
Informations système
FreeNAS-9.3-STABLE-201412090314 Installé sur une clé usb 8go.
Intel(R) Pentium(R) CPU G6950 @ 2.80GHz
RAM : 8029MB
5 disques SATA de 500go pour créer 1 seul volume RAIDz logiciel.
ada0
ada1
ada2
ada3
ada4


L'interface HTTP fonctionne parfaitement, je n'ai jamais rencontré de problème hormis celui-ci.
En me baladant sur les forums, j'ai repéré quelques commandes que je vous liste ci après.

Merci à tous pour votre attention et pour l'aide apportée.



pool: freenas-boot
state: ONLINE
scan: scrub repaired 0 in 0h0m with 0 errors on Wed May 27 03:45:56 2015
config:

NAME STATE READ WRITE CKSUM
freenas-boot ONLINE 0 0 0
gptid/2b308150-d48a-11e4-980c-f4ce46327c28 ONLINE 0 0 0

errors: No known data errors

<ST3500630AS 3.AAK> at scbus0 target 0 lun 0 (pass0,ada0)
<ST3500620AS HP12> at scbus1 target 0 lun 0 (pass1,ada1)
<SAMSUNG HD502HI 1AG01113> at scbus2 target 0 lun 0 (pass2,ada2)
<ST3500620AS SD15> at scbus3 target 0 lun 0 (pass3,ada3)
<ST3500418AS CC37> at scbus4 target 0 lun 0 (pass4,ada4)
<Generic- Compact Flash 1.00> at scbus6 target 0 lun 0 (da0,pass5)
<Generic- SM/xD-Picture 1.00> at scbus6 target 0 lun 1 (da1,pass6)
<Generic- SD/MMC 1.00> at scbus6 target 0 lun 2 (da2,pass7)
<Generic- MS/MS-Pro/HG 1.00> at scbus6 target 0 lun 3 (da3,pass8)
<Generic- SD/MMC/MS/MSPRO 1.00> at scbus6 target 0 lun 4 (da4,pass9)
<Kingston DataTraveler 2.0 1.00> at scbus7 target 0 lun 0 (da5,pass10)

dT: 1.001s w: 1.000s filter: gptid
L(q) ops/s r/s kBps ms/r w/s kBps ms/w %busy Name
0 0 0 0 0.0 0 0 0.0 0.0| gptid/ba66b2fb-d4c2-11e4-9a59-f4ce46327c28
0 0 0 0 0.0 0 0 0.0 0.0| gptid/ba7466f0-d4c2-11e4-9a59-f4ce46327c28
0 0 0 0 0.0 0 0 0.0 0.0| gptid/bab7df8e-d4c2-11e4-9a59-f4ce46327c28
0 0 0 0 0.0 0 0 0.0 0.0| gptid/bac0ec25-d4c2-11e4-9a59-f4ce46327c28
0 0 0 0 0.0 0 0 0.0 0.0| gptid/b9fff6ed-d4c2-11e4-9a59-f4ce46327c28
0 0 0 0 0.0 0 0 0.0 0.0| gptid/ba0d0080-d4c2-11e4-9a59-f4ce46327c28
0 0 0 0 0.0 0 0 0.0 0.0| gptid/2b292e4c-d48a-11e4-980c-f4ce46327c28
0 0 0 0 0.0 0 0 0.0 0.0| gptid/2b308150-d48a-11e4-980c-f4ce46327c28
 
Last edited:

Bidule0hm

Server Electronics Sorcerer
Joined
Aug 5, 2013
Messages
3,710
Hum... la pool n'est même pas montée, ça ne présage rien de bon.

T'as essayé de rebooter le serveur ?
 

Bidule0hm

Server Electronics Sorcerer
Joined
Aug 5, 2013
Messages
3,710
Les pools sont censées être importées automatiquement au démarrage. Il faudrait voir dans le log si y'a un message d'erreur et si oui quel est-il.

Oui, zpool import nom_de_la_pool mais avant toute chose sache que tu es seul responsable de ce que tu fais dans le terminal, si tu ne sais pas ce que tu fais tu peux facilement détruire pas mal de choses (y compris tes données).
 
Last edited:

Seub

Dabbler
Joined
May 27, 2015
Messages
16
Bonsoir ! Désolé du retard de réponse, j'ai eu une semaine très chargée.
Concernant les données on peut y aller à fond ! j'ai tout en backup sur mon pc fixe.

J'ai essayé :

unrecognized command 'import'

cannot import 'Volume1': no such pool or dataset
Destroy and re-create the pool from
a backup source.

NAME SIZE ALLOC FREE EXPANDSZ FRAG CAP DEDUP HEALTH ALTROOT
freenas-boot 7.19G 941M 6.27G - - 12% 1.00x ONLINE -

total 9
drwxr-xr-x 2 root wheel 4 May 30 20:35 ./
drwxr-xr-x 6 root wheel 12 May 30 21:00 ../
-rw-r--r-- 1 root wheel 2488 Mar 27 21:49 zpool.cache
-rw-r--r-- 1 root wheel 2488 May 30 20:35 zpool.cache.saved

17172466663771251580|Volume1||ZFS|0|1

Configuration for import:
vdev_children: 1
version: 5000
pool_guid: 17172466663771251580
name: 'Volume1'
state: 0
hostid: 482783774
hostname: 'freenas.local'
vdev_tree:
type: 'root'
id: 0
guid: 17172466663771251580
children[0]:
type: 'raidz'
id: 0
guid: 1041077096399642800
nparity: 1
metaslab_array: 35
metaslab_shift: 34
ashift: 12
asize: 2489771622400
is_log: 0
create_txg: 4
children[0]:
type: 'disk'
id: 0
guid: 7953183372317378875
whole_disk: 1
create_txg: 4
path: '/dev/gptid/ba0d0080-d4c2-11e4-9a59-f4ce46327c28'
children[1]:
type: 'disk'
id: 1
guid: 3919283512253717986
whole_disk: 1
create_txg: 4
path: '/dev/gptid/ba7466f0-d4c2-11e4-9a59-f4ce46327c28'
children[2]:
type: 'disk'
id: 2
guid: 5719028957203345124
whole_disk: 1
create_txg: 4
path: '/dev/gptid/bac0ec25-d4c2-11e4-9a59-f4ce46327c28'
children[3]:
type: 'disk'
id: 3
guid: 2977581537849333751
path: '/dev/gptid/bb0a1bd5-d4c2-11e4-9a59-f4ce46327c28'
whole_disk: 1
create_txg: 4
children[4]:
type: 'disk'
id: 4
guid: 10590421017981154462
path: '/dev/gptid/bb6bafe7-d4c2-11e4-9a59-f4ce46327c28'
whole_disk: 1
create_txg: 4
zdb: can't open 'Volume1': File exists

pool: Volume1
id: 17172466663771251580
state: UNAVAIL
status: One or more devices are missing from the system.
action: The pool cannot be imported. Attach the missing
devices and try again.
see: http://illumos.org/msg/ZFS-8000-3C
config:

Volume1 UNAVAIL insufficient replicas
raidz1-0 UNAVAIL insufficient replicas
gptid/ba0d0080-d4c2-11e4-9a59-f4ce46327c28 ONLINE
gptid/ba7466f0-d4c2-11e4-9a59-f4ce46327c28 ONLINE
gptid/bac0ec25-d4c2-11e4-9a59-f4ce46327c28 ONLINE
2977581537849333751 UNAVAIL cannot open
10590421017981154462 UNAVAIL cannot open

En regardant Zpool import, je crois que 2 disques sont Hors ligne, j'ai essayé de les débrancher / rebrancher mais rien n'y fait.
Impossible de trouver de quoi les remonter / réactiver.
2977581537849333751 UNAVAIL cannot open
10590421017981154462 UNAVAIL cannot open
 
Last edited:

Bidule0hm

Server Electronics Sorcerer
Joined
Aug 5, 2013
Messages
3,710
Ah, désolé, j'ai mis zfs au lieu de zpool dans la commande, j'ai corrigé ;)

Oui, deux disques non-reconnus et comme c'est un RAID-Z1 tu ne peux tolérer qu'un seul disque en offline. Si t'as une backup le plus simple est de juste wipper les disques et de refaire le volume ;)

Je ne saurais que trop te conseiller d'utiliser du RAID-Z2 sauf si tu sais à quoi tu t'engage avec le RAID-Z1.

Il faudrait aussi savoir pourquoi ces deux disques ne sont pas accessibles. Il sont HS ? si oui, pourquoi ? (voir les données SMART), etc...
 

Seub

Dabbler
Joined
May 27, 2015
Messages
16
Zpool import:
Code:
pool: Volume1
id: 17172466663771251580
state: UNAVAIL
status: One or more devices are missing from the system.
action: The pool cannot be imported. Attach the missing
devices and try again.
see: http://illumos.org/msg/ZFS-8000-3C
config:

Volume1 UNAVAIL insufficient replicas
raidz1-0 UNAVAIL insufficient replicas
gptid/ba0d0080-d4c2-11e4-9a59-f4ce46327c28 ONLINE
gptid/ba7466f0-d4c2-11e4-9a59-f4ce46327c28 ONLINE
gptid/bac0ec25-d4c2-11e4-9a59-f4ce46327c28 ONLINE
2977581537849333751 UNAVAIL cannot open
10590421017981154462 UNAVAIL cannot open
 
Last edited:

Bidule0hm

Server Electronics Sorcerer
Joined
Aug 5, 2013
Messages
3,710
Yep, re-désolé, j'avais pas vu le dernier spoiler, j'ai édité mais c'était visiblement trop tard... Faut que je me réveille :p
 

Seub

Dabbler
Joined
May 27, 2015
Messages
16
Ils n'ont pas l'air HS, tu peux me guider pour voir l'état des disques ? via la console ou l'interface, peu importe.
 

Bidule0hm

Server Electronics Sorcerer
Joined
Aug 5, 2013
Messages
3,710
Que donne camcontrol devlist ?
 

Seub

Dabbler
Joined
May 27, 2015
Messages
16
camcontrol devlist
Code:
<ST3500630AS 3.AAK>                at scbus0 target 0 lun 0 (pass0,ada0)
<ST3500620AS HP12>                 at scbus1 target 0 lun 0 (pass1,ada1)
<SAMSUNG HD502HI 1AG01113>         at scbus2 target 0 lun 0 (pass2,ada2)
<ST3500620AS SD15>                 at scbus3 target 0 lun 0 (pass3,ada3)
<ST3500418AS CC37>                 at scbus4 target 0 lun 0 (pass4,ada4)
<Generic- Compact Flash 1.00>      at scbus6 target 0 lun 0 (da0,pass5)
<Generic- SM/xD-Picture 1.00>      at scbus6 target 0 lun 1 (da1,pass6)
<Generic- SD/MMC 1.00>             at scbus6 target 0 lun 2 (da2,pass7)
<Generic- MS/MS-Pro/HG 1.00>       at scbus6 target 0 lun 3 (da3,pass8)
<Generic- SD/MMC/MS/MSPRO 1.00>    at scbus6 target 0 lun 4 (da4,pass9)
<Kingston DataTraveler 2.0 1.00>   at scbus7 target 0 lun 0 (da5,pass10)
 
Last edited:

Seub

Dabbler
Joined
May 27, 2015
Messages
16
Salut, vu que tout était perdu (l'espoir autant que les données), j'ai suivi tes conseils et j'ai fait un RaidZ2 tolérant la panne de 2 disques.
Concernant les disques, peut-tu me dire lesquels étaient défaillant ?
J'ai réussi à les formater, a reconstruire le raid, recréer le partage.

J'ai ajouté une tache Cron auto-test court toutes les heures de tout les disques (sans savoir si ça sert à quelque chose xD) j'aimerais via ton aide être sur de la survie de mes disques.

Merci encore pour toute l'aide que tu m'as apporté.
 

Bidule0hm

Server Electronics Sorcerer
Joined
Aug 5, 2013
Messages
3,710
C'était probablement ST3500620AS SD15 (ada3) et ST3500418AS CC37 (ada4) mais impossible d'en être sûr maintenant car tu as refait le volume.

Ceci étant dit tu peux poster la sortie de smartctl -a /dev/adaX avec X = 0 à 4. Poste le résultat entre les balises code stp (très important pour conserver le formatage du texte).

Toutes les heures c'est beaucoup trop souvent. Je recommande un test long au moins une fois par mois mais pas plus souvent qu'une fois par semaine (perso j'en fais 2 par mois), et optionnellement un test court tous les 3 à 5 jours. Mais ce n'est pas un CRON, c'est censé être une tâche SMART, si tu n'es pas sûr fais une capture de l'interface et poste-là ;)

Pas de pb :)
 

Seub

Dabbler
Joined
May 27, 2015
Messages
16
smartctl -a /dev/ada0
Code:
smartctl 6.3 2014-07-26 r3976 [FreeBSD 9.3-RELEASE-p5 amd64] (local build)
Copyright (C) 2002-14, Bruce Allen, Christian Franke, www.smartmontools.org

=== START OF INFORMATION SECTION ===
Model Family:     Seagate Barracuda 7200.10
Device Model:     ST3500630AS
Serial Number:    9QG88NVH
Firmware Version: 3.AAK
User Capacity:    500,107,862,016 bytes [500 GB]
Sector Size:      512 bytes logical/physical
Device is:        In smartctl database [for details use: -P show]
ATA Version is:   ATA/ATAPI-7 (minor revision not indicated)
Local Time is:    Tue Jun  2 21:40:32 2015 CEST
SMART support is: Available - device has SMART capability.
SMART support is: Enabled

=== START OF READ SMART DATA SECTION ===
SMART overall-health self-assessment test result: PASSED
See vendor-specific Attribute list for marginal Attributes.

General SMART Values:
Offline data collection status:  (0x82) Offline data collection activity
                                        was completed without error.
                                        Auto Offline Data Collection: Enabled.
Self-test execution status:      ( 249) Self-test routine in progress...
                                        90% of test remaining.
Total time to complete Offline
data collection:                (  430) seconds.
Offline data collection
capabilities:                    (0x5b) SMART execute Offline immediate.
                                        Auto Offline data collection on/off support.
                                        Suspend Offline collection upon new
                                        command.
                                        Offline surface scan supported.
                                        Self-test supported.
                                        No Conveyance Self-test supported.
                                        Selective Self-test supported.
SMART capabilities:            (0x0003) Saves SMART data before entering
                                        power-saving mode.
                                        Supports SMART auto save timer.
Error logging capability:        (0x01) Error logging supported.
                                        General Purpose Logging supported.
Short self-test routine
recommended polling time:        (   1) minutes.
Extended self-test routine
recommended polling time:        ( 163) minutes.

SMART Attributes Data Structure revision number: 10
Vendor Specific SMART Attributes with Thresholds:
ID# ATTRIBUTE_NAME          FLAG     VALUE WORST THRESH TYPE      UPDATED  WHEN_FAILED RAW_VALUE
  1 Raw_Read_Error_Rate     0x000f   108   091   006    Pre-fail  Always       -       160389294
  3 Spin_Up_Time            0x0003   095   093   000    Pre-fail  Always       -       0
  4 Start_Stop_Count        0x0032   096   096   020    Old_age   Always       -       4567
  5 Reallocated_Sector_Ct   0x0033   100   100   036    Pre-fail  Always       -       0
  7 Seek_Error_Rate         0x000f   055   050   030    Pre-fail  Always       -       3878710616473
  9 Power_On_Hours          0x0032   084   084   000    Old_age   Always       -       14190
10 Spin_Retry_Count        0x0013   100   100   097    Pre-fail  Always       -       0
12 Power_Cycle_Count       0x0032   098   098   020    Old_age   Always       -       3044
187 Reported_Uncorrect      0x0032   100   100   000    Old_age   Always       -       0
189 High_Fly_Writes         0x003a   100   100   000    Old_age   Always       -       0
190 Airflow_Temperature_Cel 0x0022   042   040   045    Old_age   Always   FAILING_NOW 58 (Min/Max 40/60 #17)
194 Temperature_Celsius     0x0022   058   060   000    Old_age   Always       -       58 (0 13 0 0 0)
195 Hardware_ECC_Recovered  0x001a   060   052   000    Old_age   Always       -       12073608
197 Current_Pending_Sector  0x0012   100   100   000    Old_age   Always       -       0
198 Offline_Uncorrectable   0x0010   100   100   000    Old_age   Offline      -       0
199 UDMA_CRC_Error_Count    0x003e   200   200   000    Old_age   Always       -       0
200 Multi_Zone_Error_Rate   0x0000   100   253   000    Old_age   Offline      -       0
202 Data_Address_Mark_Errs  0x0032   100   253   000    Old_age   Always       -       0

SMART Error Log Version: 1
No Errors Logged

SMART Self-test log structure revision number 1
Num  Test_Description    Status                  Remaining  LifeTime(hours)  LBA_of_first_error
# 1  Short offline       Completed without error       00%     13728         -
# 2  Short offline       Completed without error       00%      2244         -
# 3  Short offline       Completed without error       00%      2242         -
# 4  Short offline       Completed without error       00%      2193         -
# 5  Short offline       Self-test routine in progress 90%     14188         -

SMART Selective self-test log data structure revision number 1
SPAN  MIN_LBA  MAX_LBA  CURRENT_TEST_STATUS
    1        0        0  Not_testing
    2        0        0  Not_testing
    3        0        0  Not_testing
    4        0        0  Not_testing
    5        0        0  Not_testing
Selective self-test flags (0x0):
  After scanning selected spans, do NOT read-scan remainder of disk.
If Selective self-test is pending on power-up, resume after 0 minute delay.

smartctl -a /dev/ada1
Code:
smartctl 6.3 2014-07-26 r3976 [FreeBSD 9.3-RELEASE-p5 amd64] (local build)
Copyright (C) 2002-14, Bruce Allen, Christian Franke, www.smartmontools.org

=== START OF INFORMATION SECTION ===
Model Family:     Seagate Barracuda 7200.11
Device Model:     ST3500620AS
Serial Number:    9QM64SEF
LU WWN Device Id: 5 000c50 00db36090
Firmware Version: HP12
User Capacity:    500,107,862,016 bytes [500 GB]
Sector Size:      512 bytes logical/physical
Rotation Rate:    7200 rpm
Device is:        In smartctl database [for details use: -P show]
ATA Version is:   ATA8-ACS T13/1699-D revision 4
SATA Version is:  SATA 2.6, 3.0 Gb/s
Local Time is:    Tue Jun  2 21:45:20 2015 CEST

==> WARNING: There are known problems with these drives,
see the following Seagate web pages:
http://knowledge.seagate.com/articles/en_US/FAQ/207931en
http://knowledge.seagate.com/articles/en_US/FAQ/207951en
http://knowledge.seagate.com/articles/en_US/FAQ/207957en

SMART support is: Available - device has SMART capability.
SMART support is: Enabled

=== START OF READ SMART DATA SECTION ===
SMART overall-health self-assessment test result: PASSED

General SMART Values:
Offline data collection status:  (0x82) Offline data collection activity
                                        was completed without error.
                                        Auto Offline Data Collection: Enabled.
Self-test execution status:      (   0) The previous self-test routine completed
                                        without error or no self-test has ever
                                        been run.
Total time to complete Offline
data collection:                (  634) seconds.
Offline data collection
capabilities:                    (0x5b) SMART execute Offline immediate.
                                        Auto Offline data collection on/off support.
                                        Suspend Offline collection upon new
                                        command.
                                        Offline surface scan supported.
                                        Self-test supported.
                                        No Conveyance Self-test supported.
                                        Selective Self-test supported.
SMART capabilities:            (0x0003) Saves SMART data before entering
                                        power-saving mode.
                                        Supports SMART auto save timer.
Error logging capability:        (0x01) Error logging supported.
                                        General Purpose Logging supported.
Short self-test routine
recommended polling time:        (   2) minutes.
Extended self-test routine
recommended polling time:        ( 112) minutes.
SCT capabilities:              (0x103f) SCT Status supported.
                                        SCT Error Recovery Control supported.
                                        SCT Feature Control supported.
                                        SCT Data Table supported.

SMART Attributes Data Structure revision number: 10
Vendor Specific SMART Attributes with Thresholds:
ID# ATTRIBUTE_NAME          FLAG     VALUE WORST THRESH TYPE      UPDATED  WHEN_FAILED RAW_VALUE
  1 Raw_Read_Error_Rate     0x000f   118   099   006    Pre-fail  Always       -       169609571
  3 Spin_Up_Time            0x0002   094   094   000    Old_age   Always       -       0
  4 Start_Stop_Count        0x0033   098   098   020    Pre-fail  Always       -       2770
  5 Reallocated_Sector_Ct   0x0033   100   100   036    Pre-fail  Always       -       0
  7 Seek_Error_Rate         0x000f   076   060   030    Pre-fail  Always       -       26019030971
  9 Power_On_Hours          0x0032   087   087   000    Old_age   Always       -       11682
 10 Spin_Retry_Count        0x0013   100   100   097    Pre-fail  Always       -       14
 12 Power_Cycle_Count       0x0033   098   037   020    Pre-fail  Always       -       2770
184 End-to-End_Error        0x0033   100   100   097    Pre-fail  Always       -       0
187 Reported_Uncorrect      0x0032   100   100   000    Old_age   Always       -       0
188 Command_Timeout         0x0032   100   099   000    Old_age   Always       -       2
189 High_Fly_Writes         0x003a   100   100   000    Old_age   Always       -       0
190 Airflow_Temperature_Cel 0x0022   065   060   045    Old_age   Always       -       35 (Min/Max 32/35)
194 Temperature_Celsius     0x0022   035   040   000    Old_age   Always       -       35 (0 5 0 0 0)
195 Hardware_ECC_Recovered  0x001a   036   031   000    Old_age   Always       -       169609571
196 Reallocated_Event_Count 0x0033   100   100   036    Pre-fail  Always       -       0
197 Current_Pending_Sector  0x0012   100   100   000    Old_age   Always       -       0
198 Offline_Uncorrectable   0x0010   100   100   000    Old_age   Offline      -       0
199 UDMA_CRC_Error_Count    0x003e   200   200   000    Old_age   Always       -       0

SMART Error Log Version: 1
No Errors Logged

SMART Self-test log structure revision number 1
Num  Test_Description    Status                  Remaining  LifeTime(hours)  LBA_of_first_error
# 1  Short offline       Completed without error       00%     11682         -
# 2  Short offline       Completed without error       00%     11681         -
# 3  Short offline       Completed without error       00%     11680         -
# 4  Short offline       Completed without error       00%         0         -

SMART Selective self-test log data structure revision number 1
 SPAN  MIN_LBA  MAX_LBA  CURRENT_TEST_STATUS
    1        0        0  Not_testing
    2        0        0  Not_testing
    3        0        0  Not_testing
    4        0        0  Not_testing
    5        0        0  Not_testing
Selective self-test flags (0x0):
  After scanning selected spans, do NOT read-scan remainder of disk.
If Selective self-test is pending on power-up, resume after 0 minute delay.

smartctl -a /dev/ada2
Code:
smartctl 6.3 2014-07-26 r3976 [FreeBSD 9.3-RELEASE-p5 amd64] (local build)
Copyright (C) 2002-14, Bruce Allen, Christian Franke, www.smartmontools.org

=== START OF INFORMATION SECTION ===
Model Family:     SAMSUNG SpinPoint F2 EG
Device Model:     SAMSUNG HD502HI
Serial Number:    S1VZJDWS412833
LU WWN Device Id: 5 0024e9 001410db1
Firmware Version: 1AG01113
User Capacity:    500,107,862,016 bytes [500 GB]
Sector Size:      512 bytes logical/physical
Device is:        In smartctl database [for details use: -P show]
ATA Version is:   ATA/ATAPI-7, ATA8-ACS T13/1699-D revision 3b
Local Time is:    Tue Jun  2 21:46:35 2015 CEST
SMART support is: Available - device has SMART capability.
SMART support is: Enabled

=== START OF READ SMART DATA SECTION ===
SMART overall-health self-assessment test result: PASSED

General SMART Values:
Offline data collection status:  (0x00) Offline data collection activity
                                        was never started.
                                        Auto Offline Data Collection: Disabled.
Self-test execution status:      (   0) The previous self-test routine completed
                                        without error or no self-test has ever
                                        been run.
Total time to complete Offline
data collection:                ( 6589) seconds.
Offline data collection
capabilities:                    (0x7b) SMART execute Offline immediate.
                                        Auto Offline data collection on/off support.
                                        Suspend Offline collection upon new
                                        command.
                                        Offline surface scan supported.
                                        Self-test supported.
                                        Conveyance Self-test supported.
                                        Selective Self-test supported.
SMART capabilities:            (0x0003) Saves SMART data before entering
                                        power-saving mode.
                                        Supports SMART auto save timer.
Error logging capability:        (0x01) Error logging supported.
                                        General Purpose Logging supported.
Short self-test routine
recommended polling time:        (   2) minutes.
Extended self-test routine
recommended polling time:        ( 111) minutes.
Conveyance self-test routine
recommended polling time:        (  12) minutes.
SCT capabilities:              (0x003f) SCT Status supported.
                                        SCT Error Recovery Control supported.
                                        SCT Feature Control supported.
                                        SCT Data Table supported.

SMART Attributes Data Structure revision number: 16
Vendor Specific SMART Attributes with Thresholds:
ID# ATTRIBUTE_NAME          FLAG     VALUE WORST THRESH TYPE      UPDATED  WHEN_FAILED RAW_VALUE
  1 Raw_Read_Error_Rate     0x000f   100   100   051    Pre-fail  Always       -       0
  3 Spin_Up_Time            0x0007   093   093   011    Pre-fail  Always       -       3050
  4 Start_Stop_Count        0x0032   100   100   000    Old_age   Always       -       68
  5 Reallocated_Sector_Ct   0x0033   100   100   010    Pre-fail  Always       -       0
  7 Seek_Error_Rate         0x000f   100   099   051    Pre-fail  Always       -       5
  8 Seek_Time_Performance   0x0025   100   100   015    Pre-fail  Offline      -       10144
  9 Power_On_Hours          0x0032   093   093   000    Old_age   Always       -       33676
 10 Spin_Retry_Count        0x0033   100   100   051    Pre-fail  Always       -       0
 11 Calibration_Retry_Count 0x0012   100   100   000    Old_age   Always       -       10
 12 Power_Cycle_Count       0x0032   100   100   000    Old_age   Always       -       67
 13 Read_Soft_Error_Rate    0x000e   100   100   000    Old_age   Always       -       0
183 Runtime_Bad_Block       0x0032   100   100   000    Old_age   Always       -       0
184 End-to-End_Error        0x0033   100   100   000    Pre-fail  Always       -       0
187 Reported_Uncorrect      0x0032   100   100   000    Old_age   Always       -       0
188 Command_Timeout         0x0032   100   100   000    Old_age   Always       -       0
190 Airflow_Temperature_Cel 0x0022   062   062   000    Old_age   Always       -       38 (Min/Max 26/38)
194 Temperature_Celsius     0x0022   061   058   000    Old_age   Always       -       39 (Min/Max 26/42)
195 Hardware_ECC_Recovered  0x001a   100   100   000    Old_age   Always       -       1196078
196 Reallocated_Event_Count 0x0032   100   100   000    Old_age   Always       -       0
197 Current_Pending_Sector  0x0012   100   100   000    Old_age   Always       -       0
198 Offline_Uncorrectable   0x0030   100   100   000    Old_age   Offline      -       0
199 UDMA_CRC_Error_Count    0x003e   100   100   000    Old_age   Always       -       0
200 Multi_Zone_Error_Rate   0x000a   100   099   000    Old_age   Always       -       0
201 Soft_Read_Error_Rate    0x000a   100   100   000    Old_age   Always       -       0

SMART Error Log Version: 1
No Errors Logged

SMART Self-test log structure revision number 1
Num  Test_Description    Status                  Remaining  LifeTime(hours)  LBA_of_first_error
# 1  Short offline       Completed without error       00%     33675         -
# 2  Short offline       Completed without error       00%     33674         -
# 3  Short offline       Completed without error       00%     33674         -

SMART Selective self-test log data structure revision number 1
 SPAN  MIN_LBA  MAX_LBA  CURRENT_TEST_STATUS
    1        0        0  Not_testing
    2        0        0  Not_testing
    3        0        0  Not_testing
    4        0        0  Not_testing
    5        0        0  Not_testing
Selective self-test flags (0x0):
  After scanning selected spans, do NOT read-scan remainder of disk.
If Selective self-test is pending on power-up, resume after 0 minute delay.
 
Last edited:

Seub

Dabbler
Joined
May 27, 2015
Messages
16
smartctl -a /dev/ada3
Code:
smartctl 6.3 2014-07-26 r3976 [FreeBSD 9.3-RELEASE-p5 amd64] (local build)
Copyright (C) 2002-14, Bruce Allen, Christian Franke, www.smartmontools.org

=== START OF INFORMATION SECTION ===
Model Family:     Seagate Barracuda 7200.11
Device Model:     ST3500620AS
Serial Number:    9QM2DVM9
LU WWN Device Id: 5 000c50 00b924de1
Firmware Version: SD15
User Capacity:    500,107,862,016 bytes [500 GB]
Sector Size:      512 bytes logical/physical
Rotation Rate:    7200 rpm
Device is:        In smartctl database [for details use: -P show]
ATA Version is:   ATA8-ACS T13/1699-D revision 4
SATA Version is:  SATA 2.6, 3.0 Gb/s
Local Time is:    Tue Jun  2 21:48:00 2015 CEST

==> WARNING: There are known problems with these drives,
THIS DRIVE MAY OR MAY NOT BE AFFECTED,
see the following web pages for details:
http://knowledge.seagate.com/articles/en_US/FAQ/207931en
http://knowledge.seagate.com/articles/en_US/FAQ/207951en
http://bugs.debian.org/cgi-bin/bugreport.cgi?bug=632758

SMART support is: Available - device has SMART capability.
SMART support is: Enabled

=== START OF READ SMART DATA SECTION ===
SMART overall-health self-assessment test result: PASSED

General SMART Values:
Offline data collection status:  (0x82) Offline data collection activity
                                        was completed without error.
                                        Auto Offline Data Collection: Enabled.
Self-test execution status:      (   0) The previous self-test routine completed
                                        without error or no self-test has ever
                                        been run.
Total time to complete Offline
data collection:                (  634) seconds.
Offline data collection
capabilities:                    (0x7b) SMART execute Offline immediate.
                                        Auto Offline data collection on/off support.
                                        Suspend Offline collection upon new
                                        command.
                                        Offline surface scan supported.
                                        Self-test supported.
                                        Conveyance Self-test supported.
                                        Selective Self-test supported.
SMART capabilities:            (0x0003) Saves SMART data before entering
                                        power-saving mode.
                                        Supports SMART auto save timer.
Error logging capability:        (0x01) Error logging supported.
                                        General Purpose Logging supported.
Short self-test routine
recommended polling time:        (   1) minutes.
Extended self-test routine
recommended polling time:        ( 118) minutes.
Conveyance self-test routine
recommended polling time:        (   2) minutes.
SCT capabilities:              (0x103b) SCT Status supported.
                                        SCT Error Recovery Control supported.
                                        SCT Feature Control supported.
                                        SCT Data Table supported.

SMART Attributes Data Structure revision number: 10
Vendor Specific SMART Attributes with Thresholds:
ID# ATTRIBUTE_NAME          FLAG     VALUE WORST THRESH TYPE      UPDATED  WHEN_FAILED RAW_VALUE
  1 Raw_Read_Error_Rate     0x000f   109   099   006    Pre-fail  Always       -       22073669
  3 Spin_Up_Time            0x0003   095   093   000    Pre-fail  Always       -       0
  4 Start_Stop_Count        0x0032   100   100   020    Old_age   Always       -       588
  5 Reallocated_Sector_Ct   0x0033   100   100   036    Pre-fail  Always       -       0
  7 Seek_Error_Rate         0x000f   079   060   030    Pre-fail  Always       -       4382368383
  9 Power_On_Hours          0x0032   096   096   000    Old_age   Always       -       4222
 10 Spin_Retry_Count        0x0013   100   100   097    Pre-fail  Always       -       2
 12 Power_Cycle_Count       0x0032   100   100   020    Old_age   Always       -       589
184 End-to-End_Error        0x0032   100   100   099    Old_age   Always       -       0
187 Reported_Uncorrect      0x0032   100   100   000    Old_age   Always       -       0
188 Command_Timeout         0x0032   100   096   000    Old_age   Always       -       1467
189 High_Fly_Writes         0x003a   100   100   000    Old_age   Always       -       0
190 Airflow_Temperature_Cel 0x0022   060   058   045    Old_age   Always       -       40 (Min/Max 31/41)
194 Temperature_Celsius     0x0022   040   042   000    Old_age   Always       -       40 (0 12 0 0 0)
195 Hardware_ECC_Recovered  0x001a   029   028   000    Old_age   Always       -       22073669
197 Current_Pending_Sector  0x0012   100   100   000    Old_age   Always       -       0
198 Offline_Uncorrectable   0x0010   100   100   000    Old_age   Offline      -       0
199 UDMA_CRC_Error_Count    0x003e   200   200   000    Old_age   Always       -       0

SMART Error Log Version: 1
No Errors Logged

SMART Self-test log structure revision number 1
Num  Test_Description    Status                  Remaining  LifeTime(hours)  LBA_of_first_error
# 1  Short offline       Completed without error       00%      4221         -
# 2  Short offline       Completed without error       00%      4220         -
# 3  Short offline       Completed without error       00%      4220         -
# 4  Short offline       Completed without error       00%      4219         -
# 5  Short offline       Completed without error       00%      3880         -

SMART Selective self-test log data structure revision number 1
 SPAN  MIN_LBA  MAX_LBA  CURRENT_TEST_STATUS
    1        0        0  Not_testing
    2        0        0  Not_testing
    3        0        0  Not_testing
    4        0        0  Not_testing
    5        0        0  Not_testing
Selective self-test flags (0x0):
  After scanning selected spans, do NOT read-scan remainder of disk.
If Selective self-test is pending on power-up, resume after 0 minute delay.

smartctl -a /dev/ada4
Code:
smartctl 6.3 2014-07-26 r3976 [FreeBSD 9.3-RELEASE-p5 amd64] (local build)
Copyright (C) 2002-14, Bruce Allen, Christian Franke, www.smartmontools.org

=== START OF INFORMATION SECTION ===
Model Family:     Seagate Barracuda 7200.12
Device Model:     ST3500418AS
Serial Number:    5VM2YWQB
LU WWN Device Id: 5 000c50 01934d508
Firmware Version: CC37
User Capacity:    500,106,780,160 bytes [500 GB]
Sector Size:      512 bytes logical/physical
Rotation Rate:    7200 rpm
Device is:        In smartctl database [for details use: -P show]
ATA Version is:   ATA8-ACS T13/1699-D revision 4
SATA Version is:  SATA 2.6, 3.0 Gb/s
Local Time is:    Tue Jun  2 21:49:09 2015 CEST

==> WARNING: A firmware update for this drive may be available,
see the following Seagate web pages:
http://knowledge.seagate.com/articles/en_US/FAQ/207931en
http://knowledge.seagate.com/articles/en_US/FAQ/213891en

SMART support is: Available - device has SMART capability.
SMART support is: Enabled

=== START OF READ SMART DATA SECTION ===
SMART overall-health self-assessment test result: PASSED

General SMART Values:
Offline data collection status:  (0x82) Offline data collection activity
                                        was completed without error.
                                        Auto Offline Data Collection: Enabled.
Self-test execution status:      (   0) The previous self-test routine completed
                                        without error or no self-test has ever
                                        been run.
Total time to complete Offline
data collection:                (  609) seconds.
Offline data collection
capabilities:                    (0x7b) SMART execute Offline immediate.
                                        Auto Offline data collection on/off support.
                                        Suspend Offline collection upon new
                                        command.
                                        Offline surface scan supported.
                                        Self-test supported.
                                        Conveyance Self-test supported.
                                        Selective Self-test supported.
SMART capabilities:            (0x0003) Saves SMART data before entering
                                        power-saving mode.
                                        Supports SMART auto save timer.
Error logging capability:        (0x01) Error logging supported.
                                        General Purpose Logging supported.
Short self-test routine
recommended polling time:        (   1) minutes.
Extended self-test routine
recommended polling time:        (  90) minutes.
Conveyance self-test routine
recommended polling time:        (   2) minutes.
SCT capabilities:              (0x103f) SCT Status supported.
                                        SCT Error Recovery Control supported.
                                        SCT Feature Control supported.
                                        SCT Data Table supported.

SMART Attributes Data Structure revision number: 10
Vendor Specific SMART Attributes with Thresholds:
ID# ATTRIBUTE_NAME          FLAG     VALUE WORST THRESH TYPE      UPDATED  WHEN_FAILED RAW_VALUE
  1 Raw_Read_Error_Rate     0x000f   118   099   006    Pre-fail  Always       -       186656906
  3 Spin_Up_Time            0x0003   097   097   000    Pre-fail  Always       -       0
  4 Start_Stop_Count        0x0032   095   095   020    Old_age   Always       -       5757
  5 Reallocated_Sector_Ct   0x0033   100   100   036    Pre-fail  Always       -       0
  7 Seek_Error_Rate         0x000f   076   060   030    Pre-fail  Always       -       45872398
  9 Power_On_Hours          0x0032   089   089   000    Old_age   Always       -       9831
 10 Spin_Retry_Count        0x0013   100   100   097    Pre-fail  Always       -       0
 12 Power_Cycle_Count       0x0032   098   098   020    Old_age   Always       -       2385
183 Runtime_Bad_Block       0x0032   100   100   000    Old_age   Always       -       0
184 End-to-End_Error        0x0032   100   100   099    Old_age   Always       -       0
187 Reported_Uncorrect      0x0032   100   100   000    Old_age   Always       -       0
188 Command_Timeout         0x0032   100   094   000    Old_age   Always       -       17180328568
189 High_Fly_Writes         0x003a   059   059   000    Old_age   Always       -       41
190 Airflow_Temperature_Cel 0x0022   065   057   045    Old_age   Always       -       35 (Min/Max 32/36)
194 Temperature_Celsius     0x0022   035   043   000    Old_age   Always       -       35 (0 11 0 0 0)
195 Hardware_ECC_Recovered  0x001a   032   017   000    Old_age   Always       -       186656906
197 Current_Pending_Sector  0x0012   100   100   000    Old_age   Always       -       0
198 Offline_Uncorrectable   0x0010   100   100   000    Old_age   Offline      -       0
199 UDMA_CRC_Error_Count    0x003e   200   200   000    Old_age   Always       -       3
240 Head_Flying_Hours       0x0000   100   253   000    Old_age   Offline      -       15683 (118 187 0)
241 Total_LBAs_Written      0x0000   100   253   000    Old_age   Offline      -       595751791
242 Total_LBAs_Read         0x0000   100   253   000    Old_age   Offline      -       328230902

SMART Error Log Version: 1
No Errors Logged

SMART Self-test log structure revision number 1
Num  Test_Description    Status                  Remaining  LifeTime(hours)  LBA_of_first_error
# 1  Short offline       Completed without error       00%      9830         -
# 2  Short offline       Completed without error       00%      9829         -
# 3  Short offline       Completed without error       00%      9829         -
# 4  Short offline       Completed without error       00%      9384         -
# 5  Short offline       Completed without error       00%      3955         -

SMART Selective self-test log data structure revision number 1
 SPAN  MIN_LBA  MAX_LBA  CURRENT_TEST_STATUS
    1        0        0  Not_testing
    2        0        0  Not_testing
    3        0        0  Not_testing
    4        0        0  Not_testing
    5        0        0  Not_testing
Selective self-test flags (0x0):
  After scanning selected spans, do NOT read-scan remainder of disk.
If Selective self-test is pending on power-up, resume after 0 minute delay.
 

Bidule0hm

Server Electronics Sorcerer
Joined
Aug 5, 2013
Messages
3,710
Ada0 a un gros pb de température, il est actuellement à 58 °C ce qui est énorme (un disque ne devrait pas dépasser 40 °C pour ne pas écourter sa durée de vie de manière très significative), il faut faire quelque chose au plus vite pour résoudre ce pb. Sur le même principe ada2 et 3 sont un peu limite.

Ada1 a fail 14 fois à atteindre la bonne vitesse. Vu le nombre de fois où il a été allumé/éteint je ne m'inquièterais pas trop mais je garderais un oeil sur la valeur pour vérifier qu'elle n'augmente pas significativement si j'étais toi.

Ada3 a un nombre très élevé de command timeout. Habituellement ça veut dire qu'il y a un pb avec le câble --> vérifier que tout soit correctement connecté et garder un oeil sur la valeur pour voir si elle augmente.

Comme il n'y a jamais eu de test long on ne peut pas savoir de manière fiable si y'a des secteurs qui posent problème. Je conseille donc de lancer un test long sur chaque disque (voir le lien useful commands dans ma signature), d'attendre qu'ils se finissent (c'est long, le mieux est de les lancer le soir et d'attendre le lendemain pour être sûr) et enfin de reposter les smartctl -a de chaque disque ;)
 

Seub

Dabbler
Joined
May 27, 2015
Messages
16
Salut !
Encore et toujours moi ! je lâche pas l'affaire xD
J'ai suivi tes conseils, j'ai changé le câble, vérifié les températures, rajouté un ventilateur, vérifié les températures, ils sont tous en moyenne à 25-30°.

Je me suis dit"Seb t'as fait ça comme un boss" (seb c'est moi)
Je fait mes petites sauvegardes, je redémarre, et BIM ! rebelotte !

"CRTITIQUE: The volume Volume1 (ZFS) state is DEGRADED: One or more devices could not be opened. Sufficient replicas exist for the pool to continue functioning in a degraded state"

La je me suis laché comme dans un sketch a bigarre "OOOOOOh l'enc... de p.. de ta m... la vilaine ! (sans les chauves-souris enragées)

zpool status
Code:
        NAME                                            STATE     READ WRITE CKS                                                                                                                UM
        Volume1                                         DEGRADED     0     0                                                                                                                     0
          raidz2-0                                      DEGRADED     0     0                                                                                                                     0
            gptid/fb18bdd9-094a-11e5-b06a-f4ce46327c28  ONLINE       0     0                                                                                                                     0
            gptid/fb8454f8-094a-11e5-b06a-f4ce46327c28  ONLINE       0     0                                                                                                                     0
            gptid/fc360be2-094a-11e5-b06a-f4ce46327c28  ONLINE       0     0                                                                                                                     0
            18311434589177619763                        UNAVAIL      0     0                                                                                                                     0  was /dev/gptid/fc82b4a4-094a-11e5-b06a-f4ce46327c28
            17203167417674596954                        UNAVAIL      0     0 

J'ai refait tout le mic-mac.
Children 3 et Children 4 encore en cause.

je suis entrain de faire un test long du Ada3 grâce à la commande "smartctl -t long /dev/ada3

Tu as un tuto pour changer les disques abîmés ? on peut le faire à froid sans commandes ?
 
Last edited:

SmallGuy

Guru
Joined
Jun 7, 2013
Messages
560
2 de tes 5 disques (ada3 et ada4) ne peuvent pas etre importés.
Et 2 disques manquant sur un pool en RAIDZ1=t'es marron.
Sur les tests smart on constate qu'il ne sont pas en bonne santé et que tu n'avais apparament pas configuré de tests periodiques, qui t'auraient avertis dés les premiers symptômes. Il aurait alors été probablement possible de changer le disque en cause sans problème pour ton pool.
Malheureusement, dans la situation actuelle je ne vois pas de solution pour recuperer ton pool.
Peut-être que quelqu'un aura une idée lumineuse mais je reste septique.
Pour changer un disque defectueux il suffit de suivre a la lettre la documentation officielle disponible en haut de page.

[Edit]je viens de comprendre que tu as refais ton pool en RAIDZ2. J'etais rester sur le zpool status du debut du topic... Il suffit alors de remplacer les 2 disques.
 
Last edited:

Seub

Dabbler
Joined
May 27, 2015
Messages
16
Bonjour SmallGuy,
C'est un Raiz2 ;) j'ai encore mes données. la première fois j'ai tout perdu, on ne m'y reprendra plus.
Ok merci pour le lien, je vais voir ça.
Je poste les résultats du long test et vais voir pour paramétrer des tests smart ponctuels.
 

SmallGuy

Guru
Joined
Jun 7, 2013
Messages
560
Penses aussi a configurer le service smart avec des tests periodiques, et la notification par email (service smart et compte root).
 
Status
Not open for further replies.
Top