Replacing Hard Drive In Ambrosia - shawfdong/hyades GitHub Wiki

Ambrosia Disk Failure

	NAME        STATE     READ WRITE CKSUM
	zhome       ONLINE       0     0     0
	  raidz2-0  ONLINE       0     0     0
	    mfid24  ONLINE       0     0     0
	    mfid25  ONLINE       0     0     3
	    mfid26  ONLINE       0     0     0
	    mfid27  ONLINE       0     0     0
	    mfid28  ONLINE       0     0     0
	    mfid35  ONLINE       0     0     0
	    mfid29  ONLINE       0     0     0
	    mfid30  ONLINE       0     0     0
	    mfid31  ONLINE       0     0     0
	    mfid32  ONLINE       0     0    11 ***
	    mfid33  ONLINE       0     0     0
	    mfid34  ONLINE       0     0     0

mfi0: 898253 (508163725s/0x0002/info) - Unexpected sense: PD 2c(e0x09/s9) Path 50000c0f01d2cbfe, CDB: 8f 00 00 00 00 00 7b a0 c8 08 00 00 10 00 00 00, Sense: 3/11/00

mfid32 - PD 2c - e2:s8?

MegaCLI:

# MegaCli -pdInfo -PhysDrv '[9:1]' -a0 
                                     
Enclosure Device ID: 9
Slot Number: 1
Drive's position: DiskGroup: 25, Span: 0, Arm: 0
Enclosure position: 1
Device Id: 19
WWN: 50000C0F01D2BAD5
Sequence Number: 2
Media Error Count: 21080
Other Error Count: 2594
Predictive Failure Count: 23
Last Predictive Failure Event Seq Number: 895317
PD Type: SAS

Raw Size: 3.638 TB [0x1d1c0beb0 Sectors]
Non Coerced Size: 3.637 TB [0x1d1b0beb0 Sectors]
Coerced Size: 3.637 TB [0x1d1b00000 Sectors]
Sector Size:  512
Logical Sector Size:  512
Physical Sector Size:  512
Firmware state: Online, Spun Up
Commissioned Spare : No
Emergency Spare : No
Device Firmware Level: VR07
Shield Counter: 0
Successful diagnostics completion on :  N/A
SAS Address(0): 0x50000c0f01d2bad6
SAS Address(1): 0x0
Connected Port Number: 1(path0) 
Inquiry Data: WD      WD4001FYYG-01SL3VR07WD-WMC1F1253802     
FDE Capable: Not Capable
FDE Enable: Disable
Secured: Unsecured
Locked: Unlocked
Needs EKM Attention: No
Foreign State: None 
Device Speed: 6.0Gb/s 
Link Speed: 6.0Gb/s 
Media Type: Hard Disk Device
Drive:  Not Certified
Drive Temperature :37C (98.60 F)
PI Eligibility:  No 
Drive is formatted for PI information:  No
PI: No PI
Port-0 :
Port status: Active
Port's Linkspeed: 6.0Gb/s 
Port-1 :
Port status: Active
Port's Linkspeed: 6.0Gb/s 
Drive has flagged a S.M.A.R.T alert : Yes
# MegaCli -pdInfo -PhysDrv '[9:9]' -a0
                                     
Enclosure Device ID: 9
Slot Number: 9
Drive's position: DiskGroup: 32, Span: 0, Arm: 0
Enclosure position: 1
Device Id: 44
WWN: 50000C0F01D2CBFD
Sequence Number: 2
Media Error Count: 45972
Other Error Count: 4619
Predictive Failure Count: 15
Last Predictive Failure Event Seq Number: 895318
PD Type: SAS

Raw Size: 3.638 TB [0x1d1c0beb0 Sectors]
Non Coerced Size: 3.637 TB [0x1d1b0beb0 Sectors]
Coerced Size: 3.637 TB [0x1d1b00000 Sectors]
Sector Size:  512
Logical Sector Size:  512
Physical Sector Size:  512
Firmware state: Online, Spun Up
Commissioned Spare : No
Emergency Spare : No
Device Firmware Level: VR07
Shield Counter: 0
Successful diagnostics completion on :  N/A
SAS Address(0): 0x50000c0f01d2cbfe
SAS Address(1): 0x0
Connected Port Number: 1(path0) 
Inquiry Data: WD      WD4001FYYG-01SL3VR07WD-WMC1F1273270     
FDE Capable: Not Capable
FDE Enable: Disable
Secured: Unsecured
Locked: Unlocked

So we have Predictive Failure on [9:1] & [9:9]!!!

root@ambrosia:~ # zpool status zhome
  pool: zhome
 state: DEGRADED
status: One or more devices could not be opened.  Sufficient replicas exist for
	the pool to continue functioning in a degraded state.
action: Attach the missing device and online it using 'zpool online'.
   see: http://illumos.org/msg/ZFS-8000-2Q
  scan: resilvered 381G in 3h18m with 0 errors on Fri Jun  6 14:48:34 2014
config:

	NAME            STATE     READ WRITE CKSUM
	zhome           DEGRADED     0     0     0
	  raidz2-0      DEGRADED     0     0     0
	    mfid24      ONLINE       0     0     0
	    mfid25      ONLINE       0     0     0
	    mfid26      ONLINE       0     0     0
	    mfid27      ONLINE       0     0     0
	    mfid28      ONLINE       0     0     0
	    mfid34      ONLINE       0     0     0
	    mfid29      ONLINE       0     0     0
	    mfid30      ONLINE       0     0     0
	    mfid31      ONLINE       0     0     0
	    2125743150  UNAVAIL      0     0     0  was /dev/mfid32
	    mfid32      ONLINE       0     0     0
	    mfid33      ONLINE       0     0     0
root@ambrosia:~ # mfiutil show drives | grep E2:S9
47 ( 3726G) UNCONFIGURED GOOD <WD WD4001FYYG-01SL3 VR07 serial=WD-WMC1F0073850> SCSI-6 E2:S9
root@ambrosia:~ # mfiutil create jbod -v E2:S9
Adding drive 47 to array 35
Adding array 35 to volume 33
mfiutil: Command failed: Status: 0x54
mfiutil: Failed to add volume: Input/output error
root@ambrosia:~ # MegaCli -CfgForeign -Clear -a0
                                     
There is no foreign configuration on controller 0.

Exit Code: 0x00
root@ambrosia:~ # MegaCli -GetPreservedCacheList -a0
                                     
Adapter #0

Virtual Drive(Target ID 33): Missing.

Exit Code: 0x00
# MegaCli -DiscardPreservedCache -Lall -a0

Now it works:

# mfiutil create jbod -v E2:S9
Adding drive 47 to array 35
Adding array 35 to volume 33
root@ambrosia:~ # mfiutil show volumes
mfi0 Volumes:
  Id     Size    Level   Stripe  State   Cache   Name
 mfid0 ( 3725G) RAID-0      64k OPTIMAL Writes  
 mfid1 ( 3725G) RAID-0      64k OPTIMAL Writes  
 mfid2 ( 3725G) RAID-0      64k OPTIMAL Writes  
 mfid3 ( 3725G) RAID-0      64k OPTIMAL Writes  
 mfid4 ( 3725G) RAID-0      64k OPTIMAL Writes  
 mfid5 ( 3725G) RAID-0      64k OPTIMAL Writes  
 mfid6 ( 3725G) RAID-0      64k OPTIMAL Writes  
 mfid7 ( 3725G) RAID-0      64k OPTIMAL Writes  
 mfid8 ( 3725G) RAID-0      64k OPTIMAL Writes  
 mfid9 ( 3725G) RAID-0      64k OPTIMAL Writes  
mfid10 ( 3725G) RAID-0      64k OPTIMAL Writes  
mfid11 ( 3725G) RAID-0      64k OPTIMAL Writes  
mfid12 ( 3725G) RAID-0      64k OPTIMAL Writes  
mfid13 ( 3725G) RAID-0      64k OPTIMAL Writes  
mfid14 ( 3725G) RAID-0      64k OPTIMAL Writes  
mfid15 ( 3725G) RAID-0      64k OPTIMAL Writes  
mfid16 ( 3725G) RAID-0      64k OPTIMAL Writes  
mfid17 ( 3725G) RAID-0      64k OPTIMAL Writes  
mfid18 ( 3725G) RAID-0      64k OPTIMAL Writes  
mfid19 ( 3725G) RAID-0      64k OPTIMAL Writes  
mfid20 ( 3725G) RAID-0      64k OPTIMAL Writes  
mfid21 ( 3725G) RAID-0      64k OPTIMAL Writes  
mfid22 ( 3725G) RAID-0      64k OPTIMAL Writes  
mfid23 ( 3725G) RAID-0      64k OPTIMAL Writes  
mfid24 ( 3725G) RAID-0      64k OPTIMAL Writes  
mfid25 ( 3725G) RAID-0      64k OPTIMAL Writes  
mfid26 ( 3725G) RAID-0      64k OPTIMAL Writes  
mfid27 ( 3725G) RAID-0      64k OPTIMAL Writes  
mfid28 ( 3725G) RAID-0      64k OPTIMAL Writes  
mfid29 ( 3725G) RAID-0      64k OPTIMAL Writes  
mfid30 ( 3725G) RAID-0      64k OPTIMAL Writes  
mfid31 ( 3725G) RAID-0      64k OPTIMAL Writes  
mfid32 ( 3725G) RAID-0      64k OPTIMAL Writes  
mfid33 ( 3725G) RAID-0      64k OPTIMAL Writes  
mfid34 ( 3725G) RAID-0      64k OPTIMAL Writes  
mfid35 ( 3725G) RAID-0      64k OPTIMAL Writes  
# zpool replace zhome 2125743150 mfid35
root@ambrosia:~ # zpool status zhome
  pool: zhome
 state: DEGRADED
status: One or more devices is currently being resilvered.  The pool will
	continue to function, possibly in a degraded state.
action: Wait for the resilver to complete.
  scan: resilver in progress since Mon Feb  8 12:41:27 2016
        203M scanned out of 20.5T at 5.08M/s, (scan is slow, no estimated time)
        15.8M resilvered, 0.00% done
config:

	NAME              STATE     READ WRITE CKSUM
	zhome             DEGRADED     0     0     0
	  raidz2-0        DEGRADED     0     0     0
	    mfid24        ONLINE       0     0     0
	    mfid25        ONLINE       0     0     0
	    mfid26        ONLINE       0     0     0
	    mfid27        ONLINE       0     0     0
	    mfid28        ONLINE       0     0     0
	    mfid34        ONLINE       0     0     0
	    mfid29        ONLINE       0     0     0
	    mfid30        ONLINE       0     0     0
	    mfid31        ONLINE       0     0     0
	    replacing-9   UNAVAIL      0     0     0
	      2125743150  UNAVAIL      0     0     0  was /dev/mfid32
	      mfid35      ONLINE       0     0     0  (resilvering)
	    mfid32        ONLINE       0     0     0
	    mfid33        ONLINE       0     0     0

errors: No known data errors
root@ambrosia:~ # zpool status zhome
  pool: zhome
 state: DEGRADED
status: One or more devices is currently being resilvered.  The pool will
	continue to function, possibly in a degraded state.
action: Wait for the resilver to complete.
  scan: resilver in progress since Mon Feb  8 12:41:27 2016
        138G scanned out of 20.5T at 3.71M/s, (scan is slow, no estimated time)
        11.4G resilvered, 0.66% done
config:

	NAME              STATE     READ WRITE CKSUM
	zhome             DEGRADED     0     0     0
	  raidz2-0        DEGRADED     0     0     0
	    mfid24        ONLINE       0     0     0
	    mfid25        ONLINE       0     0     3  (resilvering)
	    mfid26        ONLINE       0     0     0
	    mfid27        ONLINE       0     0     0
	    mfid28        ONLINE       0     0     0
	    mfid34        ONLINE       0     0     0
	    mfid29        ONLINE       0     0     0
	    mfid30        ONLINE       0     0     0
	    mfid31        ONLINE       0     0     0
	    replacing-9   DEGRADED     0     0     3
	      2125743150  UNAVAIL      0     0     0  was /dev/mfid32
	      mfid35      ONLINE       0     0     0  (resilvering)
	    mfid32        ONLINE       0     0     0
	    mfid33        ONLINE       0     0     0

errors: No known data errors

mfid25 needs to be replaced soon too:

Enclosure Device ID: 9
Slot Number: 1
Drive's position: DiskGroup: 25, Span: 0, Arm: 0
Enclosure position: 1
Device Id: 19
WWN: 50000C0F01D2BAD5
Sequence Number: 2
Media Error Count: 21080
Other Error Count: 2594
Predictive Failure Count: 23
Last Predictive Failure Event Seq Number: 895317
PD Type: SAS

We need to change the Controller BIOS mode to IE - Ignore errors from the default PE - Pause on errors:

root@ambrosia:~ # storcli64 /c0 show bios
Controller = 0
Status = Success
Description = None


Controller Properties :

-------------------------------------------------
Ctrl_Prop                        Value           
-------------------------------------------------
Basic Input/Output System (BIOS) ON              
Auto Boot Select(ABS)            OFF             
BIOS Boot Mode                   Pause on errors 
Device Exposure                  Expose All      
-------------------------------------------------


root@ambrosia:~ # storcli64 /c1 show bios
Controller = 1
Status = Failure
Description = Controller 1 not found

root@ambrosia:~ # storcli64 /c0 show bios
Controller = 0
Status = Success
Description = None


Controller Properties :

-------------------------------------------------
Ctrl_Prop                        Value           
-------------------------------------------------
Basic Input/Output System (BIOS) ON              
Auto Boot Select(ABS)            OFF             
BIOS Boot Mode                   Pause on errors 
Device Exposure                  Expose All      
-------------------------------------------------


root@ambrosia:~ # storcli64 /c0 set BIOSMode help

syntax error, unexpected TOKEN_BIOSMODE

     Storage Command Line Tool  Ver 1.13.06 Sep 03, 2014

     (c)Copyright 2014, LSI Corporation, All Rights Reserved.


help - lists all the commands with their usage. E.g. storcli help
<command> help - gives details about a particular command. E.g. storcli add help

List of commands:

Commands   Description
-------------------------------------------------------------------
add        Adds/creates a new element to controller like VD,Spare..etc
delete     Deletes an element like VD,Spare
show       Displays information about an element
set        Set a particular value to a property 
get        Get a particular value to a property 
compare    Compares particular value to a property
start      Start background operation
stop       Stop background operation
pause      Pause background operation
resume     Resume background operation
download   Downloads file to given device
expand     expands size of given drive
insert     inserts new drive for missing
transform  downgrades the controller
/cx        Controller specific commands
/ex        Enclosure specific commands
/sx        Slot/PD specific commands
/vx        Virtual drive specific commands
/dx        Disk group specific commands
/fall      Foreign configuration specific commands
/px        Phy specific commands
/[bbu|cv]  Battery Backup Unit, Cachevault commands

Other aliases : cachecade, freespace, sysinfo

Use a combination of commands to filter the output of help further.
E.g. 'storcli cx show help' displays all the show operations on cx.
Use verbose for detailed description E.g. 'storcli add  verbose help'
Use 'page=[x]' as the last option in all the commands to set the page break.
X=lines per page. E.g. 'storcli help page=10'


Command options must be entered in the same order as displayed in the help of 
the respective commands.
root@ambrosia:~ # storcli64 /c0 set BIOS help
     Storage Command Line Tool  Ver 1.13.06 Sep 03, 2014

     (c)Copyright 2014, LSI Corporation, All Rights Reserved.


NAME: Set bios on controller

SYNTAX: storcli /cx set bios [state=<on|off>] [Mode=<SOE|PE|IE|SME>]
		[abs=<on|off>] [DeviceExposure=<value>]

  Only the following combinations are supported 
    a) storcli /cx set bios state=<on|off> 
    b) storcli /cx set bios Mode=<SOE|PE|IE|SME> 
    c) storcli /cx set bios abs=<on|off> 
    d) storcli /cx set bios DeviceExposure=<value> 

DESCRIPTION: Set bios controller property to on or off. 
   Mode - Sets the BIOS Boot mode.

     OPTIONS:
       SOE - Stop on errors 
       PE  - Pause on errors 
       IE  - Ignore errors 
       SME - Safe mode on errors 
   abs - Enables|Disables  the auto boot select.

   DeviceExposure - Number of devices to be exposed.
       value range is 0-255 
       Value 0 and 1: Expose all
       Value 2 - 255: Actual number of devices to be exposed

CONVENTION:
/cx - specifies the controller where X is the controller index


root@ambrosia:~ # storcli64 /c0 set BIOS Mode=IE
Controller = 0
Status = Success
Description = None


Controller Properties :

----------------
Ctrl_Prop Value 
----------------
BIOS Mode IE    
----------------


root@ambrosia:~ # storcli64 /c0 show bios
Controller = 0
Status = Success
Description = None


Controller Properties :

-----------------------------------------------
Ctrl_Prop                        Value         
-----------------------------------------------
Basic Input/Output System (BIOS) ON            
Auto Boot Select(ABS)            OFF           
BIOS Boot Mode                   Ignore errors 
Device Exposure                  Expose All    
-----------------------------------------------

Or:

root@ambrosia:~ # MegaCli -help | grep -i adpbios
MegaCli -AdpBIOS -Enbl |-Dsbl | -SOE | -BE |  -HCOE | - HSM | EnblAutoSelectBootLd | DsblAutoSelectBootLd | -Dsply -aN|-a0,1,2|-aALL 
root@ambrosia:~ # MegaCli -AdpBIOS -BE -a0
                                     
BIOS is set to Bypass Error on Adapter 0.

Exit Code: 0x00
root@ambrosia:~ # MegaCli -AdpBIOS -Dsply -a0
                                     
BIOS on Adapter 0 is Enabled.
    BIOS will Bypass error.
Auto select Boot on Adapter 0 is Disabled.

Exit Code: 0x00

See Also

⚠️ **GitHub.com Fallback** ⚠️