Replacing Hard Drive In Ambrosia - shawfdong/hyades GitHub Wiki
NAME STATE READ WRITE CKSUM zhome ONLINE 0 0 0 raidz2-0 ONLINE 0 0 0 mfid24 ONLINE 0 0 0 mfid25 ONLINE 0 0 3 mfid26 ONLINE 0 0 0 mfid27 ONLINE 0 0 0 mfid28 ONLINE 0 0 0 mfid35 ONLINE 0 0 0 mfid29 ONLINE 0 0 0 mfid30 ONLINE 0 0 0 mfid31 ONLINE 0 0 0 mfid32 ONLINE 0 0 11 *** mfid33 ONLINE 0 0 0 mfid34 ONLINE 0 0 0 mfi0: 898253 (508163725s/0x0002/info) - Unexpected sense: PD 2c(e0x09/s9) Path 50000c0f01d2cbfe, CDB: 8f 00 00 00 00 00 7b a0 c8 08 00 00 10 00 00 00, Sense: 3/11/00 mfid32 - PD 2c - e2:s8?
# MegaCli -pdInfo -PhysDrv '[9:1]' -a0
Enclosure Device ID: 9
Slot Number: 1
Drive's position: DiskGroup: 25, Span: 0, Arm: 0
Enclosure position: 1
Device Id: 19
WWN: 50000C0F01D2BAD5
Sequence Number: 2
Media Error Count: 21080
Other Error Count: 2594
Predictive Failure Count: 23
Last Predictive Failure Event Seq Number: 895317
PD Type: SAS
Raw Size: 3.638 TB [0x1d1c0beb0 Sectors]
Non Coerced Size: 3.637 TB [0x1d1b0beb0 Sectors]
Coerced Size: 3.637 TB [0x1d1b00000 Sectors]
Sector Size: 512
Logical Sector Size: 512
Physical Sector Size: 512
Firmware state: Online, Spun Up
Commissioned Spare : No
Emergency Spare : No
Device Firmware Level: VR07
Shield Counter: 0
Successful diagnostics completion on : N/A
SAS Address(0): 0x50000c0f01d2bad6
SAS Address(1): 0x0
Connected Port Number: 1(path0)
Inquiry Data: WD WD4001FYYG-01SL3VR07WD-WMC1F1253802
FDE Capable: Not Capable
FDE Enable: Disable
Secured: Unsecured
Locked: Unlocked
Needs EKM Attention: No
Foreign State: None
Device Speed: 6.0Gb/s
Link Speed: 6.0Gb/s
Media Type: Hard Disk Device
Drive: Not Certified
Drive Temperature :37C (98.60 F)
PI Eligibility: No
Drive is formatted for PI information: No
PI: No PI
Port-0 :
Port status: Active
Port's Linkspeed: 6.0Gb/s
Port-1 :
Port status: Active
Port's Linkspeed: 6.0Gb/s
Drive has flagged a S.M.A.R.T alert : Yes
# MegaCli -pdInfo -PhysDrv '[9:9]' -a0
Enclosure Device ID: 9
Slot Number: 9
Drive's position: DiskGroup: 32, Span: 0, Arm: 0
Enclosure position: 1
Device Id: 44
WWN: 50000C0F01D2CBFD
Sequence Number: 2
Media Error Count: 45972
Other Error Count: 4619
Predictive Failure Count: 15
Last Predictive Failure Event Seq Number: 895318
PD Type: SAS
Raw Size: 3.638 TB [0x1d1c0beb0 Sectors]
Non Coerced Size: 3.637 TB [0x1d1b0beb0 Sectors]
Coerced Size: 3.637 TB [0x1d1b00000 Sectors]
Sector Size: 512
Logical Sector Size: 512
Physical Sector Size: 512
Firmware state: Online, Spun Up
Commissioned Spare : No
Emergency Spare : No
Device Firmware Level: VR07
Shield Counter: 0
Successful diagnostics completion on : N/A
SAS Address(0): 0x50000c0f01d2cbfe
SAS Address(1): 0x0
Connected Port Number: 1(path0)
Inquiry Data: WD WD4001FYYG-01SL3VR07WD-WMC1F1273270
FDE Capable: Not Capable
FDE Enable: Disable
Secured: Unsecured
Locked: Unlocked
So we have Predictive Failure on [9:1] & [9:9]!!!
root@ambrosia:~ # zpool status zhome pool: zhome state: DEGRADED status: One or more devices could not be opened. Sufficient replicas exist for the pool to continue functioning in a degraded state. action: Attach the missing device and online it using 'zpool online'. see: http://illumos.org/msg/ZFS-8000-2Q scan: resilvered 381G in 3h18m with 0 errors on Fri Jun 6 14:48:34 2014 config: NAME STATE READ WRITE CKSUM zhome DEGRADED 0 0 0 raidz2-0 DEGRADED 0 0 0 mfid24 ONLINE 0 0 0 mfid25 ONLINE 0 0 0 mfid26 ONLINE 0 0 0 mfid27 ONLINE 0 0 0 mfid28 ONLINE 0 0 0 mfid34 ONLINE 0 0 0 mfid29 ONLINE 0 0 0 mfid30 ONLINE 0 0 0 mfid31 ONLINE 0 0 0 2125743150 UNAVAIL 0 0 0 was /dev/mfid32 mfid32 ONLINE 0 0 0 mfid33 ONLINE 0 0 0
root@ambrosia:~ # mfiutil show drives | grep E2:S9 47 ( 3726G) UNCONFIGURED GOOD <WD WD4001FYYG-01SL3 VR07 serial=WD-WMC1F0073850> SCSI-6 E2:S9
root@ambrosia:~ # mfiutil create jbod -v E2:S9 Adding drive 47 to array 35 Adding array 35 to volume 33 mfiutil: Command failed: Status: 0x54 mfiutil: Failed to add volume: Input/output error
root@ambrosia:~ # MegaCli -CfgForeign -Clear -a0
There is no foreign configuration on controller 0.
Exit Code: 0x00
root@ambrosia:~ # MegaCli -GetPreservedCacheList -a0
Adapter #0
Virtual Drive(Target ID 33): Missing.
Exit Code: 0x00
# MegaCli -DiscardPreservedCache -Lall -a0
Now it works:
# mfiutil create jbod -v E2:S9 Adding drive 47 to array 35 Adding array 35 to volume 33
root@ambrosia:~ # mfiutil show volumes mfi0 Volumes: Id Size Level Stripe State Cache Name mfid0 ( 3725G) RAID-0 64k OPTIMAL Writes mfid1 ( 3725G) RAID-0 64k OPTIMAL Writes mfid2 ( 3725G) RAID-0 64k OPTIMAL Writes mfid3 ( 3725G) RAID-0 64k OPTIMAL Writes mfid4 ( 3725G) RAID-0 64k OPTIMAL Writes mfid5 ( 3725G) RAID-0 64k OPTIMAL Writes mfid6 ( 3725G) RAID-0 64k OPTIMAL Writes mfid7 ( 3725G) RAID-0 64k OPTIMAL Writes mfid8 ( 3725G) RAID-0 64k OPTIMAL Writes mfid9 ( 3725G) RAID-0 64k OPTIMAL Writes mfid10 ( 3725G) RAID-0 64k OPTIMAL Writes mfid11 ( 3725G) RAID-0 64k OPTIMAL Writes mfid12 ( 3725G) RAID-0 64k OPTIMAL Writes mfid13 ( 3725G) RAID-0 64k OPTIMAL Writes mfid14 ( 3725G) RAID-0 64k OPTIMAL Writes mfid15 ( 3725G) RAID-0 64k OPTIMAL Writes mfid16 ( 3725G) RAID-0 64k OPTIMAL Writes mfid17 ( 3725G) RAID-0 64k OPTIMAL Writes mfid18 ( 3725G) RAID-0 64k OPTIMAL Writes mfid19 ( 3725G) RAID-0 64k OPTIMAL Writes mfid20 ( 3725G) RAID-0 64k OPTIMAL Writes mfid21 ( 3725G) RAID-0 64k OPTIMAL Writes mfid22 ( 3725G) RAID-0 64k OPTIMAL Writes mfid23 ( 3725G) RAID-0 64k OPTIMAL Writes mfid24 ( 3725G) RAID-0 64k OPTIMAL Writes mfid25 ( 3725G) RAID-0 64k OPTIMAL Writes mfid26 ( 3725G) RAID-0 64k OPTIMAL Writes mfid27 ( 3725G) RAID-0 64k OPTIMAL Writes mfid28 ( 3725G) RAID-0 64k OPTIMAL Writes mfid29 ( 3725G) RAID-0 64k OPTIMAL Writes mfid30 ( 3725G) RAID-0 64k OPTIMAL Writes mfid31 ( 3725G) RAID-0 64k OPTIMAL Writes mfid32 ( 3725G) RAID-0 64k OPTIMAL Writes mfid33 ( 3725G) RAID-0 64k OPTIMAL Writes mfid34 ( 3725G) RAID-0 64k OPTIMAL Writes mfid35 ( 3725G) RAID-0 64k OPTIMAL Writes
# zpool replace zhome 2125743150 mfid35
root@ambrosia:~ # zpool status zhome
pool: zhome
state: DEGRADED
status: One or more devices is currently being resilvered. The pool will
continue to function, possibly in a degraded state.
action: Wait for the resilver to complete.
scan: resilver in progress since Mon Feb 8 12:41:27 2016
203M scanned out of 20.5T at 5.08M/s, (scan is slow, no estimated time)
15.8M resilvered, 0.00% done
config:
NAME STATE READ WRITE CKSUM
zhome DEGRADED 0 0 0
raidz2-0 DEGRADED 0 0 0
mfid24 ONLINE 0 0 0
mfid25 ONLINE 0 0 0
mfid26 ONLINE 0 0 0
mfid27 ONLINE 0 0 0
mfid28 ONLINE 0 0 0
mfid34 ONLINE 0 0 0
mfid29 ONLINE 0 0 0
mfid30 ONLINE 0 0 0
mfid31 ONLINE 0 0 0
replacing-9 UNAVAIL 0 0 0
2125743150 UNAVAIL 0 0 0 was /dev/mfid32
mfid35 ONLINE 0 0 0 (resilvering)
mfid32 ONLINE 0 0 0
mfid33 ONLINE 0 0 0
errors: No known data errors
root@ambrosia:~ # zpool status zhome
pool: zhome
state: DEGRADED
status: One or more devices is currently being resilvered. The pool will
continue to function, possibly in a degraded state.
action: Wait for the resilver to complete.
scan: resilver in progress since Mon Feb 8 12:41:27 2016
138G scanned out of 20.5T at 3.71M/s, (scan is slow, no estimated time)
11.4G resilvered, 0.66% done
config:
NAME STATE READ WRITE CKSUM
zhome DEGRADED 0 0 0
raidz2-0 DEGRADED 0 0 0
mfid24 ONLINE 0 0 0
mfid25 ONLINE 0 0 3 (resilvering)
mfid26 ONLINE 0 0 0
mfid27 ONLINE 0 0 0
mfid28 ONLINE 0 0 0
mfid34 ONLINE 0 0 0
mfid29 ONLINE 0 0 0
mfid30 ONLINE 0 0 0
mfid31 ONLINE 0 0 0
replacing-9 DEGRADED 0 0 3
2125743150 UNAVAIL 0 0 0 was /dev/mfid32
mfid35 ONLINE 0 0 0 (resilvering)
mfid32 ONLINE 0 0 0
mfid33 ONLINE 0 0 0
errors: No known data errors
mfid25 needs to be replaced soon too:
Enclosure Device ID: 9 Slot Number: 1 Drive's position: DiskGroup: 25, Span: 0, Arm: 0 Enclosure position: 1 Device Id: 19 WWN: 50000C0F01D2BAD5 Sequence Number: 2 Media Error Count: 21080 Other Error Count: 2594 Predictive Failure Count: 23 Last Predictive Failure Event Seq Number: 895317 PD Type: SAS
We need to change the Controller BIOS mode to IE - Ignore errors from the default PE - Pause on errors:
root@ambrosia:~ # storcli64 /c0 show bios
Controller = 0
Status = Success
Description = None
Controller Properties :
-------------------------------------------------
Ctrl_Prop Value
-------------------------------------------------
Basic Input/Output System (BIOS) ON
Auto Boot Select(ABS) OFF
BIOS Boot Mode Pause on errors
Device Exposure Expose All
-------------------------------------------------
root@ambrosia:~ # storcli64 /c1 show bios
Controller = 1
Status = Failure
Description = Controller 1 not found
root@ambrosia:~ # storcli64 /c0 show bios
Controller = 0
Status = Success
Description = None
Controller Properties :
-------------------------------------------------
Ctrl_Prop Value
-------------------------------------------------
Basic Input/Output System (BIOS) ON
Auto Boot Select(ABS) OFF
BIOS Boot Mode Pause on errors
Device Exposure Expose All
-------------------------------------------------
root@ambrosia:~ # storcli64 /c0 set BIOSMode help
syntax error, unexpected TOKEN_BIOSMODE
Storage Command Line Tool Ver 1.13.06 Sep 03, 2014
(c)Copyright 2014, LSI Corporation, All Rights Reserved.
help - lists all the commands with their usage. E.g. storcli help
<command> help - gives details about a particular command. E.g. storcli add help
List of commands:
Commands Description
-------------------------------------------------------------------
add Adds/creates a new element to controller like VD,Spare..etc
delete Deletes an element like VD,Spare
show Displays information about an element
set Set a particular value to a property
get Get a particular value to a property
compare Compares particular value to a property
start Start background operation
stop Stop background operation
pause Pause background operation
resume Resume background operation
download Downloads file to given device
expand expands size of given drive
insert inserts new drive for missing
transform downgrades the controller
/cx Controller specific commands
/ex Enclosure specific commands
/sx Slot/PD specific commands
/vx Virtual drive specific commands
/dx Disk group specific commands
/fall Foreign configuration specific commands
/px Phy specific commands
/[bbu|cv] Battery Backup Unit, Cachevault commands
Other aliases : cachecade, freespace, sysinfo
Use a combination of commands to filter the output of help further.
E.g. 'storcli cx show help' displays all the show operations on cx.
Use verbose for detailed description E.g. 'storcli add verbose help'
Use 'page=[x]' as the last option in all the commands to set the page break.
X=lines per page. E.g. 'storcli help page=10'
Command options must be entered in the same order as displayed in the help of
the respective commands.
root@ambrosia:~ # storcli64 /c0 set BIOS help
Storage Command Line Tool Ver 1.13.06 Sep 03, 2014
(c)Copyright 2014, LSI Corporation, All Rights Reserved.
NAME: Set bios on controller
SYNTAX: storcli /cx set bios [state=<on|off>] [Mode=<SOE|PE|IE|SME>]
[abs=<on|off>] [DeviceExposure=<value>]
Only the following combinations are supported
a) storcli /cx set bios state=<on|off>
b) storcli /cx set bios Mode=<SOE|PE|IE|SME>
c) storcli /cx set bios abs=<on|off>
d) storcli /cx set bios DeviceExposure=<value>
DESCRIPTION: Set bios controller property to on or off.
Mode - Sets the BIOS Boot mode.
OPTIONS:
SOE - Stop on errors
PE - Pause on errors
IE - Ignore errors
SME - Safe mode on errors
abs - Enables|Disables the auto boot select.
DeviceExposure - Number of devices to be exposed.
value range is 0-255
Value 0 and 1: Expose all
Value 2 - 255: Actual number of devices to be exposed
CONVENTION:
/cx - specifies the controller where X is the controller index
root@ambrosia:~ # storcli64 /c0 set BIOS Mode=IE
Controller = 0
Status = Success
Description = None
Controller Properties :
----------------
Ctrl_Prop Value
----------------
BIOS Mode IE
----------------
root@ambrosia:~ # storcli64 /c0 show bios
Controller = 0
Status = Success
Description = None
Controller Properties :
-----------------------------------------------
Ctrl_Prop Value
-----------------------------------------------
Basic Input/Output System (BIOS) ON
Auto Boot Select(ABS) OFF
BIOS Boot Mode Ignore errors
Device Exposure Expose All
-----------------------------------------------
Or:
root@ambrosia:~ # MegaCli -help | grep -i adpbios
MegaCli -AdpBIOS -Enbl |-Dsbl | -SOE | -BE | -HCOE | - HSM | EnblAutoSelectBootLd | DsblAutoSelectBootLd | -Dsply -aN|-a0,1,2|-aALL
root@ambrosia:~ # MegaCli -AdpBIOS -BE -a0
BIOS is set to Bypass Error on Adapter 0.
Exit Code: 0x00
root@ambrosia:~ # MegaCli -AdpBIOS -Dsply -a0
BIOS on Adapter 0 is Enabled.
BIOS will Bypass error.
Auto select Boot on Adapter 0 is Disabled.
Exit Code: 0x00