Replacing Hard Drive In Ambrosia - shawfdong/hyades GitHub Wiki
NAME STATE READ WRITE CKSUM zhome ONLINE 0 0 0 raidz2-0 ONLINE 0 0 0 mfid24 ONLINE 0 0 0 mfid25 ONLINE 0 0 3 mfid26 ONLINE 0 0 0 mfid27 ONLINE 0 0 0 mfid28 ONLINE 0 0 0 mfid35 ONLINE 0 0 0 mfid29 ONLINE 0 0 0 mfid30 ONLINE 0 0 0 mfid31 ONLINE 0 0 0 mfid32 ONLINE 0 0 11 *** mfid33 ONLINE 0 0 0 mfid34 ONLINE 0 0 0 mfi0: 898253 (508163725s/0x0002/info) - Unexpected sense: PD 2c(e0x09/s9) Path 50000c0f01d2cbfe, CDB: 8f 00 00 00 00 00 7b a0 c8 08 00 00 10 00 00 00, Sense: 3/11/00 mfid32 - PD 2c - e2:s8?
# MegaCli -pdInfo -PhysDrv '[9:1]' -a0 Enclosure Device ID: 9 Slot Number: 1 Drive's position: DiskGroup: 25, Span: 0, Arm: 0 Enclosure position: 1 Device Id: 19 WWN: 50000C0F01D2BAD5 Sequence Number: 2 Media Error Count: 21080 Other Error Count: 2594 Predictive Failure Count: 23 Last Predictive Failure Event Seq Number: 895317 PD Type: SAS Raw Size: 3.638 TB [0x1d1c0beb0 Sectors] Non Coerced Size: 3.637 TB [0x1d1b0beb0 Sectors] Coerced Size: 3.637 TB [0x1d1b00000 Sectors] Sector Size: 512 Logical Sector Size: 512 Physical Sector Size: 512 Firmware state: Online, Spun Up Commissioned Spare : No Emergency Spare : No Device Firmware Level: VR07 Shield Counter: 0 Successful diagnostics completion on : N/A SAS Address(0): 0x50000c0f01d2bad6 SAS Address(1): 0x0 Connected Port Number: 1(path0) Inquiry Data: WD WD4001FYYG-01SL3VR07WD-WMC1F1253802 FDE Capable: Not Capable FDE Enable: Disable Secured: Unsecured Locked: Unlocked Needs EKM Attention: No Foreign State: None Device Speed: 6.0Gb/s Link Speed: 6.0Gb/s Media Type: Hard Disk Device Drive: Not Certified Drive Temperature :37C (98.60 F) PI Eligibility: No Drive is formatted for PI information: No PI: No PI Port-0 : Port status: Active Port's Linkspeed: 6.0Gb/s Port-1 : Port status: Active Port's Linkspeed: 6.0Gb/s Drive has flagged a S.M.A.R.T alert : Yes
# MegaCli -pdInfo -PhysDrv '[9:9]' -a0 Enclosure Device ID: 9 Slot Number: 9 Drive's position: DiskGroup: 32, Span: 0, Arm: 0 Enclosure position: 1 Device Id: 44 WWN: 50000C0F01D2CBFD Sequence Number: 2 Media Error Count: 45972 Other Error Count: 4619 Predictive Failure Count: 15 Last Predictive Failure Event Seq Number: 895318 PD Type: SAS Raw Size: 3.638 TB [0x1d1c0beb0 Sectors] Non Coerced Size: 3.637 TB [0x1d1b0beb0 Sectors] Coerced Size: 3.637 TB [0x1d1b00000 Sectors] Sector Size: 512 Logical Sector Size: 512 Physical Sector Size: 512 Firmware state: Online, Spun Up Commissioned Spare : No Emergency Spare : No Device Firmware Level: VR07 Shield Counter: 0 Successful diagnostics completion on : N/A SAS Address(0): 0x50000c0f01d2cbfe SAS Address(1): 0x0 Connected Port Number: 1(path0) Inquiry Data: WD WD4001FYYG-01SL3VR07WD-WMC1F1273270 FDE Capable: Not Capable FDE Enable: Disable Secured: Unsecured Locked: Unlocked
So we have Predictive Failure on [9:1] & [9:9]!!!
root@ambrosia:~ # zpool status zhome pool: zhome state: DEGRADED status: One or more devices could not be opened. Sufficient replicas exist for the pool to continue functioning in a degraded state. action: Attach the missing device and online it using 'zpool online'. see: http://illumos.org/msg/ZFS-8000-2Q scan: resilvered 381G in 3h18m with 0 errors on Fri Jun 6 14:48:34 2014 config: NAME STATE READ WRITE CKSUM zhome DEGRADED 0 0 0 raidz2-0 DEGRADED 0 0 0 mfid24 ONLINE 0 0 0 mfid25 ONLINE 0 0 0 mfid26 ONLINE 0 0 0 mfid27 ONLINE 0 0 0 mfid28 ONLINE 0 0 0 mfid34 ONLINE 0 0 0 mfid29 ONLINE 0 0 0 mfid30 ONLINE 0 0 0 mfid31 ONLINE 0 0 0 2125743150 UNAVAIL 0 0 0 was /dev/mfid32 mfid32 ONLINE 0 0 0 mfid33 ONLINE 0 0 0
root@ambrosia:~ # mfiutil show drives | grep E2:S9 47 ( 3726G) UNCONFIGURED GOOD <WD WD4001FYYG-01SL3 VR07 serial=WD-WMC1F0073850> SCSI-6 E2:S9
root@ambrosia:~ # mfiutil create jbod -v E2:S9 Adding drive 47 to array 35 Adding array 35 to volume 33 mfiutil: Command failed: Status: 0x54 mfiutil: Failed to add volume: Input/output error
root@ambrosia:~ # MegaCli -CfgForeign -Clear -a0 There is no foreign configuration on controller 0. Exit Code: 0x00
root@ambrosia:~ # MegaCli -GetPreservedCacheList -a0 Adapter #0 Virtual Drive(Target ID 33): Missing. Exit Code: 0x00
# MegaCli -DiscardPreservedCache -Lall -a0
Now it works:
# mfiutil create jbod -v E2:S9 Adding drive 47 to array 35 Adding array 35 to volume 33
root@ambrosia:~ # mfiutil show volumes mfi0 Volumes: Id Size Level Stripe State Cache Name mfid0 ( 3725G) RAID-0 64k OPTIMAL Writes mfid1 ( 3725G) RAID-0 64k OPTIMAL Writes mfid2 ( 3725G) RAID-0 64k OPTIMAL Writes mfid3 ( 3725G) RAID-0 64k OPTIMAL Writes mfid4 ( 3725G) RAID-0 64k OPTIMAL Writes mfid5 ( 3725G) RAID-0 64k OPTIMAL Writes mfid6 ( 3725G) RAID-0 64k OPTIMAL Writes mfid7 ( 3725G) RAID-0 64k OPTIMAL Writes mfid8 ( 3725G) RAID-0 64k OPTIMAL Writes mfid9 ( 3725G) RAID-0 64k OPTIMAL Writes mfid10 ( 3725G) RAID-0 64k OPTIMAL Writes mfid11 ( 3725G) RAID-0 64k OPTIMAL Writes mfid12 ( 3725G) RAID-0 64k OPTIMAL Writes mfid13 ( 3725G) RAID-0 64k OPTIMAL Writes mfid14 ( 3725G) RAID-0 64k OPTIMAL Writes mfid15 ( 3725G) RAID-0 64k OPTIMAL Writes mfid16 ( 3725G) RAID-0 64k OPTIMAL Writes mfid17 ( 3725G) RAID-0 64k OPTIMAL Writes mfid18 ( 3725G) RAID-0 64k OPTIMAL Writes mfid19 ( 3725G) RAID-0 64k OPTIMAL Writes mfid20 ( 3725G) RAID-0 64k OPTIMAL Writes mfid21 ( 3725G) RAID-0 64k OPTIMAL Writes mfid22 ( 3725G) RAID-0 64k OPTIMAL Writes mfid23 ( 3725G) RAID-0 64k OPTIMAL Writes mfid24 ( 3725G) RAID-0 64k OPTIMAL Writes mfid25 ( 3725G) RAID-0 64k OPTIMAL Writes mfid26 ( 3725G) RAID-0 64k OPTIMAL Writes mfid27 ( 3725G) RAID-0 64k OPTIMAL Writes mfid28 ( 3725G) RAID-0 64k OPTIMAL Writes mfid29 ( 3725G) RAID-0 64k OPTIMAL Writes mfid30 ( 3725G) RAID-0 64k OPTIMAL Writes mfid31 ( 3725G) RAID-0 64k OPTIMAL Writes mfid32 ( 3725G) RAID-0 64k OPTIMAL Writes mfid33 ( 3725G) RAID-0 64k OPTIMAL Writes mfid34 ( 3725G) RAID-0 64k OPTIMAL Writes mfid35 ( 3725G) RAID-0 64k OPTIMAL Writes
# zpool replace zhome 2125743150 mfid35
root@ambrosia:~ # zpool status zhome pool: zhome state: DEGRADED status: One or more devices is currently being resilvered. The pool will continue to function, possibly in a degraded state. action: Wait for the resilver to complete. scan: resilver in progress since Mon Feb 8 12:41:27 2016 203M scanned out of 20.5T at 5.08M/s, (scan is slow, no estimated time) 15.8M resilvered, 0.00% done config: NAME STATE READ WRITE CKSUM zhome DEGRADED 0 0 0 raidz2-0 DEGRADED 0 0 0 mfid24 ONLINE 0 0 0 mfid25 ONLINE 0 0 0 mfid26 ONLINE 0 0 0 mfid27 ONLINE 0 0 0 mfid28 ONLINE 0 0 0 mfid34 ONLINE 0 0 0 mfid29 ONLINE 0 0 0 mfid30 ONLINE 0 0 0 mfid31 ONLINE 0 0 0 replacing-9 UNAVAIL 0 0 0 2125743150 UNAVAIL 0 0 0 was /dev/mfid32 mfid35 ONLINE 0 0 0 (resilvering) mfid32 ONLINE 0 0 0 mfid33 ONLINE 0 0 0 errors: No known data errors
root@ambrosia:~ # zpool status zhome pool: zhome state: DEGRADED status: One or more devices is currently being resilvered. The pool will continue to function, possibly in a degraded state. action: Wait for the resilver to complete. scan: resilver in progress since Mon Feb 8 12:41:27 2016 138G scanned out of 20.5T at 3.71M/s, (scan is slow, no estimated time) 11.4G resilvered, 0.66% done config: NAME STATE READ WRITE CKSUM zhome DEGRADED 0 0 0 raidz2-0 DEGRADED 0 0 0 mfid24 ONLINE 0 0 0 mfid25 ONLINE 0 0 3 (resilvering) mfid26 ONLINE 0 0 0 mfid27 ONLINE 0 0 0 mfid28 ONLINE 0 0 0 mfid34 ONLINE 0 0 0 mfid29 ONLINE 0 0 0 mfid30 ONLINE 0 0 0 mfid31 ONLINE 0 0 0 replacing-9 DEGRADED 0 0 3 2125743150 UNAVAIL 0 0 0 was /dev/mfid32 mfid35 ONLINE 0 0 0 (resilvering) mfid32 ONLINE 0 0 0 mfid33 ONLINE 0 0 0 errors: No known data errors
mfid25 needs to be replaced soon too:
Enclosure Device ID: 9 Slot Number: 1 Drive's position: DiskGroup: 25, Span: 0, Arm: 0 Enclosure position: 1 Device Id: 19 WWN: 50000C0F01D2BAD5 Sequence Number: 2 Media Error Count: 21080 Other Error Count: 2594 Predictive Failure Count: 23 Last Predictive Failure Event Seq Number: 895317 PD Type: SAS
We need to change the Controller BIOS mode to IE - Ignore errors from the default PE - Pause on errors:
root@ambrosia:~ # storcli64 /c0 show bios Controller = 0 Status = Success Description = None Controller Properties : ------------------------------------------------- Ctrl_Prop Value ------------------------------------------------- Basic Input/Output System (BIOS) ON Auto Boot Select(ABS) OFF BIOS Boot Mode Pause on errors Device Exposure Expose All ------------------------------------------------- root@ambrosia:~ # storcli64 /c1 show bios Controller = 1 Status = Failure Description = Controller 1 not found root@ambrosia:~ # storcli64 /c0 show bios Controller = 0 Status = Success Description = None Controller Properties : ------------------------------------------------- Ctrl_Prop Value ------------------------------------------------- Basic Input/Output System (BIOS) ON Auto Boot Select(ABS) OFF BIOS Boot Mode Pause on errors Device Exposure Expose All ------------------------------------------------- root@ambrosia:~ # storcli64 /c0 set BIOSMode help syntax error, unexpected TOKEN_BIOSMODE Storage Command Line Tool Ver 1.13.06 Sep 03, 2014 (c)Copyright 2014, LSI Corporation, All Rights Reserved. help - lists all the commands with their usage. E.g. storcli help <command> help - gives details about a particular command. E.g. storcli add help List of commands: Commands Description ------------------------------------------------------------------- add Adds/creates a new element to controller like VD,Spare..etc delete Deletes an element like VD,Spare show Displays information about an element set Set a particular value to a property get Get a particular value to a property compare Compares particular value to a property start Start background operation stop Stop background operation pause Pause background operation resume Resume background operation download Downloads file to given device expand expands size of given drive insert inserts new drive for missing transform downgrades the controller /cx Controller specific commands /ex Enclosure specific commands /sx Slot/PD specific commands /vx Virtual drive specific commands /dx Disk group specific commands /fall Foreign configuration specific commands /px Phy specific commands /[bbu|cv] Battery Backup Unit, Cachevault commands Other aliases : cachecade, freespace, sysinfo Use a combination of commands to filter the output of help further. E.g. 'storcli cx show help' displays all the show operations on cx. Use verbose for detailed description E.g. 'storcli add verbose help' Use 'page=[x]' as the last option in all the commands to set the page break. X=lines per page. E.g. 'storcli help page=10' Command options must be entered in the same order as displayed in the help of the respective commands. root@ambrosia:~ # storcli64 /c0 set BIOS help Storage Command Line Tool Ver 1.13.06 Sep 03, 2014 (c)Copyright 2014, LSI Corporation, All Rights Reserved. NAME: Set bios on controller SYNTAX: storcli /cx set bios [state=<on|off>] [Mode=<SOE|PE|IE|SME>] [abs=<on|off>] [DeviceExposure=<value>] Only the following combinations are supported a) storcli /cx set bios state=<on|off> b) storcli /cx set bios Mode=<SOE|PE|IE|SME> c) storcli /cx set bios abs=<on|off> d) storcli /cx set bios DeviceExposure=<value> DESCRIPTION: Set bios controller property to on or off. Mode - Sets the BIOS Boot mode. OPTIONS: SOE - Stop on errors PE - Pause on errors IE - Ignore errors SME - Safe mode on errors abs - Enables|Disables the auto boot select. DeviceExposure - Number of devices to be exposed. value range is 0-255 Value 0 and 1: Expose all Value 2 - 255: Actual number of devices to be exposed CONVENTION: /cx - specifies the controller where X is the controller index root@ambrosia:~ # storcli64 /c0 set BIOS Mode=IE Controller = 0 Status = Success Description = None Controller Properties : ---------------- Ctrl_Prop Value ---------------- BIOS Mode IE ---------------- root@ambrosia:~ # storcli64 /c0 show bios Controller = 0 Status = Success Description = None Controller Properties : ----------------------------------------------- Ctrl_Prop Value ----------------------------------------------- Basic Input/Output System (BIOS) ON Auto Boot Select(ABS) OFF BIOS Boot Mode Ignore errors Device Exposure Expose All -----------------------------------------------
Or:
root@ambrosia:~ # MegaCli -help | grep -i adpbios MegaCli -AdpBIOS -Enbl |-Dsbl | -SOE | -BE | -HCOE | - HSM | EnblAutoSelectBootLd | DsblAutoSelectBootLd | -Dsply -aN|-a0,1,2|-aALL root@ambrosia:~ # MegaCli -AdpBIOS -BE -a0 BIOS is set to Bypass Error on Adapter 0. Exit Code: 0x00 root@ambrosia:~ # MegaCli -AdpBIOS -Dsply -a0 BIOS on Adapter 0 is Enabled. BIOS will Bypass error. Auto select Boot on Adapter 0 is Disabled. Exit Code: 0x00