Routine Maintenance: Difference between revisions
Line 409: | Line 409: | ||
See [[RAIC_CLI#Adaptec|Adaptec RAID CLI Reference]] for more details on how to use the CLI | See [[RAIC_CLI#Adaptec|Adaptec RAID CLI Reference]] for more details on how to use the CLI | ||
== DELL (LSI-based) | == DELL (LSI-based) SAS controllers == | ||
Here's what the output looks like when running verify.sh on a LSI-based card: | Here's what the output looks like when running verify.sh on a LSI-based card: |
Revision as of 11:56, 1 November 2012
Free up space on backup1
backup1 is our primary customer backup system. As usage grows over time, it needs to be regularly purged of old files. The easiest way to do this is by removing deprecated files. These mostly consist of cancelled customers or temporary dump/storage files (created during dump/restores). Our standard policy is to hang onto cancelled customers for 6mos after which we remove their files (as far as customers know their data is purged immediately, but we hang onto it just in case.. and in some cases we cancel a server due to non payment so this makes it easy to restore their system). To find files to remove:
[root@backup1 ~]# cd /data/deprecated/ [root@backup1 /data/deprecated]# ls 2101-migrated-20120317.tgz old-683-cxld-20121021.tgz 69.55.230.2-wwwbackup old-744-cxld-20120708.tgz 991-DONTDELETE.tgz old-809-cxld-20120609.tgz archive-col02050-mdfile-cxld-20120409.gz old-854-cxld-20120621.tgz col01371.tgz old-931-cxld-20060513.tgz deleteme_ubuntu-10.10-x86_20111205 old-col00123-mdfile-noarchive-20120417.gz jail10_old old-col00147-vnfile-cxld-20120828.gz jail14_rsync_old old-col00419-dump-cxld-20120224.gz jail15_old old-col01098-vnfile-cxld-20120827.gz jail3_old old-col01278-dump-cxld-20120822 jail4_old old-col01517-dump-cxld-20120828 jail5_old old-col01669-dump-cxld-20120203.gz old-1009-cxld-20120608.tgz old-col01687-dump-cxld-20120909 old-1012-cxld-20120411.tgz old-col01790-dump-cxld-20120828 old-1052-cxld-20120721.tgz old-col01812-dump-cxld-20120820 old-10631-cxld-20120622.tgz old-col01938-mdfile-cxld-20120619.gz old-10632-cxld-20120622.tgz old-col02095-mdfile-noarchive-20120523.gz old-10633-cxld-20120622.tgz olddebian-3.0-v15-20110610.tgz old-1236-cxld-20120621.tgz oldmod_frontpage-deb30-v15-20110610.tgz old-1381-cxld-20120404.tgz oldmod_perl-deb30-v15-20110610.tgz old-1422-cxld-20120721.tgz oldmod_ssl-deb30-v15-20110610.tgz old-14681-cxld-20120619.tgz oldmysql-deb30-v15-20110610.tgz old-1544-cxld-20120626.tgz oldproftpd-deb30-v15-20110610.tgz old-18351-cxld-20120605.tgz old_virt14 old-1853-cxld-20120910.tgz old_virt18 old-1963-cxld-20120206.tgz oldwebmin-deb30-v15-20110610.tgz old-1967-cxld-20120605.tgz suse.virt11.20120421.tgz old-1981-noarchive-20120729.tgz virt11 old-2030-migrated-noarchive-20120727.tgz virt12_old old-2037-cxld-20120716.tgz virt13_old old-2065-cxld-20120727.tgz virt16_old old-2068-cxld-20120424.tgz virt4_old old-2085-cxld-20120531.tgz virt5_old old-364-cxld-20120904.tgz virt6_old old-446-cxld-20120512.tgz virt7_old old-613-cxld-20120601.tgz virt8_old [root@backup1 /data/deprecated]#
virtX_old and jailX_old are permanently archived, so ignore those as well as anything else marked not to delete or otherwise suspicious. Likewise, probably a good idea to try to hang onto oldTEMPLATE.gz as long as we can as well. Most of the stuff we want to delete is dated when it was deprecated, making this easy. So to remove files from 6 mos ago (running this in Oct):
[root@backup1 /data/deprecated]# ls old*201204* old-1012-cxld-20120411.tgz old-2068-cxld-20120424.tgz old-1381-cxld-20120404.tgz old-col00123-mdfile-noarchive-20120417.gz [root@backup1 /data/deprecated]# rm old*201204*
Monthly RAID checks
Every month we check the health of and verfy the parity on all our RAID-based systems. To facilitate this, we've created a simple script to start the process:
sh /root/verify.sh
Adaptec-based servers
Here's some sample output:
mail /usr/local/www/scripts# sh /root/verify.sh --------------------------------------------------------------------------------------------- Adaptec SCSI RAID Controller Command Line Interface Copyright 1998-2002 Adaptec, Inc. All rights reserved --------------------------------------------------------------------------------------------- CLI > open aac0 Executing: open "aac0" AAC0> container list /f Executing: container list /full=TRUE Num Total Oth Chunk Scsi Partition Creation System Label Type Size Ctr Size Usage B:ID:L Offset:Size State RO Lk Task Done% Ent Date Time Files ----- ------ ------ --- ------ ------- ------ ------------- ------- -- -- ------- ------ --- ------ -------- ------ 0 Mirror 33.9GB Open 0:01:0 64.0KB:33.9GB Normal 0 071002 05:39:32 /dev/aacd0 mirror0 0:00:0 64.0KB:33.9GB Normal 1 071002 05:39:32 1 Mirror 33.9GB Open 0:02:0 64.0KB:33.9GB Normal 0 071002 05:39:50 /dev/aacd1 mirror1 0:03:0 64.0KB:33.9GB Normal 1 071002 05:39:50 AAC0> disk list /f Executing: disk list /full=TRUE B:ID:L Device Type Removable media Vendor-ID Product-ID Rev Blocks Bytes/Bl ock Usage Shared Rate ------ -------------- --------------- --------- ---------------- ----- --------- -------- --- ---------------- ------ ---- 0:00:0 Disk N FUJITSU MAJ3364MC 3702 71390320 512 Initialized NO 160 0:01:0 Disk N FUJITSU MAJ3364MC 3702 71390320 512 Initialized NO 160 0:02:0 Disk N FUJITSU MAJ3364MC 3702 71390320 512 Initialized NO 160 0:03:0 Disk N FUJITSU MAJ3364MC 3702 71390320 512 Initialized NO 160 AAC0> disk show smart Executing: disk show smart Smart Method of Enable Capable Informational Exception Performance Error B:ID:L Device Exceptions(MRIE) Control Enabled Count ------ ------- ---------------- --------- ----------- ------ 0:00:0 Y 6 Y N 0 0:01:0 Y 6 Y N 0 0:02:0 Y 6 Y N 0 0:03:0 Y 6 Y N 0 0:06:0 N AAC0> task list Executing: task list Controller Tasks TaskId Function Done% Container State Specific1 Specific2 ------ -------- ------- --------- ----- --------- --------- No tasks currently running on controller AAC0> dia sh hi Executing: diagnostic show history No switches specified, defaulting to "/current". *** HISTORY BUFFER FROM CURRENT CONTROLLER RUN *** [00]: GetDiskLogEntry: container - 1, entry return 0 [01]: Container 1 started SCRUB task [02]: Starting Mirror:1 scrub [03]: Master disk: 2, start sector: 128, sector count = 71286784 [04]: Slave disk: 3, start sector: 128, sector count = 71286784 [05]: UpdateDiskLogIndex - Set - container 0, index 1 [06]: GetDiskLogEntry: container - 0, entry return 1 [07]: Container 0 started SCRUB task [08]: Starting Mirror:0 scrub [09]: Master disk: 1, start sector: 128, sector count = 71286784 [10]: Slave disk: 0, start sector: 128, sector count = 71286784 [11]: Mirror Scrub Container:1 ErrorsFound:0 [12]: Clear disk log: sector - 80, driveno 2 [13]: Clear disk log: sector - 80, driveno 3 [14]: Container 1 completed SCRUB task: [15]: Mirror Scrub Container:0 ErrorsFound:0 [16]: Clear disk log: sector - 81, driveno 1 [17]: Clear disk log: sector - 81, driveno 0 [18]: Container 0 completed SCRUB task: [19]: UpdateDiskLogIndex - Set - container 0, index 0 [20]: GetDiskLogEntry: container - 0, entry return 0 [21]: Container 0 started SCRUB task [22]: Starting Mirror:0 scrub [23]: Master disk: 1, start sector: 128, sector count = 71286784 [24]: Slave disk: 0, start sector: 128, sector count = 71286784 [25]: UpdateDiskLogIndex - Set - container 1, index 1 [26]: GetDiskLogEntry: container - 1, entry return 1 [27]: Container 1 started SCRUB task [28]: Starting Mirror:1 scrub [29]: Master disk: 2, start sector: 128, sector count = 71286784 [30]: Slave disk: 3, start sector: 128, sector count = 71286784 [31]: Mirror Scrub Container:1 ErrorsFound:0 [32]: Clear disk log: sector - 81, driveno 2 [33]: Clear disk log: sector - 81, driveno 3 [34]: Container 1 completed SCRUB task: [35]: Mirror Scrub Container:0 ErrorsFound:0 [36]: Clear disk log: sector - 80, driveno 1 [37]: Clear disk log: sector - 80, driveno 0 [38]: Container 0 completed SCRUB task: [39]: UpdateDiskLogIndex - Set - container 0, index 0 [40]: GetDiskLogEntry: container - 0, entry return 0 [41]: Container 0 started SCRUB task [42]: Starting Mirror:0 scrub [43]: Master disk: 1, start sector: 128, sector count = 71286784 [44]: Slave disk: 0, start sector: 128, sector count = 71286784 [45]: UpdateDiskLogIndex - Set - container 1, index 1 [46]: GetDiskLogEntry: container - 1, entry return 1 [47]: Container 1 started SCRUB task [48]: Starting Mirror:1 scrub [49]: Master disk: 2, start sector: 128, sector count = 71286784 [50]: Slave disk: 3, start sector: 128, sector count = 71286784 [51]: Mirror Scrub Container:1 ErrorsFound:0 [52]: Clear disk log: sector - 81, driveno 2 [53]: Clear disk log: sector - 81, driveno 3 [54]: Container 1 completed SCRUB task: [55]: Mirror Scrub Container:0 ErrorsFound:0 [56]: Clear disk log: sector - 80, driveno 1 [57]: Clear disk log: sector - 80, driveno 0 [58]: Container 0 completed SCRUB task: [59]: UpdateDiskLogIndex - Set - container 0, index 0 [60]: GetDiskLogEntry: container - 0, entry return 0 [61]: Container 0 started SCRUB task [62]: Starting Mirror:0 scrub [63]: Master disk: 1, start sector: 128, sector count = 71286784 [64]: Slave disk: 0, start sector: 128, sector count = 71286784 [65]: UpdateDiskLogIndex - Set - container 1, index 1 [66]: GetDiskLogEntry: container - 1, entry return 1 [67]: Container 1 started SCRUB task [68]: Starting Mirror:1 scrub [69]: Master disk: 2, start sector: 128, sector count = 71286784 [70]: Slave disk: 3, start sector: 128, sector count = 71286784 [71]: Mirror Scrub Container:1 ErrorsFound:0 [72]: Clear disk log: sector - 81, driveno 2 [73]: Clear disk log: sector - 81, driveno 3 [74]: Container 1 completed SCRUB task: [75]: Mirror Scrub Container:0 ErrorsFound:0 [76]: Clear disk log: sector - 80, driveno 1 [77]: Clear disk log: sector - 80, driveno 0 [78]: Container 0 completed SCRUB task: [79]: UpdateDiskLogIndex - Set - container 0, index 0 [80]: GetDiskLogEntry: container - 0, entry return 0 [81]: Container 0 started SCRUB task [82]: Starting Mirror:0 scrub [83]: Master disk: 1, start sector: 128, sector count = 71286784 [84]: Slave disk: 0, start sector: 128, sector count = 71286784 [85]: UpdateDiskLogIndex - Set - container 1, index 1 [86]: GetDiskLogEntry: container - 1, entry return 1 [87]: Container 1 started SCRUB task [88]: Starting Mirror:1 scrub [89]: Master disk: 2, start sector: 128, sector count = 71286784 [90]: Slave disk: 3, start sector: 128, sector count = 71286784 [91]: Mirror Scrub Container:1 ErrorsFound:0 [92]: Clear disk log: sector - 81, driveno 2 [93]: Clear disk log: sector - 81, driveno 3 [94]: Container 1 completed SCRUB task: [95]: Mirror Scrub Container:0 ErrorsFound:0 [96]: Clear disk log: sector - 80, driveno 1 [97]: Clear disk log: sector - 80, driveno 0 [98]: Container 0 completed SCRUB task: [99]: ======================== History Output Complete. AAC0> AAC0> exit Executing: exit press enter when ready to run verify <INS> --------------------------------------------------------------------------------------------- Adaptec SCSI RAID Controller Command Line Interface Copyright 1998-2002 Adaptec, Inc. All rights reserved --------------------------------------------------------------------------------------------- CLI > open aac0 Executing: open "aac0" AAC0> contai scr 0 Executing: container scrub 0 AAC0> contai scr 1 Executing: container scrub 1 AAC0> exit Executing: exit when done run: aaccli open aac0 dia sh hi c Nov 1 10:32:46 mail /kernel: aac0: **Monitor** Container 0 started SCRUB task Nov 1 10:32:47 mail /kernel: aac0: **Monitor** Container 1 started SCRUB task
Here's an analysis of what we're seeing and what we're looking for:
AAC0> container list /f Executing: container list /full=TRUE Num Total Oth Chunk Scsi Partition Creation System Label Type Size Ctr Size Usage B:ID:L Offset:Size State RO Lk Task Done% Ent Date Time Files ----- ------ ------ --- ------ ------- ------ ------------- ------- -- -- ------- ------ --- ------ -------- ------ 0 Mirror 33.9GB Open 0:01:0 64.0KB:33.9GB Normal 0 071002 05:39:32 /dev/aacd0 mirror0 0:00:0 64.0KB:33.9GB Normal 1 071002 05:39:32 1 Mirror 33.9GB Open 0:02:0 64.0KB:33.9GB Normal 0 071002 05:39:50 /dev/aacd1 mirror1 0:03:0 64.0KB:33.9GB Normal 1 071002 05:39:50
This is showing you the health of the arrays. You're looking for Normal under the State column, and the absence of a ! in the sector size - sometimes, you'll see this:
64.0KB!33.9GB
That indicates a problem.
AAC0> disk show smart Executing: disk show smart Smart Method of Enable Capable Informational Exception Performance Error B:ID:L Device Exceptions(MRIE) Control Enabled Count ------ ------- ---------------- --------- ----------- ------ 0:00:0 Y 6 Y N 0 0:01:0 Y 6 Y N 0 0:02:0 Y 6 Y N 0 0:03:0 Y 6 Y N 0 0:06:0 N
This shows you a SMART report output. Looking for values in the Error Count column.
AAC0> task list Executing: task list Controller Tasks TaskId Function Done% Container State Specific1 Specific2 ------ -------- ------- --------- ----- --------- --------- No tasks currently running on controller
Look for absence of tasks running- a bad thing would be to see a rebuild or verify running when you didn't initiate it.
With the history output, you're looking for any anomalies or events since the last time a verify was run. If you see a drive with lots of problems, you may want to take backups before allowing the verify to run since it could replicate errors onto the good drive.
After you see the history output, it will prompt you to press enter to run the verify. If you're happy with all the output you're seeing- mirror is healthy, history looks good, it's safe to proceed. Otherwise ^C to exit. After hitting enter it will start the verify and start to tail the messages log so you can easily see when the verify is complete. At which point you will run the provided output to followup and view the history to see the results of the verify. So, putting it all together, after hitting enter to start the verify, you'll see:
--------------------------------------------------------------------------------------------- Adaptec SCSI RAID Controller Command Line Interface Copyright 1998-2002 Adaptec, Inc. All rights reserved --------------------------------------------------------------------------------------------- CLI > open aac0 Executing: open "aac0" AAC0> contai scr 0 Executing: container scrub 0 AAC0> contai scr 1 Executing: container scrub 1 AAC0> exit Executing: exit when done run: aaccli open aac0 dia sh hi c Nov 1 10:32:46 mail /kernel: aac0: **Monitor** Container 0 started SCRUB task Nov 1 10:32:47 mail /kernel: aac0: **Monitor** Container 1 started SCRUB task
When the scrub(s) (verify) are complete - if the server has multiple logical drives, it will run both in parallel - you should run:
aaccli open aac0 dia sh hi c
Which will show you the diagnostic history, you're looking for the results of the most recent scrub:
[100]: Mirror Scrub Container:1 ErrorsFound:0 [101]: Clear disk log: sector - 81, driveno 2 [102]: Clear disk log: sector - 81, driveno 3 [103]: Container 1 completed SCRUB task: [104]: Mirror Scrub Container:0 ErrorsFound:0 [105]: Clear disk log: sector - 80, driveno 1 [106]: Clear disk log: sector - 80, driveno 0 [107]: Container 0 completed SCRUB task:
If you see:
[104]: Mirror Scrub Container:0 ErrorsFound:5
You'll want to rerun the verify on that drive till it shows 0, or perhaps replace the drive- you should be able to see from the output which drive had the problem.
Depending on the size and how busy the drive is, the verify can take anywhere from an hour to the better part of a day.
You will notice that the diagnostic history is not shown on our modern adaptec cards (i.e. any adaptec card not in a Dell 2450). The reason for this is the history is never cleared, so there's simply too much data to show and it just crashes the CLI. So, don't bother trying to see it...which does make it hard to see if there are problems going on, so you just need to watch the scrub to see it goes to 100%. You will also notice that on some servers there's no tail of messages. Again, this is cause no data is shown there about the completion of the scrub. The thing to do here is to go into the CLI and continue to show tasks to monitor scrub progress.
See Adaptec RAID CLI Reference for more details on how to use the CLI
DELL (LSI-based) SAS controllers
Here's what the output looks like when running verify.sh on a LSI-based card:
jail2 /mnt/data2# sh /root/verify.sh Adapter #0 Enclosure Device ID: 32 Slot Number: 0 Device Id: 0 Sequence Number: 2 Media Error Count: 0 Other Error Count: 0 Predictive Failure Count: 0 Last Predictive Failure Event Seq Number: 0 PD Type: SAS Raw Size: 140014MB [0x11177328 Sectors] Non Coerced Size: 139502MB [0x11077328 Sectors] Coerced Size: 139392MB [0x11040000 Sectors] Firmware state: Online SAS Address(0): 0x500000e018396142 SAS Address(1): 0x0 Connected Port Number: 0(path0) Inquiry Data: FUJITSU MAX3147RC D207DQ03P7A0DESN Foreign State: None Media Type: Hard Disk Device Device Speed: Unknown Link Speed: Unknown Enclosure Device ID: 32 Slot Number: 1 Device Id: 1 Sequence Number: 2 Media Error Count: 0 Other Error Count: 0 Predictive Failure Count: 0 Last Predictive Failure Event Seq Number: 0 PD Type: SAS Raw Size: 140014MB [0x11177328 Sectors] Non Coerced Size: 139502MB [0x11077328 Sectors] Coerced Size: 139392MB [0x11040000 Sectors] Firmware state: Online SAS Address(0): 0x500000e018395db2 SAS Address(1): 0x0 Connected Port Number: 1(path0) Inquiry Data: FUJITSU MAX3147RC D207DQ03P7A0DERV Foreign State: None Media Type: Hard Disk Device Device Speed: Unknown Link Speed: Unknown Enclosure Device ID: 32 Slot Number: 2 Device Id: 2 Sequence Number: 2 Media Error Count: 0 Other Error Count: 0 Predictive Failure Count: 0 Last Predictive Failure Event Seq Number: 0 PD Type: SAS Raw Size: 286102MB [0x22ecb25c Sectors] Non Coerced Size: 285590MB [0x22dcb25c Sectors] Coerced Size: 285568MB [0x22dc0000 Sectors] Firmware state: Online SAS Address(0): 0x5000c50006eece89 SAS Address(1): 0x0 Connected Port Number: 2(path0) Inquiry Data: SEAGATE ST3300555SS T2113LM4BFBZ Foreign State: None Media Type: Hard Disk Device Device Speed: Unknown Link Speed: Unknown Enclosure Device ID: 32 Slot Number: 3 Device Id: 3 Sequence Number: 2 Media Error Count: 0 Other Error Count: 0 Predictive Failure Count: 0 Last Predictive Failure Event Seq Number: 0 PD Type: SAS Raw Size: 286102MB [0x22ecb25c Sectors] Non Coerced Size: 285590MB [0x22dcb25c Sectors] Coerced Size: 285568MB [0x22dc0000 Sectors] Firmware state: Online SAS Address(0): 0x5000c50006eee035 SAS Address(1): 0x0 Connected Port Number: 3(path0) Inquiry Data: SEAGATE ST3300555SS T2113LM4BGF7 Foreign State: None Media Type: Hard Disk Device Device Speed: Unknown Link Speed: Unknown Enclosure Device ID: 32 Slot Number: 4 Device Id: 4 Sequence Number: 2 Media Error Count: 0 Other Error Count: 0 Predictive Failure Count: 0 Last Predictive Failure Event Seq Number: 0 PD Type: SAS Raw Size: 286102MB [0x22ecb25c Sectors] Non Coerced Size: 285590MB [0x22dcb25c Sectors] Coerced Size: 285568MB [0x22dc0000 Sectors] Firmware state: Online SAS Address(0): 0x5000c50004bd7ea5 SAS Address(1): 0x0 Connected Port Number: 4(path0) Inquiry Data: SEAGATE ST3300656SS HS093QP0G8SW Foreign State: None Media Type: Hard Disk Device Device Speed: Unknown Link Speed: Unknown Enclosure Device ID: 32 Slot Number: 5 Device Id: 5 Sequence Number: 2 Media Error Count: 0 Other Error Count: 0 Predictive Failure Count: 0 Last Predictive Failure Event Seq Number: 0 PD Type: SAS Raw Size: 286102MB [0x22ecb25c Sectors] Non Coerced Size: 285590MB [0x22dcb25c Sectors] Coerced Size: 285568MB [0x22dc0000 Sectors] Firmware state: Online SAS Address(0): 0x500000e01f1c4112 SAS Address(1): 0x0 Connected Port Number: 5(path0) Inquiry Data: FUJITSU MBA3300RC D306BJ15P9201W06 Foreign State: None Media Type: Hard Disk Device Device Speed: Unknown Link Speed: Unknown Exit Code: 0x00 Adapter 0 -- Virtual Drive Information: Virtual Disk: 0 (Target Id: 0) Name: RAID Level: Primary-1, Secondary-0, RAID Level Qualifier-0 Size:139392MB State: Optimal Stripe Size: 64kB Number Of Drives:2 Span Depth:1 Default Cache Policy: WriteBack, ReadAdaptive, Direct, No Write Cache if Bad BBU Current Cache Policy: WriteBack, ReadAdaptive, Direct, No Write Cache if Bad BBU Access Policy: Read/Write Disk Cache Policy: Disk's Default Exit Code: 0x00 Adapter 0 -- Virtual Drive Information: Virtual Disk: 1 (Target Id: 1) Name:MIRROR1 RAID Level: Primary-1, Secondary-0, RAID Level Qualifier-0 Size:285568MB State: Optimal Stripe Size: 64kB Number Of Drives:2 Span Depth:1 Default Cache Policy: WriteBack, ReadAdaptive, Direct, No Write Cache if Bad BBU Current Cache Policy: WriteBack, ReadAdaptive, Direct, No Write Cache if Bad BBU Access Policy: Read/Write Disk Cache Policy: Disk's Default Exit Code: 0x00 Adapter 0 -- Virtual Drive Information: Virtual Disk: 2 (Target Id: 2) Name:MIRROR2 RAID Level: Primary-1, Secondary-0, RAID Level Qualifier-0 Size:285568MB State: Optimal Stripe Size: 64kB Number Of Drives:2 Span Depth:1 Default Cache Policy: WriteBack, ReadAdaptive, Direct, No Write Cache if Bad BBU Current Cache Policy: WriteBack, ReadAdaptive, Direct, No Write Cache if Bad BBU Access Policy: Read/Write Disk Cache Policy: Disk's Default Exit Code: 0x00 Battery FRU : N/A Battery Warning : Enabled Memory Correctable Errors : 0 Memory Uncorrectable Errors : 0 BBU : Present BBU : Yes Cache When BBU Bad : Disabled press enter when ready to run verify
Before pressing enter, here's what we're looking for:
Enclosure Device ID: 32 Slot Number: 0 Device Id: 0 Sequence Number: 2 Media Error Count: 0 Other Error Count: 0 Predictive Failure Count: 0 Last Predictive Failure Event Seq Number: 0 PD Type: SAS Raw Size: 140014MB [0x11177328 Sectors] Non Coerced Size: 139502MB [0x11077328 Sectors] Coerced Size: 139392MB [0x11040000 Sectors] Firmware state: Online SAS Address(0): 0x500000e018396142 SAS Address(1): 0x0 Connected Port Number: 0(path0) Inquiry Data: FUJITSU MAX3147RC D207DQ03P7A0DESN Foreign State: None Media Type: Hard Disk Device Device Speed: Unknown Link Speed: Unknown
This is the output shown for each physical drive in the system. We're looking to confirm it's Firmware state is Online, and Media Error Count, Other Error Count, and Predictive Failure Count are all zero (or near zero).
Adapter 0 -- Virtual Drive Information: Virtual Disk: 1 (Target Id: 1) Name:MIRROR1 RAID Level: Primary-1, Secondary-0, RAID Level Qualifier-0 Size:285568MB State: Optimal Stripe Size: 64kB Number Of Drives:2 Span Depth:1 Default Cache Policy: WriteBack, ReadAdaptive, Direct, No Write Cache if Bad BBU Current Cache Policy: WriteBack, ReadAdaptive, Direct, No Write Cache if Bad BBU Access Policy: Read/Write Disk Cache Policy: Disk's Default
This is the output for each logical drive. We're looking for State Optimal. Also confirm Default Cache Policy: WriteBack, ReadAdaptive, Direct, No Write Cache if Bad BBU
Exit Code: 0x00 Battery FRU : N/A Battery Warning : Enabled Memory Correctable Errors : 0 Memory Uncorrectable Errors : 0 BBU : Present BBU : Yes Cache When BBU Bad : Disabled
Confirm that the battery is present and error-free.
If all that checks out, you're ready to proceed with the verify. After pressing enter, the verify is started and here's what you see:
Start Check Consistency on Virtual Drive 0 (target id: 0) Success. Exit Code: 0x00 Start Check Consistency on Virtual Drive 1 (target id: 1) Success. Exit Code: 0x00 Start Check Consistency on Virtual Drive 2 (target id: 2) Success. Exit Code: 0x00 Check Consistency Progress of Virtual Drives... Virtual Drive # Percent Complete Time Elps 0 °°°°°°°°°°°°°°°°°°°°°°°00 %°°°°°°°°°°°°°°°°°°°°°°° 00:00:03 1 °°°°°°°°°°°°°°°°°°°°°°°00 %°°°°°°°°°°°°°°°°°°°°°°° 00:00:02 2 °°°°°°°°°°°°°°°°°°°°°°°00 %°°°°°°°°°°°°°°°°°°°°°°° 00:00:01 Press <ESC> key to quit...
The progress for each drive is displayed until all drives have completed the verify. We just want to make sure that each drive goes to completion. No followup is needed...though there probably is a log or history where we can get more info.
You will notice that jail7 does not run a verify- that's on purpose. The last time we tried this it crashed the system. So, this must be run from the BIOS (take the system offline for a couple hours).
See LSI RAID CLI Reference for more details on how to use the CLI