Revision as of 10:56, 1 November 2012

Free up space on backup1

backup1 is our primary customer backup system. As usage grows over time, it needs to be regularly purged of old files. The easiest way to do this is by removing deprecated files. These mostly consist of cancelled customers or temporary dump/storage files (created during dump/restores). Our standard policy is to hang onto cancelled customers for 6mos after which we remove their files (as far as customers know their data is purged immediately, but we hang onto it just in case.. and in some cases we cancel a server due to non payment so this makes it easy to restore their system). To find files to remove:

[root@backup1 ~]# cd /data/deprecated/
[root@backup1 /data/deprecated]# ls
2101-migrated-20120317.tgz                old-683-cxld-20121021.tgz
69.55.230.2-wwwbackup                     old-744-cxld-20120708.tgz
991-DONTDELETE.tgz                        old-809-cxld-20120609.tgz
archive-col02050-mdfile-cxld-20120409.gz  old-854-cxld-20120621.tgz
col01371.tgz                              old-931-cxld-20060513.tgz
deleteme_ubuntu-10.10-x86_20111205        old-col00123-mdfile-noarchive-20120417.gz
jail10_old                                old-col00147-vnfile-cxld-20120828.gz
jail14_rsync_old                          old-col00419-dump-cxld-20120224.gz
jail15_old                                old-col01098-vnfile-cxld-20120827.gz
jail3_old                                 old-col01278-dump-cxld-20120822
jail4_old                                 old-col01517-dump-cxld-20120828
jail5_old                                 old-col01669-dump-cxld-20120203.gz
old-1009-cxld-20120608.tgz                old-col01687-dump-cxld-20120909
old-1012-cxld-20120411.tgz                old-col01790-dump-cxld-20120828
old-1052-cxld-20120721.tgz                old-col01812-dump-cxld-20120820
old-10631-cxld-20120622.tgz               old-col01938-mdfile-cxld-20120619.gz
old-10632-cxld-20120622.tgz               old-col02095-mdfile-noarchive-20120523.gz
old-10633-cxld-20120622.tgz               olddebian-3.0-v15-20110610.tgz
old-1236-cxld-20120621.tgz                oldmod_frontpage-deb30-v15-20110610.tgz
old-1381-cxld-20120404.tgz                oldmod_perl-deb30-v15-20110610.tgz
old-1422-cxld-20120721.tgz                oldmod_ssl-deb30-v15-20110610.tgz
old-14681-cxld-20120619.tgz               oldmysql-deb30-v15-20110610.tgz
old-1544-cxld-20120626.tgz                oldproftpd-deb30-v15-20110610.tgz
old-18351-cxld-20120605.tgz               old_virt14
old-1853-cxld-20120910.tgz                old_virt18
old-1963-cxld-20120206.tgz                oldwebmin-deb30-v15-20110610.tgz
old-1967-cxld-20120605.tgz                suse.virt11.20120421.tgz
old-1981-noarchive-20120729.tgz           virt11
old-2030-migrated-noarchive-20120727.tgz  virt12_old
old-2037-cxld-20120716.tgz                virt13_old
old-2065-cxld-20120727.tgz                virt16_old
old-2068-cxld-20120424.tgz                virt4_old
old-2085-cxld-20120531.tgz                virt5_old
old-364-cxld-20120904.tgz                 virt6_old
old-446-cxld-20120512.tgz                 virt7_old
old-613-cxld-20120601.tgz                 virt8_old
[root@backup1 /data/deprecated]#

virtX_old and jailX_old are permanently archived, so ignore those as well as anything else marked not to delete or otherwise suspicious. Likewise, probably a good idea to try to hang onto oldTEMPLATE.gz as long as we can as well. Most of the stuff we want to delete is dated when it was deprecated, making this easy. So to remove files from 6 mos ago (running this in Oct):

[root@backup1 /data/deprecated]# ls old*201204*
old-1012-cxld-20120411.tgz  old-2068-cxld-20120424.tgz
old-1381-cxld-20120404.tgz  old-col00123-mdfile-noarchive-20120417.gz
[root@backup1 /data/deprecated]# rm old*201204*

Monthly RAID checks

Every month we check the health of and verfy the parity on all our RAID-based systems. To facilitate this, we've created a simple script to start the process:

sh /root/verify.sh

Adaptec-based servers

Here's some sample output:

mail /usr/local/www/scripts# sh /root/verify.sh
---------------------------------------------------------------------------------------------

Adaptec SCSI RAID Controller Command Line Interface
Copyright 1998-2002 Adaptec, Inc. All rights reserved
---------------------------------------------------------------------------------------------


CLI > open aac0
Executing: open "aac0"

AAC0> container list /f
Executing: container list /full=TRUE
Num          Total  Oth Chunk          Scsi   Partition
Creation        System
Label Type   Size   Ctr Size   Usage   B:ID:L Offset:Size   State   RO Lk Task    Done%  Ent
Date   Time      Files
----- ------ ------ --- ------ ------- ------ ------------- ------- -- -- ------- ------ ---
------ -------- ------
 0    Mirror 33.9GB            Open    0:01:0 64.0KB:33.9GB Normal                        0
071002 05:39:32
 /dev/aacd0           mirror0          0:00:0 64.0KB:33.9GB Normal                        1
071002 05:39:32

 1    Mirror 33.9GB            Open    0:02:0 64.0KB:33.9GB Normal                        0
071002 05:39:50
 /dev/aacd1           mirror1          0:03:0 64.0KB:33.9GB Normal                        1
071002 05:39:50


AAC0> disk list /f
Executing: disk list /full=TRUE

B:ID:L  Device Type     Removable media  Vendor-ID Product-ID        Rev   Blocks    Bytes/Bl
ock Usage            Shared Rate
------  --------------  ---------------  --------- ----------------  ----- --------- --------
--- ---------------- ------ ----
0:00:0   Disk            N                FUJITSU   MAJ3364MC         3702  71390320  512
     Initialized      NO     160
0:01:0   Disk            N                FUJITSU   MAJ3364MC         3702  71390320  512
     Initialized      NO     160
0:02:0   Disk            N                FUJITSU   MAJ3364MC         3702  71390320  512
     Initialized      NO     160
0:03:0   Disk            N                FUJITSU   MAJ3364MC         3702  71390320  512
     Initialized      NO     160

AAC0> disk show smart
Executing: disk show smart

        Smart    Method of         Enable
        Capable  Informational     Exception  Performance  Error
B:ID:L  Device   Exceptions(MRIE)  Control    Enabled      Count
------  -------  ----------------  ---------  -----------  ------
0:00:0     Y            6             Y           N             0
0:01:0     Y            6             Y           N             0
0:02:0     Y            6             Y           N             0
0:03:0     Y            6             Y           N             0
0:06:0     N

AAC0> task list
Executing: task list

Controller Tasks

TaskId Function  Done%  Container State Specific1 Specific2
------ -------- ------- --------- ----- --------- ---------

No tasks currently running on controller

AAC0> dia sh hi
Executing: diagnostic show history
No switches specified, defaulting to "/current".



 *** HISTORY BUFFER FROM CURRENT CONTROLLER RUN ***

[00]: GetDiskLogEntry: container - 1, entry return 0
[01]: Container 1 started SCRUB task
[02]: Starting Mirror:1 scrub
[03]: Master disk: 2, start sector: 128, sector count = 71286784
[04]: Slave  disk: 3, start sector: 128, sector count = 71286784
[05]: UpdateDiskLogIndex - Set   - container 0, index 1
[06]: GetDiskLogEntry: container - 0, entry return 1
[07]: Container 0 started SCRUB task
[08]: Starting Mirror:0 scrub
[09]: Master disk: 1, start sector: 128, sector count = 71286784
[10]: Slave  disk: 0, start sector: 128, sector count = 71286784
[11]: Mirror Scrub Container:1   ErrorsFound:0
[12]: Clear disk log: sector - 80, driveno 2
[13]: Clear disk log: sector - 80, driveno 3
[14]: Container 1 completed SCRUB task:
[15]: Mirror Scrub Container:0   ErrorsFound:0
[16]: Clear disk log: sector - 81, driveno 1
[17]: Clear disk log: sector - 81, driveno 0
[18]: Container 0 completed SCRUB task:
[19]: UpdateDiskLogIndex - Set   - container 0, index 0
[20]: GetDiskLogEntry: container - 0, entry return 0
[21]: Container 0 started SCRUB task
[22]: Starting Mirror:0 scrub
[23]: Master disk: 1, start sector: 128, sector count = 71286784
[24]: Slave  disk: 0, start sector: 128, sector count = 71286784
[25]: UpdateDiskLogIndex - Set   - container 1, index 1
[26]: GetDiskLogEntry: container - 1, entry return 1
[27]: Container 1 started SCRUB task
[28]: Starting Mirror:1 scrub
[29]: Master disk: 2, start sector: 128, sector count = 71286784
[30]: Slave  disk: 3, start sector: 128, sector count = 71286784
[31]: Mirror Scrub Container:1   ErrorsFound:0
[32]: Clear disk log: sector - 81, driveno 2
[33]: Clear disk log: sector - 81, driveno 3
[34]: Container 1 completed SCRUB task:
[35]: Mirror Scrub Container:0   ErrorsFound:0
[36]: Clear disk log: sector - 80, driveno 1
[37]: Clear disk log: sector - 80, driveno 0
[38]: Container 0 completed SCRUB task:
[39]: UpdateDiskLogIndex - Set   - container 0, index 0
[40]: GetDiskLogEntry: container - 0, entry return 0
[41]: Container 0 started SCRUB task
[42]: Starting Mirror:0 scrub
[43]: Master disk: 1, start sector: 128, sector count = 71286784
[44]: Slave  disk: 0, start sector: 128, sector count = 71286784
[45]: UpdateDiskLogIndex - Set   - container 1, index 1
[46]: GetDiskLogEntry: container - 1, entry return 1
[47]: Container 1 started SCRUB task
[48]: Starting Mirror:1 scrub
[49]: Master disk: 2, start sector: 128, sector count = 71286784
[50]: Slave  disk: 3, start sector: 128, sector count = 71286784
[51]: Mirror Scrub Container:1   ErrorsFound:0
[52]: Clear disk log: sector - 81, driveno 2
[53]: Clear disk log: sector - 81, driveno 3
[54]: Container 1 completed SCRUB task:
[55]: Mirror Scrub Container:0   ErrorsFound:0
[56]: Clear disk log: sector - 80, driveno 1
[57]: Clear disk log: sector - 80, driveno 0
[58]: Container 0 completed SCRUB task:
[59]: UpdateDiskLogIndex - Set   - container 0, index 0
[60]: GetDiskLogEntry: container - 0, entry return 0
[61]: Container 0 started SCRUB task
[62]: Starting Mirror:0 scrub
[63]: Master disk: 1, start sector: 128, sector count = 71286784
[64]: Slave  disk: 0, start sector: 128, sector count = 71286784
[65]: UpdateDiskLogIndex - Set   - container 1, index 1
[66]: GetDiskLogEntry: container - 1, entry return 1
[67]: Container 1 started SCRUB task
[68]: Starting Mirror:1 scrub
[69]: Master disk: 2, start sector: 128, sector count = 71286784
[70]: Slave  disk: 3, start sector: 128, sector count = 71286784
[71]: Mirror Scrub Container:1   ErrorsFound:0
[72]: Clear disk log: sector - 81, driveno 2
[73]: Clear disk log: sector - 81, driveno 3
[74]: Container 1 completed SCRUB task:
[75]: Mirror Scrub Container:0   ErrorsFound:0
[76]: Clear disk log: sector - 80, driveno 1
[77]: Clear disk log: sector - 80, driveno 0
[78]: Container 0 completed SCRUB task:
[79]: UpdateDiskLogIndex - Set   - container 0, index 0
[80]: GetDiskLogEntry: container - 0, entry return 0
[81]: Container 0 started SCRUB task
[82]: Starting Mirror:0 scrub
[83]: Master disk: 1, start sector: 128, sector count = 71286784
[84]: Slave  disk: 0, start sector: 128, sector count = 71286784
[85]: UpdateDiskLogIndex - Set   - container 1, index 1
[86]: GetDiskLogEntry: container - 1, entry return 1
[87]: Container 1 started SCRUB task
[88]: Starting Mirror:1 scrub
[89]: Master disk: 2, start sector: 128, sector count = 71286784
[90]: Slave  disk: 3, start sector: 128, sector count = 71286784
[91]: Mirror Scrub Container:1   ErrorsFound:0
[92]: Clear disk log: sector - 81, driveno 2
[93]: Clear disk log: sector - 81, driveno 3
[94]: Container 1 completed SCRUB task:
[95]: Mirror Scrub Container:0   ErrorsFound:0
[96]: Clear disk log: sector - 80, driveno 1
[97]: Clear disk log: sector - 80, driveno 0
[98]: Container 0 completed SCRUB task:
[99]:

========================
History Output Complete.

AAC0>
AAC0> exit
Executing: exit

press enter when ready to run verify                                                 <INS>
---------------------------------------------------------------------------------------------

Adaptec SCSI RAID Controller Command Line Interface
Copyright 1998-2002 Adaptec, Inc. All rights reserved
---------------------------------------------------------------------------------------------


CLI > open aac0
Executing: open "aac0"

AAC0> contai scr 0
Executing: container scrub 0

AAC0> contai scr 1
Executing: container scrub 1

AAC0> exit
Executing: exit

when done run:                                                                       

aaccli
open aac0
dia sh hi
c


Nov  1 10:32:46 mail /kernel: aac0: **Monitor** Container 0 started SCRUB task
Nov  1 10:32:47 mail /kernel: aac0: **Monitor** Container 1 started SCRUB task

Here's an analysis of what we're seeing and what we're looking for:

AAC0> container list /f
Executing: container list /full=TRUE
Num          Total  Oth Chunk          Scsi   Partition
Creation        System
Label Type   Size   Ctr Size   Usage   B:ID:L Offset:Size   State   RO Lk Task    Done%  Ent
Date   Time      Files
----- ------ ------ --- ------ ------- ------ ------------- ------- -- -- ------- ------ ---
------ -------- ------
 0    Mirror 33.9GB            Open    0:01:0 64.0KB:33.9GB Normal                        0
071002 05:39:32
 /dev/aacd0           mirror0          0:00:0 64.0KB:33.9GB Normal                        1
071002 05:39:32

 1    Mirror 33.9GB            Open    0:02:0 64.0KB:33.9GB Normal                        0
071002 05:39:50
 /dev/aacd1           mirror1          0:03:0 64.0KB:33.9GB Normal                        1
071002 05:39:50

This is showing you the health of the arrays. You're looking for Normal under the State column, and the absence of a ! in the sector size - sometimes, you'll see this:

64.0KB!33.9GB

That indicates a problem.

AAC0> disk show smart
Executing: disk show smart

        Smart    Method of         Enable
        Capable  Informational     Exception  Performance  Error
B:ID:L  Device   Exceptions(MRIE)  Control    Enabled      Count
------  -------  ----------------  ---------  -----------  ------
0:00:0     Y            6             Y           N             0
0:01:0     Y            6             Y           N             0
0:02:0     Y            6             Y           N             0
0:03:0     Y            6             Y           N             0
0:06:0     N

This shows you a SMART report output. Looking for values in the Error Count column.

AAC0> task list
Executing: task list

Controller Tasks

TaskId Function  Done%  Container State Specific1 Specific2
------ -------- ------- --------- ----- --------- ---------

No tasks currently running on controller

Look for absence of tasks running- a bad thing would be to see a rebuild or verify running when you didn't initiate it.

With the history output, you're looking for any anomalies or events since the last time a verify was run. If you see a drive with lots of problems, you may want to take backups before allowing the verify to run since it could replicate errors onto the good drive.

After you see the history output, it will prompt you to press enter to run the verify. If you're happy with all the output you're seeing- mirror is healthy, history looks good, it's safe to proceed. Otherwise ^C to exit. After hitting enter it will start the verify and start to tail the messages log so you can easily see when the verify is complete. At which point you will run the provided output to followup and view the history to see the results of the verify. So, putting it all together, after hitting enter to start the verify, you'll see:

---------------------------------------------------------------------------------------------

Adaptec SCSI RAID Controller Command Line Interface
Copyright 1998-2002 Adaptec, Inc. All rights reserved
---------------------------------------------------------------------------------------------


CLI > open aac0
Executing: open "aac0"

AAC0> contai scr 0
Executing: container scrub 0

AAC0> contai scr 1
Executing: container scrub 1

AAC0> exit
Executing: exit

when done run:                                                                       

aaccli
open aac0
dia sh hi
c


Nov  1 10:32:46 mail /kernel: aac0: **Monitor** Container 0 started SCRUB task
Nov  1 10:32:47 mail /kernel: aac0: **Monitor** Container 1 started SCRUB task

When the scrub(s) (verify) are complete - if the server has multiple logical drives, it will run both in parallel - you should run:

aaccli
open aac0
dia sh hi
c

Which will show you the diagnostic history, you're looking for the results of the most recent scrub:

[100]: Mirror Scrub Container:1   ErrorsFound:0
[101]: Clear disk log: sector - 81, driveno 2
[102]: Clear disk log: sector - 81, driveno 3
[103]: Container 1 completed SCRUB task:
[104]: Mirror Scrub Container:0   ErrorsFound:0
[105]: Clear disk log: sector - 80, driveno 1
[106]: Clear disk log: sector - 80, driveno 0
[107]: Container 0 completed SCRUB task:

If you see:

[104]: Mirror Scrub Container:0   ErrorsFound:5

You'll want to rerun the verify on that drive till it shows 0, or perhaps replace the drive- you should be able to see from the output which drive had the problem.

Depending on the size and how busy the drive is, the verify can take anywhere from an hour to the better part of a day.

You will notice that the diagnostic history is not shown on our modern adaptec cards (i.e. any adaptec card not in a Dell 2450). The reason for this is the history is never cleared, so there's simply too much data to show and it just crashes the CLI. So, don't bother trying to see it...which does make it hard to see if there are problems going on, so you just need to watch the scrub to see it goes to 100%. You will also notice that on some servers there's no tail of messages. Again, this is cause no data is shown there about the completion of the scrub. The thing to do here is to go into the CLI and continue to show tasks to monitor scrub progress.

See Adaptec RAID CLI Reference for more details on how to use the CLI

DELL (LSI-based) SAS controllers

Here's what the output looks like when running verify.sh on a LSI-based card:

jail2 /mnt/data2# sh /root/verify.sh

Adapter #0

Enclosure Device ID: 32
Slot Number: 0
Device Id: 0
Sequence Number: 2
Media Error Count: 0
Other Error Count: 0
Predictive Failure Count: 0
Last Predictive Failure Event Seq Number: 0
PD Type: SAS
Raw Size: 140014MB [0x11177328 Sectors]
Non Coerced Size: 139502MB [0x11077328 Sectors]
Coerced Size: 139392MB [0x11040000 Sectors]
Firmware state: Online
SAS Address(0): 0x500000e018396142
SAS Address(1): 0x0
Connected Port Number: 0(path0)
Inquiry Data: FUJITSU MAX3147RC       D207DQ03P7A0DESN
Foreign State: None
Media Type: Hard Disk Device
Device Speed: Unknown
Link Speed: Unknown

Enclosure Device ID: 32
Slot Number: 1
Device Id: 1
Sequence Number: 2
Media Error Count: 0
Other Error Count: 0
Predictive Failure Count: 0
Last Predictive Failure Event Seq Number: 0
PD Type: SAS
Raw Size: 140014MB [0x11177328 Sectors]
Non Coerced Size: 139502MB [0x11077328 Sectors]
Coerced Size: 139392MB [0x11040000 Sectors]
Firmware state: Online
SAS Address(0): 0x500000e018395db2
SAS Address(1): 0x0
Connected Port Number: 1(path0)
Inquiry Data: FUJITSU MAX3147RC       D207DQ03P7A0DERV
Foreign State: None
Media Type: Hard Disk Device
Device Speed: Unknown
Link Speed: Unknown

Enclosure Device ID: 32
Slot Number: 2
Device Id: 2
Sequence Number: 2
Media Error Count: 0
Other Error Count: 0
Predictive Failure Count: 0
Last Predictive Failure Event Seq Number: 0
PD Type: SAS
Raw Size: 286102MB [0x22ecb25c Sectors]
Non Coerced Size: 285590MB [0x22dcb25c Sectors]
Coerced Size: 285568MB [0x22dc0000 Sectors]
Firmware state: Online
SAS Address(0): 0x5000c50006eece89
SAS Address(1): 0x0
Connected Port Number: 2(path0)
Inquiry Data: SEAGATE ST3300555SS     T2113LM4BFBZ
Foreign State: None
Media Type: Hard Disk Device
Device Speed: Unknown
Link Speed: Unknown

Enclosure Device ID: 32
Slot Number: 3
Device Id: 3
Sequence Number: 2
Media Error Count: 0
Other Error Count: 0
Predictive Failure Count: 0
Last Predictive Failure Event Seq Number: 0
PD Type: SAS
Raw Size: 286102MB [0x22ecb25c Sectors]
Non Coerced Size: 285590MB [0x22dcb25c Sectors]
Coerced Size: 285568MB [0x22dc0000 Sectors]
Firmware state: Online
SAS Address(0): 0x5000c50006eee035
SAS Address(1): 0x0
Connected Port Number: 3(path0)
Inquiry Data: SEAGATE ST3300555SS     T2113LM4BGF7
Foreign State: None
Media Type: Hard Disk Device
Device Speed: Unknown
Link Speed: Unknown

Enclosure Device ID: 32
Slot Number: 4
Device Id: 4
Sequence Number: 2
Media Error Count: 0
Other Error Count: 0
Predictive Failure Count: 0
Last Predictive Failure Event Seq Number: 0
PD Type: SAS
Raw Size: 286102MB [0x22ecb25c Sectors]
Non Coerced Size: 285590MB [0x22dcb25c Sectors]
Coerced Size: 285568MB [0x22dc0000 Sectors]
Firmware state: Online
SAS Address(0): 0x5000c50004bd7ea5
SAS Address(1): 0x0
Connected Port Number: 4(path0)
Inquiry Data: SEAGATE ST3300656SS     HS093QP0G8SW
Foreign State: None
Media Type: Hard Disk Device
Device Speed: Unknown
Link Speed: Unknown

Enclosure Device ID: 32
Slot Number: 5
Device Id: 5
Sequence Number: 2
Media Error Count: 0
Other Error Count: 0
Predictive Failure Count: 0
Last Predictive Failure Event Seq Number: 0
PD Type: SAS
Raw Size: 286102MB [0x22ecb25c Sectors]
Non Coerced Size: 285590MB [0x22dcb25c Sectors]
Coerced Size: 285568MB [0x22dc0000 Sectors]
Firmware state: Online
SAS Address(0): 0x500000e01f1c4112
SAS Address(1): 0x0
Connected Port Number: 5(path0)
Inquiry Data: FUJITSU MBA3300RC       D306BJ15P9201W06
Foreign State: None
Media Type: Hard Disk Device
Device Speed: Unknown
Link Speed: Unknown


Exit Code: 0x00


Adapter 0 -- Virtual Drive Information:
Virtual Disk: 0 (Target Id: 0)
Name:
RAID Level: Primary-1, Secondary-0, RAID Level Qualifier-0
Size:139392MB
State: Optimal
Stripe Size: 64kB
Number Of Drives:2
Span Depth:1
Default Cache Policy: WriteBack, ReadAdaptive, Direct, No Write Cache if Bad BBU
Current Cache Policy: WriteBack, ReadAdaptive, Direct, No Write Cache if Bad BBU
Access Policy: Read/Write
Disk Cache Policy: Disk's Default

Exit Code: 0x00


Adapter 0 -- Virtual Drive Information:
Virtual Disk: 1 (Target Id: 1)
Name:MIRROR1
RAID Level: Primary-1, Secondary-0, RAID Level Qualifier-0
Size:285568MB
State: Optimal
Stripe Size: 64kB
Number Of Drives:2
Span Depth:1
Default Cache Policy: WriteBack, ReadAdaptive, Direct, No Write Cache if Bad BBU
Current Cache Policy: WriteBack, ReadAdaptive, Direct, No Write Cache if Bad BBU
Access Policy: Read/Write
Disk Cache Policy: Disk's Default

Exit Code: 0x00


Adapter 0 -- Virtual Drive Information:
Virtual Disk: 2 (Target Id: 2)
Name:MIRROR2
RAID Level: Primary-1, Secondary-0, RAID Level Qualifier-0
Size:285568MB
State: Optimal
Stripe Size: 64kB
Number Of Drives:2
Span Depth:1
Default Cache Policy: WriteBack, ReadAdaptive, Direct, No Write Cache if Bad BBU
Current Cache Policy: WriteBack, ReadAdaptive, Direct, No Write Cache if Bad BBU
Access Policy: Read/Write
Disk Cache Policy: Disk's Default

Exit Code: 0x00
Battery FRU     : N/A
Battery Warning                  : Enabled
Memory Correctable Errors   : 0
Memory Uncorrectable Errors : 0
BBU             : Present
BBU                             : Yes
Cache When BBU Bad               : Disabled
press enter when ready to run verify

Before pressing enter, here's what we're looking for:

Enclosure Device ID: 32
Slot Number: 0
Device Id: 0
Sequence Number: 2
Media Error Count: 0
Other Error Count: 0
Predictive Failure Count: 0
Last Predictive Failure Event Seq Number: 0
PD Type: SAS
Raw Size: 140014MB [0x11177328 Sectors]
Non Coerced Size: 139502MB [0x11077328 Sectors]
Coerced Size: 139392MB [0x11040000 Sectors]
Firmware state: Online
SAS Address(0): 0x500000e018396142
SAS Address(1): 0x0
Connected Port Number: 0(path0)
Inquiry Data: FUJITSU MAX3147RC       D207DQ03P7A0DESN
Foreign State: None
Media Type: Hard Disk Device
Device Speed: Unknown
Link Speed: Unknown

This is the output shown for each physical drive in the system. We're looking to confirm it's Firmware state is Online, and Media Error Count, Other Error Count, and Predictive Failure Count are all zero (or near zero).

Adapter 0 -- Virtual Drive Information:
Virtual Disk: 1 (Target Id: 1)
Name:MIRROR1
RAID Level: Primary-1, Secondary-0, RAID Level Qualifier-0
Size:285568MB
State: Optimal
Stripe Size: 64kB
Number Of Drives:2
Span Depth:1
Default Cache Policy: WriteBack, ReadAdaptive, Direct, No Write Cache if Bad BBU
Current Cache Policy: WriteBack, ReadAdaptive, Direct, No Write Cache if Bad BBU
Access Policy: Read/Write
Disk Cache Policy: Disk's Default

This is the output for each logical drive. We're looking for State Optimal. Also confirm Default Cache Policy: WriteBack, ReadAdaptive, Direct, No Write Cache if Bad BBU

Exit Code: 0x00
Battery FRU     : N/A
Battery Warning                  : Enabled
Memory Correctable Errors   : 0
Memory Uncorrectable Errors : 0
BBU             : Present
BBU                             : Yes
Cache When BBU Bad               : Disabled

Confirm that the battery is present and error-free.

If all that checks out, you're ready to proceed with the verify. After pressing enter, the verify is started and here's what you see:

Start Check Consistency on Virtual Drive 0 (target id: 0) Success.

Exit Code: 0x00

Start Check Consistency on Virtual Drive 1 (target id: 1) Success.

Exit Code: 0x00

Start Check Consistency on Virtual Drive 2 (target id: 2) Success.

Exit Code: 0x00

  Check Consistency

 Progress of Virtual Drives...

  Virtual Drive #              Percent Complete                       Time Elps
          0         °°°°°°°°°°°°°°°°°°°°°°°00 %°°°°°°°°°°°°°°°°°°°°°°° 00:00:03
          1         °°°°°°°°°°°°°°°°°°°°°°°00 %°°°°°°°°°°°°°°°°°°°°°°° 00:00:02
          2         °°°°°°°°°°°°°°°°°°°°°°°00 %°°°°°°°°°°°°°°°°°°°°°°° 00:00:01

    Press <ESC> key to quit...

The progress for each drive is displayed until all drives have completed the verify. We just want to make sure that each drive goes to completion. No followup is needed...though there probably is a log or history where we can get more info.

You will notice that jail7 does not run a verify- that's on purpose. The last time we tried this it crashed the system. So, this must be run from the BIOS (take the system offline for a couple hours).

See LSI RAID CLI Reference for more details on how to use the CLI

@@ Line 409: / Line 409: @@
 See [[RAIC_CLI#Adaptec|Adaptec RAID CLI Reference]] for more details on how to use the CLI
-== DELL (LSI-based) servers ==
+== DELL (LSI-based) SAS controllers ==
 Here's what the output looks like when running verify.sh on a LSI-based card:

Routine Maintenance: Difference between revisions

Revision as of 10:56, 1 November 2012

Contents

Free up space on backup1

Monthly RAID checks

Adaptec-based servers

DELL (LSI-based) SAS controllers

Navigation menu

Routine Maintenance: Difference between revisions

Revision as of 10:56, 1 November 2012

Free up space on backup1

Monthly RAID checks

Adaptec-based servers

DELL (LSI-based) SAS controllers

Navigation menu

Search