Routine Maintenance: Difference between revisions
No edit summary |
|||
Line 52: | Line 52: | ||
[root@backup1 /data/deprecated]# rm old*201204* | [root@backup1 /data/deprecated]# rm old*201204* | ||
</pre> | </pre> | ||
= Monthly RAID checks = | |||
Every month we check the health of and verfy the parity on all our RAID-based systems. | |||
To facilitate this, we've created a simple script to start the process: | |||
sh /root/verify.sh | |||
== Adaptec-based servers === | |||
Here's some sample output: | |||
<pre> | |||
mail /usr/local/www/scripts# sh /root/verify.sh | |||
--------------------------------------------------------------------------------------------- | |||
Adaptec SCSI RAID Controller Command Line Interface | |||
Copyright 1998-2002 Adaptec, Inc. All rights reserved | |||
--------------------------------------------------------------------------------------------- | |||
CLI > open aac0 | |||
Executing: open "aac0" | |||
AAC0> container list /f | |||
Executing: container list /full=TRUE | |||
Num Total Oth Chunk Scsi Partition | |||
Creation System | |||
Label Type Size Ctr Size Usage B:ID:L Offset:Size State RO Lk Task Done% Ent | |||
Date Time Files | |||
----- ------ ------ --- ------ ------- ------ ------------- ------- -- -- ------- ------ --- | |||
------ -------- ------ | |||
0 Mirror 33.9GB Open 0:01:0 64.0KB:33.9GB Normal 0 | |||
071002 05:39:32 | |||
/dev/aacd0 mirror0 0:00:0 64.0KB:33.9GB Normal 1 | |||
071002 05:39:32 | |||
1 Mirror 33.9GB Open 0:02:0 64.0KB:33.9GB Normal 0 | |||
071002 05:39:50 | |||
/dev/aacd1 mirror1 0:03:0 64.0KB:33.9GB Normal 1 | |||
071002 05:39:50 | |||
AAC0> disk list /f | |||
Executing: disk list /full=TRUE | |||
B:ID:L Device Type Removable media Vendor-ID Product-ID Rev Blocks Bytes/Bl | |||
ock Usage Shared Rate | |||
------ -------------- --------------- --------- ---------------- ----- --------- -------- | |||
--- ---------------- ------ ---- | |||
0:00:0 Disk N FUJITSU MAJ3364MC 3702 71390320 512 | |||
Initialized NO 160 | |||
0:01:0 Disk N FUJITSU MAJ3364MC 3702 71390320 512 | |||
Initialized NO 160 | |||
0:02:0 Disk N FUJITSU MAJ3364MC 3702 71390320 512 | |||
Initialized NO 160 | |||
0:03:0 Disk N FUJITSU MAJ3364MC 3702 71390320 512 | |||
Initialized NO 160 | |||
AAC0> disk show smart | |||
Executing: disk show smart | |||
Smart Method of Enable | |||
Capable Informational Exception Performance Error | |||
B:ID:L Device Exceptions(MRIE) Control Enabled Count | |||
------ ------- ---------------- --------- ----------- ------ | |||
0:00:0 Y 6 Y N 0 | |||
0:01:0 Y 6 Y N 0 | |||
0:02:0 Y 6 Y N 0 | |||
0:03:0 Y 6 Y N 0 | |||
0:06:0 N | |||
AAC0> task list | |||
Executing: task list | |||
Controller Tasks | |||
TaskId Function Done% Container State Specific1 Specific2 | |||
------ -------- ------- --------- ----- --------- --------- | |||
No tasks currently running on controller | |||
AAC0> dia sh hi | |||
Executing: diagnostic show history | |||
No switches specified, defaulting to "/current". | |||
*** HISTORY BUFFER FROM CURRENT CONTROLLER RUN *** | |||
[00]: GetDiskLogEntry: container - 1, entry return 0 | |||
[01]: Container 1 started SCRUB task | |||
[02]: Starting Mirror:1 scrub | |||
[03]: Master disk: 2, start sector: 128, sector count = 71286784 | |||
[04]: Slave disk: 3, start sector: 128, sector count = 71286784 | |||
[05]: UpdateDiskLogIndex - Set - container 0, index 1 | |||
[06]: GetDiskLogEntry: container - 0, entry return 1 | |||
[07]: Container 0 started SCRUB task | |||
[08]: Starting Mirror:0 scrub | |||
[09]: Master disk: 1, start sector: 128, sector count = 71286784 | |||
[10]: Slave disk: 0, start sector: 128, sector count = 71286784 | |||
[11]: Mirror Scrub Container:1 ErrorsFound:0 | |||
[12]: Clear disk log: sector - 80, driveno 2 | |||
[13]: Clear disk log: sector - 80, driveno 3 | |||
[14]: Container 1 completed SCRUB task: | |||
[15]: Mirror Scrub Container:0 ErrorsFound:0 | |||
[16]: Clear disk log: sector - 81, driveno 1 | |||
[17]: Clear disk log: sector - 81, driveno 0 | |||
[18]: Container 0 completed SCRUB task: | |||
[19]: UpdateDiskLogIndex - Set - container 0, index 0 | |||
[20]: GetDiskLogEntry: container - 0, entry return 0 | |||
[21]: Container 0 started SCRUB task | |||
[22]: Starting Mirror:0 scrub | |||
[23]: Master disk: 1, start sector: 128, sector count = 71286784 | |||
[24]: Slave disk: 0, start sector: 128, sector count = 71286784 | |||
[25]: UpdateDiskLogIndex - Set - container 1, index 1 | |||
[26]: GetDiskLogEntry: container - 1, entry return 1 | |||
[27]: Container 1 started SCRUB task | |||
[28]: Starting Mirror:1 scrub | |||
[29]: Master disk: 2, start sector: 128, sector count = 71286784 | |||
[30]: Slave disk: 3, start sector: 128, sector count = 71286784 | |||
[31]: Mirror Scrub Container:1 ErrorsFound:0 | |||
[32]: Clear disk log: sector - 81, driveno 2 | |||
[33]: Clear disk log: sector - 81, driveno 3 | |||
[34]: Container 1 completed SCRUB task: | |||
[35]: Mirror Scrub Container:0 ErrorsFound:0 | |||
[36]: Clear disk log: sector - 80, driveno 1 | |||
[37]: Clear disk log: sector - 80, driveno 0 | |||
[38]: Container 0 completed SCRUB task: | |||
[39]: UpdateDiskLogIndex - Set - container 0, index 0 | |||
[40]: GetDiskLogEntry: container - 0, entry return 0 | |||
[41]: Container 0 started SCRUB task | |||
[42]: Starting Mirror:0 scrub | |||
[43]: Master disk: 1, start sector: 128, sector count = 71286784 | |||
[44]: Slave disk: 0, start sector: 128, sector count = 71286784 | |||
[45]: UpdateDiskLogIndex - Set - container 1, index 1 | |||
[46]: GetDiskLogEntry: container - 1, entry return 1 | |||
[47]: Container 1 started SCRUB task | |||
[48]: Starting Mirror:1 scrub | |||
[49]: Master disk: 2, start sector: 128, sector count = 71286784 | |||
[50]: Slave disk: 3, start sector: 128, sector count = 71286784 | |||
[51]: Mirror Scrub Container:1 ErrorsFound:0 | |||
[52]: Clear disk log: sector - 81, driveno 2 | |||
[53]: Clear disk log: sector - 81, driveno 3 | |||
[54]: Container 1 completed SCRUB task: | |||
[55]: Mirror Scrub Container:0 ErrorsFound:0 | |||
[56]: Clear disk log: sector - 80, driveno 1 | |||
[57]: Clear disk log: sector - 80, driveno 0 | |||
[58]: Container 0 completed SCRUB task: | |||
[59]: UpdateDiskLogIndex - Set - container 0, index 0 | |||
[60]: GetDiskLogEntry: container - 0, entry return 0 | |||
[61]: Container 0 started SCRUB task | |||
[62]: Starting Mirror:0 scrub | |||
[63]: Master disk: 1, start sector: 128, sector count = 71286784 | |||
[64]: Slave disk: 0, start sector: 128, sector count = 71286784 | |||
[65]: UpdateDiskLogIndex - Set - container 1, index 1 | |||
[66]: GetDiskLogEntry: container - 1, entry return 1 | |||
[67]: Container 1 started SCRUB task | |||
[68]: Starting Mirror:1 scrub | |||
[69]: Master disk: 2, start sector: 128, sector count = 71286784 | |||
[70]: Slave disk: 3, start sector: 128, sector count = 71286784 | |||
[71]: Mirror Scrub Container:1 ErrorsFound:0 | |||
[72]: Clear disk log: sector - 81, driveno 2 | |||
[73]: Clear disk log: sector - 81, driveno 3 | |||
[74]: Container 1 completed SCRUB task: | |||
[75]: Mirror Scrub Container:0 ErrorsFound:0 | |||
[76]: Clear disk log: sector - 80, driveno 1 | |||
[77]: Clear disk log: sector - 80, driveno 0 | |||
[78]: Container 0 completed SCRUB task: | |||
[79]: UpdateDiskLogIndex - Set - container 0, index 0 | |||
[80]: GetDiskLogEntry: container - 0, entry return 0 | |||
[81]: Container 0 started SCRUB task | |||
[82]: Starting Mirror:0 scrub | |||
[83]: Master disk: 1, start sector: 128, sector count = 71286784 | |||
[84]: Slave disk: 0, start sector: 128, sector count = 71286784 | |||
[85]: UpdateDiskLogIndex - Set - container 1, index 1 | |||
[86]: GetDiskLogEntry: container - 1, entry return 1 | |||
[87]: Container 1 started SCRUB task | |||
[88]: Starting Mirror:1 scrub | |||
[89]: Master disk: 2, start sector: 128, sector count = 71286784 | |||
[90]: Slave disk: 3, start sector: 128, sector count = 71286784 | |||
[91]: Mirror Scrub Container:1 ErrorsFound:0 | |||
[92]: Clear disk log: sector - 81, driveno 2 | |||
[93]: Clear disk log: sector - 81, driveno 3 | |||
[94]: Container 1 completed SCRUB task: | |||
[95]: Mirror Scrub Container:0 ErrorsFound:0 | |||
[96]: Clear disk log: sector - 80, driveno 1 | |||
[97]: Clear disk log: sector - 80, driveno 0 | |||
[98]: Container 0 completed SCRUB task: | |||
[99]: | |||
======================== | |||
History Output Complete. | |||
AAC0> | |||
AAC0> exit | |||
Executing: exit | |||
press enter when ready to run verify <INS> | |||
--------------------------------------------------------------------------------------------- | |||
Adaptec SCSI RAID Controller Command Line Interface | |||
Copyright 1998-2002 Adaptec, Inc. All rights reserved | |||
--------------------------------------------------------------------------------------------- | |||
CLI > open aac0 | |||
Executing: open "aac0" | |||
AAC0> contai scr 0 | |||
Executing: container scrub 0 | |||
AAC0> contai scr 1 | |||
Executing: container scrub 1 | |||
AAC0> exit | |||
Executing: exit | |||
when done run: | |||
aaccli | |||
open aac0 | |||
dia sh hi | |||
c | |||
Nov 1 10:32:46 mail /kernel: aac0: **Monitor** Container 0 started SCRUB task | |||
Nov 1 10:32:47 mail /kernel: aac0: **Monitor** Container 1 started SCRUB task | |||
</pre> | |||
Here's an analysis of what we're seeing and what we're looking for: | |||
<pre> | |||
AAC0> container list /f | |||
Executing: container list /full=TRUE | |||
Num Total Oth Chunk Scsi Partition | |||
Creation System | |||
Label Type Size Ctr Size Usage B:ID:L Offset:Size State RO Lk Task Done% Ent | |||
Date Time Files | |||
----- ------ ------ --- ------ ------- ------ ------------- ------- -- -- ------- ------ --- | |||
------ -------- ------ | |||
0 Mirror 33.9GB Open 0:01:0 64.0KB:33.9GB Normal 0 | |||
071002 05:39:32 | |||
/dev/aacd0 mirror0 0:00:0 64.0KB:33.9GB Normal 1 | |||
071002 05:39:32 | |||
1 Mirror 33.9GB Open 0:02:0 64.0KB:33.9GB Normal 0 | |||
071002 05:39:50 | |||
/dev/aacd1 mirror1 0:03:0 64.0KB:33.9GB Normal 1 | |||
071002 05:39:50 | |||
</pre> | |||
This is showing you the health of the arrays. You're looking for ''Normal'' under the State column, and the absence of a ! in the sector size - sometimes, you'll see this: | |||
64.0KB!33.9GB | |||
That indicates a problem. | |||
<pre> | |||
AAC0> disk show smart | |||
Executing: disk show smart | |||
Smart Method of Enable | |||
Capable Informational Exception Performance Error | |||
B:ID:L Device Exceptions(MRIE) Control Enabled Count | |||
------ ------- ---------------- --------- ----------- ------ | |||
0:00:0 Y 6 Y N 0 | |||
0:01:0 Y 6 Y N 0 | |||
0:02:0 Y 6 Y N 0 | |||
0:03:0 Y 6 Y N 0 | |||
0:06:0 N | |||
</pre> | |||
This shows you a SMART report output. Looking for values in the Error Count column. | |||
<pre> | |||
AAC0> task list | |||
Executing: task list | |||
Controller Tasks | |||
TaskId Function Done% Container State Specific1 Specific2 | |||
------ -------- ------- --------- ----- --------- --------- | |||
No tasks currently running on controller | |||
</pre> | |||
Look for absence of tasks running- a bad thing would be to see a rebuild or verify running when you didn't initiate it. | |||
With the history output, you're looking for any anomalies or events since the last time a verify was run. If you see a drive with lots of problems, you may want to take backups before allowing the verify to run since it could replicate errors onto the good drive. | |||
After you see the history output, it will prompt you to press enter to run the verify. If you're happy with all the output you're seeing- mirror is healthy, history looks good, it's safe to proceed. Otherwise ^C to exit. After hitting enter it will start the verify and start to tail the messages log so you can easily see when the verify is complete. At which point you will run the provided output to followup and view the history to see the results of the verify. So, putting it all together, after hitting enter to start the verify, you'll see: | |||
<pre> | |||
--------------------------------------------------------------------------------------------- | |||
Adaptec SCSI RAID Controller Command Line Interface | |||
Copyright 1998-2002 Adaptec, Inc. All rights reserved | |||
--------------------------------------------------------------------------------------------- | |||
CLI > open aac0 | |||
Executing: open "aac0" | |||
AAC0> contai scr 0 | |||
Executing: container scrub 0 | |||
AAC0> contai scr 1 | |||
Executing: container scrub 1 | |||
AAC0> exit | |||
Executing: exit | |||
when done run: | |||
aaccli | |||
open aac0 | |||
dia sh hi | |||
c | |||
Nov 1 10:32:46 mail /kernel: aac0: **Monitor** Container 0 started SCRUB task | |||
Nov 1 10:32:47 mail /kernel: aac0: **Monitor** Container 1 started SCRUB task | |||
</pre> | |||
When the scrub(s) (verify) are complete - if the server has multiple logical drives, it will run both in parallel - you should run: | |||
<pre> | |||
aaccli | |||
open aac0 | |||
dia sh hi | |||
c | |||
</pre> | |||
Which will show you the diagnostic history, you're looking for the results of the most recent scrub: | |||
<pre> | |||
[100]: Mirror Scrub Container:1 ErrorsFound:0 | |||
[101]: Clear disk log: sector - 81, driveno 2 | |||
[102]: Clear disk log: sector - 81, driveno 3 | |||
[103]: Container 1 completed SCRUB task: | |||
[104]: Mirror Scrub Container:0 ErrorsFound:0 | |||
[105]: Clear disk log: sector - 80, driveno 1 | |||
[106]: Clear disk log: sector - 80, driveno 0 | |||
[107]: Container 0 completed SCRUB task: | |||
</pre> | |||
If you see: | |||
[104]: Mirror Scrub Container:0 ErrorsFound:5 | |||
You'll want to rerun the verify on that drive till it shows 0, or perhaps replace the drive- you should be able to see from the output which drive had the problem. | |||
See Adaptec RAID CLI Reference for more details on how to use the CLI |
Revision as of 11:31, 1 November 2012
Free up space on backup1
backup1 is our primary customer backup system. As usage grows over time, it needs to be regularly purged of old files. The easiest way to do this is by removing deprecated files. These mostly consist of cancelled customers or temporary dump/storage files (created during dump/restores). Our standard policy is to hang onto cancelled customers for 6mos after which we remove their files (as far as customers know their data is purged immediately, but we hang onto it just in case.. and in some cases we cancel a server due to non payment so this makes it easy to restore their system). To find files to remove:
[root@backup1 ~]# cd /data/deprecated/ [root@backup1 /data/deprecated]# ls 2101-migrated-20120317.tgz old-683-cxld-20121021.tgz 69.55.230.2-wwwbackup old-744-cxld-20120708.tgz 991-DONTDELETE.tgz old-809-cxld-20120609.tgz archive-col02050-mdfile-cxld-20120409.gz old-854-cxld-20120621.tgz col01371.tgz old-931-cxld-20060513.tgz deleteme_ubuntu-10.10-x86_20111205 old-col00123-mdfile-noarchive-20120417.gz jail10_old old-col00147-vnfile-cxld-20120828.gz jail14_rsync_old old-col00419-dump-cxld-20120224.gz jail15_old old-col01098-vnfile-cxld-20120827.gz jail3_old old-col01278-dump-cxld-20120822 jail4_old old-col01517-dump-cxld-20120828 jail5_old old-col01669-dump-cxld-20120203.gz old-1009-cxld-20120608.tgz old-col01687-dump-cxld-20120909 old-1012-cxld-20120411.tgz old-col01790-dump-cxld-20120828 old-1052-cxld-20120721.tgz old-col01812-dump-cxld-20120820 old-10631-cxld-20120622.tgz old-col01938-mdfile-cxld-20120619.gz old-10632-cxld-20120622.tgz old-col02095-mdfile-noarchive-20120523.gz old-10633-cxld-20120622.tgz olddebian-3.0-v15-20110610.tgz old-1236-cxld-20120621.tgz oldmod_frontpage-deb30-v15-20110610.tgz old-1381-cxld-20120404.tgz oldmod_perl-deb30-v15-20110610.tgz old-1422-cxld-20120721.tgz oldmod_ssl-deb30-v15-20110610.tgz old-14681-cxld-20120619.tgz oldmysql-deb30-v15-20110610.tgz old-1544-cxld-20120626.tgz oldproftpd-deb30-v15-20110610.tgz old-18351-cxld-20120605.tgz old_virt14 old-1853-cxld-20120910.tgz old_virt18 old-1963-cxld-20120206.tgz oldwebmin-deb30-v15-20110610.tgz old-1967-cxld-20120605.tgz suse.virt11.20120421.tgz old-1981-noarchive-20120729.tgz virt11 old-2030-migrated-noarchive-20120727.tgz virt12_old old-2037-cxld-20120716.tgz virt13_old old-2065-cxld-20120727.tgz virt16_old old-2068-cxld-20120424.tgz virt4_old old-2085-cxld-20120531.tgz virt5_old old-364-cxld-20120904.tgz virt6_old old-446-cxld-20120512.tgz virt7_old old-613-cxld-20120601.tgz virt8_old [root@backup1 /data/deprecated]#
virtX_old and jailX_old are permanently archived, so ignore those as well as anything else marked not to delete or otherwise suspicious. Likewise, probably a good idea to try to hang onto oldTEMPLATE.gz as long as we can as well. Most of the stuff we want to delete is dated when it was deprecated, making this easy. So to remove files from 6 mos ago (running this in Oct):
[root@backup1 /data/deprecated]# ls old*201204* old-1012-cxld-20120411.tgz old-2068-cxld-20120424.tgz old-1381-cxld-20120404.tgz old-col00123-mdfile-noarchive-20120417.gz [root@backup1 /data/deprecated]# rm old*201204*
Monthly RAID checks
Every month we check the health of and verfy the parity on all our RAID-based systems. To facilitate this, we've created a simple script to start the process:
sh /root/verify.sh
Adaptec-based servers =
Here's some sample output:
mail /usr/local/www/scripts# sh /root/verify.sh --------------------------------------------------------------------------------------------- Adaptec SCSI RAID Controller Command Line Interface Copyright 1998-2002 Adaptec, Inc. All rights reserved --------------------------------------------------------------------------------------------- CLI > open aac0 Executing: open "aac0" AAC0> container list /f Executing: container list /full=TRUE Num Total Oth Chunk Scsi Partition Creation System Label Type Size Ctr Size Usage B:ID:L Offset:Size State RO Lk Task Done% Ent Date Time Files ----- ------ ------ --- ------ ------- ------ ------------- ------- -- -- ------- ------ --- ------ -------- ------ 0 Mirror 33.9GB Open 0:01:0 64.0KB:33.9GB Normal 0 071002 05:39:32 /dev/aacd0 mirror0 0:00:0 64.0KB:33.9GB Normal 1 071002 05:39:32 1 Mirror 33.9GB Open 0:02:0 64.0KB:33.9GB Normal 0 071002 05:39:50 /dev/aacd1 mirror1 0:03:0 64.0KB:33.9GB Normal 1 071002 05:39:50 AAC0> disk list /f Executing: disk list /full=TRUE B:ID:L Device Type Removable media Vendor-ID Product-ID Rev Blocks Bytes/Bl ock Usage Shared Rate ------ -------------- --------------- --------- ---------------- ----- --------- -------- --- ---------------- ------ ---- 0:00:0 Disk N FUJITSU MAJ3364MC 3702 71390320 512 Initialized NO 160 0:01:0 Disk N FUJITSU MAJ3364MC 3702 71390320 512 Initialized NO 160 0:02:0 Disk N FUJITSU MAJ3364MC 3702 71390320 512 Initialized NO 160 0:03:0 Disk N FUJITSU MAJ3364MC 3702 71390320 512 Initialized NO 160 AAC0> disk show smart Executing: disk show smart Smart Method of Enable Capable Informational Exception Performance Error B:ID:L Device Exceptions(MRIE) Control Enabled Count ------ ------- ---------------- --------- ----------- ------ 0:00:0 Y 6 Y N 0 0:01:0 Y 6 Y N 0 0:02:0 Y 6 Y N 0 0:03:0 Y 6 Y N 0 0:06:0 N AAC0> task list Executing: task list Controller Tasks TaskId Function Done% Container State Specific1 Specific2 ------ -------- ------- --------- ----- --------- --------- No tasks currently running on controller AAC0> dia sh hi Executing: diagnostic show history No switches specified, defaulting to "/current". *** HISTORY BUFFER FROM CURRENT CONTROLLER RUN *** [00]: GetDiskLogEntry: container - 1, entry return 0 [01]: Container 1 started SCRUB task [02]: Starting Mirror:1 scrub [03]: Master disk: 2, start sector: 128, sector count = 71286784 [04]: Slave disk: 3, start sector: 128, sector count = 71286784 [05]: UpdateDiskLogIndex - Set - container 0, index 1 [06]: GetDiskLogEntry: container - 0, entry return 1 [07]: Container 0 started SCRUB task [08]: Starting Mirror:0 scrub [09]: Master disk: 1, start sector: 128, sector count = 71286784 [10]: Slave disk: 0, start sector: 128, sector count = 71286784 [11]: Mirror Scrub Container:1 ErrorsFound:0 [12]: Clear disk log: sector - 80, driveno 2 [13]: Clear disk log: sector - 80, driveno 3 [14]: Container 1 completed SCRUB task: [15]: Mirror Scrub Container:0 ErrorsFound:0 [16]: Clear disk log: sector - 81, driveno 1 [17]: Clear disk log: sector - 81, driveno 0 [18]: Container 0 completed SCRUB task: [19]: UpdateDiskLogIndex - Set - container 0, index 0 [20]: GetDiskLogEntry: container - 0, entry return 0 [21]: Container 0 started SCRUB task [22]: Starting Mirror:0 scrub [23]: Master disk: 1, start sector: 128, sector count = 71286784 [24]: Slave disk: 0, start sector: 128, sector count = 71286784 [25]: UpdateDiskLogIndex - Set - container 1, index 1 [26]: GetDiskLogEntry: container - 1, entry return 1 [27]: Container 1 started SCRUB task [28]: Starting Mirror:1 scrub [29]: Master disk: 2, start sector: 128, sector count = 71286784 [30]: Slave disk: 3, start sector: 128, sector count = 71286784 [31]: Mirror Scrub Container:1 ErrorsFound:0 [32]: Clear disk log: sector - 81, driveno 2 [33]: Clear disk log: sector - 81, driveno 3 [34]: Container 1 completed SCRUB task: [35]: Mirror Scrub Container:0 ErrorsFound:0 [36]: Clear disk log: sector - 80, driveno 1 [37]: Clear disk log: sector - 80, driveno 0 [38]: Container 0 completed SCRUB task: [39]: UpdateDiskLogIndex - Set - container 0, index 0 [40]: GetDiskLogEntry: container - 0, entry return 0 [41]: Container 0 started SCRUB task [42]: Starting Mirror:0 scrub [43]: Master disk: 1, start sector: 128, sector count = 71286784 [44]: Slave disk: 0, start sector: 128, sector count = 71286784 [45]: UpdateDiskLogIndex - Set - container 1, index 1 [46]: GetDiskLogEntry: container - 1, entry return 1 [47]: Container 1 started SCRUB task [48]: Starting Mirror:1 scrub [49]: Master disk: 2, start sector: 128, sector count = 71286784 [50]: Slave disk: 3, start sector: 128, sector count = 71286784 [51]: Mirror Scrub Container:1 ErrorsFound:0 [52]: Clear disk log: sector - 81, driveno 2 [53]: Clear disk log: sector - 81, driveno 3 [54]: Container 1 completed SCRUB task: [55]: Mirror Scrub Container:0 ErrorsFound:0 [56]: Clear disk log: sector - 80, driveno 1 [57]: Clear disk log: sector - 80, driveno 0 [58]: Container 0 completed SCRUB task: [59]: UpdateDiskLogIndex - Set - container 0, index 0 [60]: GetDiskLogEntry: container - 0, entry return 0 [61]: Container 0 started SCRUB task [62]: Starting Mirror:0 scrub [63]: Master disk: 1, start sector: 128, sector count = 71286784 [64]: Slave disk: 0, start sector: 128, sector count = 71286784 [65]: UpdateDiskLogIndex - Set - container 1, index 1 [66]: GetDiskLogEntry: container - 1, entry return 1 [67]: Container 1 started SCRUB task [68]: Starting Mirror:1 scrub [69]: Master disk: 2, start sector: 128, sector count = 71286784 [70]: Slave disk: 3, start sector: 128, sector count = 71286784 [71]: Mirror Scrub Container:1 ErrorsFound:0 [72]: Clear disk log: sector - 81, driveno 2 [73]: Clear disk log: sector - 81, driveno 3 [74]: Container 1 completed SCRUB task: [75]: Mirror Scrub Container:0 ErrorsFound:0 [76]: Clear disk log: sector - 80, driveno 1 [77]: Clear disk log: sector - 80, driveno 0 [78]: Container 0 completed SCRUB task: [79]: UpdateDiskLogIndex - Set - container 0, index 0 [80]: GetDiskLogEntry: container - 0, entry return 0 [81]: Container 0 started SCRUB task [82]: Starting Mirror:0 scrub [83]: Master disk: 1, start sector: 128, sector count = 71286784 [84]: Slave disk: 0, start sector: 128, sector count = 71286784 [85]: UpdateDiskLogIndex - Set - container 1, index 1 [86]: GetDiskLogEntry: container - 1, entry return 1 [87]: Container 1 started SCRUB task [88]: Starting Mirror:1 scrub [89]: Master disk: 2, start sector: 128, sector count = 71286784 [90]: Slave disk: 3, start sector: 128, sector count = 71286784 [91]: Mirror Scrub Container:1 ErrorsFound:0 [92]: Clear disk log: sector - 81, driveno 2 [93]: Clear disk log: sector - 81, driveno 3 [94]: Container 1 completed SCRUB task: [95]: Mirror Scrub Container:0 ErrorsFound:0 [96]: Clear disk log: sector - 80, driveno 1 [97]: Clear disk log: sector - 80, driveno 0 [98]: Container 0 completed SCRUB task: [99]: ======================== History Output Complete. AAC0> AAC0> exit Executing: exit press enter when ready to run verify <INS> --------------------------------------------------------------------------------------------- Adaptec SCSI RAID Controller Command Line Interface Copyright 1998-2002 Adaptec, Inc. All rights reserved --------------------------------------------------------------------------------------------- CLI > open aac0 Executing: open "aac0" AAC0> contai scr 0 Executing: container scrub 0 AAC0> contai scr 1 Executing: container scrub 1 AAC0> exit Executing: exit when done run: aaccli open aac0 dia sh hi c Nov 1 10:32:46 mail /kernel: aac0: **Monitor** Container 0 started SCRUB task Nov 1 10:32:47 mail /kernel: aac0: **Monitor** Container 1 started SCRUB task
Here's an analysis of what we're seeing and what we're looking for:
AAC0> container list /f Executing: container list /full=TRUE Num Total Oth Chunk Scsi Partition Creation System Label Type Size Ctr Size Usage B:ID:L Offset:Size State RO Lk Task Done% Ent Date Time Files ----- ------ ------ --- ------ ------- ------ ------------- ------- -- -- ------- ------ --- ------ -------- ------ 0 Mirror 33.9GB Open 0:01:0 64.0KB:33.9GB Normal 0 071002 05:39:32 /dev/aacd0 mirror0 0:00:0 64.0KB:33.9GB Normal 1 071002 05:39:32 1 Mirror 33.9GB Open 0:02:0 64.0KB:33.9GB Normal 0 071002 05:39:50 /dev/aacd1 mirror1 0:03:0 64.0KB:33.9GB Normal 1 071002 05:39:50
This is showing you the health of the arrays. You're looking for Normal under the State column, and the absence of a ! in the sector size - sometimes, you'll see this:
64.0KB!33.9GB
That indicates a problem.
AAC0> disk show smart Executing: disk show smart Smart Method of Enable Capable Informational Exception Performance Error B:ID:L Device Exceptions(MRIE) Control Enabled Count ------ ------- ---------------- --------- ----------- ------ 0:00:0 Y 6 Y N 0 0:01:0 Y 6 Y N 0 0:02:0 Y 6 Y N 0 0:03:0 Y 6 Y N 0 0:06:0 N
This shows you a SMART report output. Looking for values in the Error Count column.
AAC0> task list Executing: task list Controller Tasks TaskId Function Done% Container State Specific1 Specific2 ------ -------- ------- --------- ----- --------- --------- No tasks currently running on controller
Look for absence of tasks running- a bad thing would be to see a rebuild or verify running when you didn't initiate it.
With the history output, you're looking for any anomalies or events since the last time a verify was run. If you see a drive with lots of problems, you may want to take backups before allowing the verify to run since it could replicate errors onto the good drive.
After you see the history output, it will prompt you to press enter to run the verify. If you're happy with all the output you're seeing- mirror is healthy, history looks good, it's safe to proceed. Otherwise ^C to exit. After hitting enter it will start the verify and start to tail the messages log so you can easily see when the verify is complete. At which point you will run the provided output to followup and view the history to see the results of the verify. So, putting it all together, after hitting enter to start the verify, you'll see:
--------------------------------------------------------------------------------------------- Adaptec SCSI RAID Controller Command Line Interface Copyright 1998-2002 Adaptec, Inc. All rights reserved --------------------------------------------------------------------------------------------- CLI > open aac0 Executing: open "aac0" AAC0> contai scr 0 Executing: container scrub 0 AAC0> contai scr 1 Executing: container scrub 1 AAC0> exit Executing: exit when done run: aaccli open aac0 dia sh hi c Nov 1 10:32:46 mail /kernel: aac0: **Monitor** Container 0 started SCRUB task Nov 1 10:32:47 mail /kernel: aac0: **Monitor** Container 1 started SCRUB task
When the scrub(s) (verify) are complete - if the server has multiple logical drives, it will run both in parallel - you should run:
aaccli open aac0 dia sh hi c
Which will show you the diagnostic history, you're looking for the results of the most recent scrub:
[100]: Mirror Scrub Container:1 ErrorsFound:0 [101]: Clear disk log: sector - 81, driveno 2 [102]: Clear disk log: sector - 81, driveno 3 [103]: Container 1 completed SCRUB task: [104]: Mirror Scrub Container:0 ErrorsFound:0 [105]: Clear disk log: sector - 80, driveno 1 [106]: Clear disk log: sector - 80, driveno 0 [107]: Container 0 completed SCRUB task:
If you see:
[104]: Mirror Scrub Container:0 ErrorsFound:5
You'll want to rerun the verify on that drive till it shows 0, or perhaps replace the drive- you should be able to see from the output which drive had the problem.
See Adaptec RAID CLI Reference for more details on how to use the CLI