Routine Maintenance: Difference between revisions
Line 1,355: | Line 1,355: | ||
= Infrequent tasks = | = Infrequent tasks = | ||
== Free up space on gateway == | |||
<pre>newgateway /var/spool# cd clientmqueue/ | |||
newgateway /var/spool/clientmqueue# sh | |||
# for f in `ls`; do rm $f; done | |||
exit</pre> | |||
== Free up space on mail == | == Free up space on mail == | ||
Revision as of 17:31, 1 March 2013
Daily Tasks
check load graphs
Click on the Load link in mgmt
This screen shows you load levels on our servers and network traffic for critical machines (firewalls, backup servers).
If you see load high or increasing
FreeBSD: run jtop (or jt > 7.x) and see if there are any runaway processes. Here are some examples of entries in top that are definitely runaway processes:
79481 root 64 0 2256K 1056K CPU1 1 58:16 87.40% 87.40% nano 50650 1000 64 0 1852K 1112K RUN 0 207.9H 84.08% 84.08% screen 14829 www 2 0 39100K 31736K accept 0 104:24 46.54% 6.54% httpd 42065 root 61 0 1300K 844K RUN 1 47.8H 91.36% 91.36% ee 1328 www 56 0 18440K 10796K CPU1 0 64.4H 97.71% 97.71% httpd 26251 user 57 0 6124K 1160K CPU1 1 82.9H 98.44% 98.44% screen 89874 root 60 0 1352K 892K RUN 1 33.8H 65.82% 65.82% dialog 38656 1000 64 0 3088K 2136K CPU0 0 806:13 97.95% 97.95% StutBot 27630 root 64 0 1396K 972K RUN 1 76.8H 86.47% 86.47% ee
Linux: run vwe to see which VPS’s have high loads. From there run vp <veid> and/or vt <veid> to see what's going on in that system. vzstat will also give you a nice picture of whats going on, systems with high numbers in the mlat column are likely culprits.
examples of out of control procs:
12183 nobody 16 0 4916 1348 1340 R 45.5 0.0 4249m httpd 29266 #502 16 0 1852 796 792 R 22.5 0.0 1104m vim 23860 #41 16 0 5472 5472 2076 R 98.9 0.2 31:41 python 19227 bin 19 0 1688 716 652 R 99.9 0.0 321:08 wtrs_ui 7762 apache 16 0 268 236 224 R 85.7 0.0 1010m ptrace 4624 #501 20 0 4304 2400 2044 R 53.6 0.1 284:32 YoSucker 20451 #506 20 0 1876 820 816 R 17.2 0.0 169:35 vim 8834 #514 20 0 900 724 672 R 77.6 0.0 382:30 neostats 31815 apache 14 0 3176 3176 1696 R 74.4 0.1 6:15 counter
Just kill -9 them and be done with it.
Also, anytime you see `kmod` or `ptrace` - kill those immediaely no matter how much they are using - they are attempts to exploit the linux ptrace bug. They won't work, but they suck a lot of CPU...
Also, any other processes that are at 90-100% cpu usage and have been running for any long period of time should be killed except for mysqld processes on FreeBSD. See above.
However, there is an exception:
if it is a mysqld, we don't want to kill their database. What you want to do is jpid <pid> to see who owns it, and then email them the paste containing the instructions for the nanny. Or you can simply do a kill -1 PID on the process to restart it.
Load averages jump at night
The load averages on the FreeBSD systems may jump up at night between 1 and 4 am - this is because the backups are running - if this is what is causing a jump in load, you will see processes like `rsync` in top eating a lot of CPU time.
check backups
mgmt -> Motnroing -> Backups and make sure every machine was backed up the previous nite. Also look at df on backup1 and backup2 to make sure no disk is approaching full, though bb should warn us in advance. Please note - errors encountered when a backup script on any of the particular systems run will gnerate an email to support@johncompanies.com so you can know immediately the day after if the directory to be backed up has been moved or no longer exists. A paste exists for this to notify the customer of a non-existant file/dir.
check bb for warnings
mgmt -> BigBrother
Some events don't generate pages (on purpose). You will only see them by going to the bb main page.
Monthly Tasks
rotate pine sent mail (1st of month)
On the 1st of the month, before any emails are sent out, quit out of pine, then log back in. Send mail from last month will be archived. If you mess up and do it on the 3rd (for example), you can go into the previous month's saved email and save emails from the current month into the sent-mail (current month) mailbox.
b/w caps
On the 1st: remove any bwcaps put into the firewall (only really applies if a bwcap was added cause someone went over on b/w):
ipfw list|grep pipe ipfw del [each rule listed]
NOTE: this cronjob on newgateway will do some of that for you, provided you used one of the following pipe #s:
0 0 1 * * /sbin/ipfw del 3 4 5 17331
We really don’t do this anymore since we have centralized traffic accounting with netflow, but for posterity:
Make sure all machines reset counters to 0 after midnight on the 1st Make sure they dumped a counter
On each jail run:
trafficgather.pl
And on each virt:
linuxtrafficgather.pl
Monthly RAID checks
Every month we check the health of and verfy the parity on all our RAID-based systems. To facilitate this, we've created a simple script to start the process:
sh /root/verify.sh
Adaptec controllers
Here's some sample output:
mail /usr/local/www/scripts# sh /root/verify.sh --------------------------------------------------------------------------------------------- Adaptec SCSI RAID Controller Command Line Interface Copyright 1998-2002 Adaptec, Inc. All rights reserved --------------------------------------------------------------------------------------------- CLI > open aac0 Executing: open "aac0" AAC0> container list /f Executing: container list /full=TRUE Num Total Oth Chunk Scsi Partition Creation System Label Type Size Ctr Size Usage B:ID:L Offset:Size State RO Lk Task Done% Ent Date Time Files ----- ------ ------ --- ------ ------- ------ ------------- ------- -- -- ------- ------ --- ------ -------- ------ 0 Mirror 33.9GB Open 0:01:0 64.0KB:33.9GB Normal 0 071002 05:39:32 /dev/aacd0 mirror0 0:00:0 64.0KB:33.9GB Normal 1 071002 05:39:32 1 Mirror 33.9GB Open 0:02:0 64.0KB:33.9GB Normal 0 071002 05:39:50 /dev/aacd1 mirror1 0:03:0 64.0KB:33.9GB Normal 1 071002 05:39:50 AAC0> disk list /f Executing: disk list /full=TRUE B:ID:L Device Type Removable media Vendor-ID Product-ID Rev Blocks Bytes/Bl ock Usage Shared Rate ------ -------------- --------------- --------- ---------------- ----- --------- -------- --- ---------------- ------ ---- 0:00:0 Disk N FUJITSU MAJ3364MC 3702 71390320 512 Initialized NO 160 0:01:0 Disk N FUJITSU MAJ3364MC 3702 71390320 512 Initialized NO 160 0:02:0 Disk N FUJITSU MAJ3364MC 3702 71390320 512 Initialized NO 160 0:03:0 Disk N FUJITSU MAJ3364MC 3702 71390320 512 Initialized NO 160 AAC0> disk show smart Executing: disk show smart Smart Method of Enable Capable Informational Exception Performance Error B:ID:L Device Exceptions(MRIE) Control Enabled Count ------ ------- ---------------- --------- ----------- ------ 0:00:0 Y 6 Y N 0 0:01:0 Y 6 Y N 0 0:02:0 Y 6 Y N 0 0:03:0 Y 6 Y N 0 0:06:0 N AAC0> task list Executing: task list Controller Tasks TaskId Function Done% Container State Specific1 Specific2 ------ -------- ------- --------- ----- --------- --------- No tasks currently running on controller AAC0> dia sh hi Executing: diagnostic show history No switches specified, defaulting to "/current". *** HISTORY BUFFER FROM CURRENT CONTROLLER RUN *** [00]: GetDiskLogEntry: container - 1, entry return 0 [01]: Container 1 started SCRUB task [02]: Starting Mirror:1 scrub [03]: Master disk: 2, start sector: 128, sector count = 71286784 [04]: Slave disk: 3, start sector: 128, sector count = 71286784 [05]: UpdateDiskLogIndex - Set - container 0, index 1 [06]: GetDiskLogEntry: container - 0, entry return 1 [07]: Container 0 started SCRUB task [08]: Starting Mirror:0 scrub [09]: Master disk: 1, start sector: 128, sector count = 71286784 [10]: Slave disk: 0, start sector: 128, sector count = 71286784 [11]: Mirror Scrub Container:1 ErrorsFound:0 [12]: Clear disk log: sector - 80, driveno 2 [13]: Clear disk log: sector - 80, driveno 3 [14]: Container 1 completed SCRUB task: [15]: Mirror Scrub Container:0 ErrorsFound:0 [16]: Clear disk log: sector - 81, driveno 1 [17]: Clear disk log: sector - 81, driveno 0 [18]: Container 0 completed SCRUB task: [19]: UpdateDiskLogIndex - Set - container 0, index 0 [20]: GetDiskLogEntry: container - 0, entry return 0 [21]: Container 0 started SCRUB task [22]: Starting Mirror:0 scrub [23]: Master disk: 1, start sector: 128, sector count = 71286784 [24]: Slave disk: 0, start sector: 128, sector count = 71286784 [25]: UpdateDiskLogIndex - Set - container 1, index 1 [26]: GetDiskLogEntry: container - 1, entry return 1 [27]: Container 1 started SCRUB task [28]: Starting Mirror:1 scrub [29]: Master disk: 2, start sector: 128, sector count = 71286784 [30]: Slave disk: 3, start sector: 128, sector count = 71286784 [31]: Mirror Scrub Container:1 ErrorsFound:0 [32]: Clear disk log: sector - 81, driveno 2 [33]: Clear disk log: sector - 81, driveno 3 [34]: Container 1 completed SCRUB task: [35]: Mirror Scrub Container:0 ErrorsFound:0 [36]: Clear disk log: sector - 80, driveno 1 [37]: Clear disk log: sector - 80, driveno 0 [38]: Container 0 completed SCRUB task: [39]: UpdateDiskLogIndex - Set - container 0, index 0 [40]: GetDiskLogEntry: container - 0, entry return 0 [41]: Container 0 started SCRUB task [42]: Starting Mirror:0 scrub [43]: Master disk: 1, start sector: 128, sector count = 71286784 [44]: Slave disk: 0, start sector: 128, sector count = 71286784 [45]: UpdateDiskLogIndex - Set - container 1, index 1 [46]: GetDiskLogEntry: container - 1, entry return 1 [47]: Container 1 started SCRUB task [48]: Starting Mirror:1 scrub [49]: Master disk: 2, start sector: 128, sector count = 71286784 [50]: Slave disk: 3, start sector: 128, sector count = 71286784 [51]: Mirror Scrub Container:1 ErrorsFound:0 [52]: Clear disk log: sector - 81, driveno 2 [53]: Clear disk log: sector - 81, driveno 3 [54]: Container 1 completed SCRUB task: [55]: Mirror Scrub Container:0 ErrorsFound:0 [56]: Clear disk log: sector - 80, driveno 1 [57]: Clear disk log: sector - 80, driveno 0 [58]: Container 0 completed SCRUB task: [59]: UpdateDiskLogIndex - Set - container 0, index 0 [60]: GetDiskLogEntry: container - 0, entry return 0 [61]: Container 0 started SCRUB task [62]: Starting Mirror:0 scrub [63]: Master disk: 1, start sector: 128, sector count = 71286784 [64]: Slave disk: 0, start sector: 128, sector count = 71286784 [65]: UpdateDiskLogIndex - Set - container 1, index 1 [66]: GetDiskLogEntry: container - 1, entry return 1 [67]: Container 1 started SCRUB task [68]: Starting Mirror:1 scrub [69]: Master disk: 2, start sector: 128, sector count = 71286784 [70]: Slave disk: 3, start sector: 128, sector count = 71286784 [71]: Mirror Scrub Container:1 ErrorsFound:0 [72]: Clear disk log: sector - 81, driveno 2 [73]: Clear disk log: sector - 81, driveno 3 [74]: Container 1 completed SCRUB task: [75]: Mirror Scrub Container:0 ErrorsFound:0 [76]: Clear disk log: sector - 80, driveno 1 [77]: Clear disk log: sector - 80, driveno 0 [78]: Container 0 completed SCRUB task: [79]: UpdateDiskLogIndex - Set - container 0, index 0 [80]: GetDiskLogEntry: container - 0, entry return 0 [81]: Container 0 started SCRUB task [82]: Starting Mirror:0 scrub [83]: Master disk: 1, start sector: 128, sector count = 71286784 [84]: Slave disk: 0, start sector: 128, sector count = 71286784 [85]: UpdateDiskLogIndex - Set - container 1, index 1 [86]: GetDiskLogEntry: container - 1, entry return 1 [87]: Container 1 started SCRUB task [88]: Starting Mirror:1 scrub [89]: Master disk: 2, start sector: 128, sector count = 71286784 [90]: Slave disk: 3, start sector: 128, sector count = 71286784 [91]: Mirror Scrub Container:1 ErrorsFound:0 [92]: Clear disk log: sector - 81, driveno 2 [93]: Clear disk log: sector - 81, driveno 3 [94]: Container 1 completed SCRUB task: [95]: Mirror Scrub Container:0 ErrorsFound:0 [96]: Clear disk log: sector - 80, driveno 1 [97]: Clear disk log: sector - 80, driveno 0 [98]: Container 0 completed SCRUB task: [99]: ======================== History Output Complete. AAC0> AAC0> exit Executing: exit press enter when ready to run verify <INS> --------------------------------------------------------------------------------------------- Adaptec SCSI RAID Controller Command Line Interface Copyright 1998-2002 Adaptec, Inc. All rights reserved --------------------------------------------------------------------------------------------- CLI > open aac0 Executing: open "aac0" AAC0> contai scr 0 Executing: container scrub 0 AAC0> contai scr 1 Executing: container scrub 1 AAC0> exit Executing: exit when done run: aaccli open aac0 dia sh hi c Nov 1 10:32:46 mail /kernel: aac0: **Monitor** Container 0 started SCRUB task Nov 1 10:32:47 mail /kernel: aac0: **Monitor** Container 1 started SCRUB task
Here's an analysis of what we're seeing and what we're looking for:
AAC0> container list /f Executing: container list /full=TRUE Num Total Oth Chunk Scsi Partition Creation System Label Type Size Ctr Size Usage B:ID:L Offset:Size State RO Lk Task Done% Ent Date Time Files ----- ------ ------ --- ------ ------- ------ ------------- ------- -- -- ------- ------ --- ------ -------- ------ 0 Mirror 33.9GB Open 0:01:0 64.0KB:33.9GB Normal 0 071002 05:39:32 /dev/aacd0 mirror0 0:00:0 64.0KB:33.9GB Normal 1 071002 05:39:32 1 Mirror 33.9GB Open 0:02:0 64.0KB:33.9GB Normal 0 071002 05:39:50 /dev/aacd1 mirror1 0:03:0 64.0KB:33.9GB Normal 1 071002 05:39:50
This is showing you the health of the arrays. You're looking for Normal under the State column, and the absence of a ! in the sector size - sometimes, you'll see this:
64.0KB!33.9GB
That indicates a problem.
AAC0> disk show smart Executing: disk show smart Smart Method of Enable Capable Informational Exception Performance Error B:ID:L Device Exceptions(MRIE) Control Enabled Count ------ ------- ---------------- --------- ----------- ------ 0:00:0 Y 6 Y N 0 0:01:0 Y 6 Y N 0 0:02:0 Y 6 Y N 0 0:03:0 Y 6 Y N 0 0:06:0 N
This shows you a SMART report output. Looking for values in the Error Count column.
AAC0> task list Executing: task list Controller Tasks TaskId Function Done% Container State Specific1 Specific2 ------ -------- ------- --------- ----- --------- --------- No tasks currently running on controller
Look for absence of tasks running- a bad thing would be to see a rebuild or verify running when you didn't initiate it.
With the history output, you're looking for any anomalies or events since the last time a verify was run. If you see a drive with lots of problems, you may want to take backups before allowing the verify to run since it could replicate errors onto the good drive.
After you see the history output, it will prompt you to press enter to run the verify. If you're happy with all the output you're seeing- mirror is healthy, history looks good, it's safe to proceed. Otherwise ^C to exit. After hitting enter it will start the verify and start to tail the messages log file (so you can easily see when the verify is complete). Here's what that'll look like:
Nov 1 14:38:08 mail /kernel: aac0: **Monitor** Container 1 completed SCRUB task: Nov 1 14:46:45 mail /kernel: aac0: **Monitor** Container 0 completed SCRUB task:
So, putting it all together, after hitting enter to start the verify, you'll see:
--------------------------------------------------------------------------------------------- Adaptec SCSI RAID Controller Command Line Interface Copyright 1998-2002 Adaptec, Inc. All rights reserved --------------------------------------------------------------------------------------------- CLI > open aac0 Executing: open "aac0" AAC0> contai scr 0 Executing: container scrub 0 AAC0> contai scr 1 Executing: container scrub 1 AAC0> exit Executing: exit when done run: aaccli open aac0 dia sh hi c Nov 1 10:32:46 mail /kernel: aac0: **Monitor** Container 0 started SCRUB task Nov 1 10:32:47 mail /kernel: aac0: **Monitor** Container 1 started SCRUB task
When the scrub(s) (verify) are complete - if the server has multiple logical drives, it will run both in parallel - you should exit the tail of the log file (^C) and run:
aaccli open aac0 dia sh hi c
Which will show you the diagnostic history, you're looking for the results of the most recent scrub:
[100]: Mirror Scrub Container:1 ErrorsFound:0 [101]: Clear disk log: sector - 81, driveno 2 [102]: Clear disk log: sector - 81, driveno 3 [103]: Container 1 completed SCRUB task: [104]: Mirror Scrub Container:0 ErrorsFound:0 [105]: Clear disk log: sector - 80, driveno 1 [106]: Clear disk log: sector - 80, driveno 0 [107]: Container 0 completed SCRUB task:
^C to exit the RAID CLI.
If you see:
[104]: Mirror Scrub Container:0 ErrorsFound:5
You'll want to rerun the verify on that drive till it shows 0, or perhaps replace the drive- you should be able to see from the output which drive had the problem.
Depending on the size and how busy the drive is, the verify can take anywhere from an hour to the better part of a day.
You will notice that the diagnostic history is not shown on our modern adaptec cards (i.e. any adaptec card not in a Dell 2450). The reason for this is the history is never cleared, so there's simply too much data to show and it just crashes the CLI. So, don't bother trying to see it...which does make it hard to see if there are problems going on, so you just need to watch the scrub to see it goes to 100%. You will also notice that on some servers there's no tail of messages. Again, this is cause no data is shown there about the completion of the scrub. The thing to do here is to go into the CLI and continue to show tasks to monitor scrub progress.
See Adaptec RAID CLI Reference for more details on how to use the CLI
DELL (LSI-based) SAS controllers
Here's what the output looks like when running verify.sh on a LSI-based card:
jail2 /mnt/data2# sh /root/verify.sh Adapter #0 Enclosure Device ID: 32 Slot Number: 0 Device Id: 0 Sequence Number: 2 Media Error Count: 0 Other Error Count: 0 Predictive Failure Count: 0 Last Predictive Failure Event Seq Number: 0 PD Type: SAS Raw Size: 140014MB [0x11177328 Sectors] Non Coerced Size: 139502MB [0x11077328 Sectors] Coerced Size: 139392MB [0x11040000 Sectors] Firmware state: Online SAS Address(0): 0x500000e018396142 SAS Address(1): 0x0 Connected Port Number: 0(path0) Inquiry Data: FUJITSU MAX3147RC D207DQ03P7A0DESN Foreign State: None Media Type: Hard Disk Device Device Speed: Unknown Link Speed: Unknown Enclosure Device ID: 32 Slot Number: 1 Device Id: 1 Sequence Number: 2 Media Error Count: 0 Other Error Count: 0 Predictive Failure Count: 0 Last Predictive Failure Event Seq Number: 0 PD Type: SAS Raw Size: 140014MB [0x11177328 Sectors] Non Coerced Size: 139502MB [0x11077328 Sectors] Coerced Size: 139392MB [0x11040000 Sectors] Firmware state: Online SAS Address(0): 0x500000e018395db2 SAS Address(1): 0x0 Connected Port Number: 1(path0) Inquiry Data: FUJITSU MAX3147RC D207DQ03P7A0DERV Foreign State: None Media Type: Hard Disk Device Device Speed: Unknown Link Speed: Unknown Enclosure Device ID: 32 Slot Number: 2 Device Id: 2 Sequence Number: 2 Media Error Count: 0 Other Error Count: 0 Predictive Failure Count: 0 Last Predictive Failure Event Seq Number: 0 PD Type: SAS Raw Size: 286102MB [0x22ecb25c Sectors] Non Coerced Size: 285590MB [0x22dcb25c Sectors] Coerced Size: 285568MB [0x22dc0000 Sectors] Firmware state: Online SAS Address(0): 0x5000c50006eece89 SAS Address(1): 0x0 Connected Port Number: 2(path0) Inquiry Data: SEAGATE ST3300555SS T2113LM4BFBZ Foreign State: None Media Type: Hard Disk Device Device Speed: Unknown Link Speed: Unknown Enclosure Device ID: 32 Slot Number: 3 Device Id: 3 Sequence Number: 2 Media Error Count: 0 Other Error Count: 0 Predictive Failure Count: 0 Last Predictive Failure Event Seq Number: 0 PD Type: SAS Raw Size: 286102MB [0x22ecb25c Sectors] Non Coerced Size: 285590MB [0x22dcb25c Sectors] Coerced Size: 285568MB [0x22dc0000 Sectors] Firmware state: Online SAS Address(0): 0x5000c50006eee035 SAS Address(1): 0x0 Connected Port Number: 3(path0) Inquiry Data: SEAGATE ST3300555SS T2113LM4BGF7 Foreign State: None Media Type: Hard Disk Device Device Speed: Unknown Link Speed: Unknown Enclosure Device ID: 32 Slot Number: 4 Device Id: 4 Sequence Number: 2 Media Error Count: 0 Other Error Count: 0 Predictive Failure Count: 0 Last Predictive Failure Event Seq Number: 0 PD Type: SAS Raw Size: 286102MB [0x22ecb25c Sectors] Non Coerced Size: 285590MB [0x22dcb25c Sectors] Coerced Size: 285568MB [0x22dc0000 Sectors] Firmware state: Online SAS Address(0): 0x5000c50004bd7ea5 SAS Address(1): 0x0 Connected Port Number: 4(path0) Inquiry Data: SEAGATE ST3300656SS HS093QP0G8SW Foreign State: None Media Type: Hard Disk Device Device Speed: Unknown Link Speed: Unknown Enclosure Device ID: 32 Slot Number: 5 Device Id: 5 Sequence Number: 2 Media Error Count: 0 Other Error Count: 0 Predictive Failure Count: 0 Last Predictive Failure Event Seq Number: 0 PD Type: SAS Raw Size: 286102MB [0x22ecb25c Sectors] Non Coerced Size: 285590MB [0x22dcb25c Sectors] Coerced Size: 285568MB [0x22dc0000 Sectors] Firmware state: Online SAS Address(0): 0x500000e01f1c4112 SAS Address(1): 0x0 Connected Port Number: 5(path0) Inquiry Data: FUJITSU MBA3300RC D306BJ15P9201W06 Foreign State: None Media Type: Hard Disk Device Device Speed: Unknown Link Speed: Unknown Exit Code: 0x00 Adapter 0 -- Virtual Drive Information: Virtual Disk: 0 (Target Id: 0) Name: RAID Level: Primary-1, Secondary-0, RAID Level Qualifier-0 Size:139392MB State: Optimal Stripe Size: 64kB Number Of Drives:2 Span Depth:1 Default Cache Policy: WriteBack, ReadAdaptive, Direct, No Write Cache if Bad BBU Current Cache Policy: WriteBack, ReadAdaptive, Direct, No Write Cache if Bad BBU Access Policy: Read/Write Disk Cache Policy: Disk's Default Exit Code: 0x00 Adapter 0 -- Virtual Drive Information: Virtual Disk: 1 (Target Id: 1) Name:MIRROR1 RAID Level: Primary-1, Secondary-0, RAID Level Qualifier-0 Size:285568MB State: Optimal Stripe Size: 64kB Number Of Drives:2 Span Depth:1 Default Cache Policy: WriteBack, ReadAdaptive, Direct, No Write Cache if Bad BBU Current Cache Policy: WriteBack, ReadAdaptive, Direct, No Write Cache if Bad BBU Access Policy: Read/Write Disk Cache Policy: Disk's Default Exit Code: 0x00 Adapter 0 -- Virtual Drive Information: Virtual Disk: 2 (Target Id: 2) Name:MIRROR2 RAID Level: Primary-1, Secondary-0, RAID Level Qualifier-0 Size:285568MB State: Optimal Stripe Size: 64kB Number Of Drives:2 Span Depth:1 Default Cache Policy: WriteBack, ReadAdaptive, Direct, No Write Cache if Bad BBU Current Cache Policy: WriteBack, ReadAdaptive, Direct, No Write Cache if Bad BBU Access Policy: Read/Write Disk Cache Policy: Disk's Default Exit Code: 0x00 Battery FRU : N/A Battery Warning : Enabled Memory Correctable Errors : 0 Memory Uncorrectable Errors : 0 BBU : Present BBU : Yes Cache When BBU Bad : Disabled press enter when ready to run verify
Before pressing enter, here's what we're looking for:
Enclosure Device ID: 32 Slot Number: 0 Device Id: 0 Sequence Number: 2 Media Error Count: 0 Other Error Count: 0 Predictive Failure Count: 0 Last Predictive Failure Event Seq Number: 0 PD Type: SAS Raw Size: 140014MB [0x11177328 Sectors] Non Coerced Size: 139502MB [0x11077328 Sectors] Coerced Size: 139392MB [0x11040000 Sectors] Firmware state: Online SAS Address(0): 0x500000e018396142 SAS Address(1): 0x0 Connected Port Number: 0(path0) Inquiry Data: FUJITSU MAX3147RC D207DQ03P7A0DESN Foreign State: None Media Type: Hard Disk Device Device Speed: Unknown Link Speed: Unknown
This is the output shown for each physical drive in the system. We're looking to confirm it's Firmware state is Online, and Media Error Count, Other Error Count, and Predictive Failure Count are all zero (or near zero).
Adapter 0 -- Virtual Drive Information: Virtual Disk: 1 (Target Id: 1) Name:MIRROR1 RAID Level: Primary-1, Secondary-0, RAID Level Qualifier-0 Size:285568MB State: Optimal Stripe Size: 64kB Number Of Drives:2 Span Depth:1 Default Cache Policy: WriteBack, ReadAdaptive, Direct, No Write Cache if Bad BBU Current Cache Policy: WriteBack, ReadAdaptive, Direct, No Write Cache if Bad BBU Access Policy: Read/Write Disk Cache Policy: Disk's Default
This is the output for each logical drive. We're looking for State Optimal. Also confirm Default Cache Policy: WriteBack, ReadAdaptive, Direct, No Write Cache if Bad BBU
Exit Code: 0x00 Battery FRU : N/A Battery Warning : Enabled Memory Correctable Errors : 0 Memory Uncorrectable Errors : 0 BBU : Present BBU : Yes Cache When BBU Bad : Disabled
Confirm that the battery is present and error-free.
If all that checks out, you're ready to proceed with the verify. After pressing enter, the verify is started and here's what you see:
Start Check Consistency on Virtual Drive 0 (target id: 0) Success. Exit Code: 0x00 Start Check Consistency on Virtual Drive 1 (target id: 1) Success. Exit Code: 0x00 Start Check Consistency on Virtual Drive 2 (target id: 2) Success. Exit Code: 0x00 Check Consistency Progress of Virtual Drives... Virtual Drive # Percent Complete Time Elps 0 °°°°°°°°°°°°°°°°°°°°°°°00 %°°°°°°°°°°°°°°°°°°°°°°° 00:00:03 1 °°°°°°°°°°°°°°°°°°°°°°°00 %°°°°°°°°°°°°°°°°°°°°°°° 00:00:02 2 °°°°°°°°°°°°°°°°°°°°°°°00 %°°°°°°°°°°°°°°°°°°°°°°° 00:00:01 Press <ESC> key to quit...
The progress for each drive is displayed until all drives have completed the verify. We just want to make sure that each drive goes to completion. No followup is needed...though there probably is a log or history where we can get more info.
You will notice that jail7 does not run a verify- that's on purpose. The last time we tried this it crashed the system. So, this must be run from the BIOS (take the system offline for a couple hours).
See LSI RAID CLI Reference for more details on how to use the CLI
LSI-based controllers (megaraid)
There is a CLI for this however it's easier to do this with a curses GUI app: megaraid
Currently only on these servers: virt15, virt16, and firewall2
To run:
# cd /usr/local/sbin/; megamgr
Main menu:
²ÚÄÄManagement MenuÄÄ¿² ²³ Configure ³² ²³ Initialize ³² ²³ Objects ³² ²³ Rebuild ³² ²³ Check Consistency ³² ²³ Advanced Menu ³² ²ÀÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÙ²
Before you check consistency, make sure the arrays are healthy.
Objects -> Physical Drive
Then look to make sure they're all ONLIN
²²²²²²²²²²²²²²²²²²²²²²²²²²²²²²²²²²²²²²²²²²²²²²²²²²²²²²²²²²²²²²²²²²²²²²²²²²²²²²²² ²²²²²²²²ÚÄÄÄÄÄÄÄÄÄÄÄÄObjects - PHYSICAL DRIVE SELECTION MENUÄÄÄÄÄÄÄÄÄÄÄÄÄ¿²²²²²² ²²²²²²²²³ ³²²²²²² ²²²²²²²²³ Channel-1 ³²²²²²² ²ÚÄÄMana³ ID ÉÍÍÍÍÍÍÍÍÍÍÍÍÍÍ»x ³²²²²²² ²³ Confi³ 0º* ONLIN A01-01º ³²²²²²² ²³ Initi³ ÌÍÍÍÍÍÍÍÍÍÍÍÍÍ͹ ³²²²²²² ²³ Objec³ 1º* ONLIN A01-02º ³²²²²²² ²³ Rebui³ ÌÍÍÍÍÍÍÍÍÍÍÍÍÍ͹ ³²²²²²² ²³ Check³ 2º* ONLIN A02-01º ³²²²²²² ²³ Advan³ ÌÍÍÍÍÍÍÍÍÍÍÍÍÍ͹ ³²²²²²² ²ÀÄÄÄÄÄij 3º* ONLIN A02-02º ³²²²²²² ²²²²²²²²³ ÌÍÍÍÍÍÍÍÍÍÍÍÍÍ͹ ³²²²²²² ²²²²²²²²³ 4º* ONLIN A03-01º ³²²²²²² ²²²²²²²²³ ÌÍÍÍÍÍÍÍÍÍÍÍÍÍ͹ ³²²²²²² ²²²²²²²²³ 5º* ONLIN A03-02ºþ ³²²²²²² ²²²²²²²²³ ÌÍÍÍÍÍÍÍÍÍÍÍÍÍ͹ ³²²²²²² ²²²²²²²²³ 6º* º ³²²²²²² ²²²²²²²²³ ÈÍÍÍÍÍÍÍÍÍÍÍÍÍͼx ³²²²²²² ²²²²²²²²ÀÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÙ²²²²²² ²²²²²²²²ÚÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄ¿²²²²²²²² ²²²²²²²²³Ch-1 ID-5 DISK 140013MB SEAGATE ST3146707LC 0003 ³²²²²²²²² ²²²²²²²²ÀÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÙ²²²²²²²²
Once that's done, hit escape once then the back arror to move back to the Objects menu.
So you select Objects -> Logical Drive -> Logical Drive 1 -> Check Consistency -> YES
²²²²²²²²²²²²²²²²²²²²²²²²²²²²²²²²²²²²²²²²²²²²²²²²²²²²²²²²²²²²²²²²²²²²²²²²²²²²²²²² ²²²²²²²²²²²²²²²²²²²²²²²²²²²²ÚÄLogical Drives(02)Ä¿²²²²²²²²²²²²²²²²²²²²²²²²²²²²²² ²²²²²²²²²²²²²²²²²²²²²²²²²²²²³ Logical Drive 1 ³²²²²²²²²²²²²²²²²²²²²²²²²²²²²²² ²²²²²²²²²²²²²ÚÄÄÄÄObjectsÄÄij Logical Drive 2 ³²²²²²²²²²²²²²²²²²²²²²²²²²²²²²² ²ÚÄÄManagemen³ Adapter ÀÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÙ²²²²²²²²²²²²²²²²²²²²²²²²²²²²²² ²³ Configure ³ Logical Drive ³²²²²²²²²²²²²²²²²²²²²²²²²²²²²²²²²²²²²²²²²²²²²²²²²² ²³ Initialize³ Physical Drive ³²²²²²²²²²²²²²²²²²²²²²²²²²²²²²²²²²²²²²²²²²²²²²²²²² ²³ Objects ³ Channel ³²²²²²²²²²²²²²²²²²²²²²²²²²²²²²²²²²²²²²²²²²²²²²²²²² ²³ Rebuild ÀÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÙ²²²²²²²²²²²²²²²²²²²²²²²²²²²²²²²²²²²²²²²²²²²²²²²²² ²³ Check Consistency ³²²²²²²²²²²²²²²²²²²²²²²²²²²²²²²²²²²²²²²²²²²²²²²²²²²²²²²²²²² ²³ Advanced Menu ³²²²²²²²ÚÄÄÄÄLogical Drive 1ÄÄÄÄÄ¿²²²²²²²²²²²²²²²²²²²²²²²²² ²ÀÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÙ²²²²²²²³ Initialize ÚÄCheck Consistency-1 ?Ä¿²²²²²²²²² ²²²²²²²²²²²²²²²²²²²²²²²²²²²²²³ Check Consiste³ YES ³²²²²²²²²² ²²²²²²²²²²²²²²²²²²²²²²²²²²²²²³ View/Update Pa³ NO ³²²²²²²²²² ²²²²²²²²²²²²²²²²²²²²²²²²²²²²²ÀÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÀÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÙ²²²²²²²²² ²²²²²²²²²²²²²²²²²²²²²²²²²²²²²²²²²²²²²²²²²²²²²²²²²²²²²²²²²²²²²²²²²²²²²²²²²²²²²²²² ²²²²²²²²²²²²²²²²²²²²²²²²²²²²²²²²²²²²²²²²²²²²²²²²²²²²²²²²²²²²²²²²²²²²²²²²²²²²²²²² ²²²²²²²²²²²²²²²²²²²²²²²²²²²²²²²²²²²²²²²²²²²²²²²²²²²²²²²²²²²²²²²²²²²²²²²²²²²²²²²² ²²²²²²²²²²²²²²²²²²²²²²²²²²²²²²²²²²²²²²²²²²²²²²²²²²²²²²²²²²²²²²²²²²²²²²²²²²²²²²²² ²²²²²²²²²²²²²²²²²²²²²²²²²²²²²²²²²²²²²²²²²²²²²²²²²²²²²²²²²²²²²²²²²²²²²²²²²²²²²²²² ²²²²²²²²²²²²²²²²²²²²²²²²²²²²²²ÚÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄ¿²²²²²²²²²²²²²²²²²²²²²²²²²²²²²²²² ²²²²²²²²²²²²²²²²²²²²²²²²²²²²²²³Select YES Or NO³²²²²²²²²²²²²²²²²²²²²²²²²²²²²²²²² ²²²²²²²²²²²²²²²²²²²²²²²²²²²²²²ÀÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÙ²²²²²²²²²²²²²²²²²²²²²²²²²²²²²²²²
Then watch the progress. When done, escape back to Logical Drive then repeat for Logical Drive 2. If you ^C or accidentally escape out, you can come back in running the same commands and watch the progress again (it won't restart).
You can exit megamgr by escaping out or ^C
3ware
We are using 3ware controllers on backup1 & backup2. Running the verify script will give you different output based on the type of controller:
backup2 /d2# sh /root/verify.sh Controller: c0 ------------- Driver: 1.50.01.002 Model: 7500-8 FW: FE7X 1.05.00.068 BIOS: BE7X 1.08.00.048 Monitor: ME7X 1.01.00.040 Serial #: F11605A3180172 PCB: Rev3 PCHIP: 1.30-33 ACHIP: 3.20 # of units: 3 Unit 0: JBOD 186.31 GB ( 390721968 blocks): OK Unit 1: RAID 5 465.77 GB ( 976790016 blocks): DEGRADED Unit 5: RAID 5 698.65 GB ( 1465185024 blocks): DEGRADED # of ports: 8 Port 0: WDC WD2000JB-00KFA0 WD-WCAMT1451690 186.31 GB (390721968 blocks): OK(unit 0) Port 1: WDC WD2500JB-00GVC0 WD-WCAL78219488 232.88 GB (488397168 blocks): OK(unit 1) Port 2: WDC WD2000 0.00 MB (0 blocks): OK(NO UNIT) Port 3: WDC WD2500JB-00GVC0 WD-WMAL73882417 232.88 GB (488397168 blocks): OK(unit 1) Port 4: WDC WD2000 0.00 MB (0 blocks): OK(NO UNIT) Port 5: WDC WD2500JB-00GVA0 WD-WMAL71338097 232.88 GB (488397168 blocks): OK(unit 5) Port 6: WDC WD2500JB-32EVA0 WD-WMAEH1301595 232.88 GB (488397168 blocks): OK(unit 5) Port 7: WDC WD2500JB-00GVC0 WD-WCAL78165566 232.88 GB (488397168 blocks): OK(unit 5) Controller: c1 ------------- Driver: 1.50.01.002 Model: 7500-8 FW: FE7X 1.05.00.068 BIOS: BE7X 1.08.00.048 Monitor: ME7X 1.01.00.040 Serial #: F11605A3180167 PCB: Rev3 PCHIP: 1.30-33 ACHIP: 3.20 # of units: 2 Unit 0: RAID 5 698.65 GB ( 1465185024 blocks): OK Unit 4: RAID 5 698.65 GB ( 1465185024 blocks): OK # of ports: 8 Port 0: WDC WD2500JB-00GVA0 WD-WMAL71301258 232.88 GB (488397168 blocks): OK(unit 0) Port 1: WDC WD2500JB-00GVA0 WD-WMAL71322705 232.88 GB (488397168 blocks): OK(unit 0) Port 2: WDC WD2500JB-00GVA0 WD-WMAL71945050 232.88 GB (488397168 blocks): OK(unit 0) Port 3: WDC WD2500JB-00GVA0 WD-WMAL71316201 232.88 GB (488397168 blocks): OK(unit 0) Port 4: WDC WD2500JB-00GVC0 WD-WCAL78323749 232.88 GB (488397168 blocks): OK(unit 4) Port 5: WDC WD3200AAJB-00J3A0 WD-WCAV2V689068 298.09 GB (625142448 blocks): OK(unit 4) Port 6: WDC WD2500JB-00GVC0 WD-WCAL78234420 232.88 GB (488397168 blocks): OK(unit 4) Port 7: WDC WD2500JB-00GVC0 WD-WCAL78592213 232.88 GB (488397168 blocks): OK(unit 4) backup2 /d2#
On backup2 look for all ok, no verify.
[root@backup3 ~]# sh /root/verify.sh /c2 Driver Version = 1.26.02.002 /c2 Model = 8006-2LP /c2 Available Memory = 512KB /c2 Firmware Version = FE8S 1.05.00.068 /c2 Bios Version = BE7X 1.08.00.048 /c2 Boot Loader Version = ME7X 1.01.00.040 /c2 Serial Number = L018501C6481395 /c2 PCB Version = Rev5 /c2 PCHIP Version = 1.30-66 /c2 ACHIP Version = 3.20 /c2 Total Optimal Units = 1 /c2 Not Optimal Units = 0 Unit UnitType Status %RCmpl %V/I/M Stripe Size(GB) Cache AVrfy ------------------------------------------------------------------------------ u1 RAID-1 OK - - - 931.512 ON - Port Status Unit Size Blocks Serial --------------------------------------------------------------- p0 OK u1 931.51 GB 1953525168 WD-WMAW31148820 p1 OK u1 931.51 GB 1953525168 WD-WCATR0277515 Ctl Date Severity Alarm Message ------------------------------------------------------------------------------ Sending start verify message to /c2/u1 ... Done. when done run: tw_cli /c2 show alarms [root@backup3 ~]#
Automatically starts the verify, just run tw_cli /c2 show alarms as instructed to see the results of the verify.
[root@backup1 /data/deprecated]# sh /root/verify.sh /c0 Driver Version = 2.26.02.010 /c0 Model = 9650SE-8LPML /c0 Available Memory = 224MB /c0 Firmware Version = FE9X 4.06.00.004 /c0 Bios Version = BE9X 4.05.00.015 /c0 Boot Loader Version = BL9X 3.08.00.001 /c0 Serial Number = L326025A8270177 /c0 PCB Version = Rev 032 /c0 PCHIP Version = 2.00 /c0 ACHIP Version = 1.90 /c0 Number of Ports = 8 /c0 Number of Drives = 6 /c0 Number of Units = 1 /c0 Total Optimal Units = 1 /c0 Not Optimal Units = 0 /c0 JBOD Export Policy = off /c0 Disk Spinup Policy = 1 /c0 Spinup Stagger Time Policy (sec) = 1 /c0 Auto-Carving Policy = off /c0 Auto-Carving Size = 2048 GB /c0 Auto-Rebuild Policy = on /c0 Controller Bus Type = PCIe /c0 Controller Bus Width = 1 lane /c0 Controller Bus Speed = 2.5 Gbps/lane Unit UnitType Status %RCmpl %V/I/M Stripe Size(GB) Cache AVrfy ------------------------------------------------------------------------------ u0 RAID-5 OK - - 64K 4656.56 ON ON Port Status Unit Size Blocks Serial --------------------------------------------------------------- p0 OK u0 931.51 GB 1953525168 9QJ1Y017 p1 DEVICE-ERROR u0 931.51 GB 1953525168 9QJ1ZN07 p2 OK u0 931.51 GB 1953525168 9QJ2XK1R p3 OK u0 931.51 GB 1953525168 9QJ2010B p4 OK u0 1.36 TB 2930277168 6XW0L36T p5 OK u0 931.51 GB 1953525168 WD-WMATV2444836 p6 NOT-PRESENT - - - - p7 NOT-PRESENT - - - - Ctl Date Severity Alarm Message ------------------------------------------------------------------------------ c0 [Sat May 12 11:27:15 2012] WARNING Sector repair completed: port=0, LBA=0x6AE571C c0 [Sat May 12 19:16:21 2012] WARNING Sector repair completed: port=1, LBA=0x40E62A23 c0 [Sat May 12 21:40:56 2012] INFO Verify completed: unit=0 c0 [Mon May 14 00:53:53 2012] WARNING Sector repair completed: port=1, LBA=0x7B8CFA7 c0 [Mon May 14 00:58:21 2012] WARNING Sector repair completed: port=1, LBA=0x7B8CFAA c0 [Mon May 14 04:35:13 2012] WARNING Sector repair completed: port=0, LBA=0x8FEF2CF c0 [Mon May 14 04:38:22 2012] WARNING Sector repair completed: port=0, LBA=0x8FEF2D1 c0 [Tue May 15 22:53:46 2012] WARNING Sector repair completed: port=0, LBA=0x13C2622 c0 [Wed May 16 00:39:31 2012] WARNING Sector repair completed: port=0, LBA=0x365A67F c0 [Wed May 16 00:39:37 2012] WARNING Sector repair completed: port=0, LBA=0x365A685 c0 [Wed May 16 00:47:18 2012] WARNING Sector repair completed: port=0, LBA=0x365A687 c0 [Sat May 19 00:01:44 2012] INFO Verify started: unit=0 c0 [Sat May 19 04:46:20 2012] WARNING Sector repair completed: port=0, LBA=0x365A68E c0 [Sat May 19 13:37:06 2012] WARNING Sector repair completed: port=1, LBA=0x7B8CFAC c0 [Sat May 19 13:37:28 2012] WARNING Sector repair completed: port=1, LBA=0x7B8CFAE c0 [Sat May 19 13:37:47 2012] WARNING Sector repair completed: port=1, LBA=0x7B8CFB1 c0 [Sat May 19 13:38:00 2012] WARNING Sector repair completed: port=1, LBA=0x7B8CFB3 c0 [Sat May 19 21:47:45 2012] INFO Verify completed: unit=0 c0 [Wed May 23 12:21:41 2012] INFO Cache synchronization completed: unit=0 c0 [Fri May 25 00:08:19 2012] WARNING Sector repair completed: port=0, LBA=0x12DA76C c0 [Fri May 25 00:08:34 2012] WARNING Sector repair completed: port=0, LBA=0x12E4901 c0 [Fri May 25 00:09:33 2012] WARNING Sector repair completed: port=0, LBA=0x12DA773 c0 [Fri May 25 00:39:12 2012] WARNING Sector repair completed: port=0, LBA=0x42C597B c0 [Sat May 26 00:01:45 2012] INFO Verify started: unit=0 c0 [Sat May 26 00:42:05 2012] WARNING Sector repair completed: port=1, LBA=0x323C1AC c0 [Sat May 26 00:51:43 2012] WARNING Sector repair completed: port=1, LBA=0x323C1AE c0 [Sat May 26 01:54:21 2012] WARNING Sector repair completed: port=1, LBA=0x2F0D302 c0 [Sat May 26 02:06:38 2012] WARNING Sector repair completed: port=0, LBA=0x12DA777 c0 [Sat May 26 02:07:21 2012] WARNING Sector repair completed: port=0, LBA=0x12E48FE c0 [Sat May 26 04:20:00 2012] WARNING Sector repair completed: port=1, LBA=0x2F0D306 c0 [Sat May 26 04:32:58 2012] WARNING Sector repair completed: port=1, LBA=0x323C1B1 c0 [Sat May 26 04:33:21 2012] WARNING Sector repair completed: port=1, LBA=0x323C1B3 c0 [Sat May 26 04:33:44 2012] WARNING Sector repair completed: port=1, LBA=0x323C1BA c0 [Sat May 26 05:24:07 2012] WARNING Sector repair completed: port=1, LBA=0x3F83862 c0 [Sat May 26 05:25:09 2012] WARNING Verify fixed data/parity mismatch: unit=0 c0 [Sat May 26 06:08:13 2012] WARNING Sector repair completed: port=0, LBA=0x4CDC6A2 c0 [Sat May 26 09:49:35 2012] WARNING Sector repair completed: port=1, LBA=0x6CACD4A c0 [Sat May 26 18:10:44 2012] WARNING Sector repair completed: port=1, LBA=0x18F425EA c0 [Sat May 26 19:45:40 2012] WARNING Verify fixed data/parity mismatch: unit=0 c0 [Sat May 26 20:22:52 2012] WARNING Verify fixed data/parity mismatch: unit=0 c0 [Sat May 26 20:23:15 2012] WARNING Verify fixed data/parity mismatch: unit=0 c0 [Sat May 26 20:23:22 2012] WARNING Verify fixed data/parity mismatch: unit=0 c0 [Sat May 26 20:23:35 2012] WARNING Verify fixed data/parity mismatch: unit=0 c0 [Sat May 26 20:23:41 2012] WARNING Verify fixed data/parity mismatch: unit=0 c0 [Sat May 26 20:23:49 2012] WARNING Verify fixed data/parity mismatch: unit=0 c0 [Sat May 26 20:23:57 2012] WARNING Verify fixed data/parity mismatch: unit=0 c0 [Sat May 26 20:24:02 2012] WARNING Verify fixed data/parity mismatch: unit=0 c0 [Sat May 26 20:54:41 2012] WARNING Verify fixed data/parity mismatch: unit=0 c0 [Sat May 26 22:00:30 2012] INFO Verify completed: unit=0 c0 [Sat Jun 2 00:01:43 2012] INFO Verify started: unit=0 c0 [Sat Jun 2 00:30:17 2012] WARNING Sector repair completed: port=0, LBA=0x2B911E4 c0 [Sat Jun 2 00:50:57 2012] WARNING Sector repair completed: port=0, LBA=0x5A807CA6 c0 [Sat Jun 2 04:13:13 2012] WARNING Sector repair completed: port=0, LBA=0x2D18291 c0 [Sat Jun 2 04:13:35 2012] WARNING Sector repair completed: port=0, LBA=0x2D1829F c0 [Sat Jun 2 21:48:02 2012] INFO Verify completed: unit=0 c0 [Mon Jun 4 04:40:34 2012] WARNING Sector repair completed: port=1, LBA=0x4AF8098F c0 [Tue Jun 5 00:28:19 2012] WARNING Sector repair completed: port=1, LBA=0x263C5CD c0 [Tue Jun 5 00:33:06 2012] WARNING Sector repair completed: port=1, LBA=0x263C5CF c0 [Thu Jun 7 00:34:27 2012] WARNING Sector repair completed: port=0, LBA=0x2A07B5F c0 [Thu Jun 7 00:38:50 2012] WARNING Sector repair completed: port=0, LBA=0x2A07B61 c0 [Fri Jun 8 00:07:13 2012] WARNING Sector repair completed: port=0, LBA=0xC131F6B c0 [Sat Jun 9 00:01:41 2012] INFO Verify started: unit=0 c0 [Sat Jun 9 00:29:11 2012] WARNING Sector repair completed: port=0, LBA=0x6C7614D c0 [Sat Jun 9 00:38:25 2012] WARNING Sector repair completed: port=0, LBA=0x6C76152 c0 [Sat Jun 9 04:02:30 2012] WARNING Sector repair completed: port=1, LBA=0x263C5D1 c0 [Sat Jun 9 04:02:52 2012] WARNING Sector repair completed: port=1, LBA=0x263C5D3 c0 [Sat Jun 9 04:07:32 2012] WARNING Sector repair completed: port=0, LBA=0x27D3E12 c0 [Sat Jun 9 04:07:57 2012] WARNING Sector repair completed: port=0, LBA=0x27D3E15 c0 [Sat Jun 9 04:08:16 2012] WARNING Sector repair completed: port=0, LBA=0x27D3E17 c0 [Sat Jun 9 04:08:45 2012] WARNING Sector repair completed: port=0, LBA=0x27D3E19 c0 [Sat Jun 9 04:15:04 2012] WARNING Sector repair completed: port=0, LBA=0x2A07B64 c0 [Sat Jun 9 04:15:26 2012] WARNING Sector repair completed: port=0, LBA=0x2A07B66 c0 [Sat Jun 9 04:15:45 2012] WARNING Sector repair completed: port=0, LBA=0x2A07B68 c0 [Sat Jun 9 04:15:59 2012] WARNING Sector repair completed: port=0, LBA=0x2A07B6C c0 [Sat Jun 9 04:16:13 2012] WARNING Sector repair completed: port=0, LBA=0x2A07B6E c0 [Sat Jun 9 21:48:52 2012] INFO Verify completed: unit=0 c0 [Thu Jun 14 00:40:10 2012] WARNING Sector repair completed: port=0, LBA=0x334F14B c0 [Sat Jun 16 00:01:38 2012] INFO Verify started: unit=0 c0 [Sat Jun 16 21:16:19 2012] INFO Verify completed: unit=0 c0 [Tue Jun 19 02:03:43 2012] WARNING Sector repair completed: port=1, LBA=0xFE41EAD c0 [Wed Jun 20 02:30:02 2012] WARNING Sector repair completed: port=1, LBA=0xD99145C c0 [Sat Jun 23 00:01:36 2012] INFO Verify started: unit=0 c0 [Sat Jun 23 04:27:04 2012] WARNING Sector repair completed: port=1, LBA=0x2FAD311 c0 [Sat Jun 23 06:52:38 2012] WARNING Sector repair completed: port=1, LBA=0x7C6AC8D c0 [Sat Jun 23 06:53:03 2012] WARNING Sector repair completed: port=1, LBA=0x7C6AC91 c0 [Sat Jun 23 06:53:21 2012] WARNING Sector repair completed: port=1, LBA=0x7C6AC94 c0 [Sat Jun 23 17:00:22 2012] WARNING Sector repair completed: port=1, LBA=0xF9AC7C9 c0 [Sat Jun 23 21:15:19 2012] INFO Verify completed: unit=0 c0 [Sat Jun 30 00:01:34 2012] INFO Verify started: unit=0 c0 [Sat Jun 30 05:24:13 2012] WARNING Sector repair completed: port=0, LBA=0x3FAA9E7 c0 [Sat Jun 30 14:49:39 2012] WARNING Sector repair completed: port=1, LBA=0x869931C c0 [Sat Jun 30 21:31:05 2012] INFO Verify completed: unit=0 c0 [Tue Jul 3 03:40:25 2012] WARNING Sector repair completed: port=1, LBA=0xD36C7F7 c0 [Fri Jul 6 02:50:18 2012] WARNING Sector repair completed: port=1, LBA=0x3562470 c0 [Fri Jul 6 22:18:26 2012] WARNING Sector repair completed: port=1, LBA=0x3563173 c0 [Sat Jul 7 00:01:31 2012] INFO Verify started: unit=0 c0 [Sat Jul 7 00:50:16 2012] WARNING Sector repair completed: port=0, LBA=0x76EE88 c0 [Sat Jul 7 00:50:39 2012] WARNING Sector repair completed: port=0, LBA=0x76EE8F c0 [Sat Jul 7 21:39:36 2012] INFO Verify completed: unit=0 c0 [Sun Jul 8 02:51:05 2012] WARNING Sector repair completed: port=0, LBA=0x67759D c0 [Sun Jul 8 02:53:55 2012] WARNING Sector repair completed: port=0, LBA=0x67759B c0 [Tue Jul 10 16:17:21 2012] WARNING Sector repair completed: port=0, LBA=0x15C8C695 c0 [Wed Jul 11 22:51:22 2012] WARNING Sector repair completed: port=1, LBA=0x355BBD0 c0 [Sat Jul 14 00:01:28 2012] INFO Verify started: unit=0 c0 [Sat Jul 14 01:33:40 2012] WARNING Sector repair completed: port=1, LBA=0x1333BCF4 c0 [Sat Jul 14 03:36:23 2012] WARNING Sector repair completed: port=1, LBA=0x2174773 c0 [Sat Jul 14 11:26:44 2012] WARNING Sector repair completed: port=0, LBA=0x7429AB7 c0 [Sat Jul 14 16:53:50 2012] WARNING Sector repair completed: port=1, LBA=0xA17EB3F c0 [Sat Jul 14 21:19:25 2012] INFO Verify completed: unit=0 c0 [Wed Jul 18 05:08:47 2012] WARNING Sector repair completed: port=1, LBA=0x17D62EDC c0 [Wed Jul 18 05:14:15 2012] WARNING Sector repair completed: port=1, LBA=0x17D62EE1 c0 [Thu Jul 19 03:24:59 2012] WARNING Sector repair completed: port=0, LBA=0x7733C3D c0 [Thu Jul 19 03:25:20 2012] WARNING Sector repair completed: port=0, LBA=0x773CEA5 c0 [Thu Jul 19 03:28:16 2012] WARNING Sector repair completed: port=0, LBA=0x7733C42 c0 [Thu Jul 19 03:28:41 2012] WARNING Sector repair completed: port=0, LBA=0x773CEAF c0 [Sat Jul 21 00:01:26 2012] INFO Verify started: unit=0 c0 [Sat Jul 21 03:07:30 2012] WARNING Sector repair completed: port=1, LBA=0x1CC6936 c0 [Sat Jul 21 03:07:52 2012] WARNING Sector repair completed: port=1, LBA=0x1CC6938 c0 [Sat Jul 21 03:08:11 2012] WARNING Sector repair completed: port=1, LBA=0x1CC693A c0 [Sat Jul 21 16:43:56 2012] WARNING Sector repair completed: port=0, LBA=0xD04C914 c0 [Sat Jul 21 16:45:31 2012] WARNING Sector repair completed: port=1, LBA=0xD456973 c0 [Sat Jul 21 21:14:29 2012] INFO Verify completed: unit=0 c0 [Wed Jul 25 03:37:25 2012] WARNING Sector repair completed: port=0, LBA=0x1F8E6C43 c0 [Sat Jul 28 00:01:24 2012] INFO Verify started: unit=0 c0 [Sat Jul 28 01:45:27 2012] WARNING Sector repair completed: port=0, LBA=0x11584AD c0 [Sat Jul 28 18:54:25 2012] WARNING Sector repair completed: port=1, LBA=0x447C3E6C c0 [Sat Jul 28 21:13:46 2012] INFO Verify completed: unit=0 c0 [Wed Aug 1 03:20:11 2012] WARNING Sector repair completed: port=0, LBA=0x805FEF c0 [Fri Aug 3 00:50:03 2012] WARNING Sector repair completed: port=0, LBA=0xCED0ACA c0 [Sat Aug 4 00:01:22 2012] INFO Verify started: unit=0 c0 [Sat Aug 4 00:52:51 2012] WARNING Sector repair completed: port=0, LBA=0x805FF3 c0 [Sat Aug 4 00:53:14 2012] WARNING Sector repair completed: port=0, LBA=0x805FF5 c0 [Sat Aug 4 00:53:33 2012] WARNING Sector repair completed: port=0, LBA=0x805FF7 c0 [Sat Aug 4 00:53:47 2012] WARNING Sector repair completed: port=0, LBA=0x805FF9 c0 [Sat Aug 4 00:54:00 2012] WARNING Sector repair completed: port=0, LBA=0x805FFB c0 [Sat Aug 4 00:54:14 2012] WARNING Sector repair completed: port=0, LBA=0x805FFD c0 [Sat Aug 4 00:54:27 2012] WARNING Sector repair completed: port=0, LBA=0x805FFF c0 [Sat Aug 4 04:43:12 2012] WARNING Sector repair completed: port=1, LBA=0x16974289 c0 [Sat Aug 4 04:58:17 2012] WARNING Sector repair completed: port=1, LBA=0x1697428E c0 [Sat Aug 4 20:54:53 2012] INFO Verify completed: unit=0 c0 [Wed Aug 8 03:21:55 2012] ERROR Drive timeout detected: port=1 c0 [Wed Aug 8 15:31:44 2012] WARNING Sector repair completed: port=0, LBA=0x1A366CD3 c0 [Sat Aug 11 00:01:21 2012] INFO Verify started: unit=0 c0 [Sat Aug 11 20:40:51 2012] INFO Verify completed: unit=0 c0 [Thu Aug 16 05:10:55 2012] WARNING Sector repair completed: port=0, LBA=0x1C22593 c0 [Sat Aug 18 00:01:18 2012] INFO Verify started: unit=0 c0 [Sat Aug 18 03:00:20 2012] WARNING Sector repair completed: port=0, LBA=0x1C225A5 c0 [Sat Aug 18 03:43:00 2012] WARNING Sector repair completed: port=1, LBA=0x23EE91E c0 [Sat Aug 18 03:43:23 2012] WARNING Sector repair completed: port=1, LBA=0x23EE920 c0 [Sat Aug 18 17:00:06 2012] WARNING Sector repair completed: port=1, LBA=0x137D066A c0 [Sat Aug 18 17:00:29 2012] WARNING Sector repair completed: port=1, LBA=0x137D066D c0 [Sat Aug 18 21:13:01 2012] INFO Verify completed: unit=0 c0 [Wed Aug 22 01:36:08 2012] WARNING Sector repair completed: port=0, LBA=0x2560A0F c0 [Wed Aug 22 01:37:42 2012] WARNING Sector repair completed: port=0, LBA=0x2560A13 c0 [Fri Aug 24 04:01:36 2012] WARNING Sector repair completed: port=1, LBA=0x55C1A5DF c0 [Fri Aug 24 05:02:06 2012] WARNING Sector repair completed: port=1, LBA=0xCE3378A c0 [Sat Aug 25 00:01:17 2012] INFO Verify started: unit=0 c0 [Sat Aug 25 00:31:06 2012] WARNING Sector repair completed: port=1, LBA=0x50F65D c0 [Sat Aug 25 00:39:52 2012] WARNING Sector repair completed: port=0, LBA=0x678FF4 c0 [Sat Aug 25 03:43:15 2012] WARNING Sector repair completed: port=0, LBA=0x2560A15 c0 [Sat Aug 25 03:43:39 2012] WARNING Sector repair completed: port=0, LBA=0x2560A19 c0 [Sat Aug 25 03:43:58 2012] WARNING Sector repair completed: port=0, LBA=0x2560A1B c0 [Sat Aug 25 03:44:30 2012] WARNING Sector repair completed: port=0, LBA=0x2560A21 c0 [Sat Aug 25 20:58:14 2012] INFO Verify completed: unit=0 c0 [Wed Aug 29 04:57:15 2012] WARNING Sector repair completed: port=1, LBA=0xF3957EB c0 [Sat Sep 1 00:01:15 2012] INFO Verify started: unit=0 c0 [Sat Sep 1 03:21:52 2012] WARNING Sector repair completed: port=0, LBA=0x1DAFC86 c0 [Sat Sep 1 03:22:15 2012] WARNING Sector repair completed: port=0, LBA=0x1DAFC88 c0 [Sat Sep 1 03:22:34 2012] WARNING Sector repair completed: port=0, LBA=0x1DAFC8A c0 [Sat Sep 1 03:22:47 2012] WARNING Sector repair completed: port=0, LBA=0x1DAFC8C c0 [Sat Sep 1 17:17:22 2012] WARNING Sector repair completed: port=0, LBA=0xF917FD1 c0 [Sat Sep 1 17:17:45 2012] WARNING Sector repair completed: port=0, LBA=0xF917FD3 c0 [Sat Sep 1 17:18:04 2012] WARNING Sector repair completed: port=0, LBA=0xF917FD5 c0 [Sat Sep 1 21:36:56 2012] INFO Verify completed: unit=0 c0 [Thu Sep 6 00:07:30 2012] WARNING Sector repair completed: port=0, LBA=0xDA3C64B c0 [Thu Sep 6 00:32:56 2012] WARNING Sector repair completed: port=1, LBA=0x6BBA816 c0 [Sat Sep 8 00:01:13 2012] INFO Verify started: unit=0 c0 [Sat Sep 8 00:09:56 2012] WARNING Sector repair completed: port=0, LBA=0xDEBC958 c0 [Sat Sep 8 04:38:45 2012] WARNING Sector repair completed: port=0, LBA=0x38D254F c0 [Sat Sep 8 20:44:50 2012] INFO Verify completed: unit=0 c0 [Mon Sep 10 01:26:34 2012] WARNING Sector repair completed: port=1, LBA=0xFFD8D5E c0 [Wed Sep 12 00:33:48 2012] WARNING Sector repair completed: port=1, LBA=0xE8DB928 c0 [Wed Sep 12 00:36:33 2012] WARNING Sector repair completed: port=1, LBA=0x6D49411 c0 [Fri Sep 14 01:59:39 2012] WARNING Sector repair completed: port=0, LBA=0x1467F1C c0 [Fri Sep 14 02:08:27 2012] WARNING Sector repair completed: port=1, LBA=0x14C8ABD c0 [Fri Sep 14 03:54:47 2012] WARNING Sector repair completed: port=0, LBA=0x1580C915 c0 [Sat Sep 15 00:01:11 2012] INFO Verify started: unit=0 c0 [Sat Sep 15 02:38:14 2012] WARNING Sector repair completed: port=0, LBA=0x1C178973 c0 [Sat Sep 15 02:59:02 2012] WARNING Sector repair completed: port=0, LBA=0x1C178975 c0 [Sat Sep 15 04:47:08 2012] WARNING Sector repair completed: port=0, LBA=0x3FA0356 c0 [Sat Sep 15 04:47:31 2012] WARNING Sector repair completed: port=0, LBA=0x3FA0359 c0 [Sat Sep 15 10:41:59 2012] WARNING Sector repair completed: port=0, LBA=0x6DFD1EC c0 [Sat Sep 15 13:25:23 2012] WARNING Sector repair completed: port=0, LBA=0x7CBD100 c0 [Sat Sep 15 13:25:31 2012] WARNING Sector repair completed: port=0, LBA=0x7CBD104 c0 [Sat Sep 15 13:25:54 2012] WARNING Sector repair completed: port=0, LBA=0x7CBD106 c0 [Sat Sep 15 17:10:50 2012] WARNING Sector repair completed: port=0, LBA=0x1C178977 c0 [Sat Sep 15 20:59:57 2012] INFO Verify completed: unit=0 c0 [Tue Sep 18 01:17:18 2012] WARNING Sector repair completed: port=1, LBA=0x803B05B c0 [Sat Sep 22 00:01:10 2012] INFO Verify started: unit=0 c0 [Sat Sep 22 20:54:31 2012] INFO Verify completed: unit=0 c0 [Tue Sep 25 01:56:47 2012] WARNING Sector repair completed: port=0, LBA=0x26E3909 c0 [Sat Sep 29 00:01:08 2012] INFO Verify started: unit=0 c0 [Sat Sep 29 02:04:14 2012] WARNING Sector repair completed: port=0, LBA=0x146AC03 c0 [Sat Sep 29 10:58:39 2012] WARNING Sector repair completed: port=0, LBA=0x6D4EB0E c0 [Sat Sep 29 10:59:02 2012] WARNING Sector repair completed: port=0, LBA=0x6D4EB14 c0 [Sat Sep 29 11:22:44 2012] WARNING Sector repair completed: port=0, LBA=0x6F79623 c0 [Sat Sep 29 13:50:48 2012] WARNING Sector repair completed: port=1, LBA=0x7D1D65E c0 [Sat Sep 29 13:51:11 2012] WARNING Sector repair completed: port=1, LBA=0x7D1D661 c0 [Sat Sep 29 13:51:30 2012] WARNING Sector repair completed: port=1, LBA=0x7D1D663 c0 [Sat Sep 29 20:57:34 2012] INFO Verify completed: unit=0 c0 [Mon Oct 1 04:47:24 2012] WARNING Sector repair completed: port=0, LBA=0xC5BC6F2 c0 [Tue Oct 2 02:00:27 2012] WARNING Sector repair completed: port=0, LBA=0x1547667 c0 [Tue Oct 2 02:01:56 2012] WARNING Sector repair completed: port=0, LBA=0x154766F c0 [Tue Oct 2 05:02:31 2012] WARNING Sector repair completed: port=1, LBA=0xD67D054 c0 [Tue Oct 2 05:04:14 2012] WARNING Sector repair completed: port=1, LBA=0xD67D056 c0 [Wed Oct 3 01:22:12 2012] WARNING Sector repair completed: port=1, LBA=0x12AAF8CA c0 [Thu Oct 4 04:29:22 2012] WARNING Sector repair completed: port=0, LBA=0x13E6F992 c0 [Thu Oct 4 05:10:51 2012] WARNING Sector repair completed: port=0, LBA=0x1C252A4 c0 [Sat Oct 6 00:01:07 2012] INFO Verify started: unit=0 c0 [Sat Oct 6 19:41:18 2012] WARNING Sector repair completed: port=1, LBA=0x5A5C3AE8 c0 [Sat Oct 6 21:01:05 2012] INFO Verify completed: unit=0 c0 [Mon Oct 8 00:32:06 2012] WARNING Sector repair completed: port=0, LBA=0x6C60D3E c0 [Tue Oct 9 03:51:03 2012] WARNING Sector repair completed: port=1, LBA=0x89B5EC9 c0 [Thu Oct 11 04:21:17 2012] WARNING Sector repair completed: port=1, LBA=0x13F85833 c0 [Sat Oct 13 00:01:05 2012] INFO Verify started: unit=0 c0 [Sat Oct 13 05:12:40 2012] WARNING Sector repair completed: port=0, LBA=0x3FA5134 c0 [Sat Oct 13 21:08:35 2012] INFO Verify completed: unit=0 c0 [Tue Oct 16 03:53:50 2012] WARNING Sector repair completed: port=1, LBA=0x148AA1BD c0 [Thu Oct 18 03:20:30 2012] WARNING Sector repair completed: port=0, LBA=0x1C8DABCB c0 [Thu Oct 18 04:52:50 2012] WARNING Sector repair completed: port=0, LBA=0xE879057 c0 [Sat Oct 20 00:01:04 2012] INFO Verify started: unit=0 c0 [Sat Oct 20 02:19:25 2012] WARNING Sector repair completed: port=1, LBA=0x174B012 c0 [Sat Oct 20 03:41:38 2012] WARNING Sector repair completed: port=0, LBA=0x256D93B c0 [Sat Oct 20 03:42:01 2012] WARNING Sector repair completed: port=0, LBA=0x256D93D c0 [Sat Oct 20 03:42:40 2012] WARNING Sector repair completed: port=0, LBA=0x256D940 c0 [Sat Oct 20 03:42:59 2012] WARNING Sector repair completed: port=0, LBA=0x256D942 c0 [Sat Oct 20 03:43:12 2012] WARNING Sector repair completed: port=0, LBA=0x256D944 c0 [Sat Oct 20 03:43:26 2012] WARNING Sector repair completed: port=0, LBA=0x256D948 c0 [Sat Oct 20 16:37:52 2012] WARNING Sector repair completed: port=0, LBA=0xE879060 c0 [Sat Oct 20 16:38:15 2012] WARNING Sector repair completed: port=0, LBA=0xE879062 c0 [Sat Oct 20 21:00:18 2012] INFO Verify completed: unit=0 c0 [Sat Oct 20 23:49:01 2012] WARNING Sector repair completed: port=1, LBA=0x4473E908 c0 [Sun Oct 21 03:42:26 2012] WARNING Sector repair completed: port=1, LBA=0x175BADD5 c0 [Tue Oct 23 01:09:04 2012] WARNING Sector repair completed: port=1, LBA=0x6E524860 c0 [Fri Oct 26 03:21:25 2012] WARNING Sector repair completed: port=0, LBA=0x802C61 c0 [Fri Oct 26 04:22:21 2012] WARNING Sector repair completed: port=0, LBA=0x176353CD c0 [Sat Oct 27 00:01:03 2012] INFO Verify started: unit=0 c0 [Sat Oct 27 00:49:35 2012] WARNING Sector repair completed: port=0, LBA=0x802C65 c0 [Sat Oct 27 17:02:24 2012] WARNING Sector repair completed: port=1, LBA=0xC1FF26D c0 [Sat Oct 27 17:09:06 2012] WARNING Sector repair completed: port=0, LBA=0xDF621AD c0 [Sat Oct 27 21:30:57 2012] INFO Verify completed: unit=0 c0 [Tue Oct 30 00:20:46 2012] WARNING Sector repair completed: port=0, LBA=0xE9FE2AB c0 [Wed Oct 31 02:02:03 2012] WARNING Sector repair completed: port=0, LBA=0x1460C25 c0 [Wed Oct 31 02:04:05 2012] WARNING Sector repair completed: port=0, LBA=0x1460C28 c0 [Thu Nov 1 00:48:34 2012] WARNING Sector repair completed: port=1, LBA=0xA7C92BE c0 [Thu Nov 1 05:04:45 2012] WARNING Sector repair completed: port=0, LBA=0x1C252C2 [root@backup1 /data/deprecated]#
Look for failed drives and errors. Obviously from the above we need to probably replace drives 0 and 1 and drive 1 is even showing as having problems, yet the RAID array is healthy, amazingly. You also see the automatic verifies.
Note: when rebuilding a degraded mirror, you will see no progress as it rebuilds in the cli
Unit UnitType Status %RCmpl %V/I/M Stripe Size(GB) Cache AVrfy ------------------------------------------------------------------------------ u0 RAID-5 REBUILDING 0 - 64K 4656.56 OFF ON
areca
We are using an areca controller on backup3.
[root@newbackup3 ~]# sh /root/verify.sh # Name Raid Name Level Capacity Ch/Id/Lun State =============================================================================== 1 ARC-1160-VOL#00 Raid Set # 00 Raid5 5000.0GB 00/00/00 Checking(19.7%) =============================================================================== GuiErrMsg<0x00>: Success. # Name Disks TotalCap FreeCap DiskChannels State =============================================================================== 1 Raid Set # 00 6 6000.0GB 0.0GB 123456 Checking =============================================================================== GuiErrMsg<0x00>: Success. Date-Time Device Event Type Elapsed Time Errors =============================================================================== 2012-12-05 20:40:58 ARC-1160-VOL#00 Start Checking 2012-12-01 05:06:04 ARC-1160-VOL#00 Complete Init 027:30:45 2012-11-30 01:35:19 ARC-1160-VOL#00 Start Initialize 2026-08-06 01:34:52 H/W MONITOR Raid Powered On 2012-11-30 01:33:36 ARC-1160-VOL#00 Stop Initialization 000:31:48 2012-11-30 01:01:47 ARC-1160-VOL#00 Start Initialize 2026-08-06 00:58:13 H/W MONITOR Raid Powered On 2012-11-30 00:57:26 ARC-1160-VOL#00 Stop Initialization 000:57:07 2012-11-30 00:00:19 ARC-1160-VOL#00 Start Initialize 2026-08-05 23:56:48 H/W MONITOR Raid Powered On 2026-08-05 23:52:58 H/W MONITOR Raid Powered On 2026-08-05 23:50:14 H/W MONITOR Raid Powered On 2026-08-05 23:43:30 H/W MONITOR Raid Powered On 2012-11-29 23:10:07 ARC-1160-VOL#00 Stop Initialization 000:00:56 2012-11-29 23:09:11 ARC-1160-VOL#00 Start Initialize 2026-08-05 23:08:57 H/W MONITOR Raid Powered On 2012-11-29 23:08:10 ARC-1160-VOL#00 Stop Initialization 000:20:41 2012-11-29 22:47:29 ARC-1160-VOL#00 Start Initialize 2026-08-05 22:46:59 H/W MONITOR Raid Powered On 2026-08-05 22:45:55 H/W MONITOR Raid Powered On 2026-08-05 22:44:53 H/W MONITOR Raid Powered On 2026-08-05 22:42:06 H/W MONITOR Raid Powered On 2026-08-05 22:40:50 H/W MONITOR Raid Powered On 2012-11-29 22:40:04 ARC-1160-VOL#00 Stop Initialization 000:24:25 2012-11-29 22:15:38 ARC-1160-VOL#00 Start Initialize 2026-08-05 22:15:11 000:000001215B00 Restart Init LBA Point 2026-08-05 22:15:10 H/W MONITOR Raid Powered On 2012-11-29 21:56:38 ARC-1160-VOL#00 Start Initialize 2026-08-05 21:56:12 H/W MONITOR Raid Powered On 2026-08-05 21:56:04 IDE Channel #03 Device Inserted 2012-11-29 21:55:13 IDE Channel #04 Device Inserted 2012-11-29 21:55:03 IDE Channel #02 Device Inserted 2026-08-05 21:53:09 H/W MONITOR Raid Powered On 2026-08-05 20:51:46 H/W MONITOR Raid Powered On 2026-08-05 20:49:56 H/W MONITOR Raid Powered On 2026-08-05 20:48:29 H/W MONITOR Raid Powered On 2026-08-05 20:46:29 H/W MONITOR Raid Powered On 2026-08-05 20:44:49 H/W MONITOR Raid Powered On 2026-08-05 20:43:01 H/W MONITOR Raid Powered On 2026-08-05 20:36:25 H/W MONITOR Raid Powered On 2026-08-05 20:31:18 H/W MONITOR Raid Powered On 2026-08-05 20:30:08 H/W MONITOR Raid Powered On 2026-08-05 20:08:40 H/W MONITOR Raid Powered On 2026-08-05 20:06:11 H/W MONITOR Raid Powered On 2026-08-05 20:05:14 H/W MONITOR Raid Powered On 2026-08-05 20:03:58 H/W MONITOR Raid Powered On 2026-08-05 20:00:56 H/W MONITOR Raid Powered On 2026-08-05 19:57:57 H/W MONITOR Raid Powered On 2026-08-05 19:56:15 H/W MONITOR Raid Powered On 2026-08-05 19:55:05 H/W MONITOR Raid Powered On 2026-08-05 17:24:36 H/W MONITOR Raid Powered On 2026-08-05 17:22:43 H/W MONITOR Raid Powered On 2026-08-05 04:50:42 H/W MONITOR Raid Powered On 2026-08-05 04:47:33 H/W MONITOR Raid Powered On 2026-08-05 04:43:57 H/W MONITOR Raid Powered On 2026-08-05 04:18:52 H/W MONITOR Raid Powered On 2026-08-05 04:17:30 H/W MONITOR Raid Powered On 2026-08-05 04:13:30 H/W MONITOR Raid Powered On 2026-08-05 04:10:26 H/W MONITOR Raid Powered On 2026-08-05 04:09:23 H/W MONITOR Raid Powered On 2026-08-05 00:08:09 H/W MONITOR Raid Powered On 2026-08-05 00:07:12 H/W MONITOR Raid Powered On 2026-08-05 00:05:51 H/W MONITOR Raid Powered On 2026-08-05 00:04:27 H/W MONITOR Raid Powered On =============================================================================== GuiErrMsg<0x00>: Success. press enter when ready to run verify
Look for failed drives and errors.
When it proceed's to verifying, you can confirm with:
[root@newbackup3 ~]# cli64 vsf info # Name Raid Name Level Capacity Ch/Id/Lun State =============================================================================== 1 ARC-1160-VOL#00 Raid Set # 00 Raid5 5000.0GB 00/00/00 Checking(22.5%) =============================================================================== GuiErrMsg<0x00>: Success. [root@newbackup3 ~]#
Infrequent tasks
Free up space on gateway
newgateway /var/spool# cd clientmqueue/ newgateway /var/spool/clientmqueue# sh # for f in `ls`; do rm $f; done exit
Free up space on mail
You can clear out root mail:
mail /var/log# ll -h /var/mail/root -rw------- 1 root mail 543K Dec 19 13:05 /var/mail/root mail /var/log# rm /var/mail/root
Or you can archive mail logs:
mail /var/log# ls -l htt* -rw-r--r-- 1 root wheel 297436931 Dec 19 13:26 httpd-access.log -rw-r--r-- 1 root wheel 9824324 Jul 4 11:34 httpd-access.log.old.0.gz -rw-r--r-- 1 root wheel 6884137 Mar 17 2012 httpd-access.log.old.1.gz -rw-r--r-- 1 root wheel 18557444 Dec 3 2009 httpd-access.log.old.10.gz -rw-r--r-- 1 root wheel 14740263 Jan 9 2007 httpd-access.log.old.11.gz -rw-r--r-- 1 root wheel 14209465 Nov 28 2007 httpd-access.log.old.12.gz -rw-r--r-- 1 root wheel 16874396 Feb 19 2012 httpd-access.log.old.3.gz -rw-r--r-- 1 root wheel 14554859 Jul 22 2011 httpd-access.log.old.4.gz -rw-r--r-- 1 root wheel 10513227 Feb 18 2011 httpd-access.log.old.5.gz -rw-r--r-- 1 root wheel 7201946 Oct 29 2010 httpd-access.log.old.6.gz -rw-r--r-- 1 root wheel 10062537 May 6 2010 httpd-access.log.old.7.gz -rw-r--r-- 1 root wheel 10157042 Aug 12 2010 httpd-access.log.old.8.gz -rw-r--r-- 1 root wheel 11909534 Mar 4 2010 httpd-access.log.old.9.gz -rw-r--r-- 1 root wheel 59030930 Dec 19 13:01 httpd-error.log -rw-r--r-- 1 root wheel 3413134 Mar 4 2010 httpd-error.log.0.gz -rw-r--r-- 1 root wheel 795515 May 1 2007 httpd-error.log.1.gz -rw-r--r-- 1 root wheel 1142153 Nov 30 2007 httpd-error.log.2.gz -rw-r--r-- 1 root wheel 2325801 Feb 18 2011 httpd-error.log.gz mail /var/log# sh # for f in 12 11 10 9 8 7 6 5 4 3 2 1 0; do g=`echo $f+1|bc`; mv httpd-access.log.old.$f.gz httpd-access.log.old.$g.gz; done # mv httpd-access.log httpd-access.log.old.0 # touch httpd-access.log # apachectl restart # gzip httpd-access.log.old.0 # for f in 2 1 0; do g=`echo $f+1|bc`; mv httpd-error.log.$f.gz httpd-error.log.$g.gz; done # mv httpd-error.log httpd-error.log.0 # touch httpd-error.log # apachectl restart # gzip httpd-error.log.0 # exit
Free up space on bwdb2
You can either remove items from /usr/home/archive or you can scp them to backup2 @ castle.
Free up space on backup1
backup1 is our primary customer backup system. As usage grows over time, it needs to be regularly purged of old files. The easiest way to do this is by removing deprecated files. These mostly consist of cancelled customers or temporary dump/storage files (created during dump/restores). Our standard policy is to hang onto cancelled customers for 6mos after which we remove their files (as far as customers know their data is purged immediately, but we hang onto it just in case.. and in some cases we cancel a server due to non payment so this makes it easy to restore their system). To find files to remove:
[root@backup1 ~]# cd /data/deprecated/ [root@backup1 /data/deprecated]# ls 2101-migrated-20120317.tgz old-683-cxld-20121021.tgz 69.55.230.2-wwwbackup old-744-cxld-20120708.tgz 991-DONTDELETE.tgz old-809-cxld-20120609.tgz archive-col02050-mdfile-cxld-20120409.gz old-854-cxld-20120621.tgz col01371.tgz old-931-cxld-20060513.tgz deleteme_ubuntu-10.10-x86_20111205 old-col00123-mdfile-noarchive-20120417.gz jail10_old old-col00147-vnfile-cxld-20120828.gz jail14_rsync_old old-col00419-dump-cxld-20120224.gz jail15_old old-col01098-vnfile-cxld-20120827.gz jail3_old old-col01278-dump-cxld-20120822 jail4_old old-col01517-dump-cxld-20120828 jail5_old old-col01669-dump-cxld-20120203.gz old-1009-cxld-20120608.tgz old-col01687-dump-cxld-20120909 old-1012-cxld-20120411.tgz old-col01790-dump-cxld-20120828 old-1052-cxld-20120721.tgz old-col01812-dump-cxld-20120820 old-10631-cxld-20120622.tgz old-col01938-mdfile-cxld-20120619.gz old-10632-cxld-20120622.tgz old-col02095-mdfile-noarchive-20120523.gz old-10633-cxld-20120622.tgz olddebian-3.0-v15-20110610.tgz old-1236-cxld-20120621.tgz oldmod_frontpage-deb30-v15-20110610.tgz old-1381-cxld-20120404.tgz oldmod_perl-deb30-v15-20110610.tgz old-1422-cxld-20120721.tgz oldmod_ssl-deb30-v15-20110610.tgz old-14681-cxld-20120619.tgz oldmysql-deb30-v15-20110610.tgz old-1544-cxld-20120626.tgz oldproftpd-deb30-v15-20110610.tgz old-18351-cxld-20120605.tgz old_virt14 old-1853-cxld-20120910.tgz old_virt18 old-1963-cxld-20120206.tgz oldwebmin-deb30-v15-20110610.tgz old-1967-cxld-20120605.tgz suse.virt11.20120421.tgz old-1981-noarchive-20120729.tgz virt11 old-2030-migrated-noarchive-20120727.tgz virt12_old old-2037-cxld-20120716.tgz virt13_old old-2065-cxld-20120727.tgz virt16_old old-2068-cxld-20120424.tgz virt4_old old-2085-cxld-20120531.tgz virt5_old old-364-cxld-20120904.tgz virt6_old old-446-cxld-20120512.tgz virt7_old old-613-cxld-20120601.tgz virt8_old [root@backup1 /data/deprecated]#
virtX_old and jailX_old are permanently archived, so ignore those as well as anything else marked not to delete or otherwise suspicious. Likewise, probably a good idea to try to hang onto oldTEMPLATE.gz as long as we can as well. Most of the stuff we want to delete is dated when it was deprecated, making this easy. So to remove files from 6 mos ago (running this in Oct):
[root@backup1 /data/deprecated]# ls old*201204* old-1012-cxld-20120411.tgz old-2068-cxld-20120424.tgz old-1381-cxld-20120404.tgz old-col00123-mdfile-noarchive-20120417.gz [root@backup1 /data/deprecated]# rm old*201204*
Every few months you will also want to remove some of the snapshot archives for mail. To do this you set aside the dates you want to save then remove months at a time, followed by restoring the set aside dates. Here's how that works:
[root@backup1 /data/www/daily]# ls 05 08-10-11 10-04-10 11-10-10 12-07-29 12-09-21 12-11-14 06 08-10-21 10-04-20 11-10-20 12-07-30 12-09-22 12-11-15 06-06-01-usr-home.tgz 08-11-01 10-05-01 11-11-01 12-07-31 12-09-23 12-11-16 06-07-01-usr-home.tgz 08-11-10 10-05-11 11-11-10 12-08-01 12-09-24 12-11-17 06-08-01-usr-home.tgz 08-11-20 10-05-20 11-11-20 12-08-02 12-09-25 12-11-18 06-09-01-usr-home.tgz 08-12-01 10-06-01 11-12-01 12-08-03 12-09-26 12-11-19 06-11-10 08-12-10 10-06-10 11-12-10 12-08-04 12-09-27 12-11-20 06-12-21 08-12-20 10-06-20 11-12-20 12-08-05 12-09-28 12-11-21 07-01-10 09-01-01 10-07-01 12-01-01 12-08-06 12-09-29 12-11-22 07-01-20 09-01-10 10-07-10 12-01-10 12-08-07 12-09-30 12-11-23 07-02-10 09-01-20 10-07-20 12-01-20 12-08-08 12-10-01 12-11-24 07-02-20 09-02-01 10-08-01 12-02-01 12-08-09 12-10-02 12-11-25 07-03-01 09-02-10 10-08-10 12-02-10 12-08-10 12-10-03 12-11-26 07-03-20 09-02-20 10-08-20 12-02-20 12-08-11 12-10-04 12-11-27 07-04-01 09-03-01 10-09-01 12-03-01 12-08-12 12-10-05 12-11-28 07-04-10 09-03-10 10-09-10 12-03-10 12-08-13 12-10-06 12-11-29 07-04-20 09-03-20 10-09-20 12-03-20 12-08-14 12-10-07 12-11-30 07-05-01 09-04-01 10-10-01 12-04-01 12-08-15 12-10-08 12-12-01 07-05-10 09-04-10 10-10-10 12-04-10 12-08-16 12-10-09 12-12-02 07-05-20 09-04-20 10-10-20 12-04-20 12-08-17 12-10-10 12-12-03 07-06-01 09-05-01 10-11-01 12-05-01 12-08-18 12-10-11 12-12-04 07-06-10 09-05-10 10-11-10 12-05-10 12-08-19 12-10-12 12-12-05 07-06-20 09-05-20 10-11-20 12-05-20 12-08-20 12-10-13 12-12-06 07-07-20 09-06-01 10-12-01 12-06-01 12-08-21 12-10-14 12-12-07 07-08-10 09-06-10 10-12-10 12-06-10 12-08-22 12-10-15 12-12-08 07-08-20 09-06-20 10-12-20 12-06-20 12-08-23 12-10-16 12-12-09 07-09-01 09-07-01 11-01-01 12-07-01 12-08-24 12-10-17 12-12-10 07-10-01 09-07-10 11-01-10 12-07-02 12-08-25 12-10-18 12-12-11 07-10-10 09-07-20 11-01-21 12-07-03 12-08-26 12-10-19 12-12-12 07-10-20 09-08-01 11-02-01 12-07-04 12-08-27 12-10-20 12-12-13 07-12-01 09-08-10 11-02-10 12-07-05 12-08-28 12-10-21 12-12-14 07-12-10 09-08-20 11-02-20 12-07-06 12-08-29 12-10-22 12-12-15 08-01-01 09-09-01 11-03-01 12-07-07 12-08-30 12-10-23 12-12-16 08-01-20 09-09-10 11-03-10 12-07-08 12-08-31 12-10-24 12-12-17 08-02-20 09-09-20 11-03-20 12-07-09 12-09-01 12-10-25 12-12-18 08-03-01 09-10-01 11-04-01 12-07-10 12-09-02 12-10-26 12-12-19 08-03-10 09-10-10 11-04-10 12-07-11 12-09-03 12-10-27 12-12-20 08-03-20 09-10-20 11-04-20 12-07-12 12-09-04 12-10-28 12-12-21 08-04-01 09-11-01 11-05-01 12-07-13 12-09-05 12-10-29 12-12-22 08-04-20 09-11-10 11-05-10 12-07-14 12-09-06 12-10-30 12-12-23 08-05-01 09-11-20 11-05-20 12-07-15 12-09-07 12-10-31 12-12-24 08-05-10 09-12-01 11-06-01 12-07-16 12-09-08 12-11-01 12-12-25 08-06-10 09-12-10 11-06-10 12-07-17 12-09-09 12-11-02 12-12-26 08-06-20 09-12-20 11-06-20 12-07-18 12-09-10 12-11-03 12-12-27 08-07-02 10-01-01 11-07-01 12-07-19 12-09-11 12-11-04 12-12-28 08-07-10 10-01-10 11-07-10 12-07-20 12-09-12 12-11-05 2008-10-23 08-07-20 10-01-20 11-07-20 12-07-21 12-09-13 12-11-06 bb.tgz 08-08-01 10-02-01 11-08-01 12-07-22 12-09-14 12-11-07 boot 08-08-10 10-02-10 11-08-10 12-07-23 12-09-15 12-11-08 current 08-08-21 10-02-20 11-08-20 12-07-24 12-09-16 12-11-09 hold 08-09-01 10-03-01 11-09-01 12-07-25 12-09-17 12-11-10 08-09-10 10-03-10 11-09-10 12-07-26 12-09-18 12-11-11 08-09-21 10-03-20 11-09-20 12-07-27 12-09-19 12-11-12 08-10-01 10-04-01 11-10-01 12-07-28 12-09-20 12-11-13 [root@backup1 /data/www/daily]#
So we see that everything up to July 2012 has been pruned. To prune July 2012 we do the following:
mv 12-07-01 hold mv 12-07-10 hold mv 12-07-20 hold rm -fr 12-07* mv hold/* .