System-generated Notifications

From JCWiki
Revision as of 15:26, 30 October 2012 by 70.230.212.110 (talk)
Jump to navigation Jump to search

"snapshot rotation done on backup1"

Sent daily - expect to receive this!

Action to take: Confirm that this was received before midnight (or whenever backups start from virts/jails -> backup server). Delete email.

"RAID controller problem on backup1.johncompanies.com"

Ignoring all alarms prior to 2012-09-12-12-36-37
unitu0 drive p1 status= DEVICE-ERROR
there was a WARNING event on 2012-09-14 01:59:39
there was a WARNING event on 2012-09-14 02:08:27
there was a WARNING event on 2012-09-14 03:54:47
there was a WARNING event on 2012-09-15 02:38:14
there was a WARNING event on 2012-09-15 02:59:02
there was a WARNING event on 2012-09-15 04:47:08
there was a WARNING event on 2012-09-15 04:47:31
there was a WARNING event on 2012-09-15 10:41:59
there was a WARNING event on 2012-09-15 13:25:23
there was a WARNING event on 2012-09-15 13:25:31
there was a WARNING event on 2012-09-15 13:25:54
there was a WARNING event on 2012-09-15 17:10:50
there was a WARNING event on 2012-09-18 01:17:18
there was a WARNING event on 2012-09-25 01:56:47
there was a WARNING event on 2012-09-29 02:04:14
there was a WARNING event on 2012-09-29 10:58:39
there was a WARNING event on 2012-09-29 10:59:02
there was a WARNING event on 2012-09-29 11:22:44
there was a WARNING event on 2012-09-29 13:50:48
there was a WARNING event on 2012-09-29 13:51:11
there was a WARNING event on 2012-09-29 13:51:30
there was a WARNING event on 2012-10-01 04:47:24
there was a WARNING event on 2012-10-02 02:00:27
there was a WARNING event on 2012-10-02 02:01:56
there was a WARNING event on 2012-10-02 05:02:31
there was a WARNING event on 2012-10-02 05:04:14
there was a WARNING event on 2012-10-03 01:22:12
there was a WARNING event on 2012-10-04 04:29:22
there was a WARNING event on 2012-10-04 05:10:51
there was a WARNING event on 2012-10-06 19:41:18
there was a WARNING event on 2012-10-08 00:32:06
there was a WARNING event on 2012-10-09 03:51:03
to see all status: tw_cli /c0 show all
to see all alarms: tw_cli show alarms
to silence old alarms: 3wraidchk shh

You get this when the cronjob running on the server notices there's been a new event in the logs, and those recent events are include.

Action to take: Review the logs on the server. See commands above and review tw_cli_Reference. Optional: clear the warning with 3wraidchk shh on the server that generated the notice. Delete email.

"bwdb2: sendsql.pl error"

scp -Cq /usr/home/sql/2012-10-29-10:30.sql.bz2 backup1:/data/bwdb2/pending/ (lost
connection)

Action to take: Usually none, this is a temporary failure to transfer bandwidth stats from bwdb2 to backup1 where the database lives that tracks and supplies all b/w stats. The script will continue to attempt to send data over. Only take action if it continues to fail without obvious reason (i.e. temporary outage i2b<->castle or backup1 down)