BigBrother: Difference between revisions
No edit summary |
No edit summary |
||
Line 1: | Line 1: | ||
= Overview = | = Overview = | ||
BB is our monitoring service. It runs on mail and does 2 things: | |||
# It watches machines (ping and ssh) | |||
# It listens to bb clients running on other machines for reports of problems on those machines (CPU, disk, procs) | |||
When things go wrong, there are 2 status levels, warn and panic. Panic events will always page. Warn events will only show up on the bb status page. The link to the status page can be found under Mgm. -> BigBrother (https://secure.johncompanies.com/mgmt/bb/) | |||
Condition purple means it is no longer receiving updates from a particular machine. You will still get this even if you've removed the host from bb-hosts. To remove fully, you have to run the <tt>bbrm <hostname></tt> command to remove all prev logs. | |||
NOTE: may not be working...If you get a page from bb about some alarm situation, you can silence that alarm by responding with (in the subject line): | |||
Re: !BB - <7 digit unique alarm id send in page>! ACK=Y DELAY=<# mins you want it silenced> | |||
Or, you can go to the bb status page and clicking on the lightning bolt at the top: https://secure.johncompanies.com/mgmt/bb/help/bb-ack.html | |||
== Changing who get’s paged == | |||
<pre>mail# su bb | |||
%cd | |||
%cd bbsrc/bb1.9i-btf/etc | |||
%vi bbwarnrules.cfg | |||
(edit entries) | |||
%cd .. | |||
%./runbb.sh restart | |||
Stopping Big Brother... | |||
Starting Big Brother Daemon (bbd)... | |||
Starting Network tests (bb-network)... | |||
Starting Display process (bb-display)... | |||
Big Brother 1.9i started | |||
%exit</pre> | |||
== Shutting down bb == | |||
<pre>mail# su bb | |||
%cd | |||
%cd bbsrc/bb1.9i-btf/etc | |||
%vi bbwarnrules.cfg | |||
(edit entries) | |||
%cd .. | |||
%./runbb.sh stop | |||
%exit</pre> | |||
== Disabling monitoring for a server == | |||
If there’s ever a situation where a machine is causing too many pages, you can stop it’s monitoring as follows: | |||
<pre>mail# su bb | |||
%cd | |||
%cd bbsrc/bb1.9i-btf/etc | |||
%vi bb-hosts | |||
(comment out entry, i.e.:) | |||
#10.1.4.61 virt11.johncompanies.com # ssh | |||
%cd .. | |||
%./runbb.sh restart | |||
Stopping Big Brother... | |||
Starting Big Brother Daemon (bbd)... | |||
Starting Network tests (bb-network)... | |||
Starting Display process (bb-display)... | |||
Big Brother 1.9i started | |||
%exit</pre> | |||
= monitor.johncompanies.com = | = monitor.johncompanies.com = |
Revision as of 16:21, 28 February 2013
Overview
BB is our monitoring service. It runs on mail and does 2 things:
- It watches machines (ping and ssh)
- It listens to bb clients running on other machines for reports of problems on those machines (CPU, disk, procs)
When things go wrong, there are 2 status levels, warn and panic. Panic events will always page. Warn events will only show up on the bb status page. The link to the status page can be found under Mgm. -> BigBrother (https://secure.johncompanies.com/mgmt/bb/)
Condition purple means it is no longer receiving updates from a particular machine. You will still get this even if you've removed the host from bb-hosts. To remove fully, you have to run the bbrm <hostname> command to remove all prev logs.
NOTE: may not be working...If you get a page from bb about some alarm situation, you can silence that alarm by responding with (in the subject line): Re: !BB - <7 digit unique alarm id send in page>! ACK=Y DELAY=<# mins you want it silenced>
Or, you can go to the bb status page and clicking on the lightning bolt at the top: https://secure.johncompanies.com/mgmt/bb/help/bb-ack.html
Changing who get’s paged
mail# su bb %cd %cd bbsrc/bb1.9i-btf/etc %vi bbwarnrules.cfg (edit entries) %cd .. %./runbb.sh restart Stopping Big Brother... Starting Big Brother Daemon (bbd)... Starting Network tests (bb-network)... Starting Display process (bb-display)... Big Brother 1.9i started %exit
Shutting down bb
mail# su bb %cd %cd bbsrc/bb1.9i-btf/etc %vi bbwarnrules.cfg (edit entries) %cd .. %./runbb.sh stop %exit
Disabling monitoring for a server
If there’s ever a situation where a machine is causing too many pages, you can stop it’s monitoring as follows:
mail# su bb %cd %cd bbsrc/bb1.9i-btf/etc %vi bb-hosts (comment out entry, i.e.:) #10.1.4.61 virt11.johncompanies.com # ssh %cd .. %./runbb.sh restart Stopping Big Brother... Starting Big Brother Daemon (bbd)... Starting Network tests (bb-network)... Starting Display process (bb-display)... Big Brother 1.9i started %exit