BigBrother

From JCWiki
Jump to navigation Jump to search

Overview

BB is our monitoring service. It runs on mail and does 2 things:

  1. It watches machines (ping and ssh)
  2. It listens to bb clients running on other machines for reports of problems on those machines (CPU, disk, procs)

When things go wrong, there are 2 status levels, warn and panic. Panic events will always page. Warn events will only show up on the bb status page. The link to the status page can be found under Mgm. -> BigBrother (https://secure.johncompanies.com/mgmt/bb/)

Condition purple means it is no longer receiving updates from a particular machine. You will still get this even if you've removed the host from bb-hosts. To remove fully, you have to run the bbrm <hostname> command to remove all prev logs.

NOTE: may not be working...If you get a page from bb about some alarm situation, you can silence that alarm by responding with (in the subject line): Re: !BB - <7 digit unique alarm id send in page>! ACK=Y DELAY=<# mins you want it silenced>

Or, you can go to the bb status page and clicking on the lightning bolt at the top: https://secure.johncompanies.com/mgmt/bb/help/bb-ack.html


Changing who get’s paged

mail# su bb
%cd
%cd bbsrc/bb1.9i-btf/etc
%vi bbwarnrules.cfg
(edit entries)
%cd ..
%./runbb.sh restart
Stopping Big Brother...
        Starting Big Brother Daemon (bbd)...
        Starting Network tests (bb-network)...
        Starting Display process (bb-display)...
Big Brother 1.9i started
%exit

Shutting down bb

mail# su bb
%cd
%cd bbsrc/bb1.9i-btf/etc
%vi bbwarnrules.cfg
(edit entries)
%cd ..
%./runbb.sh stop
%exit

Disabling monitoring for a server

If there’s ever a situation where a machine is causing too many pages, you can stop it’s monitoring as follows:

mail# su bb
%cd
%cd bbsrc/bb1.9i-btf/etc
%vi bb-hosts	
(comment out entry, i.e.:)

#10.1.4.61 virt11.johncompanies.com # ssh

%cd ..
%./runbb.sh restart
Stopping Big Brother...
        Starting Big Brother Daemon (bbd)...
        Starting Network tests (bb-network)...
        Starting Display process (bb-display)...
Big Brother 1.9i started
%exit


monitor.johncompanies.com

This is the monitoring service we offer to customers (colo) that have asked for monitoring. It's running under VEID 5 on quar1. It has been altered so when customers are paged (emailed), the url they are given takes them to a status page only they can see. The general/overview status page is password protected (password needs to be reset, it's lost).