Bandwidth Management
TODO
Finding who's causing bandwidth spike
We find out about bandwidth usage spikes in one of several ways:
- NOC calls and tells us they notice a large usage spike
- we see a system-generated email telling us a customer has passed their usage
- speed complaints are coming in
- we notice the spike on the mrtg page
Determining the cause of the spike is fairly easy with a bit of looking.
Castle:
Open up the mrtg graph for p1a (the top-level switch for most of the machines at castle): mgmt -> monitoring -> p1a -> bytes/sec
i2b:
Open up the mrtg graph for p20 (the top-level switch for most of the machines at i2b): mgmt -> monitoring -> p20
From there, you can begin to narrow down from which switch spike is coming from, and then you would load the mrtg graph for that switch and further narrow down by port/device. Word of caution- even though the mrtg graphs show labels to indicate which device is connected to which port, you should take followup steps to confirm which machine is actually in that port (except for 3750, p1a, p1b, p20 where the labeling should be accurate, also in general the switches at i2b are mostly correctly labeled). See Finding which IPs are on a port
Once you've determined and confirmed which server is connected to a port, there are a few ways to curb the traffic.
- you can turn off the port entirely (last resort). See Shutting down a port