This post is going to be another one of the two parters since there was a bit of work to pull all this together. The first posting (this) covers the results and description of the technique, while the second will review the source code.
One of the problems with measuring bandwidth is that results are highly variable (i.e. noisy). This can be due to a number of reasons, but basically, packets travel to their target over several hops, each hop (or router, switch etc.) can introduce a variable amount of delay due to congestion, loading, system performance etc. Therefore, any single measurement of the bandwidth is unlikely to provide an accurate representation of the circuit bandwidth. In addition, bandwidth characteristics will change over time. Therefore, it's best to support a model that uses historical data to estimate the bandwidth without completely discounting large changes in the current measurement.
For this case I implemented a simple (i.e. scalar) Kalman filter. What this filter does is estimate the measurement value based on the system and measurement noise, along with the previous measurement history. Compare its estimate to the actual measured value and then adjust the new estimate based upon how good its "guess" was.
If the estimate is close to the actual measurement, the estimate will carry more weight in the future, and conversely if the measurement and estimate are way off, the actual measurement will be carry more importance in future estimation.
What is a Kalman filter? This technique was named after its inventor Rudolf E. Kalman. The original paper is pretty dense. A simpler description which reduces some of the computations down to simple scalar values (which is what I used in this implementation). Historically, Kalman filter was originally used in trajectory estimations for the Apollo program, and since then has been applied to a wide set of problems where there is noise in the enviorement and measurments (such as financial market estimations). And has been used recently in estimating motion in the XBox Kinect.
With this implementation, a simple measurement of the bandwidth is performed first. This is done by sending two packets to a destination, a large packet and a small packet. Assuming a linear relationship between delay and packet size (I told you this is a rough estimate), the bandwidth can be estimated.
The estimate is:
Bandwidth estimate = diff in packet size (bytes) / diff in resp time (secs)
The bandwidth estimator performs this measurement and simple calculation then passes the measurement down into the processing stage. The estimation of the bandwidth using the Kalman filter (along with an Average filter--see options below) is then performed and printed to STDOUT.
Other knobs are available as well:
./bw_estimate -h -t target (ipv4) -i interval between measurements (seconds) -m processing module [None|Average|Kalman|All] -s small packet size (default 40 bytes) -l large packet size (default 1300 bytes) -h help
The module is easy to use. You just need to provide a destination target to test the bandwidth. All other options will use defaults if not provided.
An example of the output:
root@debian:/home/slioch/bandwidth# ./bw_estimate -t 220.127.116.11 -m Kalman kalman estimate: 140000.000000 kalman estimate: 152182.703125 kalman estimate: 142622.281250 kalman estimate: 107843.039062 kalman estimate: 134078.031250 kalman estimate: 128621.765625
In the example above, the Kalman filter is being used, with the target being 18.104.22.168 (google's dns server address). And the estimated bandwidth for the first few tests is between 107kbps and 152kbps.
An average function can be used instead of the Kalman, but the average will de-emphasize the last measurement, or provide too much weight on historically irrelevant values (whichever way you look at it).
Note that another option is to run this module with all estimating/averaging functions enabled:
root@debian:/home/slioch/bandwidth# ./test -t 22.214.171.124 -m All (all) none: 0.000000, average: 0.000000, kalman: 0.000000 (all) none: 114545.000000, average: 114545.000000, kalman: 114545.000000 (all) none: 252000.000000, average: 183272.500000, kalman: 114545.000000 (all) none: 1260000.000000, average: 542181.687500, kalman: 784593.187500 (all) none: 157500.000000, average: 446011.250000, kalman: 387480.843750 ^C
What we are looking at is... The first value is the instantaneous recorded bandwidth, the second is a running average (all time average), while the third is the Kalman value estimate. After a few measurements the Kalman value will always be more responsive to the actual measured value as compared to the average.
Now, I've run this a bit longer in order to generate a couple of plots so we can get a nice visual of the results:
The plots above are for a 45 second measurement showing the actual noisy measurement, the running average, and the kalman estimate.
The Average appears to be fairly non-responsive to system changes. While the Kalman trace follows the actual measurement more closely (but takes into account previous history and differences from its estimate versus actual). You'll notice that around time equal 33 seconds the actual measurement doesn't even show on the plot--the points are off the scale of the graph. The Kalman filter shows these values taking into account its estimated value, while the running average barely registers these values. If the environment and measurement noise are modeled correctly the Kalman filter should provide values with greater accuracy than the average or actual values.
Next up I'll walk through the source code... next week that is...