@adlrocha - Traffic Management: Focusing on the elephant, ignoring the mice
Improving my math intuition series - Part II
I took more time than expected to finish this next paper from the algorithms course I am taking. I wasn’t expecting it that long, and I had less time to read than initially planned, so I had to skip last week's publication to focus on reading. But I am back with my notes on the paper. This paper was an oldie but goodie from 2003: New Directions in Traffic Measurement and Accounting: Focusing on the Elephant and Ignoring the Mice.
📊 Class 2: New Directions in Traffic Measurement
The problem
Accurate network traffic measurement has always been something networking researchers and professionals have been interested in. We need these measurements for accounting purposes, bandwidth provisioning, or detecting DoS attacks. We may naively think that measuring traffic is pretty straightforward, “just count the packets from a specific flow that are passing through your network”. Unfortunately, this may be practically intractable. Keeping a counter for each flow is either too expensive (in SRAM), or too slow (in DRAM), and more so in 2003, when our hardware wasn’t as capable as now, and we didn’t have fast RAMs of hundreds of GBs available.
The state-of-the-art at the time to circumvent this problem was to simply sample a few packets every other time instead of counting every single packet. This made traffic measurements tractable with the hardware available, albeit quite inaccurate. And accuracy is key for certain use cases, especially if there’s money involved (like for accounting use cases), or when missing a large flow can be catastrophic.
For this paper, the authors build upon experimental studies that have shown how a small number of flows is generally responsible for a large share of a link’s capacity. They realized the following, “what if instead of measuring every single flow with the drawback of investing resources on measuring small flows, and harming the accuracy of large flow detection and measurement, we only measure large flows accurately and disregard small ones? In the end, large flows are the ones responsible for the bulk of the link’s traffic.” So they propose two new algorithms (and a bunch of optional improvements to them) to improve the resource requirements and performance of the state-of-the-art random sampling algorithms.
The algorithms and the math behind them
The basic ideas behind the proposed algorithms are, in my opinion, quite simple but extremely elegant. However, I think the key strength of the paper is not the design of the algorithms alone, but the theoretical analysis and comparison with the state-of-the-art random sampling algorithm; and their experimental evaluation to check the accuracy of their theoretical work.
Algorithm 1: Sample and Hold.
The simplest way to identify large flows is through random sampling of packets. If we see a lot of packets from the same flow when we sample, it potentially means that this flow is consuming a lot of traffic. But what if we are unlucky and end up sampling always packets from the same small flow?
In the sample and hold algorithm, we also start randomly sampling packets. If the sample belongs to a flow F1, we create an in-memory entry for that flow, and start counting every single packet that belongs to F1, instead of continuing sampling. If F1 is a large flow, we will be counting every single packet, and thus accurately measuring the flow. If, on the other hand, F1 is a small flow, we will eventually evict it from our flow memory when we run out of space (as we need to create space for new sampled flows), disregarding any previous measurement from F1.
The theoretical analysis to show how sample and hold is more accurate requiring less memory than random-sampling can be tackled with many of the things we learned in the Bloom filters paper from last time. Briefly, if p is the probability with which we sample a byte, the sample probability for a packet of size s is ps = 1 - (1-p)s which again we can approximate using the approximation we learned in the previous paper:
If we wish to sample each byte with probability p such that the average number of samples is 10,000; and if C bytes can be transmitted in the measurement interval, p=10,000/C. For the error analysis, if we consider a flow F that takes 1% of the traffic (i.e. the threshold T for which a flow will be considered large is a 1% of the total capacity of the channel). Thus F sends more than C/100 bytes. Since we are randomly sampling each byte with probability 10,000/C. The probability that F will not be in the flow memory at the end of the measurement interval (false negative) is (1 − 10000/C)^C/100 which (again, by approximation) is very close to e−10.
So once again, simple statistics can get us pretty far. The paper shares a way deeper theoretical analysis considering the error bounds, the memory required considering an oversampling to avoid false positives, and other interesting improvements. For the error, a geometric probability distribution is used to model the number of bytes that go by before the first packet of a flow is identified. With this, we can approximate the error bound, as once the first packet is sampled, every packet for a flow is measured (so errors are not possible, as we are measuring accurately).
For the memory usage, the size of the flow memory is determined by the number of flows identified. The actual number of sampled packets is an upper bound on the number of entries needed in the flow memory, as new entries are created only for sampled packets. Assuming that the link is constantly busy, by the linearity of expectation, the expected number of sampled bytes is p · C = O · C/T. Where O is the oversampling factor p = O / T (i.e. additional sampling done to ensure a low false negative rate).
Algorithm 2: Multistage Filters
The multistage filters algorithm is an attempt to reduce further the probability of false positives (small flows identified as large flows); and false negatives (large flows that circumvent our measurements) in our measurements.
The idea behind multistage filters will remind you a lot of the counting Bloom filters from the last paper. The basic building block of a stage filter is a set of hash stages that operate in parallel. Each stage has a table of counters which is indexed by a hash computed on a packet flow id. All counters are initialized to zero at the beginning of the measurement period. When a packet for a flow F comes in, the hash of F’s id is performed, and the counter for the index received is increased in the size of the packet. We do this for different hash functions in every hash stage (so in each stage this maps to an increase in a different counter). Since all hashes of packets from a flow F will increase the same counters, when the counters of all stages for a flow achieve our “large flow threshold”, T, this means this flow is large and we add an entry to our memory. From there on, we measure every single packet for the flow without going through the multistage filter in order to measure it accurately.
The number of counters in our multistage filters will be smaller than the number of flows we may potentially encounter, so many flows will map the same counters (as it also happened for the members in a set of a Bloom filter). This can cause false positives in two ways: first, small flows can map to counters that hold large flows and get added to flow memory; second, several small flows can hash to the same counter and add up to a number larger than the threshold. We can circumvent this in several ways: one option is to increase the number of stages to reduce the probability of false positives. Effectively, the multiple stages attenuate the probability of false positives exponentially in the number of stages. The paper describes other ways in which this can be minimized, like using serial instead of parallel stages, or the use of shielding, but I will leave that for the interested reader.
The theoretical analysis for the multistage filter is a bit more complex (I had to read it several times, and have a look at the proofs of the appendix to grasp every detail). Fortunately, the preliminary analysis is pretty straightforward:
“Assume a 100 Mbytes/s link, 5 with 100,000 flows and we want to identify the flows above 1% of the link during a one second measurement interval. Assume each stage has 1,000 buckets and a threshold of 1 Mbyte. [...] For this flow to pass one stage, the other flows need to add up to 1 Mbyte −100 Kbytes = 900 Kbytes. There are at most 99,900/900 = 111 such buckets out of the 1,000 at each stage. Therefore, the probability of passing one stage is at most 11.1%. With 4 independent stages, the probability that a certain flow no larger than 100 Kbytes passes all 4 stages is the product of the individual stage probabilities which is at most 1.52 * 10−4. Based on this analysis, we can dimension the flow memory so that it is large enough to accommodate all flows that pass the filter. The expected number of flows below 100 Kbytes passing the filter is at most 100,000*15.2*10−4 < 16. There can be at most 999 flows above 100 Kbytes, so the number of entries we expect to accommodate all flows is at most 1,015.”
Section 4 introduces a more rigorous analysis that proves a stronger bound for any distribution of flow sizes. The above analysis makes no assumption about the distribution of flow sizes, but if we consider, for instance, that the flow sizes follow a Zipf distribution, the resulting bound is lower than the conservative one from the proof above. If you are not familiar with the Zipf distribution, it is all over the Internet and Complexity Theory, so it is a good thing to add to your tool belt. Finally, to understand every lemma and theorem in the paper I highly recommend checking out the first appendix, it is really instructive.
Measurement results
Something I found interesting about the paper, is how after the thorough theoretical analysis of the algorithms, the experimental measurements show results which are lower than the ones from the theoretical analysis. At first I thought it made sense because in the theoretical analysis we were approximating upper and lower bounds, but actually some of the comparisons use the average. Honestly, this is something I’ve missed in this first pass to the paper, and something I am hoping to dive deeper on my next pass.
Where can these algorithms be used?
In section 1.2, the paper motivates how these algorithms that only measure large flows accurately and disregard the “mice” flows can be useful. I think this motivation is key in order to understand the design decisions and the applicability of the paper:
Scalable Threshold Accounting: “The two poles of pricing for network traffic are usage based (e.g., a price per byte for each flow) or duration based (e.g., a fixed price based on duration) [...] We suggest, instead, a scheme where we measure all aggregates that are above z% of the link; such traffic is subject to usage based pricing, while the remaining traffic is subject to duration based pricing”. Cool, right? This is a great workaround for when you don’t have a practical way of accurately accounting all usages for your service. This ideas was the one responsible for me considering including this section in the publication.
Real-time Traffic Monitoring: “Many ISPs monitor backbones for hotspots in order to identify large traffic aggregates that can be rerouted to reduce congestion.” Makes sense, right? Why worrying about small flows when what interested in is in detecting congestion?
Scalable Queue Management: “At a smaller time scale, scheduling mechanisms seeking to approximate max-min fairness need to detect and penalize flows sending above their fair rate”. If you took any theoretical networking course at college you probably have learned to love and hate max-min fairness as much as me.
Finally, the authors note how measurement problems (data volume, high speeds) in networking are similar to the measurement problems faced by other areas such as data mining, architecture, and even compilers. Thus the techniques in this paper may be useful in other areas (especially if you have measurements constraints) - read Section 9.
Tasks for next week 📰
Overall, this is a great paper to read if you find the time. We’ve encountered a few more mathematical concepts which can come handy in the future; we went bit deeper on how to theoretically (and thoroughly) analyze an algorithm’s performance; and we know now why we disregarding the measurement of small flows may be fine. There is much more in the paper, like implementation issues, or a discussion on why random sampling may still be a better option than the proposed algorithms for certain cases. I will leave these things out of the scope of the publication for the sake of “brevity”.
Next week, I have an interesting task ahead. The paper to read is An Improved Data Stream Summary: The Count-Min Sketch and its Applications, but this time my self-commended task is not only to read paper, but also to answer to the following question:
“Compare contrast the mice/elephants paper and the count-min sketch paper. How do they describe and define the underlying problem(s) they are considering? How do they formalize their solution(s)? How do they compare?”
So it's not only reading time but also thinking+discussion time. Do you want to join the discussion? Do not hesitate to ping me.