Editor's note: These minutes have not been edited. Minutes of the IP Performance Metrics Session Reported by Paul Love and Guy Almes 1: Overview and Agenda Bashing Guy Almes led off with remarks on RTFM and a joint exploratory meeting to be held Thursday morning. (He also gave examples of the death throes of a color printer!) The resulting agenda were: 9:10 Review of Internet Drafts Framework for IP {Performance|Provider} Metrics (revised) Connectivity One-way Delay Metric for IPPM Empirical Bulk Transfer Capacity 10:40 Alternative Flow Capacity Tools 10:45 Discussion of Bprobe/Cprobe 10:55 Simple End-to-end Metrics & Methods for Monitoring & Measuring IP provider performance Mike O'Dell (one of our two co-Area-Directors) suggested that the second P within IPPM be Performance (rather than Provider) since (a) it accurately described what we were working on and (b) it avoided misinterpreting our work as focusing on 'users' vs 'providers' business dynamics. The chair noted that such a change would not modify the current thrust of the IPPM effort; it might improve the perception of our work in some quarters, however. (No action was taken during the meeting, but later in the week, after consulting with several WG members, the chair agreed to the name change and communicated that to the Area Directors.) 2: Review of the Revised Framework Document Vern Paxson led a presentation/discussion of revisions to the Framework Document. The original Framework Document was presented at the Montreal meeting, and had undergone several revisions as a result of use in developing specific metrics. 2.a: Clock issues We will follow NTP terminology where possible. For many IPPM purposes, close synchronization among clocks on cooperating computers is more important than the absolute accuracy of any given clock. Thus, while NTP generally strives for accuracy, we are after synchronization. As an example of how difficult this can be, a 0.01% skew yields 60 ms error in only 10 minutes. Also, in our timings, we will strive to measure the 'wire time', i.e., when a packet enters/leaves network, as opposed to 'host time', i.e., when the packet is first/last seen by application software. 2.b: Classes of metrics We will strive to properly treat the relationship between singleton (i.e., single, atomic measurements), samples of those singletons, and statistics of those samples. 2.c: Issues associated with Samples Predictability and synchronicity all are problems with naive periodic sampling. Random sampling is much better. Poisson sampling is the best random scheme. If the desired number N of samples within a time interval dT is known, however, Poisson sampling is equivalent to Uniform sampling of N samples over dT. If either of these random sampling schemes are employed, it is important to test for the self-consistency of the sampling. 2.d: Adopt definitions for statistical distributions, but avoid trying to define "stochastic" metrics. Such stochastic metrics might be easy to define in terms of probability, but they often carry "hidden" assumptions. It is better to use "deterministic" definitions; thus define the "proportion of k/m packet loss" rather than the "probability p of packet loss". 2.e: Generic "Type P" packets Many metrics yield different results depending on the type of packet measured. Make this dependency explicit rather than implicit by using the term "Type P" when defining generic metrics, so it is clear that for a specific measurement, one needs to choose a specific P. 2.f: Internet address vs. hostname A multi-homed box can yield very different results depending on which of its various interfaces is tested. In discussion it was agreed that we need to make sure we need to define metrics in terms of particular interfaces. 2.g: "Well Formed" packets Unless otherwise stated, metrics assume well formed packets. In discussion, it was mentioned that a better name might sought, since "well formed" implies merely legal. 3: Review of the Connectivity Metric Jamshid Mahdavi from the Pittsburgh Supercomputer Center presented an analytical metric for Connectivity drafted together with Vern Paxson. The basic idea is to define a function F(Src, Dst, Time) => true/false. The draft defines both one-way and two-way connectivity, and defines both instantaneous connectivity and connectivity during a time interval. The most practical metric is that for causal two-way connectivity. Jamshid noted that it is very difficult to define (and measure!) any truly instantaneous metric! The most developed metric, for Type P1-P2 - Causal - Connectivity, is the only one with a methodology in the draft. Scott Bradner pointed out that "temporal" is a more accurate term than "causal", since it needn't be the case that a packet in one direction actually causes a packet to be sent in the other direction, just that it could have done so. 4: Review of the One-way Delay Metric Guy Almes from Advanced Network & Services presented an analytical metric for One-way Delay drafted together with Sunil Kalidindi. The singleton metric is an analytical metric to measure one-way delay of a Type P packet from a source to a destination over a given path. It was stressed that measuring one-way delay will require tightly synchronized clocks at the measuring computers at the end-points. Due both to asymmetric paths in the Internet and to asymmetric congestion patterns, one-way delay is also well-motivated. More controversial was the notion in the draft of a 'first hop' parameter. By itself, the first hop is of limited importance, and can only be understood as an attempt to constrain the path from source to destination. In discussion, it was agreed that this path was the real parameter (though it can often be neither fully constrained nor even known). Matt Mathis noted that the desire to specify the full path would be a property of many metrics we'd like to define. Van Jacobson noted that 20% of his measurement's intermediate hops follow different paths, and the differences are often major. Packets that do not arrive at the destination at all are given a delay value of 'undefined' or (loosely) infinite. Thus, a notion of packet loss falls directly from the one-way delay metric. Geert Jan de Groot mentioned that it would be important to guard against situations in which the first packet in a sequence encounters meaningless delays due to cache-setup/arp issues. Jeff Dunn observed that, in the case of load-balancing routers, the issue Geert Jan raised could strike more than the first packet! Scott Huddle noted that capturing the level-3/IP path may not be enough if the level-2/ATM/FR path can change. Given this singleton definition, the draft then defines a sample-metric based on a Poisson arrival process of singletons, all of which share the same source, destination, Type P, and path. The Poisson process is characterized by a time interval and a rate lambda. A discussion of the values of this Poisson process followed. Van Jacobson noted that most important reason for Poisson was that you can avoid all the problems of periodic measurement. For example, one router vendor drops packets for a brief interval every 30 sec. Periodic things are so prevalent it should be inexcusable to use periodic rate. In addition, Guy noted that, with Poisson sampling: <> you can take a given sample, narrow the time range, and the subset of the original sample within that time range is also Poisson, <> you can take a given sample, take a random subset of it, and the result is also Poisson, and <> you can take two given samples (with the same source, destination, time period, etc.), merge them, and the result is also Poisson. These properties all combine to make Poisson a good choice. Given this sample definition, the draft then defines several statistics, including minimum, median, and various percentiles. Among the chief open issues are: <> gaining measurement experience generally, <> testing the practicality of the synchronization needed specifically, and <> considering whether a round-trip variant is also needed. 5: Review of Bulk Transfer Capacity Metric Matt Mathis from Pittsburgh Supercomputer Center presented a draft empirical metric for Bulk Transfer Capacity based on his TReno tool. Before presenting the metric itself, Matt described the current state of research on Treno and TCP congestion control. The two are closely related: Treno was based originally on the Reno implementation of TCP congestion control; and recent improvements in TCP congestion control (including the recent SACK TCP option and the algorithms for using it) have resulted in part from the Treno work. Similarly, a new analytical model (being done together with Teun Ott of BellCore) that tries to capture the relevant TCP dynamics is "in the wings". In recent months, Treno has been focussed on accurately measuring end- to-end bulk transfer capacity rather than on diagnosing bottlenecks. Work continues on portability testing, documentation, and calibration. A document on the interpretation of Treno outputs is also needed. With respect to calibration and timing, Van Jacobson noted the relationship between the length of Treno run needed and the accuracy of the clocks used. One of Treno's advantages is that, by using very modern congestion control techniques coming out of the work on SACK, it allows one to measure the bulk transfer capacity of the network even though the native TCP available on current hosts cannot achieve the measured rates. This removes the (currently weak) TCP congestion control algorithms as an unintended parameter of the measurement. In discussion, Van pointed out how different SACK congestion control is from "normal" TCP; there is reason to hope for markedly improved flow performance as SACK TCP implementations become deployed. Several documentation issues were discussed; Scott Bradner specifically raised the question of how a description of the reference congestion control algorithm would be documented, and suggested coordination with the Transport Area ADs. Matt closed by noting several issues related to the Framework. <> We really want 'cloud measures' rather than measures that inadvertantly measure host performance. <> Do we need more knowledge about roles? (i.e., users vs transport providers vs content providers) <> We need an experimental design cookbook. 6: Review of several TCP-based Flow Capacity Tools. Padma Krishnaswamy from BellCore presented a discussion of several TCP-based flow capacity techniques. Examples of current practices: <> ttcp (written in the 1980s) <> Netperf <> NetSpec ttcp and Netperf, for example, use the TCP implementation of the host computer to manage flows. They can use large memory data transfers to reduce overhead and make no reliance on ICMP. One interesting question is whether one wants to test the net only or the combination of host and net. By using the host's TCP implementation, ttcp and Netperf include host effects, while TReno measures only the net. This led to discussion of these alternatives. A conjecture that arose during discussion is that if the host TCP implementation is of high quality and makes effective use of the recent SACK TCP options and congestion control algorithms, then such TCP-based flow capacity tools might yield very similar results to TReno. This would give us two very different means to test (what should be) the same metric. If the conjecture proves valid, then applications would include: <> a means to test the quality of a host TCP implementation (by testing whether TReno and a TCP-based test performed similarly), and <> a means to test Buld Transfer Capacity without TReno. Testing this conjecture would be valuable to the working group. 7: Review of work on Measuring Bottleneck Link Speed Bob Carter from Boston University presented a discussion of his work on Bprobe and Cprobe. The Bprobe tool is designed to measure the link speed of the bottleneck link along a given path, using short bursts of packets. Bprobe's success depends on several assumptions: <> that the network does not reorder packets, <> that the end-to-end path is stable over 1-second intervals, and <> that the bottleneck link speed is the same in both directions. Results were presented based on measurements within campus LANs, within geographically compact regional networks, and across the country. In discussion, it was noted that asymmetric paths will threaten the assumptions. The Cprobe tool is designed to measure the current utilization of the bottleneck link along a given path, using longer streams of packets. The work on Cprobe is not as well-developed as with Bprobe. 8: Review of work at Intel on Simple End-to-End Metrics Jeff Sedayao and Cindy Bickerstaff from Intel presented some work they had done on Simple End-to-End Metrics. The work emphasizes measurement of host-to-host performance, including round-trip (ping) delay and the time taken to pull web pages from a site. The metrics emphasize what can be done right now; no involvement of providers is required. Three statistics of the measurements are examined: <> Median <> Inter-quartile range (the difference between the 25th percentile and the 75th percentile) <> Error percentage Imeter (round trip delay) implementation <> Delays of about 100 ms are typical <> Packet losses of as much as 15-22% have been observed Timeit measures the time to get a fixed set of URLs over the web. The goal is to detect significant deviations from the norm and take action to correct the deviation. The action would include the use of other tools such as traceroute and opening trouble tickets with the apparently deviating ISP. Documentation of problems within the measuring companies' ticket control systems enable review of particularly unreliable ISPs for subsequent RFP and contract negotiations. In discussion, it was noted that different routers respond differently when they have no route to the web server specified in the URL. Also, there was discussion of how IPPM documents might cause providers to optimize their infrastructure to make the IPPM metrics look good. Intel uses the data to: <> Select an ISP, <> Expect them, in their service contract, to stay within 10% of values or it will cost the ISP real money, and <> Manage Internet performance through Intel gateways as a production capability. ISPs are cooperating in tests and are interested in the data. 9: Planning for Memphis The chair led a discussion of work to be done prior to our next meeting. The key issues are: <> implementation of the Connectivity, One-way Delay, and Bulk Transfer Capacity drafts, <> refinement of the drafts, <> introduction of a packet loss metric, and <> continued refinement of the framework document. >From the floor, it was noted that more discussion of how to visualize and use the data would be of great value. It was noted that it is sometimes hard to implement the metrics without access to computers at the remote site(s). Jeff Sedayao noted that some parts of the Internet are turning off ICMP Echo in order to avoid scanning tools. While this would be regrettable, he noted that nobody is likely to turn off (the ordinary HTTP functionality of their) Web servers! It was noted that work should be done on developing relevant metrics for multicast performance.