Minutes of the IP Performance Metrics Session of the Benchmarking Methodology Working Group Reported by Paul Love and Guy Almes 1. Overview and Agenda Bashing The meeting was chaired by WG co-Chair Guy Almes, and was very well-attended. This was the second IPPM session held in Europe, the first having been in Stockholm in summer 1995. Proposed agenda: Charter Issues Revised Framework Document Early Experience with One-way Delay & Packet Loss Revised Treno Document Interest Talk: Raj Bansal, Nokia Research Center Interest Talk: Martin Horneffer, Univ of Cologne Despite making the traditional invitation to agenda bashing, there were no changes suggested. 2. Charter Issues: Guy Almes and Jamshid Mahdavi [slides included in proceedings] The chair reviewed the history of the IPPM effort, beginning with an IPPM BOF session at the Danvers (spring 1995) meeting, the inclusion of IPPM as a separate effort within the Benchmarking Methodology working group and the Operational Requirements Area, and progress to date. For several reasons, it is now suggested that we reorganize as a separate IPPM working group within the Transport Area. One reason is the increasing awareness of the close relation between IP path performance metrics and transport-level dynamics. In order to form a new working group, we need a charter. The chair asked Jamshid Mahdavi to draft such a charter, since he was active in IPPM since the Danvers BOF and, together with Matt Mathis, had given thought to such a charter then. Jamshid's draft, with edits along the way, has been reviewed by the Transport Area ADs and circulated on the IPPM list. This edited draft, shown paragraph by paragraph on the overhead projector, was then reviewed by the working group. Guy noted that we were about 50% through our "to do" list, just as we're becoming a standalone WG. Services peripheral to layer-3 IP services, such as NOC/NIS services, will fall outside IPPM's scope. Scott Bradner: We're encouraged to make the framework doc to be an Informational RFC ASAP. Discussion then ensued on the meaning of the word "standard" and its usage in our charter. This is a difficult issue, and was not resolved during the meeting. One view is that our metrics should be documented as Informational RFCs, though with a process within the working group that follows the spirit of the IETF standards process. Another view is that the metrics should follow the IETF standards process as Proposed, then Draft, then (full) Standard RFCs. Questions from the floor asked for clarification on the relationship between IPPM and OpStat, RTFM, and BMWG. Briefly, the OpStat WG concluded some time ago, and was focused on statistics meaningful to providers (from within the cloud); IPPM focuses on metrics meaningful to users (from outside the cloud). RTFM focuses on the nature of flows rather than on the performance 'seen' by users, but it is very much acknowledged that RTFM and IPPM have much to share with each other. The remaining BMWG charter should be revised to clarify that IPPM issues fall outside it. Based on several comments, the draft charter will be further revised and is expected to result in the formal establishment of an IPPM working group shortly. 3. Revisions to the Framework Document: Vern Paxson [slides included in proceedings] After an initial version reviewed at the (summer 1996) Montreal meeting and with major revisions at the (fall 1996) San Jose meeting, the Framework document is now close to a final editing pass. Vern Paxson presented the changes in the current draft. By far the most important is a new section on the criteria/process leading to 'official status' for a given metric. When a protocol is made a standard, it is generally required that there be two independent interoperable implementations. Following this spirit, the Framework draft now calls for either two methodologies, each with at least one independent interoperable implementation, or (especially in cases where there is one preferable methodology) one methodology with at least two independent interoperable implementations. By 'independent', we stress the value of implementations that proceed from the documented metric rather than having shared code or even shared implementation orientations not based on the documented metric. By 'interoperable', we stress the value of multiple implementations whose measurements are consistent with each other; this is understood to be difficult, since it is difficult to present two different implementations with identical network conditions. In keeping with the draft, we considered the notion of rough consensus within the working group. Scott Bradner pointed to the difficulty in doing this for metrics proposed after the working group completes. Some suggest that we should work to develop an environment where we can run tests over and over to see if we really have statistically the same results. This would add rigor, but at costs of large effort. And, even then, what constitutes a meaningful difference, and why? After further discussion made clear the difficult tradeoffs involved in these issues, the chair proposed that this section of the Framework be reworded as a discussion of the considerations and issues, with detailed documentation of the right process to wait until some future revision when experience with a few specific metrics better informs our understanding of general process. The other revisions to the Framework draft were more minor, but did result in helpful suggestions to the authors. Among these revisions: <> Minor revisions to the notion of well-formed/standard-formed packets <> Comments on the applicability of the notion of wire-time measurements Among the remaining work to be done are: - An appendix on computing goodness of fit using the Anderson-Darling test - A simple test for correlation - Adding a table of contents - Possible new material on multi-path phenomena - Perhaps define clouds as directed graphs - Perhaps define broadcast networks as set of links vs meshes Vern will do these edits so that we will be able to get the Framework issued as an Informational RFC prior to the Washington IETF meeting. 4. Early experience with One-way Delay and Packet Loss: Guy Almes and Sue Hares [slides by Almes included in proceedings] Guy Almes reviewed the One-way Delay and Packet Loss metrics, with particular attention to the Singleton/Sample/Statistic structure and the importance of Poisson processes for timing the various Singleton tests within a Sample. Given this review, he presented some experiences based on work done at Advanced Network & Services on their work with the Common Solutions Group of universities. One important point (though clumsily made on the slides) was that useful measurements of Packet Loss might call for a different value of lambda than corresponding measurements of One-way Delay. For example, on a high-speed wide-area network, one might want an accuracy of plus-or-minus 1% and over time periods of no more than one minute. This would call for values of lambda of 2/sec or greater. This high value of lambda might be much more than needed for useful measurements of one-way delay on the same network. The reported work uses GPS antennae at both source and destination to allow one-way delays to be measured to accuracies of a few 10s of microseconds. The current implementation is within user space, but modern PC-based systems are fast enough for this to yield useful data. Pushing the implementation to within the kernel is planned eventually, and will allow us to tighten the error bars. The measurement infrastructure now being deployed by Advanced Network & Services in cooperation with the Common Solutions Group uses 200-MHz PC-based machines at user sites. Ongoing sets of measurements, conforming to the current draft One-way Delay and Packet Loss metrics, are performed with a lambda of 4/sec. The results are uploaded to an Oracle database. Users will then query this database with web-based tools. Seven sites are now operational. More information can be found at http://www.advanced.org/csg-ippm/ Among the early results, the most interesting is that, even when the path between two sites is symmetric, patterns of one-way delay and packet loss are decidedly asymmetric. Daniel Karrenberg reported that the RIPE NCC has started a project to do One-Way Delay, and that they plan to deploy test boxes throughout Europe. Sue Hares reported on IPPM-related work at Merit. The IPMA Project is funded by NSF, UM/Merit, and CAIDA. This work leverages the value of Merit having machines capable of measurements placed at key exchange points. Some of the most interesting results to date relate to pathological behavior of BGP implementations within routers at the exchange points. This work will in the future allow Merit to contribute to IPPM work on routing stability. They are also implementing one-way delay, round-trip delay, and packet loss on paths both across backbones and between pairs of routers at exchange points. More information can be found within the IPMA section of http://www.merit.edu/ Christian Huitema noted that it might be difficult for the receiver of a stream of test packets to have accurate knowledge of how many packets were sent if all the most recent of them have been dropped. He sees this as one weakness of the technique of using a Poisson process to define such streams. Guy noted that one way to solve this problem would be to have the sender and receiver both know the complete schedule of times at which the packets would be sent. This can be done, for example, by both knowing the seed of the pseudo-random number generator used to define the Poisson process. 5. Recent work on TReno and Bulk Transfer Capacity: Matt Mathis [slides included in proceedings] A new draft of the Empirical Bulk Transfer Capacity Internet Draft was issued recently. The key improvements are on how to use it to measure end-to-end performance. The draft has benefited from some research work done by Matt with Jeffrey Semke, Jamshid Mahdavi, and Teunis Ott, which resulted in a current Computer Communication Review paper on macroscopic TCP performance behavior. More information can be found on the CCR paper at: http://www.psc.edu/networking/papers In recent work, Matt has noticed that flow performance over wide-area high-speed and usually low-packet-loss networks can often be degraded by the apparent loss of an entire window every so often. It has been suggested that they may be related to route churn. 6. Multimedia over Internet: Performance in the absence of QoS: Raj Bansal [slides included in proceedings] Bansal and his colleagues at Nokia are concerned with the performance of telephony over IP and similar real-time services on the Internet. He noted that preconceived ideas of delay, formed by experience with conventional telephony, may not be either right or sufficient for Internet. Also, delay and packet-loss are not independent for real-time applications. They may be able to be traded off against the other. He reviewed a graph of the tail of a distribution of delay vs packet loss, and noted that in real-time applications, after a point it is no longer worth waiting for a delayed packet. Non-traditional trade-offs may make sense. The entire delay distribution, rather than a simple summary statistic such as the median, may be a more useful thing to look at. In discussion, Christian Huitema discussed similar tradeoffs. One must often look at the time-sequence of packet loss events rather than at simple percentages. The possibilities of packet loss for successive packets are not independent events. 7. Experience with Tests of Connectivity: Martin Horneffer [slides included in proceedings] Martin presented a talk on his implementation for loss/RTT metrics (using UDP echo-packets) and of his experiences with large-scale loss/RTT measurements. The measurements were from Europe to about 1000 hosts in the Internet via two different providers. The work is interesting for several reasons. First, it includes the first implementation of the Connectivity metric. Second, it does not rely on cooperation from remote hosts other than support for ICMP and UDP Echo. The program, JPING, is implemented is Java, with effective use of tcpdump and a perl script to analyze the results.