I have reviewed this document as part of the security directorate's ongoing effort to review all IETF documents being processed by the IESG. These comments were written primarily for the benefit of the security area directors. Document editors and WG chairs should treat these comments just like any other last call comments. Summary: Ready with nits The "SHOULD" in the following sentence doesn't seem like a valid RFC 2119 keyword usage to me. "Any implications for network security arising from the DUT/SUT SHOULD be identical in the lab and in production networks." Please consider replacing it with lowercase "should". (I read it as predicting a correlation between the network security properties of the DUT in the lab environment and its behavior in a production environment, not as a guideline for implementors.) Comments: I'm not sure if you would consider this to be in scope, but might it be useful to instrument implementations being benchmarked with runtime error or anomaly detection? (This would be in addition to the uninstrumented "black-box" measurements.) This could lead to detecting security-relevant bounds checking or memory management errors induced by aggressive benchmarking workloads, possibly identifying vulnerabilities early enough to fix them before they're exploited. Some kinds of instrumentation could have a substantial performance impact, so it might be best to start testing well below the limits of uninstrumented performance of the devices/systems under test. Editorial: Section 13 (Security Considerations) uses "SUT" without a prior expansion. Presumably it means "System Under Test" or "Software Under Test"?