Reviewer: toerless eckert Review result: On the right track Summary: Thanks a lot for this work. Its an immensibley complex and important problem to tackle. I have in my time only measured router traffic performance and that already was an infinite matrix. This looks to me like some order of infinite bigger a problem. Meaning: however questioning my reviews feedback may be wrt. nitpicking about the document, i think that the document in its existing form is already a great advancement to measure performance for these security devices, and in doubt should be progressed rather faster than slower especially because in my (limited) market understanding, many security device vendors will only provide actual feedback once it is an RFC (community i think overall more conservative in adopting IETF work, most not proactively engaging during draft stage). But of course: feel free to improve the document with any of the feedback/suggestions in my review that you feel are useful. Maybe high level, i would suggest most importantly to add more explanations, especially in an appropriate section about those aspects known NOT to be considered (but potentially important) so that the applicability of the tests that are described are better put into perspective by adopters of the draft to their real-world situations. Favorite pet topic: Add req. to measure the DUT through a power meter and report consumption so we can start making sure products with lower power consumptions will see sales benefits when reporting numbers from this document (see details inline). Formal: I choose to keep the whole document inline to make it easier for readers to vet my comments without having to open in parallel a copy of the whole document. Rest inline - email ends with string EOF (i have seen some email truncation happening). Thanks! Toerless --- Please fix the following nits - from https://www.ietf.org/tools/idnits idnits 2.17.00 (12 Aug 2021) > /tmp/idnits29639/draft-ietf-bmwg-ngfw-performance-13.txt: > ... > > Checking nits according to https://www.ietf.org/id-info/checklist : > ---------------------------------------------------------------------------- > > ** The abstract seems to contain references ([RFC3511]), which it > shouldn't. Please replace those with straight textual mentions of the > documents in question. > > == There are 1 instance of lines with non-RFC6890-compliant IPv4 addresses > in the document. If these are example addresses, they should be changed. > > == There are 1 instance of lines with non-RFC3849-compliant IPv6 addresses > in the document. If these are example addresses, they should be changed. > > -- The draft header indicates that this document obsoletes RFC3511, but the > abstract doesn't seem to directly say this. It does mention RFC3511 > though, so this could be OK. > > > Miscellaneous warnings: > ---------------------------------------------------------------------------- > > == The document seems to lack the recommended RFC 2119 boilerplate, even if > it appears to use RFC 2119 keywords. > > (The document does seem to have the reference to RFC 2119 which the > ID-Checklist requires). > > The lines in the following commented copy of the document are from idnits too When a comment/question is preceeded with "Nit:", then it indicates, that it seems to me the best answer would be modified draft text. When a comment/question is preceeded with "Q:", then i am actually not so sure what the outcome could be, so an answer in mail would be a start. 2 Benchmarking Methodology Working Group B. Balarajah 3 Internet-Draft 4 Obsoletes: 3511 (if approved) C. Rossenhoevel 5 Intended status: Informational EANTC AG 6 Expires: 16 July 2022 B. Monkman 7 NetSecOPEN 8 January 2022 10 Benchmarking Methodology for Network Security Device Performance 11 draft-ietf-bmwg-ngfw-performance-13 13 Abstract 15 This document provides benchmarking terminology and methodology for 16 next-generation network security devices including next-generation 17 firewalls (NGFW), next-generation intrusion prevention systems 18 (NGIPS), and unified threat management (UTM) implementations. The Nit: Why does it have to be next-generation for all example type of devices except for UTMs, and what does next-generation mean. Would suggest to rewrite text so reader does not ask herself these questions. 18 (NGIPS), and unified threat management (UTM) implementations. The 19 main areas covered in this document are test terminology, test 20 configuration parameters, and benchmarking methodology for NGFW and 21 NGIPS. This document aims to improve the applicability, I don't live and breathe the security device TLA space, but i start to suspect a UTM is some platform on which FW and IPS could run as software modules, and because its only software you assume the UTM does not have to be next-gen ? I wonder how much of this guesswork/thought process you want the reader to have or if you want to avoid that by being somehawt clearer... 21 NGIPS. This document aims to improve the applicability, 22 reproducibility, and transparency of benchmarks and to align the test 23 methodology with today's increasingly complex layer 7 security 24 centric network application use cases. As a result, this document 25 makes [RFC3511] obsolete. [minor] I kinda wonder if / how obsoleting RFC3511 could/should work. I understand when we do a bis of a standard protocol and really don't want anyone to implement the older version. But unless there is a similar IETF mandate going along with this draft that says non-NG FW and non-NG IPS are hereby obsoleted by the IETF, i can not see how this draft can obsolete RFC3511 because it simply applies to a different type of benchmarked entities. And RFC3511 would stay on forever for whatever we call non-NG. [minor] At least i think that is the case Unless this document actually does apply also to non-NG FW/IPS and can therefore superceed RFC3511 and actually obsolete it. But the text so far does say the opposite. [mayor] I observe that RFC3511 asks to measure and report goodput (5.6.5.2), and this document does not mention the term, and if at all, the loss in performance of client/server TCP or QUIC connections through behavior of the DUT (such as proxying) is at best covered indirectly by mentioning parameters such as less than 5% reduction in throughput. If this document is superceeding rfc3511 i think it should have a very explicit section discussing goodput - and maybe expanding on it. consider for example the impact of TCP connection throughput and goodput. Very likely DUT proxying TCP connections will have quite a different performance/goodput impact for a calssical web-page vs. video streaming. Therefore i am also worried about sending only average bitrates per session as opposed to some sessions going up to e.g. 500Mbps for a video streaming connection (example best commercial available UHD video streaming today). Those type of sessions might incur a lot of goodput loss with bad DUTs, but if i understand the test profile, then the per-TCP connection througput of the test profiles will be much less than 100Mbps. If such range in client session bitrates is not meant to be tested, it might at least be useful to add a section listing candidate gaps like this. Another one for example is the impact of higher RTT especially between DUT and server in the Internet. This mostly challenges TCP window size operation on DUT operating as TCP hosts and also their ability to buffer for retransmissions. Test Equipment IMHO may/should be able to emulate such long RTT. But this is not included in this document (RTT not mentioned). Beside goodput related issues, there are a couple other points in this review that may be too difficult to fix this late in the development of the document, but maybe for any of those considered to be useful input maybe add them to a section "out-of-scope (for future versions) considerations" or the like to capture them. 27 Status of This Memo 29 This Internet-Draft is submitted in full conformance with the 30 provisions of BCP 78 and BCP 79. 32 Internet-Drafts are working documents of the Internet Engineering 33 Task Force (IETF). Note that other groups may also distribute 34 working documents as Internet-Drafts. The list of current Internet- 35 Drafts is at https://datatracker.ietf.org/drafts/current/. 37 Internet-Drafts are draft documents valid for a maximum of six months 38 and may be updated, replaced, or obsoleted by other documents at any 39 time. It is inappropriate to use Internet-Drafts as reference 40 material or to cite them other than as "work in progress." 42 This Internet-Draft will expire on 5 July 2022. 44 Copyright Notice 46 Copyright (c) 2022 IETF Trust and the persons identified as the 47 document authors. All rights reserved. 49 This document is subject to BCP 78 and the IETF Trust's Legal 50 Provisions Relating to IETF Documents (https://trustee.ietf.org/ 51 license-info) in effect on the date of publication of this document. 52 Please review these documents carefully, as they describe your rights 53 and restrictions with respect to this document. Code Components 54 extracted from this document must include Revised BSD License text as 55 described in Section 4.e of the Trust Legal Provisions and are 56 provided without warranty as described in the Revised BSD License. 58 Table of Contents 60 1. Introduction . . . . . . . . . . . . . . . . . . . . . . . . 4 61 2. Requirements . . . . . . . . . . . . . . . . . . . . . . . . 4 62 3. Scope . . . . . . . . . . . . . . . . . . . . . . . . . . . . 4 63 4. Test Setup . . . . . . . . . . . . . . . . . . . . . . . . . 4 64 4.1. Testbed Configuration . . . . . . . . . . . . . . . . . . 5 65 4.2. DUT/SUT Configuration . . . . . . . . . . . . . . . . . . 6 66 4.2.1. Security Effectiveness Configuration . . . . . . . . 12 67 4.3. Test Equipment Configuration . . . . . . . . . . . . . . 12 68 4.3.1. Client Configuration . . . . . . . . . . . . . . . . 12 69 4.3.2. Backend Server Configuration . . . . . . . . . . . . 15 70 4.3.3. Traffic Flow Definition . . . . . . . . . . . . . . . 17 71 4.3.4. Traffic Load Profile . . . . . . . . . . . . . . . . 17 72 5. Testbed Considerations . . . . . . . . . . . . . . . . . . . 18 73 6. Reporting . . . . . . . . . . . . . . . . . . . . . . . . . . 19 74 6.1. Introduction . . . . . . . . . . . . . . . . . . . . . . 19 75 6.2. Detailed Test Results . . . . . . . . . . . . . . . . . . 21 76 6.3. Benchmarks and Key Performance Indicators . . . . . . . . 21 77 7. Benchmarking Tests . . . . . . . . . . . . . . . . . . . . . 23 78 7.1. Throughput Performance with Application Traffic Mix . . . 23 79 7.1.1. Objective . . . . . . . . . . . . . . . . . . . . . . 23 80 7.1.2. Test Setup . . . . . . . . . . . . . . . . . . . . . 23 81 7.1.3. Test Parameters . . . . . . . . . . . . . . . . . . . 23 82 7.1.4. Test Procedures and Expected Results . . . . . . . . 25 83 7.2. TCP/HTTP Connections Per Second . . . . . . . . . . . . . 26 84 7.2.1. Objective . . . . . . . . . . . . . . . . . . . . . . 26 85 7.2.2. Test Setup . . . . . . . . . . . . . . . . . . . . . 27 86 7.2.3. Test Parameters . . . . . . . . . . . . . . . . . . . 27 87 7.2.4. Test Procedures and Expected Results . . . . . . . . 28 88 7.3. HTTP Throughput . . . . . . . . . . . . . . . . . . . . . 30 89 7.3.1. Objective . . . . . . . . . . . . . . . . . . . . . . 30 90 7.3.2. Test Setup . . . . . . . . . . . . . . . . . . . . . 30 91 7.3.3. Test Parameters . . . . . . . . . . . . . . . . . . . 30 92 7.3.4. Test Procedures and Expected Results . . . . . . . . 32 93 7.4. HTTP Transaction Latency . . . . . . . . . . . . . . . . 33 94 7.4.1. Objective . . . . . . . . . . . . . . . . . . . . . . 33 95 7.4.2. Test Setup . . . . . . . . . . . . . . . . . . . . . 33 96 7.4.3. Test Parameters . . . . . . . . . . . . . . . . . . . 34 97 7.4.4. Test Procedures and Expected Results . . . . . . . . 35 98 7.5. Concurrent TCP/HTTP Connection Capacity . . . . . . . . . 36 99 7.5.1. Objective . . . . . . . . . . . . . . . . . . . . . . 36 100 7.5.2. Test Setup . . . . . . . . . . . . . . . . . . . . . 36 101 7.5.3. Test Parameters . . . . . . . . . . . . . . . . . . . 37 102 7.5.4. Test Procedures and Expected Results . . . . . . . . 38 103 7.6. TCP/HTTPS Connections per Second . . . . . . . . . . . . 39 104 7.6.1. Objective . . . . . . . . . . . . . . . . . . . . . . 40 105 7.6.2. Test Setup . . . . . . . . . . . . . . . . . . . . . 40 106 7.6.3. Test Parameters . . . . . . . . . . . . . . . . . . . 40 107 7.6.4. Test Procedures and Expected Results . . . . . . . . 42 108 7.7. HTTPS Throughput . . . . . . . . . . . . . . . . . . . . 43 109 7.7.1. Objective . . . . . . . . . . . . . . . . . . . . . . 43 110 7.7.2. Test Setup . . . . . . . . . . . . . . . . . . . . . 43 111 7.7.3. Test Parameters . . . . . . . . . . . . . . . . . . . 43 112 7.7.4. Test Procedures and Expected Results . . . . . . . . 45 113 7.8. HTTPS Transaction Latency . . . . . . . . . . . . . . . . 46 114 7.8.1. Objective . . . . . . . . . . . . . . . . . . . . . . 46 115 7.8.2. Test Setup . . . . . . . . . . . . . . . . . . . . . 46 116 7.8.3. Test Parameters . . . . . . . . . . . . . . . . . . . 46 117 7.8.4. Test Procedures and Expected Results . . . . . . . . 48 118 7.9. Concurrent TCP/HTTPS Connection Capacity . . . . . . . . 49 119 7.9.1. Objective . . . . . . . . . . . . . . . . . . . . . . 49 120 7.9.2. Test Setup . . . . . . . . . . . . . . . . . . . . . 49 121 7.9.3. Test Parameters . . . . . . . . . . . . . . . . . . . 49 122 7.9.4. Test Procedures and Expected Results . . . . . . . . 51 123 8. IANA Considerations . . . . . . . . . . . . . . . . . . . . . 52 124 9. Security Considerations . . . . . . . . . . . . . . . . . . . 53 125 10. Contributors . . . . . . . . . . . . . . . . . . . . . . . . 53 126 11. Acknowledgements . . . . . . . . . . . . . . . . . . . . . . 53 127 12. References . . . . . . . . . . . . . . . . . . . . . . . . . 53 128 12.1. Normative References . . . . . . . . . . . . . . . . . . 53 129 12.2. Informative References . . . . . . . . . . . . . . . . . 53 130 Appendix A. Test Methodology - Security Effectiveness 131 Evaluation . . . . . . . . . . . . . . . . . . . . . . . 54 132 A.1. Test Objective . . . . . . . . . . . . . . . . . . . . . 55 133 A.2. Testbed Setup . . . . . . . . . . . . . . . . . . . . . . 55 134 A.3. Test Parameters . . . . . . . . . . . . . . . . . . . . . 55 135 A.3.1. DUT/SUT Configuration Parameters . . . . . . . . . . 55 136 A.3.2. Test Equipment Configuration Parameters . . . . . . . 55 137 A.4. Test Results Validation Criteria . . . . . . . . . . . . 56 138 A.5. Measurement . . . . . . . . . . . . . . . . . . . . . . . 56 139 A.6. Test Procedures and Expected Results . . . . . . . . . . 57 140 A.6.1. Step 1: Background Traffic . . . . . . . . . . . . . 57 141 A.6.2. Step 2: CVE Emulation . . . . . . . . . . . . . . . . 58 142 Appendix B. DUT/SUT Classification . . . . . . . . . . . . . . . 58 143 Authors' Addresses . . . . . . . . . . . . . . . . . . . . . . . 58 145 1. Introduction 147 18 years have passed since IETF recommended test methodology and 148 terminology for firewalls initially ([RFC3511]). The requirements 149 for network security element performance and effectiveness have 150 increased tremendously since then. In the eighteen years since [nit] What is a network security element ? Please provide reference or define. If we are talking about them in this doc why are they not mentioned in the abstract ? 150 increased tremendously since then. In the eighteen years since 151 [RFC3511] was published, recommending test methodology and 152 terminology for firewalls, requirements and expectations for network 153 security elements has increased tremendously. Security function [nit] This does not parse as correct english to me "recommending test methodology ... has increased tremendously". It would, if you mean that more and more test methodologies where recommended, but not if there is an outstanding need to do so (which this document intends to fill). [nit] Why does the recommending part apply only to firewalls and the requirements and expectations only to security elements ? 153 security elements has increased tremendously. Security function [nit] What is a security function ? (i know, but i don't know if the reader is supposed to know). Aka: provide reference, add terminology section or define. Maybe easiest to restructure this intro paragraph to start with the explanation of the evolution from firewalls to network security elements which support one or more securit functions including firewall, intrusion detection etc. pp - and then conclude easily how this means that this requires this document to define all the good BMWG stuff it hopefully does. Although a terminology section is never a bad thing either ;-) 154 implementations have evolved to more advanced areas and have 155 diversified into intrusion detection and prevention, threat 156 management, analysis of encrypted traffic, etc. In an industry of 157 growing importance, well-defined, and reproducible key performance 158 indicators (KPIs) are increasingly needed to enable fair and 159 reasonable comparison of network security functions. All these [nit] maybe add what to compare - performance, functionality, scale, flexibility, adjustability - or if you knowingly only discuss subsets of these aspects, then maybe still list all the aspects you are aware of to be of interest to likely readers of this document and summarize those that you will and those that you won't cover in this document, so that the readers don't have to continue reading the document hoping to find them described. 160 reasons have led to the creation of a new next-generation network 161 security device benchmarking document, which makes [RFC3511] 162 obsolete. [nit] as mentioned above, whether or not the obsolete is true is not clear to me yet. 164 2. Requirements 166 The keywords "MUST", "MUST NOT", "REQUIRED", "SHALL", "SHALL NOT", 167 "SHOULD", "SHOULD NOT", "RECOMMENDED", "NOT RECOMMENDED", "MAY", and 168 "OPTIONAL" in this document are to be interpreted as described in BCP 169 14 [RFC2119], [RFC8174] when, and only when, they appear in all 170 capitals, as shown here. 172 3. Scope 174 This document provides testing terminology and testing methodology 175 for modern and next-generation network security devices that are 176 configured in Active ("Inline", see Figure 1 and Figure 2) mode. It [nit] The word Active does not again happen in the document, instead, the description on line 261 defines Inline mode as "active", which in my book makes 176+261 a perfect circular definition. I would suggest to have a terminology section that define "Inline", for example by also adding one most likely possible alternative mode description. 177 covers the validation of security effectiveness configurations of [nit] security configuration effectiveness ? 179 network security devices, followed by performance benchmark testing. 179 This document focuses on advanced, realistic, and reproducible 180 testing methods. Additionally, it describes testbed environments, [nit] are you sure advanced and realistic are meant to characterize the testing method or the scenario that is being tested ? "reroducible testing methods for advanced real world scenarios" ? 181 test tool requirements, and test result formats. 183 4. Test Setup 185 Test setup defined in this document applies to all benchmarking tests [nit] "/Test setup defined/The test setup defined/ 186 described in Section 7. The test setup MUST be contained within an 187 Isolated Test Environment (see Section 3 of [RFC6815]). 189 4.1. Testbed Configuration 191 Testbed configuration MUST ensure that any performance implications 192 that are discovered during the benchmark testing aren't due to the [nit] /aren't/are not/ 193 inherent physical network limitations such as the number of physical 194 links and forwarding performance capabilities (throughput and 195 latency) of the network devices in the testbed. For this reason, 196 this document recommends avoiding external devices such as switches 197 and routers in the testbed wherever possible. 199 In some deployment scenarios, the network security devices (Device 200 Under Test/System Under Test) are connected to routers and switches, 201 which will reduce the number of entries in MAC or ARP tables of the 202 Device Under Test/System Under Test (DUT/SUT). If MAC or ARP tables 203 have many entries, this may impact the actual DUT/SUT performance due 204 to MAC and ARP/ND (Neighbor Discovery) table lookup processes. This 205 document also recommends using test equipment with the capability of [nit] /also/therefore/ 206 emulating layer 3 routing functionality instead of adding external 207 routers in the testbed. 209 The testbed setup Option 1 (Figure 1) is the RECOMMENDED testbed 210 setup for the benchmarking test. 212 +-----------------------+ +-----------------------+ 213 | +-------------------+ | +-----------+ | +-------------------+ | 214 | | Emulated Router(s)| | | | | | Emulated Router(s)| | 215 | | (Optional) | +----- DUT/SUT +-----+ (Optional) | | 216 | +-------------------+ | | | | +-------------------+ | 217 | +-------------------+ | +-----------+ | +-------------------+ | 218 | | Clients | | | | Servers | | 219 | +-------------------+ | | +-------------------+ | 220 | | | | 221 | Test Equipment | | Test Equipment | 222 +-----------------------+ +-----------------------+ 224 Figure 1: Testbed Setup - Option 1 226 If the test equipment used is not capable of emulating layer 3 227 routing functionality or if the number of used ports is mismatched 228 between test equipment and the DUT/SUT (need for test equipment port 229 aggregation), the test setup can be configured as shown in Figure 2. 231 +-------------------+ +-----------+ +--------------------+ 232 |Aggregation Switch/| | | | Aggregation Switch/| 233 | Router +------+ DUT/SUT +------+ Router | 234 | | | | | | 235 +----------+--------+ +-----------+ +--------+-----------+ 236 | | 237 | | 238 +-----------+-----------+ +-----------+-----------+ 239 | | | | 240 | +-------------------+ | | +-------------------+ | 241 | | Emulated Router(s)| | | | Emulated Router(s)| | 242 | | (Optional) | | | | (Optional) | | 243 | +-------------------+ | | +-------------------+ | 244 | +-------------------+ | | +-------------------+ | 245 | | Clients | | | | Servers | | 246 | +-------------------+ | | +-------------------+ | 247 | | | | 248 | Test Equipment | | Test Equipment | 249 +-----------------------+ +-----------------------+ 251 Figure 2: Testbed Setup - Option 2 [nit] Please elaborate on the "number of used ports", and if possible show in Figure 2 by drawing multiple links. I guess that in a common case, the test equipment might provide few, but fast ports, whereas the DUT/SU might provide more slower ports, and one would there use external switches as port multiplexer ? Or vice-versa ? Butif such adaptation is performed, i wonder how different setup might impact the measurements. So for example let's say the Test Equipment (TE) has a 100Gbps port, and the DUT has 4 * 10Gbps port, so you need on each side a switch with 100Gbps and 2 * 10 Gbps. Would you try to use VLANs into the TE, or would you just build a single LAN. Any recommendations for the switch config, and why. [mayor] The fact that the left side says only client and the right side says only server is worth some more discussion. Especially because the Filtering in Figure 3 also lets me wonder in which direction traffic is meant to be filtered/inspected. Are you considering the case that clients are responders to (TCP/QUIC/UDP) connections ? For example left side is "inside", DUT is a site firewall to the Internet (right side), and there is some server on the left side (e.g.: SMTP). How about that you do have on the right an Internet and a separate site DMZ interface and then of course traffic not only between left and right, but between those interfaces on the right ? More broadly applicable, dynamic port discovery for ICE/STUN, where you want to permit inside to outside connections (to the STUN server) to permit new connections from other external nodes to go back inside). E.g.: would be good to have some elaboration about the rype of connections covered by this document. If its only initiators on the left and responders on the right, that is fine, but it should be said so and maybe point to those above cases (DMZ, inside servers, STUN/ICE) not covered by this document. 253 4.2. DUT/SUT Configuration 255 A unique DUT/SUT configuration MUST be used for all benchmarking 256 tests described in Section 7. Since each DUT/SUT will have its own 257 unique configuration, users SHOULD configure their device with the 258 same parameters and security features that would be used in the 259 actual deployment of the device or a typical deployment in order to 260 achieve maximum network security coverage. The DUT/SUT MUST be [nit] What is a "unique configuration" ? It could be different configurations across two different DUT but both achieving the same service/filtering, just difference in syntax, or it could be difference in functional outcome. Would be good to be more precise what is meant. [nit] Why would a user choose an actual deployment vs. a typical deployment ? I am imagining that a user would choose an actual deployment to measure performance specifically for that deployment but a typical deployment when the DUT would need to be deployed in different setups but not each of those can be measured individually, or because the results are meant to be comparable with other users who may have taken performance numbers. WOuld be good to elaborate a bit more so readers have a clearer understanding what "actual deployment" and "typical deployment" means and how/why to pick one over the other. [nit] I do not understand how the text up to "in order to" justifies that it will achieve the maximum network security coverage. I also do not know what "maximum network security coverage" means. If there is a definition, please provide it. Else introduce it. 260 achieve maximum network security coverage. The DUT/SUT MUST be 261 configured in "Inline" mode so that the traffic is actively inspected 262 by the DUT/SUT. Also "Fail-Open" behavior MUST be disabled on the 263 DUT/SUT. 265 Table 1 and Table 2 below describe the RECOMMENDED and OPTIONAL sets 266 of network security feature list for NGFW and NGIPS respectively. 267 The selected security features SHOULD be consistently enabled on the 268 DUT/SUT for all benchmarking tests described in Section 7. 270 To improve repeatability, a summary of the DUT/SUT configuration 271 including a description of all enabled DUT/SUT features MUST be 272 published with the benchmarking results. 274 +============================+=============+==========+ 275 | DUT/SUT (NGFW) Features | RECOMMENDED | OPTIONAL | 276 +============================+=============+==========+ 277 | SSL Inspection | x | | 278 +----------------------------+-------------+----------+ 279 | IDS/IPS | x | | 280 +----------------------------+-------------+----------+ 281 | Anti-Spyware | x | | 282 +----------------------------+-------------+----------+ 283 | Anti-Virus | x | | 284 +----------------------------+-------------+----------+ 285 | Anti-Botnet | x | | 286 +----------------------------+-------------+----------+ 287 | Web Filtering | | x | 288 +----------------------------+-------------+----------+ 289 | Data Loss Protection (DLP) | | x | 290 +----------------------------+-------------+----------+ 291 | DDoS | | x | 292 +----------------------------+-------------+----------+ 293 | Certificate Validation | | x | 294 +----------------------------+-------------+----------+ [mayor] This may be bogs because i don't know well enough how for the purpose of this document security devices are expected to inspect HTTP connections from client to server. Maybe this is a sane approach where the security device operates as a client trusted HTTPs proxy, maybe its one of the more hacky approaches (faked server certs). But however it works, i think that a security device can not get away from validating the certificate of the server in a connection. Else it shouldn't be called a security DUT. But i am not sure if that validation is what you call "Certificate Validation". 294 +----------------------------+-------------+----------+ 295 | Logging and Reporting | x | | 296 +----------------------------+-------------+----------+ 297 | Application Identification | x | | 298 +----------------------------+-------------+----------+ 300 Table 1: NGFW Security Features [nit] Why are "Web Filtering"..."Certificate Validation" only MAY ? Please point to a place in the document (or elsewhere) that rationales the SHOULD/MAY recommendations. Same applies to Table 2. [nit] 302 +============================+=============+==========+ 303 | DUT/SUT (NGIPS) Features | RECOMMENDED | OPTIONAL | 304 +============================+=============+==========+ 305 | SSL Inspection | x | | 306 +----------------------------+-------------+----------+ 307 | Anti-Malware | x | | 308 +----------------------------+-------------+----------+ 309 | Anti-Spyware | x | | 310 +----------------------------+-------------+----------+ 311 | Anti-Botnet | x | | 312 +----------------------------+-------------+----------+ 313 | Logging and Reporting | x | | 314 +----------------------------+-------------+----------+ 315 | Application Identification | x | | 316 +----------------------------+-------------+----------+ 317 | Deep Packet Inspection | x | | 318 +----------------------------+-------------+----------+ 319 | Anti-Evasion | x | | 320 +----------------------------+-------------+----------+ 322 Table 2: NGIPS Security Features [nit] I ended up scrolling up and down to compare the tables. It might be useful for other readers like me to merge the tables, aka: put the columns for NGFW and NGIPS into one table. [nit] Please start with Table 3 as it introduces the security features, else the two above tables introduce a lot of features without defining them. 324 The following table provides a brief description of the security 325 features. 327 +================+================================================+ 328 | DUT/SUT | Description | 329 | Features | | 330 +================+================================================+ 331 | SSL Inspection | DUT/SUT intercepts and decrypts inbound HTTPS | 332 | | traffic between servers and clients. Once the | 333 | | content inspection has been completed, DUT/SUT | 334 | | encrypts the HTTPS traffic with ciphers and | 335 | | keys used by the clients and servers. | 336 +----------------+------------------------------------------------+ 337 | IDS/IPS | DUT/SUT detects and blocks exploits targeting | 338 | | known and unknown vulnerabilities across the | 339 | | monitored network. | 340 +----------------+------------------------------------------------+ 341 | Anti-Malware | DUT/SUT detects and prevents the transmission | 342 | | of malicious executable code and any | 343 | | associated communications across the monitored | 344 | | network. This includes data exfiltration as | 345 | | well as command and control channels. | 346 +----------------+------------------------------------------------+ 347 | Anti-Spyware | Anti-Spyware is a subcategory of Anti Malware. | 348 | | Spyware transmits information without the | 349 | | user's knowledge or permission. DUT/SUT | 350 | | detects and block initial infection or | 351 | | transmission of data. | 352 +----------------+------------------------------------------------+ 353 | Anti-Botnet | DUT/SUT detects traffic to or from botnets. | 354 +----------------+------------------------------------------------+ 355 | Anti-Evasion | DUT/SUT detects and mitigates attacks that | 356 | | have been obfuscated in some manner. | 357 +----------------+------------------------------------------------+ 358 | Web Filtering | DUT/SUT detects and blocks malicious website | 359 | | including defined classifications of website | 360 | | across the monitored network. | 361 +----------------+------------------------------------------------+ 362 | DLP | DUT/SUT detects and prevents data breaches and | 363 | | data exfiltration, or it detects and blocks | 364 | | the transmission of sensitive data across the | 365 | | monitored network. | 366 +----------------+------------------------------------------------+ 367 | Certificate | DUT/SUT validates certificates used in | 368 | Validation | encrypted communications across the monitored | 369 | | network. | 370 +----------------+------------------------------------------------+ 371 | Logging and | DUT/SUT logs and reports all traffic at the | 372 | Reporting | flow level across the monitored network. | 373 +----------------+------------------------------------------------+ 374 | Application | DUT/SUT detects known applications as defined | 375 | Identification | within the traffic mix selected across the | 376 | | monitored network. | 377 +----------------+------------------------------------------------+ 379 Table 3: Security Feature Description [nit] Why is DDoS and DPI not listed in this table ? I just randomnly stumbled across that one, but maybe there are more mismatches between Table 1 and 2. Pls. make sure all Table 1/2 Features are mentioned. [nit] I have a bout 1000 questions and concerns about this stuff: Are there actually IETF specifications for how any of these features on the DUT do work or should work, or is this all vendor proprietary functionality ? For anything that is vendor / market proprietary specification, how would the TE (Test Equipment) know what the DUT does, so that it can effectively test it ? I imagine that if there is a difference in how a particular feature functions across different vendor DUTs, that the same is true for TE, so some TE would have more functional overlap with DUT than others. ? [nit (continued] E.g.: lets say some DUT1 feature , e.g.: DLP is really simple and therefore ot very secure. But that makes it a lot faster than a DUT2 DLP feature which is a lot more secure. Maybe there is a metric for this security, like if i rememver correctly from the past, the number of signatures in virus detection or the like... How would such differences be taken into account in measurement? 381 Below is a summary of the DUT/SUT configuration: 383 * DUT/SUT MUST be configured in "inline" mode. 385 * "Fail-Open" behavior MUST be disabled. 387 * All RECOMMENDED security features are enabled. 389 * Logging SHOULD be enabled. DUT/SUT SHOULD log all traffic at the 390 flow level - Logging to an external device is permissible. [nit] Does that mean logging of ALL flows or only of flows that trigger some security issue ? Logging of ALL flows seems like a big performance hog and may be something infeasible in fast deployments and may need to be tested as a separate case by itself. (but my concern may be outdated). [nit] If logging is to an external device, it may be useful to indicate in Figure 1/2 such a logging receiver, and ideally have it operate via a link from the DUT that does not pass test traffic so that it does not interfere. 392 * Geographical location filtering, and Application Identification 393 and Control SHOULD be configured to trigger based on a site or 394 application from the defined traffic mix. [nit] Geographic location filtering does not sound like a generically necessary or applicable security feature. If you are for example a high-tech manufacturer that sells all over the world, you may appreciate customers visiting your webserver from countries that happen to also host a lot of botnets. Or is this document focussed on a more narrower set of use-cases ? E.g.: DUT only to filter anything that could can not put into the cloud (such as web services) ? E.g.: would be good to write up some justification for the GeoLoc SHOULD that would then help readers to better understand when/how to conffigure and and when/how not. 396 In addition, a realistic number of access control rules (ACL) SHOULD 397 be configured on the DUT/SUT where ACLs are configurable and 398 reasonable based on the deployment scenario. This document 399 determines the number of access policy rules for four different 400 classes of DUT/SUT: Extra Small (XS), Small (S), Medium (M), and 401 Large (L). A sample DUT/SUT classification is described in 402 Appendix B. [mayor] IMHO, you can not put numbers such as those in Figure 3 into the main text of the document, but the speed definitions of the four classes into an Appendix B. It seems clear to me that the numbers in Figure 3 (and probably elsewhere) where derived from the assumptions that the four speed classes are defined as in Appendix B. Suggestion: inline the text of Appendix B here and mention that numbers such as in Figure 3 are derived from the assumption of those XS/S/M/L numbers, Add (if necessary, else not) that it may be appropriate to choose other numbers for XS/S/M/L, but if one does that, then the dependent numbers (such as those from Figure 3) may also need to be re-evaluated. 404 The Access Control Rules (ACL) defined in Figure 3 MUST be configured 405 from top to bottom in the correct order as shown in the table. This 406 is due to ACL types listed in specificity decreasing order, with 407 "block" first, followed by "allow", representing a typical ACL based 408 security policy. The ACL entries SHOULD be configured with routable 409 IP subnets by the DUT/SUT. (Note: There will be differences between 410 how security vendors implement ACL decision making.) The configured [nit] /security vendors/DUT/ [nit] I don't understand what i am supposed to learn from the (Note: ...) sentence. Rephrase ? or remove. 410 how security vendors implement ACL decision making.) The configured 411 ACL MUST NOT block the security and measurement traffic used for the 412 benchmarking tests. [nit] what is "security traffic" ? what is "measurement traffic" ? Don't see these terms defined before. Those two terms do not immediately click to me. I guess measured user/client-server traffic vs. test-setup management traffic (including logging) ?? In any case introduce the terms, define them and use them consistently. Whatever they are. 414 +---------------+ 415 | DUT/SUT | 416 | Classification| 417 | # Rules | 418 +-----------+-----------+--------------------+------+---+---+---+---+ 419 | | Match | | | | | | | 420 | Rules Type| Criteria | Description |Action| XS| S | M | L | 421 +-------------------------------------------------------------------+ 422 |Application|Application| Any application | block| 5 | 10| 20| 50| 423 |layer | | not included in | | | | | | 424 | | | the measurement | | | | | | 425 | | | traffic | | | | | | 426 +-------------------------------------------------------------------+ 427 |Transport |SRC IP and | Any SRC IP subnet | block| 25| 50|100|250| 428 |layer |TCP/UDP | used and any DST | | | | | | 429 | |DST ports | ports not used in | | | | | | 430 | | | the measurement | | | | | | 431 | | | traffic | | | | | | 432 +-------------------------------------------------------------------+ 433 |IP layer |SRC/DST IP | Any SRC/DST IP | block| 25| 50|100|250| 434 | | | subnet not used | | | | | | 435 | | | in the measurement | | | | | | 436 | | | traffic | | | | | | 437 +-------------------------------------------------------------------+ [nit] WOuld suggest to remove the word "Any" to minimize misinterpretation. [nit] These three blocks seem to never get exercised by the actual measurement traffic, right ? So the purpose would then be to simply load up the DUT with them in case the DUT implementation is stupid enough to have these cause relevant performance impacts even when not exercised by traffic. Would be good to write this down as a rationale after the table. Especially because the "Any" had me confused first that in a real-world deployment you would of course not include 250 individual application/port/prefixes, but you just have some simple block-all. [nit] Even 27 years ago i've seen routers acting as firewalls for universities that had thousands of such ACL entries. Aka: i think these numbers are way too low. 438 |Application|Application| Half of the | allow| 10| 10| 10| 10| 439 |layer | | applications | | | | | | 440 | | | included in the | | | | | | 441 | | | measurement traffic| | | | | | 442 | | |(see the note below)| | | | | | 443 +-------------------------------------------------------------------+ 444 |Transport |SRC IP and | Half of the SRC | allow| >1| >1| >1| >1| 445 |layer |TCP/UDP | IPs used and any | | | | | | 446 | |DST ports | DST ports used in | | | | | | 447 | | | the measurement | | | | | | 448 | | | traffic | | | | | | 449 | | | (one rule per | | | | | | 450 | | | subnet) | | | | | | 451 +-------------------------------------------------------------------+ 452 |IP layer |SRC IP | The rest of the | allow| >1| >1| >1| >1| 453 | | | SRC IP subnet | | | | | | 454 | | | range used in the | | | | | | 455 | | | measurement | | | | | | 456 | | | traffic | | | | | | 457 | | | (one rule per | | | | | | 458 | | | subnet) | | | | | | 459 +-----------+-----------+--------------------+------+---+---+---+---+ [mayor] There should be an explanation of how this is supposed to work, and it seems there are rules missing: rule on row 438 explicitly permits half the traffic sent by the test equiment. So supposedly only the other half has to be checked by rule on row 444. So when 444 says "Half of the SRC...", is that half of the total ? Would that have to be set up so that after 444 we now have 75% of the measurement traffic going through ? Likewise then rule 452 does it bring the total amount of permitted traffic to 87.5% ?. [nit] Ultimately, we only have "allows" here. Is there an assumption that after row 459 there is an implicit deny-anything-else ? I guess so, but it should be written out explicitly in the table. 461 Figure 3: DUT/SUT Access List 463 Note: If half of the applications included in the measurement traffic 464 is less than 10, the missing number of ACL entries (dummy rules) can 465 be configured for any application traffic not included in the 466 measurement traffic. 468 4.2.1. Security Effectiveness Configuration 470 The Security features (defined in Table 1 and Table 2) of the DUT/SUT 471 MUST be configured effectively to detect, prevent, and report the 472 defined security vulnerability sets. This section defines the 473 selection of the security vulnerability sets from Common [nit] "from the CVE" ?! 474 vulnerabilities and Exposures (CVE) list for the testing. The [nit] Add reference for CVE. (Not sure whats best spec, or wikipedia or cve.org,...) 475 vulnerability set SHOULD reflect a minimum of 500 CVEs from no older 476 than 10 calendar years to the current year. These CVEs SHOULD be 477 selected with a focus on in-use software commonly found in business 478 applications, with a Common vulnerability Scoring System (CVSS) 479 Severity of High (7-10). 481 This document is primarily focused on performance benchmarking. 482 However, it is RECOMMENDED to validate the security features 483 configuration of the DUT/SUT by evaluating the security effectiveness 484 as a prerequisite for performance benchmarking tests defined in the [nit] /in the/in/ 485 section 7. In case the benchmarking tests are performed without 486 evaluating security effectiveness, the test report MUST explain the 487 implications of this. The methodology for evaluating security 488 effectiveness is defined in Appendix A. 490 4.3. Test Equipment Configuration 492 In general, test equipment allows configuring parameters in different 493 protocol layers. These parameters thereby influence the traffic 494 flows which will be offered and impact performance measurements. 496 This section specifies common test equipment configuration parameters 497 applicable for all benchmarking tests defined in Section 7. Any 498 benchmarking test specific parameters are described under the test 499 setup section of each benchmarking test individually. 501 4.3.1. Client Configuration 503 This section specifies which parameters SHOULD be considered while 504 configuring clients using test equipment. Also, this section 505 specifies the RECOMMENDED values for certain parameters. The values 506 are the defaults used in most of the client operating systems 507 currently. 509 4.3.1.1. TCP Stack Attributes 511 The TCP stack SHOULD use a congestion control algorithm at client and 512 server endpoints. The IPv4 and IPv6 Maximum Segment Size (MSS) 513 SHOULD be set to 1460 bytes and 1440 bytes respectively and a TX and 514 RX initial receive windows of 64 KByte. Client initial congestion 515 window SHOULD NOT exceed 10 times the MSS. Delayed ACKs are 516 permitted and the maximum client delayed ACK SHOULD NOT exceed 10 517 times the MSS before a forced ACK. Up to three retries SHOULD be 518 allowed before a timeout event is declared. All traffic MUST set the 519 TCP PSH flag to high. The source port range SHOULD be in the range 520 of 1024 - 65535. Internal timeout SHOULD be dynamically scalable per 521 RFC 793. The client SHOULD initiate and close TCP connections. The 522 TCP connection MUST be initiated via a TCP three-way handshake (SYN, 523 SYN/ACK, ACK), and it MUST be closed via either a TCP three-way close 524 (FIN, FIN/ACK, ACK), or a TCP four-way close (FIN, ACK, FIN, ACK). [nit] Would be nice to have reference to where/how these parameters are determined. Would be nice to mention why these parameters are choosen. Probably to reflect the most common current TCP behavior that achieves best performance ? [minor] The document mentions QUIC in three places, but has no equivalent section for QUIC here as it has for TCP. I would suggest to add a section here, even if it can just say "Due to the absence of suficient experience, QUIC parameters are unspecified. Similarily to TCP, parameters should be choosen that best reflect state-of-the art performance results for QUIC client/server traffic". 526 4.3.1.2. Client IP Address Space 528 The sum of the client IP space SHOULD contain the following 529 attributes. 531 * The IP blocks SHOULD consist of multiple unique, discontinuous 532 static address blocks. 534 * A default gateway is permitted. [comment] How is this relevant, what do you expect it to do ? What would happen if you just removed it ? 536 * The DSCP (differentiated services code point) marking is set to DF 537 (Default Forwarding) '000000' on IPv4 Type of Service (ToS) field 538 and IPv6 traffic class field. 540 The following equation can be used to define the total number of 541 client IP addresses that will be configured on the test equipment. 543 Desired total number of client IP = Target throughput [Mbit/s] / 544 Average throughput per IP address [Mbit/s] 546 As shown in the example list below, the value for "Average throughput 547 per IP address" can be varied depending on the deployment and use 548 case scenario. 550 (Option 1) DUT/SUT deployment scenario 1 : 6-7 Mbit/s per IP (e.g. 551 1,400-1,700 IPs per 10Gbit/s throughput) 553 (Option 2) DUT/SUT deployment scenario 2 : 0.1-0.2 Mbit/s per IP 554 (e.g. 50,000-100,000 IPs per 10Gbit/s throughput) 556 Based on deployment and use case scenario, client IP addresses SHOULD 557 be distributed between IPv4 and IPv6. The following options MAY be 558 considered for a selection of traffic mix ratio. 560 (Option 1) 100 % IPv4, no IPv6 562 (Option 2) 80 % IPv4, 20% IPv6 564 (Option 3) 50 % IPv4, 50% IPv6 566 (Option 4) 20 % IPv4, 80% IPv6 568 (Option 5) no IPv4, 100% IPv6 [minor] This guidance is IMHO not very helpfull. It seems to me the first guidance seems to be that the percentage of IPv4 vs. IPv6 addresses should be based on the relevant ratio of IPv4 vs. IPv6 traffic in the target deployment because the way the test setup is done, some N% IPv4 addresses will also roughly result in N% IPv4 traffic in the test. That type of explanation might be very helpfull, because the risk is that readers may think they can derive the percentage of test IPv4/IPv6 addresses from the ratio of IPv4/IPv6 addresses in the target deployment, but that very often will not work: For example in the common dual-stack deployment, every client has an IPv4 and an IPv6 address, so its 50% IPv4, but the actual percentage of IPv4 traffic will very much depend on the application scenario. Some enterprises may go up to 90% or more IPv6 traffic if the main traffic is all newer cloud services traffic. An vice versa, it could be as little as 10% IPv6 if all the cloud services are legacy apps in the cloud not supporting IPv6. 570 Note: The IANA has assigned IP address range for the testing purpose 571 as described in Section 8. If the test scenario requires more IP 572 addresses or subnets than the IANA assigned, this document recommends 573 using non routable Private IPv4 address ranges or Unique Local 574 Address (ULA) IPv6 address ranges for the testing. [minor] See comments in Section 8. It might be useful to merge the text of this paragraph with the one in Section 8, else the addressing recommendations are somewhat split in the middle. [minor] It would be prudent to add a disclaimer that this document does not consider to determine whether DUT may emobdy optimizations in performance behavior for known testing address ranges. Such a disclaimer may be more general and go on the end of the document, e.g.: before IANA section - no considerations against DUT optimizations of known test scenarios including addressing ranges or other test profile specific parameters. 576 4.3.1.3. Emulated Web Browser Attributes 578 The client emulated web browser (emulated browser) contains 579 attributes that will materially affect how traffic is loaded. The [nit] what does "how traffic is loaded" mean ? Rephrase. 580 objective is to emulate modern, typical browser attributes to improve 581 realism of the result set. [nit] /result set/resulting traffic/ ? 583 For HTTP traffic emulation, the emulated browser MUST negotiate HTTP 584 version 1.1 or higher. Depending on test scenarios and chosen HTTP 585 version, the emulated browser MAY open multiple TCP connections per 586 Server endpoint IP at any time depending on how many sequential 587 transactions need to be processed. For HTTP/2 or HTTP/3, the 588 emulated browser MAY open multiple concurrent streams per connection 589 (multiplexing). HTTP/3 emulated browser uses QUIC ([RFC9000]) as 590 transport protocol. HTTP settings such as number of connection per 591 server IP, number of requests per connection, and number of streams 592 per connection MUST be documented. This document refers to [RFC8446] 593 for HTTP/2. The emulated browser SHOULD advertise a User-Agent 594 header. The emulated browser SHOULD enforce content length 595 validation. Depending on test scenarios and selected HTTP version, 596 HTTP header compression MAY be set to enable or disable. This 597 setting (compression enabled or disabled) MUST be documented in the 598 report. 600 For encrypted traffic, the following attributes SHALL define the 601 negotiated encryption parameters. The test clients MUST use TLS 602 version 1.2 or higher. TLS record size MAY be optimized for the [minor] I would bet SEC review will challenge you to comment on TLS 1.3. Would make sense to add a sentence stating that the ratio of TLS 1.2 vs TLS 1.3 traffic should be choosen based on expected target deployment and may range from 100% TLS 1.2 to 100% TLS 1.3. In the absence of known ratios, a 50/50% ratio is RECOMMENDED. 602 version 1.2 or higher. TLS record size MAY be optimized for the 603 HTTPS response object size up to a record size of 16 KByte. If 604 Server Name Indication (SNI) is required in the traffic mix profile, 605 the client endpoint MUST send TLS extension Server Name Indication 606 (SNI) information when opening a security tunnel. Each client [minor] SNI is pretty standard today. I would remove the "if" and make the whole sentence a MUST. 606 (SNI) information when opening a security tunnel. Each client 607 connection MUST perform a full handshake with server certificate and 608 MUST NOT use session reuse or resumption. 610 The following TLS 1.2 supported ciphers and keys are RECOMMENDED to 611 use for HTTPS based benchmarking tests defined in Section 7. 613 1. ECDHE-ECDSA-AES128-GCM-SHA256 with Prime256v1 (Signature Hash 614 Algorithm: ecdsa_secp256r1_sha256 and Supported group: secp256r1) 616 2. ECDHE-RSA-AES128-GCM-SHA256 with RSA 2048 (Signature Hash 617 Algorithm: rsa_pkcs1_sha256 and Supported group: secp256r1) 619 3. ECDHE-ECDSA-AES256-GCM-SHA384 with Secp521 (Signature Hash 620 Algorithm: ecdsa_secp384r1_sha384 and Supported group: secp521r1) 622 4. ECDHE-RSA-AES256-GCM-SHA384 with RSA 4096 (Signature Hash 623 Algorithm: rsa_pkcs1_sha384 and Supported group: secp256r1) 625 Note: The above ciphers and keys were those commonly used enterprise 626 grade encryption cipher suites for TLS 1.2. It is recognized that 627 these will evolve over time. Individual certification bodies SHOULD 628 use ciphers and keys that reflect evolving use cases. These choices 629 MUST be documented in the resulting test reports with detailed 630 information on the ciphers and keys used along with reasons for the 631 choices. 633 [RFC8446] defines the following cipher suites for use with TLS 1.3. 635 1. TLS_AES_128_GCM_SHA256 637 2. TLS_AES_256_GCM_SHA384 639 3. TLS_CHACHA20_POLY1305_SHA256 641 4. TLS_AES_128_CCM_SHA256 643 5. TLS_AES_128_CCM_8_SHA256 645 4.3.2. Backend Server Configuration 647 This section specifies which parameters should be considered while 648 configuring emulated backend servers using test equipment. 650 4.3.2.1. TCP Stack Attributes 652 The TCP stack on the server side SHOULD be configured similar to the 653 client side configuration described in Section 4.3.1.1. In addition, 654 server initial congestion window MUST NOT exceed 10 times the MSS. 655 Delayed ACKs are permitted and the maximum server delayed ACK MUST 656 NOT exceed 10 times the MSS before a forced ACK. 658 4.3.2.2. Server Endpoint IP Addressing 660 The sum of the server IP space SHOULD contain the following 661 attributes. 663 * The server IP blocks SHOULD consist of unique, discontinuous 664 static address blocks with one IP per server Fully Qualified 665 Domain Name (FQDN) endpoint per test port. [minor] The "per FQDN per test port" is likely underspecified/confusing. How would you recommend to configure the testbed if the same FQDN may be reachable across more than one DUT server port and the DUT is doing load balancing ? If that is not supposed to be considered, then it seems as if every FQDN is supposed to be reachable across only one DUT port, but then the sentence ikely should just say "per FQDN" (without the "per test port qualification"). Not 100% sure... [minor] Especially for IPv4, there is obviously a big trend in DC to save IPv4 address space by using SNI. Therefore a realistic scanerio would be to have more than one FQDN per IPv4 address. Maybe as high as 10:1 (guesswork). In any case i think it is prudent to include testing of such SNI overload of IP addresses because it likely can impact performance (demux of processing state not solely based on 5-tuple). 667 * A default gateway is permitted. The DSCP (differentiated services [minor] Again wondering why default gateway adds value to the doc. 667 * A default gateway is permitted. The DSCP (differentiated services 668 code point) marking is set to DF (Default Forwarding) '000000' on 669 IPv4 Type of Service (ToS) field and IPv6 traffic class field. 671 * The server IP addresses SHOULD be distributed between IPv4 and 672 IPv6 with a ratio identical to the clients distribution ratio. 674 Note: The IANA has assigned IP address range for the testing purpose 675 as described in Section 8. If the test scenario requires more IP 676 addresses or subnets than the IANA assigned, this document recommends 677 using non routable Private IPv4 address ranges or Unique Local 678 Address (ULA) IPv6 address ranges for the testing. [minor] same note about moving these addressing recommendations out as in the client section. 680 4.3.2.3. HTTP / HTTPS Server Pool Endpoint Attributes 682 The server pool for HTTP SHOULD listen on TCP port 80 and emulate the 683 same HTTP version (HTTP 1.1 or HTTP/2 or HTTP/3) and settings chosen 684 by the client (emulated web browser). The Server MUST advertise 685 server type in the Server response header [RFC7230]. For HTTPS 686 server, TLS 1.2 or higher MUST be used with a maximum record size of 687 16 KByte and MUST NOT use ticket resumption or session ID reuse. The 688 server SHOULD listen on TCP port 443 for HTTP version 1.1 and 2. For 689 HTTP/3 (HTTP over QUIC) the server SHOULD listen on UDP 443. The 690 server SHALL serve a certificate to the client. The HTTPS server 691 MUST check host SNI information with the FQDN if SNI is in use. 692 Cipher suite and key size on the server side MUST be configured 693 similar to the client side configuration described in 694 Section 4.3.1.3. 696 4.3.3. Traffic Flow Definition 698 This section describes the traffic pattern between client and server 699 endpoints. At the beginning of the test, the server endpoint 700 initializes and will be ready to accept connection states including 701 initialization of the TCP stack as well as bound HTTP and HTTPS 702 servers. When a client endpoint is needed, it will initialize and be 703 given attributes such as a MAC and IP address. The behavior of the 704 client is to sweep through the given server IP space, generating a 705 recognizable service by the DUT. Sequential and pseudorandom sweep 706 methods are acceptable. The method used MUST be stated in the final 707 report. Thus, a balanced mesh between client endpoints and server 708 endpoints will be generated in a client IP and port to server IP and 709 port combination. Each client endpoint performs the same actions as 710 other endpoints, with the difference being the source IP of the 711 client endpoint and the target server IP pool. The client MUST use 712 the server IP address or FQDN in the host header [RFC7230]. [minor] given the prevalence of SNI centric server selection, i would suggest to change server IP to server FQDN and note that server IP is simply derived from server FQDN. Likewise server port is dervice from server protocol, which seems to be just HTTP or HTTPs, so its unclear to me where we would get ports different from 80 and 443 (maybe thats mentioned later). Aka: server Port may not be relevant to mention. 714 4.3.3.1. Description of Intra-Client Behavior 716 Client endpoints are independent of other clients that are 717 concurrently executing. When a client endpoint initiates traffic, 718 this section describes how the client steps through different 719 services. Once the test is initialized, the client endpoints 720 randomly hold (perform no operation) for a few milliseconds for 721 better randomization of the start of client traffic. Each client 722 will either open a new TCP connection or connect to a TCP persistence 723 stack still open to that specific server. At any point that the 724 traffic profile may require encryption, a TLS encryption tunnel will 725 form presenting the URL or IP address request to the server. If 726 using SNI, the server MUST then perform an SNI name check with the 727 proposed FQDN compared to the domain embedded in the certificate. 728 Only when correct, will the server process the HTTPS response object. 729 The initial response object to the server is based on benchmarking 730 tests described in Section 7. Multiple additional sub-URLs (response 731 objects on the service page) MAY be requested simultaneously. This 732 MAY be to the same server IP as the initial URL. Each sub-object 733 will also use a canonical FQDN and URL path, as observed in the 734 traffic mix used. [minor] This may be necessary to keep the configuration complexity at bay, but in practice each particular IP client will likely exhibit quite different traffic profiles. One may continuously request HTTP video segments when streaming video. Another one may continuously do WebRTC (zoom), and the like. BY having every client randomnly do all the services (this is what i figure from above description), you forego the important performance aspect of "worst hit client" if the DUT exhibits specific issues with specific services (false filtering, performance degradation etc..). IMHO it would be great if test equipment could create different client traffic profiles by segmentation of the possible application space into groups and then assign new clients randomnly to groups. Beside being able to easier find performance issues, it is also resulting in more real-world performance, which might be higher. For example in a multi-core CPU based DUT, there may be heuristics of assigning different clients traffic to different CPU cores, so that L1..L3 cache of the CPU core can be better kept focussed on the codespace for a particular type of client inspection. (just guessing). 736 4.3.4. Traffic Load Profile 738 The loading of traffic is described in this section. The loading of 739 a traffic load profile has five phases: Init, ramp up, sustain, ramp 740 down, and collection. 742 1. Init phase: Testbed devices including the client and server 743 endpoints should negotiate layer 2-3 connectivity such as MAC 744 learning and ARP. Only after successful MAC learning or ARP/ND 745 resolution SHALL the test iteration move to the next phase. No 746 measurements are made in this phase. The minimum RECOMMENDED 747 time for Init phase is 5 seconds. During this phase, the 748 emulated clients SHOULD NOT initiate any sessions with the DUT/ 749 SUT, in contrast, the emulated servers should be ready to accept 750 requests from DUT/SUT or from emulated clients. 752 2. Ramp up phase: The test equipment SHOULD start to generate the 753 test traffic. It SHOULD use a set of the approximate number of 754 unique client IP addresses to generate traffic. The traffic 755 SHOULD ramp up from zero to desired target objective. The target 756 objective is defined for each benchmarking test. The duration 757 for the ramp up phase MUST be configured long enough that the 758 test equipment does not overwhelm the DUT/SUTs stated performance 759 metrics defined in Section 6.3 namely, TCP Connections Per 760 Second, Inspected Throughput, Concurrent TCP Connections, and 761 Application Transactions Per Second. No measurements are made in 762 this phase. 764 3. Sustain phase: Starts when all required clients are active and 765 operating at their desired load condition. In the sustain phase, 766 the test equipment SHOULD continue generating traffic to constant 767 target value for a constant number of active clients. The 768 minimum RECOMMENDED time duration for sustain phase is 300 769 seconds. This is the phase where measurements occur. The test 770 equipment SHOULD measure and record statistics continuously. The 771 sampling interval for collecting the raw results and calculating 772 the statistics SHOULD be less than 2 seconds. 774 4. Ramp down phase: No new connections are established, and no 775 measurements are made. The time duration for ramp up and ramp 776 down phase SHOULD be the same. 778 5. Collection phase: The last phase is administrative and will occur 779 when the test equipment merges and collates the report data. 781 5. Testbed Considerations 783 This section describes steps for a reference test (pre-test) that 784 control the test environment including test equipment, focusing on 785 physical and virtualized environments and as well as test equipments. 786 Below are the RECOMMENDED steps for the reference test. 788 1. Perform the reference test either by configuring the DUT/SUT in 789 the most trivial setup (fast forwarding) or without presence of [nit] Define/explain or provide reference for "fast forwarding". 790 the DUT/SUT. [minor] Is the DUT/SUT assumed to operate as a router or transparent L2 switch ? Asking because "or without presence" should be amended (IMHO) with mentioning that instead of the DUT one would put a router or switch in its place that is pre-loaded with a config equivalent to that of the DUT but without any seurity functions, just passing traffic at rates to bring the TE to its limits. 792 2. Generate traffic from traffic generator. Choose a traffic 793 profile used for HTTP or HTTPS throughput performance test with 794 smallest object size. 796 3. Ensure that any ancillary switching or routing functions added in 797 the test equipment does not limit the performance by introducing 798 network metrics such as packet loss and latency. This is 799 specifically important for virtualized components (e.g., 800 vSwitches, vRouters). 802 4. Verify that the generated traffic (performance) of the test 803 equipment matches and reasonably exceeds the expected maximum 804 performance of the DUT/SUT. 806 5. Record the network performance metrics packet loss latency 807 introduced by the test environment (without DUT/SUT). 809 6. Assert that the testbed characteristics are stable during the 810 entire test session. Several factors might influence stability 811 specifically, for virtualized testbeds. For example, additional 812 workloads in a virtualized system, load balancing, and movement 813 of virtual machines during the test, or simple issues such as 814 additional heat created by high workloads leading to an emergency 815 CPU performance reduction. [minor] Add something to test the performance of the logging system. Without DUT actually generating logging, this will so far not have been validated. Maybe TE can generate logging records ? Especially burst logging from DUT without loss is important to verify (no packet loss of logged events). 817 The reference test SHOULD be performed before the benchmarking tests 818 (described in section 7) start. 820 6. Reporting [minor] I would swap section 6 and 7, because it is problematic to read what's to be reported without knowing whats to be measured first. For example, when i read 6. first it was not clear to me if/how you would test the performance limits, so the report data had a lot of questions for me. Of course when you do run the testbed you first should have read both sections first. 822 This section describes how the benchmarking test report should be 823 formatted and presented. It is RECOMMENDED to include two main 824 sections in the report, namely the introduction and the detailed test 825 results sections. 827 6.1. Introduction 829 The following attributes SHOULD be present in the introduction 830 section of the test report. [minor] I'd suggest to say here that the test report needs to include all information sufficient for independent third-party reproduction of the test setup to permit third party falsification of the test results. This includes but may not be limited to the following... 832 1. The time and date of the execution of the tests 834 2. Summary of testbed software and hardware details 835 a. DUT/SUT hardware/virtual configuration 837 * This section SHOULD clearly identify the make and model of 838 the DUT/SUT 840 * The port interfaces, including speed and link information 842 * If the DUT/SUT is a Virtual Network Function (VNF), host 843 (server) hardware and software details, interface 844 acceleration type such as DPDK and SR-IOV, used CPU cores, 845 used RAM, resource sharing (e.g. Pinning details and NUMA 846 Node) configuration details, hypervisor version, virtual 847 switch version 849 * details of any additional hardware relevant to the DUT/SUT 850 such as controllers 852 b. DUT/SUT software 854 * Operating system name 856 * Version 858 * Specific configuration details (if any) [minor] Any software details necessary and sufficient to preproduce the software setup of DUT/SUT. 860 c. DUT/SUT enabled features 862 * Configured DUT/SUT features (see Table 1 and Table 2) 864 * Attributes of the above-mentioned features 866 * Any additional relevant information about the features 868 d. Test equipment hardware and software 870 * Test equipment vendor name 872 * Hardware details including model number, interface type 874 * Test equipment firmware and test application software 875 version 877 e. Key test parameters 879 * Used cipher suites and keys 881 * IPv4 and IPv6 traffic distribution 882 * Number of configured ACL 884 f. Details of application traffic mix used in the benchmarking 885 test "Throughput Performance with Application Traffic Mix" 886 (Section 7.1) 888 * Name of applications and layer 7 protocols 890 * Percentage of emulated traffic for each application and 891 layer 7 protocols 893 * Percentage of encrypted traffic and used cipher suites and 894 keys (The RECOMMENDED ciphers and keys are defined in 895 Section 4.3.1.3) 897 * Used object sizes for each application and layer 7 898 protocols 900 3. Results Summary / Executive Summary 902 a. Results SHOULD resemble a pyramid in how it is reported, with 903 the introduction section documenting the summary of results 904 in a prominent, easy to read block. 906 6.2. Detailed Test Results 908 In the result section of the test report, the following attributes 909 SHOULD be present for each benchmarking test. 911 a. KPIs MUST be documented separately for each benchmarking test. 912 The format of the KPI metrics SHOULD be presented as described in 913 Section 6.3. 915 b. The next level of details SHOULD be graphs showing each of these 916 metrics over the duration (sustain phase) of the test. This 917 allows the user to see the measured performance stability changes 918 over time. 920 6.3. Benchmarks and Key Performance Indicators 922 This section lists key performance indicators (KPIs) for overall 923 benchmarking tests. All KPIs MUST be measured during the sustain 924 phase of the traffic load profile described in Section 4.3.4. All 925 KPIs MUST be measured from the result output of test equipment. [minor] At some other place of the document i think to remember observing of DUT self-reporting. Shouldn't then the self-reporting of the DUT be vetted as well, e.g.: compared against the TE report data ? 927 * Concurrent TCP Connections 928 The aggregate number of simultaneous connections between hosts 929 across the DUT/SUT, or between hosts and the DUT/SUT (defined in 930 [RFC2647]). [minor] Add reference to section in rfc2647 where this is defined. Also: If you refer but not reproduce 932 * TCP Connections Per Second 934 The average number of successfully established TCP connections per 935 second between hosts across the DUT/SUT, or between hosts and the 936 DUT/SUT. The TCP connection MUST be initiated via a TCP three-way 937 handshake (SYN, SYN/ACK, ACK). Then the TCP session data is sent. 938 The TCP session MUST be closed via either a TCP three-way close 939 (FIN, FIN/ACK, ACK), or a TCP four-way close (FIN, ACK, FIN, ACK), 940 and MUST NOT by RST. 942 * Application Transactions Per Second 944 The average number of successfully completed transactions per 945 second. For a particular transaction to be considered successful, 946 all data MUST have been transferred in its entirety. In case of 947 HTTP(S) transactions, it MUST have a valid status code (200 OK), 948 and the appropriate FIN, FIN/ACK sequence MUST have been 949 completed. 951 * TLS Handshake Rate 953 The average number of successfully established TLS connections per 954 second between hosts across the DUT/SUT, or between hosts and the 955 DUT/SUT. 957 * Inspected Throughput 959 The number of bits per second of examined and allowed traffic a 960 network security device is able to transmit to the correct 961 destination interface(s) in response to a specified offered load. 962 The throughput benchmarking tests defined in Section 7 SHOULD 963 measure the average Layer 2 throughput value when the DUT/SUT is 964 "inspecting" traffic. This document recommends presenting the 965 inspected throughput value in Gbit/s rounded to two places of 966 precision with a more specific Kbit/s in parenthesis. 968 * Time to First Byte (TTFB) 970 TTFB is the elapsed time between the start of sending the TCP SYN 971 packet from the client and the client receiving the first packet 972 of application data from the server or DUT/SUT. The benchmarking 973 tests HTTP Transaction Latency (Section 7.4) and HTTPS Transaction 974 Latency (Section 7.8) measure the minimum, average and maximum 975 TTFB. The value SHOULD be expressed in milliseconds. 977 * URL Response time / Time to Last Byte (TTLB) 979 URL Response time / TTLB is the elapsed time between the start of 980 sending the TCP SYN packet from the client and the client 981 receiving the last packet of application data from the server or 982 DUT/SUT. The benchmarking tests HTTP Transaction Latency 983 (Section 7.4) and HTTPS Transaction Latency (Section 7.8) measure 984 the minimum, average and maximum TTLB. The value SHOULD be 985 expressed in millisecond. [minor] Up to this point i don't think the report would include comparison for these KPI between no-DUT-present vs. DUT-present. Is that true ? How then is the reaader of the report meant to be able to vet the relative impact of the DUT for all these metrics vs. DUT not being present ? 987 7. Benchmarking Tests [minor] I think it would be good to insert here some descriptive and comparative overview of the tests from the different 7.x sections. For example, i guess (but don't know from the test), that the 7.1 test should ? perform throughput test for non-http/https applications, or else if all the applications in 7.1 would be http/https, then it would duplicate the results of 7.3 and 7.7, right ? Not sure though if/where it is written out that you therefore want a traffic mix of only non-HTTP/HTTPS application traffic for 7.1. If instead the customer relevant application mix (7.1.1) does include some percentage of HTTP/HTTP applications, then shouldn't all the tests, even those focussing on the HTTP/HTTPs characteristic also always include the non-HTTP/HTTPs application flows as kind of "background" traffic, even if not measured in the tests of particular 7.x sub-section ? [minor] Section 7. is a lot of work to get right. I observe that there is a lot of procedural replication across the steps. It would be easier to read if all that duplication was removed and described once - such as the initial/max/iterative step description. But i can understand how much work this might be, to then especially extraxct only the differences for each 7.x and only describe those 7.x differences there. 989 7.1. Throughput Performance with Application Traffic Mix 991 7.1.1. Objective 993 Using a relevant application traffic mix, determine the sustainable 994 inspected throughput supported by the DUT/SUT. 996 Based on the test customer's specific use case, testers can choose 997 the relevant application traffic mix for this test. The details 998 about the traffic mix MUST be documented in the report. At least the 999 following traffic mix details MUST be documented and reported 1000 together with the test results: 1002 Name of applications and layer 7 protocols 1004 Percentage of emulated traffic for each application and layer 7 1005 protocol 1007 Percentage of encrypted traffic and used cipher suites and keys 1008 (The RECOMMENDED ciphers and keys are defined in Section 4.3.1.3.) 1010 Used object sizes for each application and layer 7 protocols 1012 7.1.2. Test Setup 1014 Testbed setup MUST be configured as defined in Section 4. Any 1015 benchmarking test specific testbed configuration changes MUST be 1016 documented. 1018 7.1.3. Test Parameters 1020 In this section, the benchmarking test specific parameters SHOULD be 1021 defined. 1023 7.1.3.1. DUT/SUT Configuration Parameters 1025 DUT/SUT parameters MUST conform to the requirements defined in 1026 Section 4.2. Any configuration changes for this specific 1027 benchmarking test MUST be documented. In case the DUT/SUT is 1028 configured without SSL inspection, the test report MUST explain the 1029 implications of this to the relevant application traffic mix 1030 encrypted traffic. [nit] /SSL inspection/SSL Inspection/ - capitalized in all other places in the doc. [minor] I am not quite familiar with the details, so i hope a ready knows what the "MUST explain the implication" means. [minor] What is the equivalent for TLS (inspection), and why is it not equally mentioned ? 1032 7.1.3.2. Test Equipment Configuration Parameters 1034 Test equipment configuration parameters MUST conform to the 1035 requirements defined in Section 4.3. The following parameters MUST 1036 be documented for this benchmarking test: 1038 Client IP address range defined in Section 4.3.1.2 1040 Server IP address range defined in Section 4.3.2.2 1042 Traffic distribution ratio between IPv4 and IPv6 defined in 1043 Section 4.3.1.2 1045 Target inspected throughput: Aggregated line rate of interface(s) 1046 used in the DUT/SUT or the value defined based on requirement for 1047 a specific deployment scenario [minor] maybe add: or based on DUT specified performance limits (DUT may not always provide "linerate" throughput, so the ultimate test would be to see if/how much of the vendor promised performance is reachable. 1049 Initial throughput: 10% of the "Target inspected throughput" Note: 1050 Initial throughput is not a KPI to report. This value is 1051 configured on the traffic generator and used to perform Step 1: 1052 "Test Initialization and Qualification" described under the 1053 Section 7.1.4. 1055 One of the ciphers and keys defined in Section 4.3.1.3 are 1056 RECOMMENDED to use for this benchmarking test. 1058 7.1.3.3. Traffic Profile 1060 Traffic profile: This test MUST be run with a relevant application 1061 traffic mix profile. 1063 7.1.3.4. Test Results Validation Criteria 1065 The following criteria are the test results validation criteria. The 1066 test results validation criteria MUST be monitored during the whole 1067 sustain phase of the traffic load profile. 1069 a. Number of failed application transactions (receiving any HTTP 1070 response code other than 200 OK) MUST be less than 0.001% (1 out 1071 of 100,000 transactions) of total attempted transactions. [minor] So this is the right number, as opposed to the 0.01% in A.4... If you don't intend to fix A.4 (requested there), pls. explain the reason for the difference. 1073 b. Number of Terminated TCP connections due to unexpected TCP RST 1074 sent by DUT/SUT MUST be less than 0.001% (1 out of 100,000 1075 connections) of total initiated TCP connections. 1077 7.1.3.5. Measurement 1079 Following KPI metrics MUST be reported for this benchmarking test: 1081 Mandatory KPIs (benchmarks): Inspected Throughput, TTFB (minimum, 1082 average, and maximum), TTLB (minimum, average, and maximum) and 1083 Application Transactions Per Second 1085 Note: TTLB MUST be reported along with the object size used in the 1086 traffic profile. 1088 Optional KPIs: TCP Connections Per Second and TLS Handshake Rate [minor] I would prefer for TCP connections to be mandatory too. Makes it easier to communicate test data with lower layer folks. FOr example, network layer equipment often has per 5-tuple flow state also with build/churn-rate limits, so to match a security SUT with the other networking equipment this TCP connection rate rate is quite important. 1090 7.1.4. Test Procedures and Expected Results 1092 The test procedures are designed to measure the inspected throughput 1093 performance of the DUT/SUT at the sustaining period of traffic load 1094 profile. The test procedure consists of three major steps: Step 1 1095 ensures the DUT/SUT is able to reach the performance value (initial 1096 throughput) and meets the test results validation criteria when it 1097 was very minimally utilized. Step 2 determines the DUT/SUT is able 1098 to reach the target performance value within the test results 1099 validation criteria. Step 3 determines the maximum achievable 1100 performance value within the test results validation criteria. 1102 This test procedure MAY be repeated multiple times with different IP 1103 types: IPv4 only, IPv6 only, and IPv4 and IPv6 mixed traffic 1104 distribution. 1106 7.1.4.1. Step 1: Test Initialization and Qualification 1108 Verify the link status of all connected physical interfaces. All 1109 interfaces are expected to be in "UP" status. 1111 Configure traffic load profile of the test equipment to generate test 1112 traffic at the "Initial throughput" rate as described in 1113 Section 7.1.3.2. The test equipment SHOULD follow the traffic load 1114 profile definition as described in Section 4.3.4. The DUT/SUT SHOULD 1115 reach the "Initial throughput" during the sustain phase. Measure all 1116 KPI as defined in Section 7.1.3.5. The measured KPIs during the 1117 sustain phase MUST meet all the test results validation criteria 1118 defined in Section 7.1.3.4. 1120 If the KPI metrics do not meet the test results validation criteria, 1121 the test procedure MUST NOT be continued to step 2. 1123 7.1.4.2. Step 2: Test Run with Target Objective 1125 Configure test equipment to generate traffic at the "Target inspected 1126 throughput" rate defined in Section 7.1.3.2. The test equipment 1127 SHOULD follow the traffic load profile definition as described in 1128 Section 4.3.4. The test equipment SHOULD start to measure and record 1129 all specified KPIs. Continue the test until all traffic profile 1130 phases are completed. 1132 Within the test results validation criteria, the DUT/SUT is expected 1133 to reach the desired value of the target objective ("Target inspected 1134 throughput") in the sustain phase. Follow step 3, if the measured 1135 value does not meet the target value or does not fulfill the test 1136 results validation criteria. 1138 7.1.4.3. Step 3: Test Iteration 1140 Determine the achievable average inspected throughput within the test 1141 results validation criteria. Final test iteration MUST be performed 1142 for the test duration defined in Section 4.3.4. 1144 7.2. TCP/HTTP Connections Per Second 1146 7.2.1. Objective 1148 Using HTTP traffic, determine the sustainable TCP connection 1149 establishment rate supported by the DUT/SUT under different 1150 throughput load conditions. 1152 To measure connections per second, test iterations MUST use different 1153 fixed HTTP response object sizes (the different load conditions) 1154 defined in Section 7.2.3.2. 1156 7.2.2. Test Setup 1158 Testbed setup SHOULD be configured as defined in Section 4. Any 1159 specific testbed configuration changes (number of interfaces and 1160 interface type, etc.) MUST be documented. 1162 7.2.3. Test Parameters 1164 In this section, benchmarking test specific parameters SHOULD be 1165 defined. 1167 7.2.3.1. DUT/SUT Configuration Parameters 1169 DUT/SUT parameters MUST conform to the requirements defined in 1170 Section 4.2. Any configuration changes for this specific 1171 benchmarking test MUST be documented. 1173 7.2.3.2. Test Equipment Configuration Parameters 1175 Test equipment configuration parameters MUST conform to the 1176 requirements defined in Section 4.3. The following parameters MUST 1177 be documented for this benchmarking test: 1179 Client IP address range defined in Section 4.3.1.2 1181 Server IP address range defined in Section 4.3.2.2 1183 Traffic distribution ratio between IPv4 and IPv6 defined in 1184 Section 4.3.1.2 1186 Target connections per second: Initial value from product datasheet 1187 or the value defined based on requirement for a specific deployment 1188 scenario 1190 Initial connections per second: 10% of "Target connections per 1191 second" (Note: Initial connections per second is not a KPI to report. 1192 This value is configured on the traffic generator and used to perform 1193 the Step1: "Test Initialization and Qualification" described under 1194 the Section 7.2.4. 1196 The client SHOULD negotiate HTTP and close the connection with FIN 1197 immediately after completion of one transaction. In each test 1198 iteration, client MUST send GET request requesting a fixed HTTP 1199 response object size. 1201 The RECOMMENDED response object sizes are 1, 2, 4, 16, and 64 KByte. 1203 7.2.3.3. Test Results Validation Criteria 1205 The following criteria are the test results validation criteria. The 1206 Test results validation criteria MUST be monitored during the whole 1207 sustain phase of the traffic load profile. 1209 a. Number of failed application transactions (receiving any HTTP 1210 response code other than 200 OK) MUST be less than 0.001% (1 out 1211 of 100,000 transactions) of total attempted transactions. 1213 b. Number of terminated TCP connections due to unexpected TCP RST 1214 sent by DUT/SUT MUST be less than 0.001% (1 out of 100,000 1215 connections) of total initiated TCP connections. 1217 c. During the sustain phase, traffic SHOULD be forwarded at a 1218 constant rate (considered as a constant rate if any deviation of 1219 traffic forwarding rate is less than 5%). 1221 d. Concurrent TCP connections MUST be constant during steady state 1222 and any deviation of concurrent TCP connections SHOULD be less 1223 than 10%. This confirms the DUT opens and closes TCP connections 1224 at approximately the same rate. 1226 7.2.3.4. Measurement 1228 TCP Connections Per Second MUST be reported for each test iteration 1229 (for each object size). [minor] Add Variance or min/max rates to report in case above point d (line 1221) problem does exist ? 1231 7.2.4. Test Procedures and Expected Results 1233 The test procedure is designed to measure the TCP connections per 1234 second rate of the DUT/SUT at the sustaining period of the traffic 1235 load profile. The test procedure consists of three major steps: Step 1236 1 ensures the DUT/SUT is able to reach the performance value (Initial 1237 connections per second) and meets the test results validation 1238 criteria when it was very minimally utilized. Step 2 determines the 1239 DUT/SUT is able to reach the target performance value within the test 1240 results validation criteria. Step 3 determines the maximum 1241 achievable performance value within the test results validation 1242 criteria. 1244 This test procedure MAY be repeated multiple times with different IP 1245 types: IPv4 only, IPv6 only, and IPv4 and IPv6 mixed traffic 1246 distribution. 1248 7.2.4.1. Step 1: Test Initialization and Qualification 1250 Verify the link status of all connected physical interfaces. All 1251 interfaces are expected to be in "UP" status. 1253 Configure the traffic load profile of the test equipment to establish 1254 "Initial connections per second" as defined in Section 7.2.3.2. The 1255 traffic load profile SHOULD be defined as described in Section 4.3.4. 1257 The DUT/SUT SHOULD reach the "Initial connections per second" before 1258 the sustain phase. The measured KPIs during the sustain phase MUST 1259 meet all the test results validation criteria defined in 1260 Section 7.2.3.3. 1262 If the KPI metrics do not meet the test results validation criteria, 1263 the test procedure MUST NOT continue to "Step 2". 1265 7.2.4.2. Step 2: Test Run with Target Objective 1267 Configure test equipment to establish the target objective ("Target 1268 connections per second") defined in Section 7.2.3.2. The test 1269 equipment SHOULD follow the traffic load profile definition as 1270 described in Section 4.3.4. 1272 During the ramp up and sustain phase of each test iteration, other 1273 KPIs such as inspected throughput, concurrent TCP connections and 1274 application transactions per second MUST NOT reach the maximum value 1275 the DUT/SUT can support. The test results for specific test 1276 iterations SHOULD NOT be reported, if the above-mentioned KPI 1277 (especially inspected throughput) reaches the maximum value. 1278 (Example: If the test iteration with 64 KByte of HTTP response object 1279 size reached the maximum inspected throughput limitation of the DUT/ 1280 SUT, the test iteration MAY be interrupted and the result for 64 1281 KByte SHOULD NOT be reported.) 1283 The test equipment SHOULD start to measure and record all specified 1284 KPIs. Continue the test until all traffic profile phases are 1285 completed. 1287 Within the test results validation criteria, the DUT/SUT is expected 1288 to reach the desired value of the target objective ("Target 1289 connections per second") in the sustain phase. Follow step 3, if the 1290 measured value does not meet the target value or does not fulfill the 1291 test results validation criteria. 1293 7.2.4.3. Step 3: Test Iteration 1295 Determine the achievable TCP connections per second within the test 1296 results validation criteria. 1298 7.3. HTTP Throughput 1300 7.3.1. Objective 1302 Determine the sustainable inspected throughput of the DUT/SUT for 1303 HTTP transactions varying the HTTP response object size. [nit] High level, what is the difference between 7.2 and 7.3 ? Some more explanation would be useful. One interpretation i came up with is that 7.2 measures performane of e.g.: HTTP connections where each connection performs a single GET, and 7.3 measures long-lived HTTP connections in which a high rate of HTTP GET is performed (so as to differentiate transactions at TCP+HTTP level (7.2) from those only happening at HTTP level (7.3). If that is a lucky guess it might help other similarily guessing readers to write this out more explicitly. 1305 7.3.2. Test Setup 1307 Testbed setup SHOULD be configured as defined in Section 4. Any 1308 specific testbed configuration changes (number of interfaces and 1309 interface type, etc.) MUST be documented. 1311 7.3.3. Test Parameters 1313 In this section, benchmarking test specific parameters SHOULD be 1314 defined. 1316 7.3.3.1. DUT/SUT Configuration Parameters 1318 DUT/SUT parameters MUST conform to the requirements defined in 1319 Section 4.2. Any configuration changes for this specific 1320 benchmarking test MUST be documented. 1322 7.3.3.2. Test Equipment Configuration Parameters 1324 Test equipment configuration parameters MUST conform to the 1325 requirements defined in Section 4.3. The following parameters MUST 1326 be documented for this benchmarking test: 1328 Client IP address range defined in Section 4.3.1.2 1330 Server IP address range defined in Section 4.3.2.2 1332 Traffic distribution ratio between IPv4 and IPv6 defined in 1333 Section 4.3.1.2 1335 Target inspected throughput: Aggregated line rate of interface(s) 1336 used in the DUT/SUT or the value defined based on requirement for a 1337 specific deployment scenario 1338 Initial throughput: 10% of "Target inspected throughput" Note: 1339 Initial throughput is not a KPI to report. This value is configured 1340 on the traffic generator and used to perform Step 1: "Test 1341 Initialization and Qualification" described under Section 7.3.4. 1343 Number of HTTP response object requests (transactions) per 1344 connection: 10 1346 RECOMMENDED HTTP response object size: 1, 16, 64, 256 KByte, and 1347 mixed objects defined in Table 4. 1349 +=====================+============================+ 1350 | Object size (KByte) | Number of requests/ Weight | 1351 +=====================+============================+ 1352 | 0.2 | 1 | 1353 +---------------------+----------------------------+ 1354 | 6 | 1 | 1355 +---------------------+----------------------------+ 1356 | 8 | 1 | 1357 +---------------------+----------------------------+ 1358 | 9 | 1 | 1359 +---------------------+----------------------------+ 1360 | 10 | 1 | 1361 +---------------------+----------------------------+ 1362 | 25 | 1 | 1363 +---------------------+----------------------------+ 1364 | 26 | 1 | 1365 +---------------------+----------------------------+ 1366 | 35 | 1 | 1367 +---------------------+----------------------------+ 1368 | 59 | 1 | 1369 +---------------------+----------------------------+ 1370 | 347 | 1 | 1371 +---------------------+----------------------------+ 1373 Table 4: Mixed Objects [minor] Interesting/useful data. If there was any reference/explanation how these numbere where derived, that would be great to add. 1375 7.3.3.3. Test Results Validation Criteria 1377 The following criteria are the test results validation criteria. The 1378 test results validation criteria MUST be monitored during the whole 1379 sustain phase of the traffic load profile. 1381 a. Number of failed application transactions (receiving any HTTP 1382 response code other than 200 OK) MUST be less than 0.001% (1 out 1383 of 100,000 transactions) of attempt transactions. 1385 b. Traffic SHOULD be forwarded at a constant rate (considered as a 1386 constant rate if any deviation of traffic forwarding rate is less 1387 than 5%). 1389 c. Concurrent TCP connections MUST be constant during steady state 1390 and any deviation of concurrent TCP connections SHOULD be less 1391 than 10%. This confirms the DUT opens and closes TCP connections 1392 at approximately the same rate. 1394 7.3.3.4. Measurement 1396 Inspected Throughput and HTTP Transactions per Second MUST be 1397 reported for each object size. 1399 7.3.4. Test Procedures and Expected Results 1401 The test procedure is designed to measure HTTP throughput of the DUT/ 1402 SUT. The test procedure consists of three major steps: Step 1 1403 ensures the DUT/SUT is able to reach the performance value (Initial 1404 throughput) and meets the test results validation criteria when it 1405 was very minimal utilized. Step 2 determines the DUT/SUT is able to 1406 reach the target performance value within the test results validation 1407 criteria. Step 3 determines the maximum achievable performance value 1408 within the test results validation criteria. 1410 This test procedure MAY be repeated multiple times with different 1411 IPv4 and IPv6 traffic distribution and HTTP response object sizes. 1413 7.3.4.1. Step 1: Test Initialization and Qualification 1415 Verify the link status of all connected physical interfaces. All 1416 interfaces are expected to be in "UP" status. 1418 Configure traffic load profile of the test equipment to establish 1419 "Initial inspected throughput" as defined in Section 7.3.3.2. 1421 The traffic load profile SHOULD be defined as described in 1422 Section 4.3.4. The DUT/SUT SHOULD reach the "Initial inspected 1423 throughput" during the sustain phase. Measure all KPI as defined in 1424 Section 7.3.3.4. 1426 The measured KPIs during the sustain phase MUST meet the test results 1427 validation criteria "a" defined in Section 7.3.3.3. The test results 1428 validation criteria "b" and "c" are OPTIONAL for step 1. 1430 If the KPI metrics do not meet the test results validation criteria, 1431 the test procedure MUST NOT be continued to "Step 2". 1433 7.3.4.2. Step 2: Test Run with Target Objective 1435 Configure test equipment to establish the target objective ("Target 1436 inspected throughput") defined in Section 7.3.3.2. The test 1437 equipment SHOULD start to measure and record all specified KPIs. 1438 Continue the test until all traffic profile phases are completed. 1440 Within the test results validation criteria, the DUT/SUT is expected 1441 to reach the desired value of the target objective in the sustain 1442 phase. Follow step 3, if the measured value does not meet the target 1443 value or does not fulfill the test results validation criteria. 1445 7.3.4.3. Step 3: Test Iteration 1447 Determine the achievable inspected throughput within the test results 1448 validation criteria and measure the KPI metric Transactions per 1449 Second. Final test iteration MUST be performed for the test duration 1450 defined in Section 4.3.4. 1452 7.4. HTTP Transaction Latency [nit] It would be nice to have explanatory text explaining why 7.4 requires different test runs as opposed to just measuring the transaction latency as part of 7.2 and 7.3. I have not tried to compare in detail the descriptions here to figure out the differences in test runs, but even if there are differences, why would transaction latency not also be measured in 7.2 and 7.3 as a metric ? 1454 7.4.1. Objective 1456 Using HTTP traffic, determine the HTTP transaction latency when DUT 1457 is running with sustainable HTTP transactions per second supported by 1458 the DUT/SUT under different HTTP response object sizes. 1460 Test iterations MUST be performed with different HTTP response object 1461 sizes in two different scenarios. One with a single transaction and 1462 the other with multiple transactions within a single TCP connection. 1463 For consistency both the single and multiple transaction test MUST be 1464 configured with the same HTTP version 1466 Scenario 1: The client MUST negotiate HTTP and close the connection 1467 with FIN immediately after completion of a single transaction (GET 1468 and RESPONSE). 1470 Scenario 2: The client MUST negotiate HTTP and close the connection 1471 FIN immediately after completion of 10 transactions (GET and 1472 RESPONSE) within a single TCP connection. 1474 7.4.2. Test Setup 1476 Testbed setup SHOULD be configured as defined in Section 4. Any 1477 specific testbed configuration changes (number of interfaces and 1478 interface type, etc.) MUST be documented. 1480 7.4.3. Test Parameters 1482 In this section, benchmarking test specific parameters SHOULD be 1483 defined. 1485 7.4.3.1. DUT/SUT Configuration Parameters 1487 DUT/SUT parameters MUST conform to the requirements defined in 1488 Section 4.2. Any configuration changes for this specific 1489 benchmarking test MUST be documented. 1491 7.4.3.2. Test Equipment Configuration Parameters 1493 Test equipment configuration parameters MUST conform to the 1494 requirements defined in Section 4.3. The following parameters MUST 1495 be documented for this benchmarking test: 1497 Client IP address range defined in Section 4.3.1.2 1499 Server IP address range defined in Section 4.3.2.2 1501 Traffic distribution ratio between IPv4 and IPv6 defined in 1502 Section 4.3.1.2 1504 Target objective for scenario 1: 50% of the connections per second 1505 measured in benchmarking test TCP/HTTP Connections Per Second 1506 (Section 7.2) 1508 Target objective for scenario 2: 50% of the inspected throughput 1509 measured in benchmarking test HTTP Throughput (Section 7.3) 1511 Initial objective for scenario 1: 10% of "Target objective for 1512 scenario 1" 1514 Initial objective for scenario 2: 10% of "Target objective for 1515 scenario 2" 1517 Note: The Initial objectives are not a KPI to report. These values 1518 are configured on the traffic generator and used to perform the 1519 Step1: "Test Initialization and Qualification" described under the 1520 Section 7.4.4. 1522 HTTP transaction per TCP connection: Test scenario 1 with single 1523 transaction and test scenario 2 with 10 transactions. 1525 HTTP with GET request requesting a single object. The RECOMMENDED 1526 object sizes are 1, 16, and 64 KByte. For each test iteration, 1527 client MUST request a single HTTP response object size. 1529 7.4.3.3. Test Results Validation Criteria 1531 The following criteria are the test results validation criteria. The 1532 Test results validation criteria MUST be monitored during the whole 1533 sustain phase of the traffic load profile. 1535 a. Number of failed application transactions (receiving any HTTP 1536 response code other than 200 OK) MUST be less than 0.001% (1 out 1537 of 100,000 transactions) of attempt transactions. 1539 b. Number of terminated TCP connections due to unexpected TCP RST 1540 sent by DUT/SUT MUST be less than 0.001% (1 out of 100,000 1541 connections) of total initiated TCP connections. 1543 c. During the sustain phase, traffic SHOULD be forwarded at a 1544 constant rate (considered as a constant rate if any deviation of 1545 traffic forwarding rate is less than 5%). 1547 d. Concurrent TCP connections MUST be constant during steady state 1548 and any deviation of concurrent TCP connections SHOULD be less 1549 than 10%. This confirms the DUT opens and closes TCP connections 1550 at approximately the same rate. 1552 e. After ramp up the DUT MUST achieve the "Target objective" defined 1553 in Section 7.4.3.2 and remain in that state for the entire test 1554 duration (sustain phase). 1556 7.4.3.4. Measurement 1558 TTFB (minimum, average, and maximum) and TTLB (minimum, average and 1559 maximum) MUST be reported for each object size. 1561 7.4.4. Test Procedures and Expected Results 1563 The test procedure is designed to measure TTFB or TTLB when the DUT/ 1564 SUT is operating close to 50% of its maximum achievable connections 1565 per second or inspected throughput. The test procedure consists of 1566 two major steps: Step 1 ensures the DUT/SUT is able to reach the 1567 initial performance values and meets the test results validation 1568 criteria when it was very minimally utilized. Step 2 measures the 1569 latency values within the test results validation criteria. 1571 This test procedure MAY be repeated multiple times with different IP 1572 types (IPv4 only, IPv6 only and IPv4 and IPv6 mixed traffic 1573 distribution), HTTP response object sizes and single and multiple 1574 transactions per connection scenarios. 1576 7.4.4.1. Step 1: Test Initialization and Qualification 1578 Verify the link status of all connected physical interfaces. All 1579 interfaces are expected to be in "UP" status. 1581 Configure traffic load profile of the test equipment to establish 1582 "Initial objective" as defined in Section 7.4.3.2. The traffic load 1583 profile SHOULD be defined as described in Section 4.3.4. 1585 The DUT/SUT SHOULD reach the "Initial objective" before the sustain 1586 phase. The measured KPIs during the sustain phase MUST meet all the 1587 test results validation criteria defined in Section 7.4.3.3. 1589 If the KPI metrics do not meet the test results validation criteria, 1590 the test procedure MUST NOT be continued to "Step 2". 1592 7.4.4.2. Step 2: Test Run with Target Objective 1594 Configure test equipment to establish "Target objective" defined in 1595 Section 7.4.3.2. The test equipment SHOULD follow the traffic load 1596 profile definition as described in Section 4.3.4. 1598 The test equipment SHOULD start to measure and record all specified 1599 KPIs. Continue the test until all traffic profile phases are 1600 completed. 1602 Within the test results validation criteria, the DUT/SUT MUST reach 1603 the desired value of the target objective in the sustain phase. 1605 Measure the minimum, average, and maximum values of TTFB and TTLB. 1607 7.5. Concurrent TCP/HTTP Connection Capacity [nit] again a summary comparison of the traffic in 7.5 vs. the prior traffic profiles would be helpful to understand the benefit of these test runs. Is this about any real-world reqirement or more a synthetic performance number for unrealistic HTTP connections (which would still be a useful number IMHO, just want to know) ? The traffic profile below is somewhat strange because it defines the rate of GET within a TCP connection based not on real-world application behavior, but just to create some rate of GET per TCP connection over the steady state. I guess the goal is something like "measure the maximum sustainable number of TCP/HTTP connctions, wehreas each connection carries as little as possible traffic and a sufficiently low number of HTTP (GET) transactions that the DUT is not too much performance loaded with the HTTP level inspection, but mostly with HTTP/TCP flow maintainance ?? In general, describing for each of the 7.x section upfront the goal and design criteria of the test runs in those high-level terms is IMHO very beneficial for reviewers to vet if/how well the detailled description does meet the goals. Otherwise one is somewhat left puzzling about that question. Aka: enhance the 7.x.1 objective sessions with that amount of details. 1609 7.5.1. Objective 1611 Determine the number of concurrent TCP connections that the DUT/ SUT 1612 sustains when using HTTP traffic. 1614 7.5.2. Test Setup 1616 Testbed setup SHOULD be configured as defined in Section 4. Any 1617 specific testbed configuration changes (number of interfaces and 1618 interface type, etc.) MUST be documented. 1620 7.5.3. Test Parameters 1622 In this section, benchmarking test specific parameters SHOULD be 1623 defined. 1625 7.5.3.1. DUT/SUT Configuration Parameters 1627 DUT/SUT parameters MUST conform to the requirements defined in 1628 Section 4.2. Any configuration changes for this specific 1629 benchmarking test MUST be documented. 1631 7.5.3.2. Test Equipment Configuration Parameters 1633 Test equipment configuration parameters MUST conform to the 1634 requirements defined in Section 4.3. The following parameters MUST 1635 be noted for this benchmarking test: 1637 Client IP address range defined in Section 4.3.1.2 1639 Server IP address range defined in Section 4.3.2.2 1641 Traffic distribution ratio between IPv4 and IPv6 defined in 1642 Section 4.3.1.2 1644 Target concurrent connection: Initial value from product datasheet 1645 or the value defined based on requirement for a specific 1646 deployment scenario. 1648 Initial concurrent connection: 10% of "Target concurrent 1649 connection" Note: Initial concurrent connection is not a KPI to 1650 report. This value is configured on the traffic generator and 1651 used to perform the Step1: "Test Initialization and Qualification" 1652 described under the Section 7.5.4. 1654 Maximum connections per second during ramp up phase: 50% of 1655 maximum connections per second measured in benchmarking test TCP/ 1656 HTTP Connections per second (Section 7.2) 1658 Ramp up time (in traffic load profile for "Target concurrent 1659 connection"): "Target concurrent connection" / "Maximum 1660 connections per second during ramp up phase" 1662 Ramp up time (in traffic load profile for "Initial concurrent 1663 connection"): "Initial concurrent connection" / "Maximum 1664 connections per second during ramp up phase" 1666 The client MUST negotiate HTTP and each client MAY open multiple 1667 concurrent TCP connections per server endpoint IP. 1669 Each client sends 10 GET requests requesting 1 KByte HTTP response 1670 object in the same TCP connection (10 transactions/TCP connection) 1671 and the delay (think time) between each transaction MUST be X 1672 seconds. 1674 X = ("Ramp up time" + "steady state time") /10 1676 The established connections SHOULD remain open until the ramp down 1677 phase of the test. During the ramp down phase, all connections 1678 SHOULD be successfully closed with FIN. 1680 7.5.3.3. Test Results Validation Criteria 1682 The following criteria are the test results validation criteria. The 1683 Test results validation criteria MUST be monitored during the whole 1684 sustain phase of the traffic load profile. 1686 a. Number of failed application transactions (receiving any HTTP 1687 response code other than 200 OK) MUST be less than 0.001% (1 out 1688 of 100,000 transaction) of total attempted transactions. 1690 b. Number of terminated TCP connections due to unexpected TCP RST 1691 sent by DUT/SUT MUST be less than 0.001% (1 out of 100,000 1692 connections) of total initiated TCP connections. 1694 c. During the sustain phase, traffic SHOULD be forwarded at a 1695 constant rate (considered as a constant rate if any deviation of 1696 traffic forwarding rate is less than 5%). 1698 7.5.3.4. Measurement 1700 Average Concurrent TCP Connections MUST be reported for this 1701 benchmarking test. 1703 7.5.4. Test Procedures and Expected Results 1705 The test procedure is designed to measure the concurrent TCP 1706 connection capacity of the DUT/SUT at the sustaining period of 1707 traffic load profile. The test procedure consists of three major 1708 steps: Step 1 ensures the DUT/SUT is able to reach the performance 1709 value (Initial concurrent connection) and meets the test results 1710 validation criteria when it was very minimally utilized. Step 2 1711 determines the DUT/SUT is able to reach the target performance value 1712 within the test results validation criteria. Step 3 determines the 1713 maximum achievable performance value within the test results 1714 validation criteria. 1716 This test procedure MAY be repeated multiple times with different 1717 IPv4 and IPv6 traffic distribution. 1719 7.5.4.1. Step 1: Test Initialization and Qualification 1721 Verify the link status of all connected physical interfaces. All 1722 interfaces are expected to be in "UP" status. 1724 Configure test equipment to establish "Initial concurrent TCP 1725 connections" defined in Section 7.5.3.2. Except ramp up time, the 1726 traffic load profile SHOULD be defined as described in Section 4.3.4. 1728 During the sustain phase, the DUT/SUT SHOULD reach the "Initial 1729 concurrent TCP connections". The measured KPIs during the sustain 1730 phase MUST meet all the test results validation criteria defined in 1731 Section 7.5.3.3. 1733 If the KPI metrics do not meet the test results validation criteria, 1734 the test procedure MUST NOT be continued to "Step 2". 1736 7.5.4.2. Step 2: Test Run with Target Objective 1738 Configure test equipment to establish the target objective ("Target 1739 concurrent TCP connections"). The test equipment SHOULD follow the 1740 traffic load profile definition (except ramp up time) as described in 1741 Section 4.3.4. 1743 During the ramp up and sustain phase, the other KPIs such as 1744 inspected throughput, TCP connections per second, and application 1745 transactions per second MUST NOT reach the maximum value the DUT/SUT 1746 can support. 1748 The test equipment SHOULD start to measure and record KPIs defined in 1749 Section 7.5.3.4. Continue the test until all traffic profile phases 1750 are completed. 1752 Within the test results validation criteria, the DUT/SUT is expected 1753 to reach the desired value of the target objective in the sustain 1754 phase. Follow step 3, if the measured value does not meet the target 1755 value or does not fulfill the test results validation criteria. 1757 7.5.4.3. Step 3: Test Iteration 1759 Determine the achievable concurrent TCP connections capacity within 1760 the test results validation criteria. 1762 7.6. TCP/HTTPS Connections per Second [minor] The one big performance factor that i think is not documented or suggested to be compared is the cost of certificate (chain) validation for different key-length certificates used for the TCP/HTTPs connections. The parameters for TLS 1.2 and TLS 1.3 mentioned in before in the document do not cover that. I think it would be prudent to figure out an Internet common minimum (fastest to process) certificate and a common maximum complexity certificate. The latter one may simply be when revocation is enabled, e.g.: checking the server certificate against a revocation list. Just saying because server certificate verification may monopolise connection setup performance - unless you want to make the argument that it is irrelevant because due to the limited number of servers in the test, the DUT is assumed/known to be able to cache server certificate validation results during ramput phase so it does become irrelevant during steady state phase. But it would be at least good to describe this in text. 1763 7.6.1. Objective 1765 Using HTTPS traffic, determine the sustainable SSL/TLS session 1766 establishment rate supported by the DUT/SUT under different 1767 throughput load conditions. 1769 Test iterations MUST include common cipher suites and key strengths 1770 as well as forward looking stronger keys. Specific test iterations 1771 MUST include ciphers and keys defined in Section 7.6.3.2. 1773 For each cipher suite and key strengths, test iterations MUST use a 1774 single HTTPS response object size defined in Section 7.6.3.2 to 1775 measure connections per second performance under a variety of DUT/SUT 1776 security inspection load conditions. 1778 7.6.2. Test Setup 1780 Testbed setup SHOULD be configured as defined in Section 4. Any 1781 specific testbed configuration changes (number of interfaces and 1782 interface type, etc.) MUST be documented. 1784 7.6.3. Test Parameters 1786 In this section, benchmarking test specific parameters SHOULD be 1787 defined. 1789 7.6.3.1. DUT/SUT Configuration Parameters 1791 DUT/SUT parameters MUST conform to the requirements defined in 1792 Section 4.2. Any configuration changes for this specific 1793 benchmarking test MUST be documented. 1795 7.6.3.2. Test Equipment Configuration Parameters 1797 Test equipment configuration parameters MUST conform to the 1798 requirements defined in Section 4.3. The following parameters MUST 1799 be documented for this benchmarking test: 1801 Client IP address range defined in Section 4.3.1.2 1803 Server IP address range defined in Section 4.3.2.2 1805 Traffic distribution ratio between IPv4 and IPv6 defined in 1806 Section 4.3.1.2 1808 Target connections per second: Initial value from product datasheet 1809 or the value defined based on requirement for a specific deployment 1810 scenario. 1812 Initial connections per second: 10% of "Target connections per 1813 second" Note: Initial connections per second is not a KPI to report. 1814 This value is configured on the traffic generator and used to perform 1815 the Step1: "Test Initialization and Qualification" described under 1816 the Section 7.6.4. 1818 RECOMMENDED ciphers and keys defined in Section 4.3.1.3 1820 The client MUST negotiate HTTPS and close the connection with FIN 1821 immediately after completion of one transaction. In each test 1822 iteration, client MUST send GET request requesting a fixed HTTPS 1823 response object size. The RECOMMENDED object sizes are 1, 2, 4, 16, 1824 and 64 KByte. 1826 7.6.3.3. Test Results Validation Criteria 1828 The following criteria are the test results validation criteria. The 1829 test results validation criteria MUST be monitored during the whole 1830 test duration. 1832 a. Number of failed application transactions (receiving any HTTP 1833 response code other than 200 OK) MUST be less than 0.001% (1 out 1834 of 100,000 transactions) of attempt transactions. 1836 b. Number of terminated TCP connections due to unexpected TCP RST 1837 sent by DUT/SUT MUST be less than 0.001% (1 out of 100,000 1838 connections) of total initiated TCP connections. 1840 c. During the sustain phase, traffic SHOULD be forwarded at a 1841 constant rate (considered as a constant rate if any deviation of 1842 traffic forwarding rate is less than 5%). 1844 d. Concurrent TCP connections MUST be constant during steady state 1845 and any deviation of concurrent TCP connections SHOULD be less 1846 than 10%. This confirms the DUT opens and closes TCP connections 1847 at approximately the same rate. 1849 7.6.3.4. Measurement 1851 TCP connections per second MUST be reported for each test iteration 1852 (for each object size). 1854 The KPI metric TLS Handshake Rate can be measured in the test using 1 1855 KByte object size. 1857 7.6.4. Test Procedures and Expected Results 1859 The test procedure is designed to measure the TCP connections per 1860 second rate of the DUT/SUT at the sustaining period of traffic load 1861 profile. The test procedure consists of three major steps: Step 1 1862 ensures the DUT/SUT is able to reach the performance value (Initial 1863 connections per second) and meets the test results validation 1864 criteria when it was very minimally utilized. Step 2 determines the 1865 DUT/SUT is able to reach the target performance value within the test 1866 results validation criteria. Step 3 determines the maximum 1867 achievable performance value within the test results validation 1868 criteria. 1870 This test procedure MAY be repeated multiple times with different 1871 IPv4 and IPv6 traffic distribution. 1873 7.6.4.1. Step 1: Test Initialization and Qualification 1875 Verify the link status of all connected physical interfaces. All 1876 interfaces are expected to be in "UP" status. 1878 Configure traffic load profile of the test equipment to establish 1879 "Initial connections per second" as defined in Section 7.6.3.2. The 1880 traffic load profile SHOULD be defined as described in Section 4.3.4. 1882 The DUT/SUT SHOULD reach the "Initial connections per second" before 1883 the sustain phase. The measured KPIs during the sustain phase MUST 1884 meet all the test results validation criteria defined in 1885 Section 7.6.3.3. 1887 If the KPI metrics do not meet the test results validation criteria, 1888 the test procedure MUST NOT be continued to "Step 2". 1890 7.6.4.2. Step 2: Test Run with Target Objective 1892 Configure test equipment to establish "Target connections per second" 1893 defined in Section 7.6.3.2. The test equipment SHOULD follow the 1894 traffic load profile definition as described in Section 4.3.4. 1896 During the ramp up and sustain phase, other KPIs such as inspected 1897 throughput, concurrent TCP connections, and application transactions 1898 per second MUST NOT reach the maximum value the DUT/SUT can support. 1899 The test results for specific test iteration SHOULD NOT be reported, 1900 if the above mentioned KPI (especially inspected throughput) reaches 1901 the maximum value. (Example: If the test iteration with 64 KByte of 1902 HTTPS response object size reached the maximum inspected throughput 1903 limitation of the DUT, the test iteration MAY be interrupted and the 1904 result for 64 KByte SHOULD NOT be reported). 1906 The test equipment SHOULD start to measure and record all specified 1907 KPIs. Continue the test until all traffic profile phases are 1908 completed. 1910 Within the test results validation criteria, the DUT/SUT is expected 1911 to reach the desired value of the target objective ("Target 1912 connections per second") in the sustain phase. Follow step 3, if the 1913 measured value does not meet the target value or does not fulfill the 1914 test results validation criteria. 1916 7.6.4.3. Step 3: Test Iteration 1918 Determine the achievable connections per second within the test 1919 results validation criteria. 1921 7.7. HTTPS Throughput 1923 7.7.1. Objective 1925 Determine the sustainable inspected throughput of the DUT/SUT for 1926 HTTPS transactions varying the HTTPS response object size. 1928 Test iterations MUST include common cipher suites and key strengths 1929 as well as forward looking stronger keys. Specific test iterations 1930 MUST include the ciphers and keys defined in Section 7.7.3.2. 1932 7.7.2. Test Setup 1934 Testbed setup SHOULD be configured as defined in Section 4. Any 1935 specific testbed configuration changes (number of interfaces and 1936 interface type, etc.) MUST be documented. 1938 7.7.3. Test Parameters 1940 In this section, benchmarking test specific parameters SHOULD be 1941 defined. 1943 7.7.3.1. DUT/SUT Configuration Parameters 1945 DUT/SUT parameters MUST conform to the requirements defined in 1946 Section 4.2. Any configuration changes for this specific 1947 benchmarking test MUST be documented. 1949 7.7.3.2. Test Equipment Configuration Parameters 1951 Test equipment configuration parameters MUST conform to the 1952 requirements defined in Section 4.3. The following parameters MUST 1953 be documented for this benchmarking test: 1955 Client IP address range defined in Section 4.3.1.2 1957 Server IP address range defined in Section 4.3.2.2 1959 Traffic distribution ratio between IPv4 and IPv6 defined in 1960 Section 4.3.1.2 1962 Target inspected throughput: Aggregated line rate of interface(s) 1963 used in the DUT/SUT or the value defined based on requirement for a 1964 specific deployment scenario. 1966 Initial throughput: 10% of "Target inspected throughput" Note: 1967 Initial throughput is not a KPI to report. This value is configured 1968 on the traffic generator and used to perform the Step1: "Test 1969 Initialization and Qualification" described under the Section 7.7.4. 1971 Number of HTTPS response object requests (transactions) per 1972 connection: 10 1974 RECOMMENDED ciphers and keys defined in Section 4.3.1.3 1976 RECOMMENDED HTTPS response object size: 1, 16, 64, 256 KByte, and 1977 mixed objects defined in Table 4 under Section 7.3.3.2. 1979 7.7.3.3. Test Results Validation Criteria 1981 The following criteria are the test results validation criteria. The 1982 test results validation criteria MUST be monitored during the whole 1983 sustain phase of the traffic load profile. 1985 a. Number of failed Application transactions (receiving any HTTP 1986 response code other than 200 OK) MUST be less than 0.001% (1 out 1987 of 100,000 transactions) of attempt transactions. 1989 b. Traffic SHOULD be forwarded at a constant rate (considered as a 1990 constant rate if any deviation of traffic forwarding rate is less 1991 than 5%). 1993 c. Concurrent TCP connections MUST be constant during steady state 1994 and any deviation of concurrent TCP connections SHOULD be less 1995 than 10%. This confirms the DUT opens and closes TCP connections 1996 at approximately the same rate. 1998 7.7.3.4. Measurement 2000 Inspected Throughput and HTTP Transactions per Second MUST be 2001 reported for each object size. 2003 7.7.4. Test Procedures and Expected Results 2005 The test procedure consists of three major steps: Step 1 ensures the 2006 DUT/SUT is able to reach the performance value (Initial throughput) 2007 and meets the test results validation criteria when it was very 2008 minimally utilized. Step 2 determines the DUT/SUT is able to reach 2009 the target performance value within the test results validation 2010 criteria. Step 3 determines the maximum achievable performance value 2011 within the test results validation criteria. 2013 This test procedure MAY be repeated multiple times with different 2014 IPv4 and IPv6 traffic distribution and HTTPS response object sizes. 2016 7.7.4.1. Step 1: Test Initialization and Qualification 2018 Verify the link status of all connected physical interfaces. All 2019 interfaces are expected to be in "UP" status. 2021 Configure traffic load profile of the test equipment to establish 2022 "Initial throughput" as defined in Section 7.7.3.2. 2024 The traffic load profile SHOULD be defined as described in 2025 Section 4.3.4. The DUT/SUT SHOULD reach the "Initial throughput" 2026 during the sustain phase. Measure all KPI as defined in 2027 Section 7.7.3.4. 2029 The measured KPIs during the sustain phase MUST meet the test results 2030 validation criteria "a" defined in Section 7.7.3.3. The test results 2031 validation criteria "b" and "c" are OPTIONAL for step 1. 2033 If the KPI metrics do not meet the test results validation criteria, 2034 the test procedure MUST NOT be continued to "Step 2". 2036 7.7.4.2. Step 2: Test Run with Target Objective 2038 Configure test equipment to establish the target objective ("Target 2039 inspected throughput") defined in Section 7.7.3.2. The test 2040 equipment SHOULD start to measure and record all specified KPIs. 2041 Continue the test until all traffic profile phases are completed. 2043 Within the test results validation criteria, the DUT/SUT is expected 2044 to reach the desired value of the target objective in the sustain 2045 phase. Follow step 3, if the measured value does not meet the target 2046 value or does not fulfill the test results validation criteria. 2048 7.7.4.3. Step 3: Test Iteration 2050 Determine the achievable average inspected throughput within the test 2051 results validation criteria. Final test iteration MUST be performed 2052 for the test duration defined in Section 4.3.4. 2054 7.8. HTTPS Transaction Latency 2056 7.8.1. Objective 2058 Using HTTPS traffic, determine the HTTPS transaction latency when 2059 DUT/SUT is running with sustainable HTTPS transactions per second 2060 supported by the DUT/SUT under different HTTPS response object size. 2062 Scenario 1: The client MUST negotiate HTTPS and close the connection 2063 with FIN immediately after completion of a single transaction (GET 2064 and RESPONSE). 2066 Scenario 2: The client MUST negotiate HTTPS and close the connection 2067 with FIN immediately after completion of 10 transactions (GET and 2068 RESPONSE) within a single TCP connection. 2070 7.8.2. Test Setup 2072 Testbed setup SHOULD be configured as defined in Section 4. Any 2073 specific testbed configuration changes (number of interfaces and 2074 interface type, etc.) MUST be documented. 2076 7.8.3. Test Parameters 2078 In this section, benchmarking test specific parameters SHOULD be 2079 defined. 2081 7.8.3.1. DUT/SUT Configuration Parameters 2083 DUT/SUT parameters MUST conform to the requirements defined in 2084 Section 4.2. Any configuration changes for this specific 2085 benchmarking test MUST be documented. 2087 7.8.3.2. Test Equipment Configuration Parameters 2089 Test equipment configuration parameters MUST conform to the 2090 requirements defined in Section 4.3. The following parameters MUST 2091 be documented for this benchmarking test: 2093 Client IP address range defined in Section 4.3.1.2 2095 Server IP address range defined in Section 4.3.2.2 2096 Traffic distribution ratio between IPv4 and IPv6 defined in 2097 Section 4.3.1.2 2099 RECOMMENDED cipher suites and key sizes defined in Section 4.3.1.3 2101 Target objective for scenario 1: 50% of the connections per second 2102 measured in benchmarking test TCP/HTTPS Connections per second 2103 (Section 7.6) 2105 Target objective for scenario 2: 50% of the inspected throughput 2106 measured in benchmarking test HTTPS Throughput (Section 7.7) 2108 Initial objective for scenario 1: 10% of "Target objective for 2109 scenario 1" 2111 Initial objective for scenario 2: 10% of "Target objective for 2112 scenario 2" 2114 Note: The Initial objectives are not a KPI to report. These values 2115 are configured on the traffic generator and used to perform the 2116 Step1: "Test Initialization and Qualification" described under the 2117 Section 7.8.4. 2119 HTTPS transaction per TCP connection: Test scenario 1 with single 2120 transaction and scenario 2 with 10 transactions 2122 HTTPS with GET request requesting a single object. The RECOMMENDED 2123 object sizes are 1, 16, and 64 KByte. For each test iteration, 2124 client MUST request a single HTTPS response object size. 2126 7.8.3.3. Test Results Validation Criteria 2128 The following criteria are the test results validation criteria. The 2129 Test results validation criteria MUST be monitored during the whole 2130 sustain phase of the traffic load profile. 2132 a. Number of failed application transactions (receiving any HTTP 2133 response code other than 200 OK) MUST be less than 0.001% (1 out 2134 of 100,000 transactions) of attempt transactions. 2136 b. Number of terminated TCP connections due to unexpected TCP RST 2137 sent by DUT/SUT MUST be less than 0.001% (1 out of 100,000 2138 connections) of total initiated TCP connections. 2140 c. During the sustain phase, traffic SHOULD be forwarded at a 2141 constant rate (considered as a constant rate if any deviation of 2142 traffic forwarding rate is less than 5%). 2144 d. Concurrent TCP connections MUST be constant during steady state 2145 and any deviation of concurrent TCP connections SHOULD be less 2146 than 10%. This confirms the DUT opens and closes TCP connections 2147 at approximately the same rate. 2149 e. After ramp up the DUT/SUT MUST achieve the "Target objective" 2150 defined in the parameter Section 7.8.3.2 and remain in that state 2151 for the entire test duration (sustain phase). 2153 7.8.3.4. Measurement 2155 TTFB (minimum, average, and maximum) and TTLB (minimum, average and 2156 maximum) MUST be reported for each object size. 2158 7.8.4. Test Procedures and Expected Results 2160 The test procedure is designed to measure TTFB or TTLB when the DUT/ 2161 SUT is operating close to 50% of its maximum achievable connections 2162 per second or inspected throughput. The test procedure consists of 2163 two major steps: Step 1 ensures the DUT/SUT is able to reach the 2164 initial performance values and meets the test results validation 2165 criteria when it was very minimally utilized. Step 2 measures the 2166 latency values within the test results validation criteria. 2168 This test procedure MAY be repeated multiple times with different IP 2169 types (IPv4 only, IPv6 only and IPv4 and IPv6 mixed traffic 2170 distribution), HTTPS response object sizes and single, and multiple 2171 transactions per connection scenarios. 2173 7.8.4.1. Step 1: Test Initialization and Qualification 2175 Verify the link status of all connected physical interfaces. All 2176 interfaces are expected to be in "UP" status. 2178 Configure traffic load profile of the test equipment to establish 2179 "Initial objective" as defined in the Section 7.8.3.2. The traffic 2180 load profile SHOULD be defined as described in Section 4.3.4. 2182 The DUT/SUT SHOULD reach the "Initial objective" before the sustain 2183 phase. The measured KPIs during the sustain phase MUST meet all the 2184 test results validation criteria defined in Section 7.8.3.3. 2186 If the KPI metrics do not meet the test results validation criteria, 2187 the test procedure MUST NOT be continued to "Step 2". 2189 7.8.4.2. Step 2: Test Run with Target Objective 2191 Configure test equipment to establish "Target objective" defined in 2192 Section 7.8.3.2. The test equipment SHOULD follow the traffic load 2193 profile definition as described in Section 4.3.4. 2195 The test equipment SHOULD start to measure and record all specified 2196 KPIs. Continue the test until all traffic profile phases are 2197 completed. 2199 Within the test results validation criteria, the DUT/SUT MUST reach 2200 the desired value of the target objective in the sustain phase. 2202 Measure the minimum, average, and maximum values of TTFB and TTLB. 2204 7.9. Concurrent TCP/HTTPS Connection Capacity 2206 7.9.1. Objective 2208 Determine the number of concurrent TCP connections the DUT/SUT 2209 sustains when using HTTPS traffic. 2211 7.9.2. Test Setup 2213 Testbed setup SHOULD be configured as defined in Section 4. Any 2214 specific testbed configuration changes (number of interfaces and 2215 interface type, etc.) MUST be documented. 2217 7.9.3. Test Parameters 2219 In this section, benchmarking test specific parameters SHOULD be 2220 defined. 2222 7.9.3.1. DUT/SUT Configuration Parameters 2224 DUT/SUT parameters MUST conform to the requirements defined in 2225 Section 4.2. Any configuration changes for this specific 2226 benchmarking test MUST be documented. 2228 7.9.3.2. Test Equipment Configuration Parameters 2230 Test equipment configuration parameters MUST conform to the 2231 requirements defined in Section 4.3. The following parameters MUST 2232 be documented for this benchmarking test: 2234 Client IP address range defined in Section 4.3.1.2 2236 Server IP address range defined in Section 4.3.2.2 2237 Traffic distribution ratio between IPv4 and IPv6 defined in 2238 Section 4.3.1.2 2240 RECOMMENDED cipher suites and key sizes defined in Section 4.3.1.3 2242 Target concurrent connections: Initial value from product 2243 datasheet or the value defined based on requirement for a specific 2244 deployment scenario. 2246 Initial concurrent connections: 10% of "Target concurrent 2247 connections" Note: Initial concurrent connection is not a KPI to 2248 report. This value is configured on the traffic generator and 2249 used to perform the Step1: "Test Initialization and Qualification" 2250 described under the Section 7.9.4. 2252 Connections per second during ramp up phase: 50% of maximum 2253 connections per second measured in benchmarking test TCP/HTTPS 2254 Connections per second (Section 7.6) 2256 Ramp up time (in traffic load profile for "Target concurrent 2257 connections"): "Target concurrent connections" / "Maximum 2258 connections per second during ramp up phase" 2260 Ramp up time (in traffic load profile for "Initial concurrent 2261 connections"): "Initial concurrent connections" / "Maximum 2262 connections per second during ramp up phase" 2264 The client MUST perform HTTPS transaction with persistence and each 2265 client can open multiple concurrent TCP connections per server 2266 endpoint IP. 2268 Each client sends 10 GET requests requesting 1 KByte HTTPS response 2269 objects in the same TCP connections (10 transactions/TCP connection) 2270 and the delay (think time) between each transaction MUST be X 2271 seconds. 2273 X = ("Ramp up time" + "steady state time") /10 2275 The established connections SHOULD remain open until the ramp down 2276 phase of the test. During the ramp down phase, all connections 2277 SHOULD be successfully closed with FIN. 2279 7.9.3.3. Test Results Validation Criteria 2281 The following criteria are the test results validation criteria. The 2282 Test results validation criteria MUST be monitored during the whole 2283 sustain phase of the traffic load profile. 2285 a. Number of failed application transactions (receiving any HTTP 2286 response code other than 200 OK) MUST be less than 0.001% (1 out 2287 of 100,000 transactions) of total attempted transactions. 2289 b. Number of terminated TCP connections due to unexpected TCP RST 2290 sent by DUT/SUT MUST be less than 0.001% (1 out of 100,000 2291 connections) of total initiated TCP connections. 2293 c. During the sustain phase, traffic SHOULD be forwarded at a 2294 constant rate (considered as a constant rate if any deviation of 2295 traffic forwarding rate is less than 5%). 2297 7.9.3.4. Measurement 2299 Average Concurrent TCP Connections MUST be reported for this 2300 benchmarking test. 2302 7.9.4. Test Procedures and Expected Results 2304 The test procedure is designed to measure the concurrent TCP 2305 connection capacity of the DUT/SUT at the sustaining period of 2306 traffic load profile. The test procedure consists of three major 2307 steps: Step 1 ensures the DUT/SUT is able to reach the performance 2308 value (Initial concurrent connection) and meets the test results 2309 validation criteria when it was very minimally utilized. Step 2 2310 determines the DUT/SUT is able to reach the target performance value 2311 within the test results validation criteria. Step 3 determines the 2312 maximum achievable performance value within the test results 2313 validation criteria. 2315 This test procedure MAY be repeated multiple times with different 2316 IPv4 and IPv6 traffic distribution. 2318 7.9.4.1. Step 1: Test Initialization and Qualification 2320 Verify the link status of all connected physical interfaces. All 2321 interfaces are expected to be in "UP" status. 2323 Configure test equipment to establish "Initial concurrent TCP 2324 connections" defined in Section 7.9.3.2. Except ramp up time, the 2325 traffic load profile SHOULD be defined as described in Section 4.3.4. 2327 During the sustain phase, the DUT/SUT SHOULD reach the "Initial 2328 concurrent TCP connections". The measured KPIs during the sustain 2329 phase MUST meet the test results validation criteria "a" and "b" 2330 defined in Section 7.9.3.3. 2332 If the KPI metrics do not meet the test results validation criteria, 2333 the test procedure MUST NOT be continued to "Step 2". 2335 7.9.4.2. Step 2: Test Run with Target Objective 2337 Configure test equipment to establish the target objective ("Target 2338 concurrent TCP connections"). The test equipment SHOULD follow the 2339 traffic load profile definition (except ramp up time) as described in 2340 Section 4.3.4. 2342 During the ramp up and sustain phase, the other KPIs such as 2343 inspected throughput, TCP connections per second, and application 2344 transactions per second MUST NOT reach to the maximum value that the 2345 DUT/SUT can support. 2347 The test equipment SHOULD start to measure and record KPIs defined in 2348 Section 7.9.3.4. Continue the test until all traffic profile phases 2349 are completed. 2351 Within the test results validation criteria, the DUT/SUT is expected 2352 to reach the desired value of the target objective in the sustain 2353 phase. Follow step 3, if the measured value does not meet the target 2354 value or does not fulfill the test results validation criteria. 2356 7.9.4.3. Step 3: Test Iteration 2358 Determine the achievable concurrent TCP connections within the test 2359 results validation criteria. [mayor] I would really love to see DUT power consumption numbers captured and reported for the 10% and the maximum achieved rates for the 7.x tests (during steady state). Energy consumption is becoming a more and more important factor in networking, and all the high-touch operations of security devices are amongst the most power/compute hungry operations of any network device, but with a wide variety depending on how its implemented. Its also extremely simple to just plug a power-meter into the supply line of the DUT. This would encourage DUT vendors to reduce power consumption, something that often can be achieved by just selecting appropriate components (lowest power CPU options, going FPGA etc.. routes). Personally, i am of course also interested in easily derived performance factors such as comparing 100% power consumption for the HTTP vs. HTTPS case - cost of end-to-end security that is. If a DUT just shows linerate for both HTTP and HTTPS, but with double the power consumption when using HTTPs, that may even impact deployment - even in small cases with a small 19" rack, some ventilation and some amount of power - 100..500W makes a difference whethre its 100 or 500W. 2361 8. IANA Considerations 2363 This document makes no specific request of IANA. 2365 The IANA has assigned IPv4 and IPv6 address blocks in [RFC6890] that 2366 have been registered for special purposes. The IPv6 address block 2367 2001:2::/48 has been allocated for the purpose of IPv6 Benchmarking 2368 [RFC5180] and the IPv4 address block 198.18.0.0/15 has been allocated 2369 for the purpose of IPv4 Benchmarking [RFC2544]. This assignment was 2370 made to minimize the chance of conflict in case a testing device were 2371 to be accidentally connected to part of the Internet. [minor] I don't think the secnd paragraph belongs into an IANA considerations section. This section is usually resesrved only for actions IANA is supposed to take for this document. I would suggest to move this paragraph to an earlier section, maybe even simply make one up "Addressing for tests". 2373 9. Security Considerations 2375 The primary goal of this document is to provide benchmarking 2376 terminology and methodology for next-generation network security 2377 devices for use in a laboratory isolated test environment. However, 2378 readers should be aware that there is some overlap between 2379 performance and security issues. Specifically, the optimal 2380 configuration for network security device performance may not be the 2381 most secure, and vice-versa. The cipher suites recommended in this 2382 document are for test purpose only. The cipher suite recommendation 2383 for a real deployment is outside the scope of this document. 2385 10. Contributors 2387 The following individuals contributed significantly to the creation 2388 of this document: 2390 Alex Samonte, Amritam Putatunda, Aria Eslambolchizadeh, Chao Guo, 2391 Chris Brown, Cory Ford, David DeSanto, Jurrie Van Den Breekel, 2392 Michelle Rhines, Mike Jack, Ryan Liles, Samaresh Nair, Stephen 2393 Goudreault, Tim Carlin, and Tim Otto. 2395 11. Acknowledgements 2397 The authors wish to acknowledge the members of NetSecOPEN for their 2398 participation in the creation of this document. Additionally, the 2399 following members need to be acknowledged: 2401 Anand Vijayan, Chris Marshall, Jay Lindenauer, Michael Shannon, Mike 2402 Deichman, Ryan Riese, and Toulnay Orkun. 2404 12. References 2406 12.1. Normative References 2408 [RFC2119] Bradner, S., "Key words for use in RFCs to Indicate 2409 Requirement Levels", BCP 14, RFC 2119, 2410 DOI 10.17487/RFC2119, March 1997, 2411 . 2413 [RFC8174] Leiba, B., "Ambiguity of Uppercase vs Lowercase in RFC 2414 2119 Key Words", BCP 14, RFC 8174, DOI 10.17487/RFC8174, 2415 May 2017, . 2417 12.2. Informative References 2419 [RFC2544] Bradner, S. and J. McQuaid, "Benchmarking Methodology for 2420 Network Interconnect Devices", RFC 2544, 2421 DOI 10.17487/RFC2544, March 1999, 2422 . 2424 [RFC2647] Newman, D., "Benchmarking Terminology for Firewall 2425 Performance", RFC 2647, DOI 10.17487/RFC2647, August 1999, 2426 . 2428 [RFC3511] Hickman, B., Newman, D., Tadjudin, S., and T. Martin, 2429 "Benchmarking Methodology for Firewall Performance", 2430 RFC 3511, DOI 10.17487/RFC3511, April 2003, 2431 . 2433 [RFC5180] Popoviciu, C., Hamza, A., Van de Velde, G., and D. 2434 Dugatkin, "IPv6 Benchmarking Methodology for Network 2435 Interconnect Devices", RFC 5180, DOI 10.17487/RFC5180, May 2436 2008, . 2438 [RFC6815] Bradner, S., Dubray, K., McQuaid, J., and A. Morton, 2439 "Applicability Statement for RFC 2544: Use on Production 2440 Networks Considered Harmful", RFC 6815, 2441 DOI 10.17487/RFC6815, November 2012, 2442 . 2444 [RFC6890] Cotton, M., Vegoda, L., Bonica, R., Ed., and B. Haberman, 2445 "Special-Purpose IP Address Registries", BCP 153, 2446 RFC 6890, DOI 10.17487/RFC6890, April 2013, 2447 . 2449 [RFC7230] Fielding, R., Ed. and J. Reschke, Ed., "Hypertext Transfer 2450 Protocol (HTTP/1.1): Message Syntax and Routing", 2451 RFC 7230, DOI 10.17487/RFC7230, June 2014, 2452 . 2454 [RFC8446] Rescorla, E., "The Transport Layer Security (TLS) Protocol 2455 Version 1.3", RFC 8446, DOI 10.17487/RFC8446, August 2018, 2456 . 2458 [RFC9000] Iyengar, J., Ed. and M. Thomson, Ed., "QUIC: A UDP-Based 2459 Multiplexed and Secure Transport", RFC 9000, 2460 DOI 10.17487/RFC9000, May 2021, 2461 . 2463 Appendix A. Test Methodology - Security Effectiveness Evaluation [nit] /Evaluation/Test/ - called test in the rest of this doc. 2464 A.1. Test Objective 2466 This test methodology verifies the DUT/SUT is able to detect, [nit] /verifies the/ verifies that the/ 2467 prevent, and report the vulnerabilities. 2469 In this test, background test traffic will be generated to utilize 2470 the DUT/SUT. In parallel, the CVEs will be sent to the DUT/SUT as 2471 encrypted and as well as clear text payload formats using a traffic 2472 generator. The selection of the CVEs is described in Section 4.2.1. 2474 The following KPIs are measured in this test: 2476 * Number of blocked CVEs 2478 * Number of bypassed (nonblocked) CVEs 2480 * Background traffic performance (verify if the background traffic 2481 is impacted while sending CVE toward DUT/SUT) 2483 * Accuracy of DUT/SUT statistics in term of vulnerabilities 2484 reporting 2486 A.2. Testbed Setup 2488 The same testbed MUST be used for security effectiveness test and as 2489 well as for benchmarking test cases defined in Section 7. 2491 A.3. Test Parameters 2493 In this section, the benchmarking test specific parameters SHOULD be 2494 defined. [nit] /SHOULD/are/ - a requirement against the authors of the document to write desirable text in the document is not normative. 2496 A.3.1. DUT/SUT Configuration Parameters 2498 DUT/SUT configuration parameters MUST conform to the requirements 2499 defined in Section 4.2. The same DUT configuration MUST be used for 2500 Security effectiveness test and as well as for benchmarking test 2501 cases defined in Section 7. The DUT/SUT MUST be configured in inline 2502 mode and all detected attack traffic MUST be dropped and the session [nit] /detected traffic/detected CVE traffic/ - there is also background traffic, which i guess shouldnot be dropped, right ? [nit] /the session/its session/ ? 2503 SHOULD be reset 2505 A.3.2. Test Equipment Configuration Parameters 2507 Test equipment configuration parameters MUST conform to the 2508 requirements defined in Section 4.3. The same client and server IP 2509 ranges MUST be configured as used in the benchmarking test cases. In 2510 addition, the following parameters MUST be documented for this 2511 benchmarking test: 2513 * Background Traffic: 45% of maximum HTTP throughput and 45% of 2514 Maximum HTTPS throughput supported by the DUT/SUT (measured with 2515 object size 64 KByte in the benchmarking tests "HTTP(S) 2516 Throughput" defined in Section 7.3 and Section 7.7). [nit] RECOMMENDED Background Traffic ? 2518 * RECOMMENDED CVE traffic transmission Rate: 10 CVEs per second 2520 * It is RECOMMENDED to generate each CVE multiple times 2521 (sequentially) at 10 CVEs per second 2523 * Ciphers and keys for the encrypted CVE traffic MUST use the same 2524 cipher configured for HTTPS traffic related benchmarking tests 2525 (Section 7.6 - Section 7.9) 2527 A.4. Test Results Validation Criteria 2529 The following criteria are the test results validation criteria. The 2530 test results validation criteria MUST be monitored during the whole 2531 test duration. [nit] /criteria are/lists/ - duplication of criteria in sentence. 2533 a. Number of failed application transaction in the background 2534 traffic MUST be less than 0.01% of attempted transactions. 2536 b. Number of terminated TCP connections of the background traffic 2537 (due to unexpected TCP RST sent by DUT/SUT) MUST be less than 2538 0.01% of total initiated TCP connections in the background 2539 traffic. [comment] That is quite high. Shouldn't this at least be 5 nines of success ? 99.999% -> 0.001% maximum rate of errors ? I thought thats the common lore service provider product quality requirement minimum. 2541 c. During the sustain phase, traffic SHOULD be forwarded at a 2542 constant rate (considered as a constant rate if any deviation of 2543 traffic forwarding rate is less than 5%). [minor] This seems underspecified. I guess in the ideally behaving DUT case all background traffic is passed unmodified and all CVE connection traffic is dropped. So the total amount of traffic with CVE events must be configured to be less then 5% ?! What additional information would this 5% tell me that i do not already get from a. and b. ? E.g.: if i fail some background connection, then the impact depends on how big that connection would have been, but it doesn't seem as if i get new information if a big NetFlix background flow got killed and therefore 5 Gigabyte less background traffic where observed, or if the same happened to a 200KByte amazon shopping connection. It would just cause DUT to maybe do less inspection on big flows in fear of triggering false resets on them ?? Is that what we want from DUTs ? 2545 d. False positive MUST NOT occur in the background traffic. [comment] I do not understand d. When a background transaction from a. fails, how is that different from false-positively being classified as a CVE - it would be droppen then, right ? Or are you saying that a./b. is the case that the background traffic receives errors from the DUT even though the DUT does NOT recognize it as a CVE ? Any example reason why that would happen ? 2547 A.5. Measurement 2549 Following KPI metrics MUST be reported for this test scenario: 2551 Mandatory KPIs: 2553 * Blocked CVEs: It SHOULD be represented in the following ways: 2555 - Number of blocked CVEs out of total CVEs 2557 - Percentage of blocked CVEs 2559 * Unblocked CVEs: It SHOULD be represented in the following ways: 2561 - Number of unblocked CVEs out of total CVEs 2563 - Percentage of unblocked CVEs 2565 * Background traffic behavior: It SHOULD be represented one of the 2566 followings ways: 2568 - No impact: Considered as "no impact'" if any deviation of 2569 traffic forwarding rate is less than or equal to 5 % (constant 2570 rate) 2572 - Minor impact: Considered as "minor impact" if any deviation of 2573 traffic forwarding rate is greater than 5% and less than or 2574 equal to10% (i.e. small spikes) 2576 - Heavily impacted: Considered as "Heavily impacted" if any 2577 deviation of traffic forwarding rate is greater than 10% (i.e. 2578 large spikes) or reduced the background HTTP(S) throughput 2579 greater than 10% [minor] I would prefer reporting the a./b. numbers, e.g.: percentage of failed background connections. As mentioned before, i find the total background traffic rate impact a rather problematic/less valuable metric. 2581 * DUT/SUT reporting accuracy: DUT/SUT MUST report all detected 2582 vulnerabilities. 2584 Optional KPIs: 2586 * List of unblocked CVEs [minor] I think this KPI is a SHOULD or even MUST. Otherwise one can not trace security impacts (when one does not know which CVE it is). This is still the security effectiveness appendix, and reporting is not effective without this. 2588 A.6. Test Procedures and Expected Results 2590 The test procedure is designed to measure the security effectiveness 2591 of the DUT/SUT at the sustaining period of the traffic load profile. 2592 The test procedure consists of two major steps. This test procedure 2593 MAY be repeated multiple times with different IPv4 and IPv6 traffic 2594 distribution. 2596 A.6.1. Step 1: Background Traffic 2598 Generate background traffic at the transmission rate defined in 2599 Appendix A.3.2. 2601 The DUT/SUT MUST reach the target objective (HTTP(S) throughput) in 2602 sustain phase. The measured KPIs during the sustain phase MUST meet 2603 all the test results validation criteria defined in Appendix A.4. 2605 If the KPI metrics do not meet the acceptance criteria, the test 2606 procedure MUST NOT be continued to "Step 2". 2608 A.6.2. Step 2: CVE Emulation 2610 While generating background traffic (in sustain phase), send the CVE 2611 traffic as defined in the parameter section. 2613 The test equipment SHOULD start to measure and record all specified 2614 KPIs. Continue the test until all CVEs are sent. 2616 The measured KPIs MUST meet all the test results validation criteria 2617 defined in Appendix A.4. 2619 In addition, the DUT/SUT SHOULD report the vulnerabilities correctly. 2621 Appendix B. DUT/SUT Classification 2623 This document aims to classify the DUT/SUT in four different 2624 categories based on its maximum supported firewall throughput 2625 performance number defined in the vendor datasheet. This 2626 classification MAY help user to determine specific configuration 2627 scale (e.g., number of ACL entries), traffic profiles, and attack 2628 traffic profiles, scaling those proportionally to DUT/SUT sizing 2629 category. 2631 The four different categories are Extra Small (XS), Small (S), Medium 2632 (M), and Large (L). The RECOMMENDED throughput values for the 2633 following categories are: 2635 Extra Small (XS) - Supported throughput less than or equal to1Gbit/s 2637 Small (S) - Supported throughput greater than 1Gbit/s and less than 2638 or equal to 5Gbit/s 2640 Medium (M) - Supported throughput greater than 5Gbit/s and less than 2641 or equal to10Gbit/s 2643 Large (L) - Supported throughput greater than 10Gbit/s 2645 Authors' Addresses 2647 Balamuhunthan Balarajah 2648 Berlin 2649 Germany 2651 Email: bm.balarajah@gmail.com 2652 Carsten Rossenhoevel 2653 EANTC AG 2654 Salzufer 14 2655 10587 Berlin 2656 Germany 2658 Email: cross@eantc.de 2660 Brian Monkman 2661 NetSecOPEN 2662 417 Independence Court 2663 Mechanicsburg, PA 17050 2664 United States of America 2666 Email: bmonkman@netsecopen.org EOF