I have reviewed draft-ietf-intarea-frag-fragile-09. This document has been reviewed as part of the transport area review team's ongoing effort to review key IETF documents. These comments were written primarily for the transport area directors, but are copied to the document's authors and WG to allow them to address any issues raised and also to the IETF discussion list for information. When done at the time of IETF Last Call, the authors should consider this review as part of the last-call comments they receive. Please always CC tsv-art@ietf.org if you reply to or forward this review. Overview This document presents advice on the use of fragmentation and what should be supported. Most of the recommendations are presented in other RFCs, but fragmentation is an important topic to consider and one that has received significant recent attention. This document is almost ready, but has issues, in particular relating to other IETF working group documents that should be addressed before it proceeds. (1) “Software libraries SHOULD include provision for PLPMTUD for each supported transport protocol.” - The above text in Section 7.2 sounds like a good recommendation, but it may not yet be a reality for many TCP implementations and UDP support is being specified as DPLPMTUD in TSVWG. I suggest this NOT an issue for this being a recommendation in a BCP in future, but to me it would create a normative dependency on [draft-ietf-tsvwg-datagram-plpmtud]. (2) Alignment is needed on content and recommendations with draft-ietf-intarea-tunnels. That draft explicitly speaks of the need for fragmentation to support tunnels, and it seems that this is a little at odds with what is described in the draft being reviewed. It may be that the WG thinks the intended audience is different - since draft-ietf-intarea-tunnels focuses only on nodes providing tunnel services, whereas the draft being reviewed appears to focus on endpoints. None-the-less both are targeting BCP status, and both are from the same WG. As an example, Section 9 appears to be an overall discouragement on the use of fragments. It could be that this creates a normative dependency on two active WG drafts, currently cited as informational: [I-D.ietf-intarea-tunnels] - because this draft needs to be absorbed with that draft when reading about tunnels. Both appear to target BCP status. The opportunities for confusion would be considerably reduced if the WG thought it was able to ensure that the two documents clearly defined their own scope - and even better if the two could be published as a document pair so that the RFC-series clear talks about the fragility of using fragments from endpoints and the need to support fragments to support tunnels. Major/Technical comments by section: Section 2.1 on Note 1 “ So, for the purposes of this document, we assume that the IPv4 minimum link MTU is 576 bytes.” - I expect this is mostly true - but I question whether this extra clause is wise to state within a BCP? --- Section 4.4 Technical: - Please ensure the text clearly supports recommendation in Section 7.4. At first reading, it was not clear whether the points were different or the same. It was not clear to me from the text why this is described as "a partial solution" to this problem. In other places, ECMP use of the flow label is recommended, althoughI could understand perhaps that maybe uptake of this approach is still something that has yet to be the expected default. None-the-less it seems that if deployed this would help alleviate other problems. Section 7.4 seems to include a recommendation to do this, so maybe the text in section 4.4 could be reordered to more logically lead the reader to the recommendation in section 7.4? I see that the current draft of draft-ietf-intarea-tunnels agrees on this recommendation. --- Section 4.4/4.7.3: - The description doesn’t talk about the more generic issue that arises when there is no return reachability from nodes in the network when different paths are used in the forward and return directions. This is not quite the same as the issue in 4.4, but it is a different outcome. ---- Section 5.1: “ By contrast, PLPMTUD does not rely upon the network's ability to deliver ICMP PTB messages. However, in many loss-based TCP congestion control algorithms, the dropping of a packet may cause the TCP control algorithm to drop the congestion control window, or even re-start with the entire slow start process. For high capacity, long round-trip time, large volume TCP streams, the deliberate probing with large packets and the consequent packet drop may impose too harsh a penalty on total TCP throughput for it to be a viable approach. [RFC4821] defines PLPMTUD procedures for TCP.” - I think this is a little imprecise, please check RFC4821. I’d suggest something like: By contrast, PLPMTUD does not rely upon the network's ability to deliver ICMP PTB messages. It utilises probe messages sent as TCP segments to determine if the probed PMTU can be successfully used across the network path. In PLPMTUD, probing is separated from congestion control, so that loss of a TCP probe segment does not cause a reduction of the congestion control window. [RFC4821] defines PLPMTUD procedures for TCP.” --- Section 5.1: “While TCP will never cause the underlying IP module to emit a packet that is larger than the PMTU estimate, it can cause the underlying IP module to emit a packet that is larger than the actual PMTU. “ - Is that really true? I agree TCP never emits a normal segment that results in a packet with a size larger than the PMTU estimate, but when performing a probe to increase the PMTU I’d have thought this requires sending the probe with size larger than the PMTU. --- Section 7.1: - Final paragraph does not directly talk about tunnels where it is not possible to support the minimum packet size for the encapsulated packet over a path that only supports the minimum packet size. Maybe as in Section 6, this could refer to INTAREA Tunnels [I-D.ietf-intarea-tunnels], for more general recommendations on what to do? --- Section 7.3 (echoed in Section 7.5): “Middle boxes SHOULD process IP fragments” - Process seems a vague term. … Please check this recommendation does not contradict this for NATs (e.g., RFC 4787). Technical question on Section 7.3. - How is a NAT expected too work with UDP checksum fragments for a zero checksum? - Section 5.3.1 states the NAPT should recalculate based on reassembly of the entire datagram. --- Section 7.4: - Appears to need RFC references to ECMP e.g. [RFC438 and RFC7098], LAG, etc. It may in this case also be useful to comment on GRE in UDP... which clearly had other intentions and where this BCP therefore creates new precedents. - [RFC6438] appears to be cited normatively. --- Section 7.5: "including making sure the network generates PTB packets when dropping packets too large compared to outgoing interface MTU." - As a BCP we need to be clear, this seems open to misinterpretation, because ICMP PTB messages are usually rate-limited (practically this may happen at multiple levels), and I presume this BCP does not seek to change that. Careful wording is hence required here so as not to infer that every "large" packet needs to generate a "PTB Message". --- Section 7.5: " Network operators SHOULD NOT filter IP fragments if they originated at a domain name server or are destined for a domain name server." - It does not say why this specific requirement identifies one service (dns), rather than a general requirement. In particular, does this require devices, such as NAPT and Firewalls to keep state just for this one protocol. if so why? That's not to say I don't agree, it is just I do not see this explained. (Is this motivated by DNSsec or some other use?) --- Section 9: - This appears to be an overall discouragement on use of fragments, whereas the current work on INT Area tunnels appears to say in specific cases fragments are necessary - see overview comment. ------------------------------------------------------ Minor/Editorial notes by section: Section: Abstract - This is short, probably the document would be better with more of a description about what this particular draft describes. This comment also echoes the overview comment saying there is a need to help a reader understand whether they should read this document or draft-ietf-intarea-tunnels or both. ---- Section: 1. "Operational experience [Kent] [Huston] [RFC7872] reveals that IP fragmentation reduces the reliability of Internet communication. This document describes IP fragmentation and explains how it reduces the reliability of Internet communication. " - I argue that this text uses the word "reliabitilty" a little wrongly from a transport perspective. I like the word "fragile" that was introduced and I expect the sentence could be reworked to talk in terms of the "increases the fragility". --- Section: 1. - The following text in the Introduction could be over-stated: “ Furthermore, fragmentation is expected to work in limited domains where security and interoperability issues can be addressed.” - I suggest thinking about whether the word “limited” could be removed from this sentence. Any domain that can resolve security and interoperability issues, can also deploy this. ---- Section 2.1: “ PMTUD is susceptible to attack because ICMP messages are easily forged [RFC5927].” - I am unsure this is the actual problem, so I'd argue against this text. To me, the security-related problem is that the source node that receives the PTB message does not or can not authenticate that the PTB message originated from a node that actually was on the path being used. (The topic of validating ICMP is further discussed in draft-ietf-tsvwg-datagram-plpmtud.) --- Section 2.1: "...because Internet paths are dynamic, PMTU is also dynamic." - The current text should be improved: I think I understand that this means the set of links and routers forming a path can change (due to link failure, load sharing, re-configuration, etc). When the path changes, the PMTU can also change. --- Section 2.1: - Note 1 describes ICMP for IPv4, but does not refer to RFC1812. This is the place where IPv4 routers are required to return a quoted packet with as much of the original datagram as possible without the length of the ICMP datagram exceeding 576 bytes. Please add this as reference, since it is important to validating that ICMP PTB messages are received on-path. ---- Section 4: - The title implies this is about reliability, although I would argue that this is not the case, and the word is mis-used. Would "Increased Fragility" work? or something else that does not imply the communication is less reliable, but it could do for a UDP transport that is unable to recover, or a TCP session that is black-holed (i.e. unusable) but still reliable as a service (bad stuff is not delivered). However that's kind of a higher-level issue. --- Section 4.1: "IP Fragmentation causes problems for routers that implement policy- based routing." - Is it always true? This statement seems incorrect, although I would agree with the assertion that problems can arise when fragmentation is used and policies refer to information only available in the first fragment. --- Section 4.2: - This section does not refer to the appropriate transport area drafts issued by the BEHAVE working group. - It would be useful to refer to RFC 5508 on the ALG requirements for a NAPT processing ICMP and probably should refer to Hairpinning Support for ICMP Packets. -[RFC4787] is a BCP describing NAT handling of IPv4 Fragments. I suggest you also check [RFC6864], which is a PS describing the update to the IP-ID, and rewriting middleboxes. --- Section 4.2: “Virtual reassembly in the network is problematic” - I agree, but I suggest the document explains the meaning of “virtual reassembly” and why this is “problematic”. --- Section 4.4: Equal Cost Multipath (ECMP), Link Aggregate Groups (LAG) - Some RFC references would be good here to ECMP, LAG, etc. --- Section 4.4: "it will assign all fragments belonging to a packet to the same link." - Section 4.1 describes this in terms of a FIB entry, is that different to a link here? or the same? --- Section 4.6: “NIDS aims at identifying malicious activity” - please define “NIDS”. --- Section 4.6: "Each IP fragment contains an "Identification" field that destination nodes use to reassemble fragmented packets. Many implementations set the Identification field to a predictable value, thus making it easy for an attacker to forge malicious IP fragments that would cause the reassembly procedure for legitimate packets to fail." - This appears to describe a broken pathology, rather than a RFC recommendation. I think there is always a danger with such wording that people cite the pathology as acceptable. I would strongly encourage the text to be reworked to explain that RFCxxxx requires xxxx, and then state the issue. - In this case I suggest you also refer to the specification in [RFC6864]. ---- Section 4.7: "It is subject to both transient and persistent loss." - Please reword to specify what "it" is. - This section needs to define what transient loss means before section 4.7.1. I suggest this is subtle, since there are various reasons why a specific ICMP message may not be received. --- Section 4.7.1. - I suggest you also add link loss, mentioning wireless networks as an example of links where propagation conditions could in some circumstances prevent delivery. --- Section 5.1: “Stream Control Protocol (SCP) [RFC4960]” - Is this intended to be SCTP? (The reference suggests this). --- Section 5.1: The PLPMTUD method can also can be used with [I-D.ietf-tsvwg-udp-options]. ---- Section 5.2: " [RFC8085] recognizes that IP fragmentation reduces the reliability of Internet communication. " - I'd argue that it "can reduce the reliability and efficiency of communication", rather than it does. --- 6.1 DNS - Please expand “OSPF” and cite an RFC. --- 6.2. OSPF - Please expand “OSPF” and cite an RFC. --- Section 6: - This speaks about packet-in-packet encapsulation, but does not explicitly include tunnels in the title. The distinction may be intentional since another ID targets BCP advice on tunnels. However, I think the draft needs to have a section title that notes the differences for tunnels and points to that draft, or this section needs to include tunnels as well as endpoint encapsulation. --- Section 6.4: - I found the title of this section a little odd. Is it possible that the title here could be changed to reflect "Other upper layer protocols that rely on fragmentation" and then cite this (slightly esoteric) link protocol as another example. I am pretty sure that we don't wish to enumerate every example of a UDP app that relies on IP fragmentation to work, so an example seems a good way to note this. - It may also be quite wise (?) to note that Section 3.2 of RFC8085 is a BCP that discourages this design, so that this is not used as a BCP to justify this design. --- Section 7: "and ensuring that the transport protocol in use adapts its segment size to the MTU." - This needs to be "Path MTU", they all adapt to the local link MTU. --- Section 7: - The final sentence should include a reference to PLPMTUD and DPLPMTUD. --- Section 7.1: - This is restating Section 3.2 of RFC8085 - which should be cited. The text there states: " Applications that fragment an application-layer message into multiple UDP datagrams SHOULD perform this fragmentation so that each datagram can be received independently, and be independently retransmitted in the case where an application implements its own reliability mechanisms." --- Section 7.1: "Developers MAY develop new protocols or applications that rely on IP fragmentation." - How can this be enforced? (Is this really a "MAY" in RFC 2119?, or is it just the result of not taking the SHOULD in RFC8085?) --- Section 7.3: - Although not strictly necessary, I would recommend avoiding the doubt when this BCP is (mis)quoted, by explicitly changing /may/ to /could/ in the following sentence: "These stateless middle boxes may perform sub-optimally, process IP fragments in a manner that is not compliant with RFC 791 or RFC 8200, or even discard IP fragments completely." --- Section 7.3: " Protocols may be able to avoid IP fragmentation by using a sufficiently small MTU (e.g. The protocol minimum link MTU)." - In previous WG documents, I've cited RFC8085 section 3.2 to assert this. I think this should be referenced here. That states: "Applications that do not follow the recommendation to do PMTU/PLPMTUD discovery SHOULD still avoid sending UDP datagrams that would result in IP packets that exceed the path MTU. Because the actual path MTU is unknown, such applications SHOULD fall back to sending messages that are shorter than the default effective MTU for sending (EMTU_S in [RFC1122])." --- Section 7.4: - This appears to reaffirm the existing advice in [RFC438] and [RFC7098] and these need to be cited. --- Section 7.5: - This lacks RFC references, but names RFCs - should these be changed to citations? --- Section 9: Security Considerations. - Many cited RFCs not security-related issues directly applicable to the use of fragments. It would be really helpful if key security recommendations are referenced in the security considerations section. ------------------------------------------------------ References: * Key documents are not referenced. * Normative requirements are placed on documents that are not listed as normative. These include two drafts currently awaiting WGLC: draft-ietf-intarea-tunnels draft-ietf-tsvwg-datagram-plpmtud * Additional Suggested References: [RFC4787] is a BCP describing NAT handling of IPv4 Fragments [RFC5722] is PS and appears to be something that could usefully be cited normatively in a BCP on this topic. [RFC6438] is a PS that appears to be cited normatively. [RFC6864] is a PS describing the update to the IP-ID, and rewriting middleboxes. [RFC5508] is a BCP that describes the ALG requirements for a NAPT processing ICMP.