I have reviewed this document as part of the security directorate's ongoing
effort to review all IETF documents being processed by the IESG. These comments
were written primarily for the benefit of the security area directors. Document
editors and WG chairs should treat these comments just like any other last call
comments.

Bidirectional Forwarding Detection (BFD, RFC 5880) is used to monitor "data plane" paths by sending BFD packets between "forwarding engines" and detecting failures when they are not correctly received. For example, its Echo function will trigger ping-like replies to check that a path is valid in both directions. The BFD Stability draft describes extensions to BFD to not just detecting failures when a sufficient number of BFD packets or echoes are not received, but to also quickly detect and report packet losses. When losses are detected, they are reported in the Yang module ietf-bfd-stability, which is defined in the draft.

There is no packet sequence number in the "Generic BFD Control Packet Format" defined in section 4.1 of RFC5880. The BFD stability draft uses instead the Sequence Number defined in the authentication extension to BFD. RFC5880 defines five types of authentication: simple password, keyed MD5, "meticulously" keyed MD5, keyed SHA1, and "meticulously" keyed SHA1. Of those, only the MD5 and SHA1 variants include a 32 bit sequence number, and only the "meticulous" variants require that the sequence number should be incremented for each packet. The draft requests that if authentication is used, then either the MD5 or SHA1 "meticulous" variant be selected. The draft also defines a NULL form of authentication, in which the authentication extension only carries a sequence number. The loss detection can then use the sequence number in the authentication extension.

The authentication sequence number is a 32 bit field. Such numbers can roll over, either after a long duration session or due to a packet injection attack. There is some text about that in the description of the NULL authentication. It says:

   If bfd.AuthSeqKnown is 1, and the received Sequence Number field is
   not equal to bfd.RcvAuthSeq + 1 (in a circular number space), then
   the loss count is incremented by one and bfd.RcvAuthSeq is set to the
   received Sequence Number.

That does not look quite right. Suppose that due to out of order delivery, the packets are received in order 1-3-2-4. Upon reception of packet 3, the algorithm counts one loss and set the next expected value to 4. After packet 2, another loss and expected value to 3. After packet 4, another loss and expected value to 5. So, three losses when none actually occurred.

In RFC 5880, the specification of Meticulous Keyed MD5 addresses both number rollover and out of order delivery. The same text is repeated for meticulous MD5 and meticulous SHA1:

      ... if the
      sequence number lies outside of the range of bfd.RcvAuthSeq+1 to
      bfd.RcvAuthSeq+(3*Detect Mult) inclusive (when treated as an
      unsigned 32-bit circular number space) the received packet MUST be
      discarded.

That means that if a the packet 1-2-3-4 are delivered out of order as 1-4-2-3, then packets 2 and 3 are going to be ignored and 1 loss will be detected. That's probably not right.

For packet loss detection, the state of the art is described in RFC 8985, the RACK-TLP Loss Detection Algorithm for TCP, and in RFC 9002 which describes how to use RACK-TLP for packet loss discovery in QUIC. I don't understand why the BFD Stability draft does not reference the RACK-TLP algorithm and RFC8985. The "ad hoc" BFD loss detection algorithm description ends up repeating a lot of points commonly discussed in the specification of RACK: detecting packet losses by monitoring holes in sequence numbers, the interaction between this detection and out of order delivery, or role and computation of timers. The BFD stability draft would probably be more robust if it merely adapted the algorithms defined in RFC8995.

Copying algorithms from RFC8985 or RFC9002 would probably also diminish the reliance on timers, allow use of adaptative timers instead of relying on timers set by management procedure, and allow reporting of RTT statistics in the Yang module.

The security considerations acknowledge that using NULL authentication will allow attackers to mimic out of order delivery and thus cause spurious loss detection. Indeed, this NULL Authentication procedure makes me cringe. Why are we even defining that today, in 2024? Naive features like that can easily turn into a CVE if the attackers used them as steps in an attack chain, causing spurious error detection to convince routers that a path is not usable anymore, and thus maybe causing connectivity to  a target to fail for the short time necessary for an attack. It would be much safer to just require the MD5 or SHA1 variants, and not define NULL authentication at all.

But MD5 and SHA1 authentication can also be attacked, because they do not protect against replay attacks. If the sessions last long enough for the sequence number to roll over, a patient attacker could record old packets and replay them after the rollover. If the repeated packets pass the "range" test, the bfd.RcvAuthSeq value will be modified, causing "good" packets below the new range to be ignored. Repeat that a couple of time, and the BFD process will be effectively disabled. I think that the solution is to not allow rollover, or maybe require use of new descriptors if too many packets have been exchanged. In any case, the security section should discuss the issue.

Of course, the security of MD5 and SHA1 is also suspect. There is no indication yet that SHA1 with the HMAC construct used in RFC5880 is broken, but the general rule is no avoid SHA1 if we can -- and certainly avoid MD5. I don't know the deployment requirements, but it would be nice if BFD allowed stronger options. Surely, some deployments will want to use different authentication methods in the future. Should there be some guidance on the development of these future extensions, such as requiring that they provide a sequence number similar to the "meticulous" variants?

At this point, let me indulge in a personal aparte. Why do we even have a specific BFD protocol? I suppose that it started with the idea of using something light weight but better than old fashioned ping. But if I read the various RFCs, I see the definition of session setup, the requirement for congestion control, the definition of timers, and now the addition of packet loss detection. There may be more coming if we want to address sequence number rollover, more precisely handle out of order packets, maybe add timestamps. As BFD is used to monitor data plane quality, using the same transport protocol as applications would make sense. These functions are all well established in transport protocols. I understand that TCP+TLS would not be practical, but we are not in 2010 anymore and we have more options. For example, it would be trivial to define "BFD over QUIC", while providing modern security, congestion control, timer management and loss detection. A BSD over QUIC application could update the Yang module just like BFD does, while providing much better security!