I am the assigned Gen-ART reviewer for this draft. The General Area Review Team (Gen-ART) reviews all IETF documents being processed by the IESG for the IETF Chair. Please treat these comments just like any other last call comments. For more information, please see the FAQ at . Document: draft-ietf-anima-grasp-api-07 Reviewer: Paul Kyzivat Review Date: 2020-10-28 IETF LC End Date: 2020-10-28 IESG Telechat date: ? This draft is on the right track but has open issues, described in the review. General: I began this review without any knowledge of Anima. I did read several of the related documents for context, but my overall understanding remains somewhat sketchy. So take my comments with with that in mind. (At least you get a fresh, untainted perspective.) Issues: Major: 5 Minor: 6 Nits: 2 1) MAJOR: Unclear model for API function usage As I read through sections 2.3.2 through 2.3.6 these I got progressively more confused. I finally concluded that a big picture of how API functions interact is lacking. For one thing, there isn't a clear delineation between those calls used by requesters and those used by responders. And then valid sequencing of calls is hard to tease out. It would be helpful to have one state model for requesters and another for responders. It may also be helpful to break this out for threaded and event loop variants. My subsequent issues regarding these sections are reflective of this confusion. 2) MAJOR: What state does GRASP retain for Objectives? It is clear that the GRASP must retain the names of registered objectives. There are implications that it must keep more. Seemingly it must retain received flooded objectives, on a per-source basis. It is unclear if it is only for registered objectives, or also for unregistered objectives. This could potentially become a large resource drain if there are lots of nodes flooding values. Negotiated objectives seem to be treated differently. It isn't clear to me if GRASP needs to retain copies of their values. 3) MAJOR: Consistency of Objective definitions In section 2.3.1.2 and elsewhere, presumably all parties that use a particular objective must agree on the values of synch, neg, dry, and the format of the value. Apparently you are relying on each caller getting this right. What happens when this isn't the case? How can blame be ascribed so that the problem can be fixed? 4) MAJOR: Confusing semantics of 'request_negotiate' In section 2.3.4 I don't understand the following: 1. The 'session_nonce' parameter is null. In this case the negotiation has succeeded immediately (the peer has accepted the request). The returned 'proffered_objective' contains the value accepted by the peer. IIUC this requires a network exchange with the peer. I don't see how this can complete *immediately*. ISTM that this could only complete immediately if it were satisfied from a local cache. That doesn't seem appropriate for this function. Similarly, in bullet 2 I don't see how the proffered_objective would be available in the initial call. Or is this text only applicable to a threaded implementation with synchronous calls? Bullet 2 also says: ... The returned 'proffered_objective' contains the first value proffered by the negotiation peer. The contents of this instance of the objective must be used in the subsequent negotiation call because it contains the updated loop count, sent by the negotiation peer. Do you mean this must be used in the call to negotiate_step? But that would mean asking for what has already been granted. For the the negotiation to be useful I would expect that the next round would ask for a value somewhere between what was originally requested and what was initially offered. I think you need to say more about the whole concept of negotiation and how it is expected to work. Also, additional discussion of the purpose and semantics of dry run negotiation. One more thing: you don't explain the semantics of 'timeout'. Is it only used locally, to terminate the synchronous form of the call? Or is it transmitted and used by the responder to control how long it might spend trying to fulfill the request before giving up? 5) MAJOR: Confusing semantics of 'negotiate_step' In section 2.3.4, I'm confused by: Called in the same thread as the preceding 'request_negotiate' or 'listen_negotiate' I understand use of this following request_negotiate, but not with listen_negotiate. A negotiation should consist of an ordered sequence of "rounds" of bidding. Allowing both requester and responder to initiate a next step can lead to a disruption of the ordering. I would expect this to only be used by the caller of request_negotiate. What am I missing? 6) MINOR(MAJOR?): Confusing semantics of 'negotiate_wait' Regarding section 2.3.4: I understand that the GRASP Confirm Waiting message (from [I-D.ietf-anima-grasp]) and hence 'negotiate_wait' is only used by the responder. How is the effect manifested in the requester? Is a synchronous requester just silently extended? What about an event loop requester? 7) MINOR(MAJOR?): Role of 'peer' in 'synchronize' The description of 'synchronize' (in section 2.3.5) says: - If the objective was already flooded, the flooded value is returned immediately in the 'result' parameter. In this case, the 'peer' and 'timeout' are ignored. Do you really want to ignore 'peer' in this case? I infer from the description of get_flood that the GRASP retains flooded values separately for each flood source. So you should have the capability to return only a value flooded by the requested peer. 8) MINOR(MAJOR?): Confusing semantics of 'flood' The descriptions of 'flood' in section 2.3.5 seem to imply that the GRASP in each of the nodes receiving the flood saves the value, for use in get_flood and synchronize. Is that right? (Actually, the description of get_flood implies that flooded values are saved separately for every flood source!) Are only registered objectives saved? Or must the GRASP save all the values from every flood that it receives? What if one flood contains X, Y, and Z. Then a later flood from the same source contains only X and Y. Presumably the new values of X and Y replace the old ones. Does GRASP retain the value of Z, or discard it? ISTM (based on my limited understanding) that it would make more sense for the GRASP to simply cache all objectives that have been registered and/or received as a result of synchronization, together with their TTL. And then subsequent requests would be satisfied from the cache as long as the TTL hasn't expired. 9) MINOR: Clarify Session Section 2.2.3 implies that Session is a full fledged entity. It would be helpful to flesh this out in more detail. E.g., describe it as a distinct entity in the API and include a state model. 10) MINOR: Representing mutually exclusive items with booleans In section 2.3.1.3, 'synch' and 'neg' are mutually exclusive. By representing them as independent booleans you introduce the possibility of invalidly setting them both or neither. ISTM it would be better to represent these as a single value (enumeration). A similar situation occurs in section 2.3.1.4. 11) MINOR: Managing thread for 'listen_negotiate' In section 2.3.4, suppose the ASA wants try to ensure that it always has a listener ready for an incoming request. IIUC the only way to achieve that is to create the maximum number of threads it is willing to devote and set them all to listening. That is sub-optimal. It would be better if it could somehow learn that another one is needed and create it. Some additional API mechanisms would be needed to support that. 12) NIT: Idempotence Bullet #2 of section 2.2.1 says: To facilitate this, the API implementation would provide non- blocking versions of all the functions that otherwise involve blocking and queueing. In these calls, a 'noReply' code will be returned by each call instead of blocking, until such time as the event for which it is waiting (or a failure) has occurred. Thus, for example, discover() would return 'noReply' instead of waiting until discovery has succeeded or timed out. The discover() call would be repeated in every cycle of the main loop until it completes. Effectively, it becomes a polling call. This seems to contradict the earlier statement (in 2.2) that the calls are not idempotent. Section 2.2.2 tries to clarify this, but it is not very clear. 13) NIT: Confusing terminology in 'discover' In the description of 'discover()' (in section 2.3.3) the 'age_limit' parameter is confusing. At the least the name seems wrong for the described semantics. The implication is that an expiration date/time is what GRASP stores, and that remaining time until expiration is what is being used here. It is hard to see how it is appropriate to call that an "age". ISTM that would better be called a TTL, and that is parameter might then be called 'minimum_TTL'.