Minutes of Rserpool IETF #54 Japan Monday, July 15, 2002 1530-1730 Co-chairs: Lyndon Ong, lyong@ciena.com Maureen Stillman, maureen.stillman@nokia.com Approximately 40 people attended this meeting. Agenda 1) Interim meeting - Maureen Stillman 2) Rserpool services and TCP mapping - Peter Lei http://www.ietf.org/internet-drafts/draft-conrad-rserpool-service-02.txt http://www.ietf.org/internet-drafts/draft-conrad-rserpool-tcpmapping-00.txt 3) ASAP - Peter Lei http://www.ietf.org/internet-drafts/draft-ietf-rserpool-common-param-00.txt http://www.ietf.org/internet-drafts/draft-ietf-rserpool-asap-04.txt 4) Reliable Server pool applicability statement - Lode Coene http://www.ietf.org/internet-drafts/draft-coene-rserpool-applic-00.txt 5) ENRP - Lode Coene http://www.ietf.org/internet-drafts/draft-ietf-rserpool-enrp-03.txt 6) Comparison document - John Loughney http://www.ietf.org/internet-drafts/draft-ietf-rserpool-comp-04.txt 7) Closing remarks co-chairs 1. Interim meeting - Maureen Stillman A Rserpool meeting was held in Herndon, Virginia in May. Extensive meeting notes were published and sent to the list. Major Rserpool addition - primitive services for the applications were added by request of John Loughney. We ask that you read the services internet-draft document and determine if the services are useful to the application. There was some discussion on how to register a PE with an ENRP server. Should the PE send separate registrations for each transport, or send a list of transports supported (SCTP/TCP)? Terminology consistency across the Rserpool documents will be important. Results of the interim meeting were documented in internet drafts released before this meeting. There are more updates that will be released after the Japan meeting. New internet-drafts and Rserpool milestones A number of new internet-drafts have been generated as a result of recent changes to the Rserpool protocol. In addition, we have generated a security threat internet-draft. The area directors have advised the chairs to revise the Rserpool milestones and include these new documents. Upon their approval, these new documents can be added as WG items. The Rserpool services document open issues were discussed. The WG was instructed by the chair to review this document and determine if these are the correct set of services to be offered to the applications. Comments should be forwarded to the list on this topic and on the specific open issues. 2) Services - Peter Lei http://www.ietf.org/internet-drafts/draft-conrad-rserpool-service-02.txt This internet draft is not finished product. There have been major change to the two modes defined previously. The original name service and failover were too constraining. Instead the service document will focus on a menu of services for the application. The goal is to define primitive services for the application. Application layer TCP and UDP interactions between PU and PE to be supported. The document describes two scenarios that satisfy very different requirements. Transport layer mappings are what the transport layer must provide to the ASAP and upper layers. Some of the issues concerning what to provide with those mappings follow. The mapping can provide services for automatic or non-automatic failover. In other words, the failover can be done without the application layer getting involved or only if the application layer is programmed to perform failover. There is an issue concerning the fact that the transport layer ACK does not necessarily mean that the application received the data. Therefore an upper layer/application level ACK is a significant reliability issue. Do we need both application layer and transport layer ACKs? How will Rserpool do retransmission/congestion control otherwise? SCTP allows you to retrieve message that are queued but not sent and sent but not ACKed. We need to add support for these services to the TCP mapping layer. Both control and data need to be exchanged between PUs and PEs. The issue is whether or not to multiplex control and data streams. Should this be optional or required? TCP transport mapping requirements. The TCP mapping must support message framing; retrieval; heartbeat and streams. These features present in SCTP and need to be added to TCP using mapping. Open issues on requirements: Do we need tunable timers? Tunable number of retransmissions? Use of Transmission Sequence Number (TSN) with upper layer acknowledgements? TCP Mapping - Peter Lei http://www.ietf.org/internet-drafts/draft-conrad-rserpool-tcpmapping-00.txt This internet draft describes a mapping layer over TCP which provides the application SCTP-like features. It requires some updating of references and it will be released again after this meeting. 3) ASAP - Peter Lei http://www.ietf.org/internet-drafts/draft-ietf-rserpool-common-param-00.txt http://www.ietf.org/internet-drafts/draft-ietf-rserpool-asap-04.txt The common parameters draft was created to centralize terms and definitions. This avoids the problem of them getting out of synch in various documents. Open issues concerning business cards were mentioned as described in the architecture and services document. Business cards provide common search order for finding an alternate middle server in the case of a tandem relationship where the middle server fails. An example based on prior experience was a CDMA service was discussed in detail at the interim meeting. The details of the business cards need to be finalized. The latest ASAP draft added support of other options for PU-PE transport, notably TCP. A Registration_Response message was added and a re-registration procedure has been defined. Next steps for the document are to separate into subsections, PU-ES vs. PE-ES services. Application layer ACKs will be added. Details of "business card" idea ( failover of tandem PU-PE cases) will be fleshed out in the next version of the document.. 4) Reliable Server pool applicability statement - Lode Coene http://www.ietf.org/internet-drafts/draft-coene-rserpool-applic-00.txt The group did not consider this as a WG item yet as it is incomplete. We will hold off on agreement as WG item as we discuss with A-Ds new documents for the Rserpool WG milestones. 5) ENRP - Lode Coene http://www.ietf.org/internet-drafts/draft-ietf-rserpool-enrp-03.txt Open issues in ENRP including PE registration. The issue is whether a PE that supports both TCP and SCTP should register once with an ENRP server with a list of transports that it supports OR the PE should register once for each separate transport. There are pos and cons to each approach. A good analogy is that the PE is an animal with several legs. Each leg represents a different transport address. If the PE registers once, then if one leg of the animal is dead, you can infer that the whole animal might be dead and avoid that PE on failover. However, if the PE registers each leg separately, then you cannot tell which legs belong to the that animal. You will just try some other leg. In doing so, you might get the same PE (animal) but use a different transport. Chances are that it is also dead. A smarter strategy is to try another PE (animal) first and then if you exhaust all other PEs (animals), then go back and try another leg on the first animal. Animal might be ill, but not dead. Algorithms for maintaining the consistency of the ENRP database were discussed. Some issues are the synchronization of servers and auditing of the name space -- should it be a robust or simple mechanism? ASAP and ENRP protocol development will continue. 6) Comparison document - John Loughney http://www.ietf.org/internet-drafts/draft-ietf-rserpool-comp-04.txt Version 4 is now the latest. It describes the history of why this protocol versus DNS, Corba, SLP, etc. There have been several editorial changes, plus a change to the text on CORBA. There are several CORBA issues. It is complex compared to IETF protocols and hard to fit together. There are some vendor dependencies (ORB vendor). There is limited applicability document or interest from the CORBA group to explain how it could be used for reliable server pooling. It is the opinion of the chairs that this document is ready for last call as an Informational RFC. This last call will begin on the list following IETF #54. 7) Closing remark on UDP - co-chairs We met with the area directors and discussed the use of UDP in Rserpool. Under their guidance, we have decided not to support UDP as part of the Rserpool infrastructure. What this means is that UDP can be used by applications in PU-PE communications, but not for communications between PU-ENRP servers or PE-ENRP servers. All ENRP server communication must be done with SCTP or TCP as transport. As a result, we will not define a UDP mapping for Rserpool. The rationale for this decision is the concern that UDP does not support congestion control mechanisms. This causes us to discourage its use in defining new protocols. This philosophy is discussed in RFC 2914 entitled "Congestion Control Principles". End of minutes ---------------------------------------------------------- Rserpool Interim meeting minutes - May 29-30 Herndon, VA We would like to thank the host, Cisco and Ken Morneault for working out the details. There were many action items generated as a result of this meeting. The group felt that we made significant progress. A new set of internet drafts will be generated as a result of these discussions. The target deadline is by the end of June so that there will be time for review of the drafts before we meet in Japan at IETF #54. Please feel free to comment on these minutes and continue discussion of the internet drafts on the list. Please understand that any consensus reached in this meeting is not regarded as the full consensus of the Rserpool WG. All new internet drafts generated as a result of these discussions are subject to review, comment and modification. Attendees: Lode Coene Phillip Conrad Michael Tuexen Randall Stewart Peter Lei Ken Morneault Jaiwant Mulik Maureen Stillman, co-chair Qiaobing Xie Minutes by Lode Coene and Maureen Stillman Wed. May 29 9AM Introduction and Agenda bashing Wed. AM 1) http://www.ietf.org/internet-drafts/draft-conrad-rserpool-service-00.txt Phillip Conrad Wed. PM 2) http://www.ietf.org/internet-drafts/draft-ietf-rserpool-common-param-00.txt http://www.ietf.org/internet-drafts/draft-ietf-rserpool-asap-03.txt Randy Stewart Wed. PM 3) http://www.ietf.org/internet-drafts/draft-ietf-rserpool-enrp-03.txt Qiaobing Xie Thurs May 30 AM 4) http://www.ietf.org/internet-drafts/draft-ietf-rserpool-arch-02.txt Michael Tuexen Wrapup 11AM 1) draft-conrad-rserpool-service-01.txt draft-conrad-rserpool-tcpmapping-01.txt Goal: Support transport other than SCTP for devices without SCTP yet PU-PE communication PU-ES communication Could be SCTP and TCP -- both can be supported by a single PE. Provide a migration path, via adaptation The adaptation layer also clarifies the service to the upper layers. Assumptions - certain failover model is assumed by the drafts - that particular failover model relies on certain features of SCTP which are not present in other transport protocols (example message delimitation is not present in TCP) - define an adaptation (TCP) with shim layers. For other protocols, such as the GRE-GPRS IP tunneling protocol is not easy. - open issue: tunable timers for heartbeats, number of retransmissions - streams may be necessary for ASAP control and application multiplexing - service mode and failover service - new service primitives for failover mode: failover callback: a hook for doing the changeover - is a local function done in user space. should be provided by the application - should send a message(or more) - protocol!? No consensus reached. This is to be determined. Note: although failover is in scope, state sharing is out of scope. Here is where the two features meet. - nameservice mode translation call poolhandle -IP address, pool element handle Features - message orientation - heartbeating - tunable timers and parameters for heartbeat - detect when a failure has happened in a more timely fashion - adapt with a shim layer to do what we need it to do - TCP is OK but for other protocols this is not so easy - report failure and pick the next server Name service mode defined: Easy migration path. Take an existing application Do a couple of tweaks. Don't have to change their application layer at all. Name a group of servers. Use server selection algorithm. Report failures. Add a few primitives. Tell the ASAP layer, this is the pool that I want, please give me a server. This server I was talking to failed, please give me another one. State sharing out of scope, but Rserpool provides hooks: Callback (optional): Failover from this PE to that PE Look at whatever state I have in the app that I need to send to my server to establish the context to properly interpret callback part of application. Could be a return OR could be -- here is a stateblob ASAP can use retrieval to get unacknowledged messages will resend them after I get the new pool element call back will get invoked and maybe send a message or 2 There is a rich set of valuable services that they can get using Rserpool. However, in name service mode they won't get everything. For example, they won't get auto failover. - already existing applications using the ASAP and ENRP will NOT get automatic failover: needs changes to your application program. - new applications will get the "whole" Rserpool deal if they choose to do so. Provide an easy migration path where you don't have to change the application transport layer interaction much. We discussed the fate of the services document. Should it be merged with ASAP, architecture or informational on its own? We didn't come to any consensus on this. (Editor's note: for now we will just leave it as is) Failover mode defined: New apps developed. Willing to go and make some mods to the application. Failover mode. Requires certain things be there in the underlying transport. Name Service mode: need full ASAP implementation Don't need mapping protocol if you use name service mode With the exception PU to ES over TCP mapping layer may be there (??) open issue Mapping is not used for the PU to PE communication Action item: Section of the text to delete: on selection of pools Common parameters document - Qiaobing Xie - common API for talking between PU and PE: is not a protocol specification yet Commmon API is nice idea Issue with retrieval: ACK sent back after it is delivered by the kernel this does not mean that it got to the application - generic acknowledgment problem: if a msg is acknowledged by a layer, that does not mean that the layer above you has received it, processed it or done something with it. MGCP request-response messages In some protocols you already have application level ACKs such as MGCP So they do only need name service part of rserpool. Rserpool is a framework from which you can choose. - notion: if failover occurs, retrieval and send msg to other server. Retrieval is a optional service, SCTP retrieval is broken.-msg is only acknowledge by its own layer, the upper layer retrieving the msg doesn't know which msg was received by it peer layer, except by explicit own layer acknowledgement. (otherwise msg duplication or loss....). The retrieval service provided (based on application ack) by Rserpool is a greater convenience than SCTP retrieval Retrieval function in ASAP can be turned on or off. If your protocol already supports application level ACK then you can turn it off MGCP is a good candidate for name service only mode. Some application have application acknowledgement in its protocol(example MEGACO), so they do only need name service part of rserpool. Rserpool is a framework from which you can choose. The group recognized there is a problem: Currently, in the ASAP document there is a flag that allows you to turn on the retrieval inside ASAP. This feature is optional. Due to the acknowledgement problem it is broken. Options are: 1) Fix it. 2) Don't use it. 3) Use it, but know it is broken. 4) Use application layer ACKs and take it out Fix it. If you already have app set, then use name service mode, but don't enable retrieval. Broken -- when ASAP does retrieval When the ASAP gets back an ACK from SCTP, this tells me it got part of the way there, but not all the way there. Want to know did they get all the way there. Did the application actually get the message and process it? We have a vague notion of how retrieval will work. Some applications don't need it. For some applications (protocols) it is better to lose messages than to process a message twice. App has to deal with message loss. Can't find a solution??? One option is to let the app call back mechanism handle this everything you don't have an ACK for try again debating a failover service edge of the charter, not clear where the boundary is solving the problem of transaction integrity providing some hooks how far to go and what those hooks should be where we are today isn't good need to go further or back away now at a place that doesn't make sense We don't want to give people the illusion that we have solved the problem when we haven't understand the issue keep it as separate as possible don't want to put them together separate issue -- have a boundary in the data send there is a flag that says transmit message again on failover to alternate undefined whether the application has the opportunity to respond to the failover indication to send messages before the retransmission of the retrieved messages implied: -- failover callback happens before you send the retrieved messages - fail-over convenience may be supplied by a separate protocol or within ASAP. - failover cannot be applicable for every application protocol. - on PU-PE : failover indication can go to the application 3 possibilities: option 1: - ASAP can do SCTP based retrieval - indicate to ASAP if msg sent, set flag, transmit msg again on failover. option 2: - no automatic retrieval - SCTP retrieval is possible but not full(optional) option 3 - no automatic retrieval - do application level acks(Rserpool based retrieval) option 4: - automatic retrieval with sendover flag and add application ack. application with application based ack should be supported by ASAP. Rough consensus for #4. Action items: 1) Write up options 1-4 for retrieval and post it to the list -- Phil Conrad 2) Phil and co-authors: Document iteration -- service and tcpmapping documents will both be updated to reflect the discussions This should be done sometime in June so we can read them before the July IETF meeting. --------------------------------------------------------------------------------------------------- 2) ASAP There was a discussion on the organization of the ASAP document with the goal of making it easier to read and understand. Section 1 is PU-ENRP interactions: name resolution, server hunt Section 2 is PE-ENRP interactions: registration/deregistration, server hunt A) split section 1 and 2 into two separate documents B) reorganize current ASAP document into Section 1 followed by Section 2 B is the rough consensus Action item: Randy Stewart will reorganize the ASAP document. business card (PECP- pool element control protocol) application ACK Should these be put in a separate document? Procedures how you do it what you do with it specify message format how you handle it on both sides Mapping document will determine this. Business card == Pool handle Death wish list How business cards work: PU A - - PU B ---- PE 1 ----------- PE 2 - PU C - CDMA example (see picture above): A, B and C are all using PE 1. If PE 1 dies, may be a requirement that they all need to failover to the same PE. Furthermore, they are synchronizing to PE 2. So PE 2 will also need to know which PE to contact if PE 1 dies. PE 1 will send business cards to PU A-C and to PE 2 so they can re-synch if PE 1 fails. PE Sends a business card to the PU that says when I die, here is the preferred list of PEs to contact in this order. Should business cards be sent in band or out of band? Can send a business card at any time. You can replace a business card with a new one. data transfer: done via the classical transport layer PECP transfer: use SCTP between the PU and PE, out-of-band information transfer or use the mapping if SCTP is not present. Bussiness card contains a last will which can be used if 2 PU can meet on the same PE element(normally, the rendezvous is NOT coordinated as ENRP will supply different lists to different requests.) a element of the wish list will first be checked with the cache. Wish list should be a option. No wish list means use ENRP, else use the info supplied in the bussiness card. The rendez vous point can only be determined by the PU/PE: needs also to distribute across the remaining PU/PE otherwise a single PU/PE will get all the traffic from the failed one. coupling can be done explicitly or implicitly. Application has to couple the 2 sessions(via ASAP primitive) and says PU clone this bussiness card from the PE. -------------------------------------------------------------------------------------------------------------- 3) ENRP How to synch up after the namerservers have been separated from each other? At present no mechanism exists to repair this. database can be check summed and send to other server. If the checksum doesn't match(ENRP servers are fully meshed and has the same state) then this must be repaired. (Audit) Same problem as with a accidental disjunct namespace. check my view with the view of the owner on the owner terms. Owner sends to other PE and the hash. If the hash doesn't match then PE must be inserted in the database or the hash is different and the owner is requested to send the info to the other duplicate PE identifiers and Server identifiers: the last will override the previous definition but that isn't a big problem. It is just a replacement of one with another. But if it is done on 2 different servers, then a resolution has to be made to "allocate" the identifier. identifier is unique within the pool. identifier can be made up of a part server id and PU id. use good random number generator. Easy one can be detected: on the same ENRP server, clashing identifier can be detected(reject the 2 announcement). Case for multiple servers is harder to resolve and will occur with a very low chance. How to mend two pieces of a split name space: ENRP servers get data from all other ENRP servers? Might be very chaotic. Everyone downloading information from everyone else. Form a spanning tree? This adds complexity. How to audit the namespace Use hash function or checksum. Only compare what I own. I send a PE ID and its hash. Audit will take care of the split network issue. Assuming that we have reliable transport, simple procedure that you have the same view or different view. If you have different view you have to exchange your data. Goal is: ENRP servers are fully meshed and all have the same view. If everything is fine that is what we have. 2 lists List A agreement on these List B you have I don't OR I have but you don't Anyone on list B is added with a flag. Flag says you need to sanity check this guy. Fails or succeed. What about you can reach me and I can't reach you. Some ENRP servers will have different views from others. Network partition will have different views. Another case is we can talk ENRP A and B, but I can talk to PE X but you cannot. One possibility: Both agree that you can reach him and I can't. Should survive a network problem. A fact of life that routing protocols have problems. Unfortunate but not unexpected. Every time you do the audit that all this signaling is done because we don't agree. Don't refer any PU to Ken. Reachable/unreachable flag in the data base. If I ask Randy he is reachable if you ask Ken, he is not reachable. Who do I trust? No perfect solution, but if I'm an ENRP server. Say Ken is not available for anyone. That is one solution. He goes to the next one on the list, and that is perfectly fine. Only needs to find an element. Even one that can't contact Ken then no one should contact Ken. We want maximum survivability. To be a valid PE have to be reachable by all servers all the time? Consensus reached: Reintroduce the fathering solution as in the first ENRP drafts. Each PE is "owned" by a ENRP server TCP checksum the database. Only compare what I own. I send a PE ID and its hash. Action Items: 1) Phil will work on this research problem of split namespace. Scope of Rserpool solution is for the Internet. This proposed Rserpool solution works in the current Internet Telcos and other server-client environment i.e. in the scope of the requirements document. Other solutions may need to be researched and proposed independently. Managing a split namespace is curtail for military environments. There will be two solutions -- military and IETF. As a research project Phil will investigate a solution based on military requirements. 2) Qiaobing is going to put back the functionality in ENRP which requires each PE have a home ENRP server (fathering). This will allow for an authoritative answer on namespace auditing questions. 3) hash or checksum the database. Only compare what I own. I send a PE ID and its hash. Randy writes this up for the mailing list. 4) Phil Conrad - look at the common params doc and see if anything needs to be added. ------------------------------------------------------------------------------- 4) Architecture There was a question about pool element handle: should it be a zero terminated bytestring instead of a zero terminated ASCII string. Resolution: bytestring it is. same handle = same length + the same bytes in the string There was a discussion of various terms used in Rserpool. Pool element identifier: integer field in ENRP design (PEI) pool element handle local(comming from local ASAP call(PEH) must go away pool element handle parameter found in common parameters doc (goes over the wire)(PEHP) We need consistency of terms across all documents. Put all terms in one document for now. Before last call we can put them in the appropriate docs. Pool Element Handle: PEH : IP address + transport protocol? Answer from the ENRP server is all IP addrs + all the transport protocols -ASAP must filter locally to support the client (should ENRP filter or is it local to ASAP) There was a long debate on whether a PE that supports both TCP and SCTP should register once with an ENRP server OR the PE should register once for each separate transport. There are pos and cons to each approach. Qiaobing's analogy is that the PE is an animal with several legs. Each leg represents a different transport address. If the PE registers once, then if one leg of the animal is dead, you can infer that the whole animal might be dead and avoid that PE on failover. However, if the PE registers each leg separately, then you cannot tell which legs belong to the that animal. You will just try some other leg. In doing so, you might get the same PE (animal) but use a different transport. Chances are that it is also dead. A smarter strategy is to try another PE (animal) first and then if you exhaust all other PEs (animals), then go back and try another leg on the first animal. Animal might be ill, but not dead. Some related issues were: Is it possible to do a changeover from one transport layer to another transport layer? Are those different PEs or not? Does each PE support a single transport protocol or not? The present design implies one registration for each PE i.e. a PE has multiple transports. Case I = Multiple registrations (one registration for each transport address) for each PE Register PE + single transport = single leg of the same animal con: no loadsharing: load a single leg con: stuck with this solution con: no detection of the failed legs of the animal con: no detection of the legs belonging to the same animal. pro: simpler to understand, design could be harder Case II = Single registration for each PE Register PE + multiple transport: all legs of the animal pro: easier to detect the dead animal (one legs fails, high chance that others legs have also failed) con: more difficult for architecture con: more difficult to register (implementation dependant) pro: include the PE + single transport model con: clarify that you can fail over to yourself (is this possible?) ill : at least one leg failed dead: all legs failed if a leg fails, check first other animals before trying the others legs of the same animal. pro: the multiple reg include the single reg (is a optional restriction) loads a single animal Consensus was reached on Case II. registration of the different transport goes across the single SCTP association to ENRP server: is this possible? mention the functions that the different protocols should do. add bussinesscard and application ack to architecture. PE can send to PU and PU can send this to the PE after failover. some sort of fate sharing between the PE/servers We discussed the pros and cons of having a data AND control channel between PU and PE. PU/PE - there is always a data channel - in failover mode: data channel must use one of the transport mappings PU/PE ASAP msg should be muxed over same association/connection over the control channel. - in Name service mode: data channel is structured per application protocol. Rserpool imposes no restriction on this communication. If there is a need for ASAP PU/PE communication, a separate data channel may be opened, while a channel MUST used a mapped protocol or SCTP(XOR). This is not possible: there is no control channel. In failover mode the data channel MUST use one of the transport mappings PU-PE ASAP messages e.g.. application layer ACKs and business cards either MUST or SHOULD be multiplexed over the same association/connection business card must be there ahead of any user data -- Randy A MUST versus SHOULD discussion should be decided by the WG and held on the list. Randy will send a message to the list. name service mode: - should we do UDP based media using rendez-vous mechanism - bussinesscard needed in name service mode? - application ack possible? -do we need ASAP PU/PE communication? - if yes, do we use SCTP or the mapping protocol Action items: 1) MUST versus SHOULD discussion on the list. PU-PE ASAP messages e.g.. application layer ACKs and business cards either MUST or SHOULD be multiplexed over the same association/connection Randy will send a message to the list. 2) Use the same terminology in all documents - All Put all terms in two documents: Common parameter definitions go to the common-param draft - Qiaobing Xie Terminology definitions go to the architecture draft - Michael Tuexen. 3) Michael and Phill - write text for the following Describe set of Functional requirements PU-PE, PU-ENRP, Add to architecture last will and testament -- business card Application level ACK 4) Michael Tuexen set up the site with Rserpool drafts www.sctp.de 5) Randy: Clarify in the ASAP document this registration and failover mechanism (for Case II). Give the rationale. Detect that one leg is unreachable should avoid going back to that animal (PE) until you exhaust all the other animals (PEs). Michael: Same action item for the architecture