IETF IP Storage (ips) Working Group Meeting Minutes San Diego IETF Meeting, December 11-12, 2000 ---------- Monday December 11, 2000 EMC will be sending out an IPR notice regarding a patent related to iSCSI and FCIP to the mailing list. Interim meeting being scheduled for week of January 15, to coincide with T10 in Orlando - Grosvenor resort. -- Framework document - Mark Carlson (Sun) Describes environments for IP Storage. Includes terms, background on various protocols. This is a living document. Currently more of a survey. This document will coordinate with Naming and Discovery. Looking for more co-authors, please contact Mark if you are interested. -- Framing discussion -- Randy Haagens (HP) and Allyn Romanow (Cisco) - Allyn and Randy were asked to compose this presentation by the ADs. Purpose was to try to clarify the problem and present a range of solutions. - Framing is a common challenge with for both iSCSI, FCIP as well as non IPS documents. While framing is not explicitly required, a solution for a more effective iSCSI specification is highly desirable. The focus of the presentation was understanding the requirements of framing (i.e. the problem). Reaching consensus on a solution was not one the goals of the presentation. Allyn started the presentation by pointing out that this topic will also be discussed on Monday night in the TSVWG. - The problem: TCP reassembly can be costly, and in some instances not feasible. Also, there is limited host memory and host bus bandwidth, so one wants to avoid manipulating the data more than once. Best would be one use of the bus and memory - zero copy. Note: This is not the same as TCP zero copy. TCP typically waits for all the data to arrive, and then copies the data to host. - In outbound direction, data can be transferred directly from memory to the protocol controller and out onto the wire. In the inbound direction, when received out of order, data has to be put in a reassembly buffer until all data is received. - One solution: Direct Memory Placement (Payload steering; data steering; RDMA) -- In order to conserve host memory bandwidth, CPU cycles and reduce on-board memory requirements, it is desirable to deliver iSCSI data directly to host buffers, avoiding the overhead of TCP reassembly buffers. The TCP reassembly buffer can be 250MB for a 10Gbps link with 200ms round-trip time. At 1Gbps, reassembly is possible but very costly. But at 10Gbps speeds or above, reassembly is no longer feasible. So, the goal is to get rid of a separate TCP reassembly buffer. Can decode ULP (iSCSI) headers and place payload directly in host memory without intermediate buffers. This would not be a conventional NIC card; instead it would be very iSCSI aware, but it would not necessarily process the iSCSI headers, but just use them to determine where to place the data. As in TCP, the iSCSI stream is presented to the iSCSI protocol processor in-order. - In this solution, must address loss of ULP sync - when a segment containing a ULP header is dropped or delayed, ULP sync is lost. Direct data placement cannot continue; data must be diverted to a reassembly buffer. Goal is to recover ULP sync at the next ULP header. There are both TCP aware and TCP unaware solutions to recovering ULP sync. - TCP unaware approaches: a) SCTP - issues with this include lack of widespread deployment b) Special Characters - requires byte by bytes processing c) Fixed length ULP messages - Inefficient for short ULP messages d) Periodic Marker - Best solution for this class of approaches Sublayer of a framing protocol. Manageable; relatively easy to implement in hardware Marker 4 byte field number of ULP bytes remaining in current PDU. Marker inserted and removed by framing protocol; e.g. iSCSI. After loss of sync, locate next marker; use to locate the next ULP PDU. Markers are transmitted twice in a row; ensures markers cannot be split by stream fragmentation/segmentation. - TCP aware Approaches a) URGent pointer - disallowed b) PSH bit - disallowed - Another TCP aware approach can be considered by the TSV working group. The TSV working group works on small items in the transport area that do not need a full working group as well as TCP/UDP transport issues. - Allyn Romanow presented a technique for demarcating message boundaries using a TCP option. This consists of using one of the reserved bits in the TCP header to extend TCP to support this type of framing. Then can add up to 40 bytes before the TCP payload. Problem is that these reserved bits are a scarce resource; need to evaluate the need for the change. Also any time a change to TCP is proposed, there is tension, e.g. tension between the need to update TCP and stability of TCP. - Procedure for standardizing a TCP option consists of a) The IESG has to approve new work items for the TSV wg. b) Ask the Transport Services (TSV) working group to adopt this as a WG item c) Pros-and cons will be discussed on the TSV wg mailing list. If it supported, hopefully the spec will be wrapped at the next IETF (roughly 3 month time frame). If no support, it's dead. The advantage of the TSV wg is that transport experts will be able to contribute feedback. d) If supported, will be adopted at next IETF meeting. Advantage is that people who are experts in transport will be able to contribute, and that this will not be an iSCSI specific solution. IPS should follow this process and contribute. Make sure that the solution (since not iSCSI specific) meets the needs of this group. This is a very common problem, that is worthy of consideration at the transport layer. Addresses areas beyond IPS. The TCP option is not the only approach. TCP header bits could potentially be used for framing. The flag approach may send many packets that are less than MSS. This is potentially a risky change to TCP. Message Boundary Option Two approaches. Not in drafts yet: - Flag approach -- Costa has written up; will post as draft. The flag approach may send many packets that are less than MSS. This is potentially a risky change to TCP. ULP header is aligned with first byte of TCP payload. - Offset Approach -- 4 bytes. 2 byte offset indicates offset into TCP payload of first ULP header in the segment. Write-up forthcoming. Discussion - Lead by Steve Bellovin Steve Requested the group concentrate on Requirements. The discussion raised the following points: - Another option for alignment - periodic alignment instead of periodic marker. There could be a requirement in iSCSI that an upper-layer header appear every n kbytes in the TCP stream. Padding could be used to make sure this happens. - This is not the first time that this issue has arisen, and there is value in a general solution that is applicable to other protocols, even though this may take longer to deploy. The consensus in the room was that a general approach is preferable to one specific to iSCSI. - Multiple message boundaries in a single TCP segment are not a problem. Once the first boundary is found, the rest are found by examining the iSCSI (or other ULP) headers. - If there is a large gap between message boundaries, the data in the gap will need buffering. Implementations may wish to consider this in setting maximum data size for a PDU. - RDMA is different but related to this topic. Any RDMA protocol will either incorporate or assume framing. It may make sense to spec a generalized RDMA protocol on top of this framing mechanism. - Implementation of this sort of framing would be optional. - A generic data framing protocol may also be a good place to put in a stronger CRC than the 16-bit Internet checksum. Drafts making specific proposals are welcome. Steve Bellovin asked for a hum of the room on whether to solve the "framing problem" in an iSCSI-specific way or whether to pursue a mechanism to add to TCP. The hum in the room was to do it in TCP. -- ISCSI document review - presented by Julian Satran. - Rough consensus has been reached on the session model - Symmetric with optional multiple connections. - Login Session context - good understanding. - Login Security context - more work needed. - Commands, messages, tasks, and tags almost complete. Items open - coding, some layout. - Response numbering scheme is well understood; complete. - The data numbering scheme has received no consensus. It may be removed. Julian's personal opinion is that it's optional and low cost with advantages. - For recovery, command restart and status well understood. No consensus on data recovery. Digest not well understood; needs to be readdressed. - Text commands - negotiation mechanisms done. - Mapping moved to T10 (aliasing). Dropped from iSCSI. - RDMA/Sync, Security/Authentication - all are still open issues. - Authentication - login phase must provide authentication. This was the consensus at the last meeting. Every iSCSI PDU must provide data integrity and authentication. - A mechanism should enable optional end2end data protection/authentication. Would like to use TCP recovery in presence of error. Digests can be activated at a higher level. Need a mechanism that can be activated on demand, ideally at login. - The current digest scheme needs to be changed. Julian suggested using IPSec for data integrity, since all the above mechanisms are provided by IPSec, it is a best fit for what is needed and very cheap if use only what is needed. Can insert own policies, including policies that will verify integrity verses provide security but use same mechanisms. Policies will be addressed in next two weeks. - David: IPSec does negotiation securely. What is currently in the draft is most likely vulnerable to man-in-the-middle attack. - Steve Bellovin indicated that the IPSec WG would be extremely opposed to any insecure non-cryptographic algorithm being defined for IPSec. Silicon must support SHA-1 or MD5 in order to do key negotiation. There are active discussions/proposals on how to do high speed encryption/negotiation. Early in process; drafts not yet standards, but worth looking at this. - Mark Bakke really wants to maintain the separate iSCSI header/iSCSI payload digests. This separation is lost by moving to IPSec. Gained data integrity is only as good as the group is willing to pay. Good integration with encryption. - Can use IPSec in transport mode, which will provide end2end protection. Integrity is required end2end, but security may not be. Security may need to be removed at the firewall/gateway, but need to still be able to verify integrity at the endpoints. Can have multiple layers of IPSec if needed. Comment from audience - not recommended. - David Peterson of Cisco asked whether ACA will be mandated by the draft. The consensus, after the discussion, is that iSCSI must support ACA but that a device need not support ACA (Ralph Weber pointed out that few initiator use ACA today). There was some grumbling because ACA is needed for reliable pipelining of ordered commands in the face of errors. - There was a question on whether asynchronous event notification (AEN) was mandatory to implement in iSCSI. Again, iSCSI transports must support asynchronous events but iSCSI devices need not. Somebody pointed out that SCSI mode pages can be used to regulate whether a device generate AENs. - Ralph Weber (T10 secretary) praised iSCSI for trying to advance the state of the art in SCSI. -- iSCSI requirements --- presented by Marjorie Krueger (HP) T10 work on authorization will not be integrated into iSCSI; to the extent that SCSI provides authorization, that's T10's domain. The fact that IP networks are less secure than typical SCSI environments have been in the past introduces additional issues that need to be addressed here in iSCSI. T10 work will be used and referenced where applicable. It was noted that the point of iSCSI authentication and authorization was to control who was able to get to a target. -- Bootstrapping -- presented by Prasenjit Sarkar (IBM) This document contains guidelines for how iSCSI boot clients connect to iSCSI boot server. Included description of how to use existing techniques. iSCSI boot clients need IP address, iSCSI boot server service delivery port name, default; LUN= 0; iSCSI initiator software. Boot process steps: Client software stage Use PXE or related bootp/tftp protocol to get iSCSI initiator software DHCP stage Use DHCP to configure client IP address Use new DHCP option to configure iSCSI boot server service delivery port name Discovery server stage Use "to be defined" iSCSI delivery service to get iSCSI There was a question on whether the boot client had to have IPsec, in light of the integrity proposal by Julian and security proposals by others. It is not required; bootp is sufficient. The absence of security requirements for boot was pointed out. The current goal of the boot document is to remain neutral on security (neither mandate nor disallow). There was some question on what to do with the iSCSI session once a bootstrap program was done with it. It was noted that it was probably simplest to close it and have the loaded program establish a new iSCSI session, but this is up to implementations. -- MIB presentation - Mark Bakke (Cisco) A group is forming to work on iSCSI MIB. The scope is management of iSCSI as opposed to SCSI. If necessary, a separate SCSI MIB (if one does not already exist) would be addressed separately. The original MIB structure in the current draft is not adequate, and is being redone. These revisions will also bring the MIB up to date with the current iSCSI draft. A question was raised about how FC-style zoning works with the MIB. It's not clear how zoning fits into the iSCSI architecture. The MIB could be implemented on anything running iSCSI including initiator, target, and gateway. The FC HBA API available from SNIA might be of interest to this group. It has a complete list of things management tools want to be able to see out of an initiator. ----- Tuesday, December 12, 2000 -- Naming and Discovery Requirements - Mark Bakke (Cisco) Naming and discovery will specify target discovery but would leave LUN discovery to SCSI mechanisms, such as REPORT LUNs. There was a bit of debate on this; why not go all the way and support LUN discovery in the naming system? The counter-argument is based on layering: "Leave unto SCSI that which is SCSI's". Scaling requirements include both small and large environments. Find targets by querying SNS. Small environments do not require SNS. Hierarchical format, with Naming Authority. World Wide Unique Identifier Address composed of IP addr+TCP port+Target Name, URL like. Plan to apply for well known port for TCP. In such a case, an address w/o TCP specified would default to this well known port. Format includes info on naming authority, including support for 'local' naming authority. Character set to be allowed? Unicode? Recommend UI schemes for naming authority. Need to look at security issues. T10 issues - reservations, reset, LUN naming. Target reset discussion. Noted that T10 is thinking of making target reset optional. Is breaking of a connection in iSCSI equivalent to a target reset? Consensus is no: the end of a session was equivalent to a target reset and would also cause any persistent reservations to be released. Naming scheme will allow multiple port and multiple initiator/target discovery. Will give list of targets + all paths to that target. Draft currently an individual submission - consensus (hum) taken, to be adopted as working group document. No opposition hums. -- iSNS document presented by Josh Tseng, Nishan ISNS describes a scalable information facility for registration, discovery and management of networked facilities. ISNS follows a client/server architecture. If client registers with name server, allows itself to be managed by the name server. Why needed? Simplifies storage management implementations. Allows greater scalability over broadcast/multicast discovery methods. Supports zoning. Next step - incorporate requirements/suggestions from IPS working group. Extend document for FCIP Access control - what is name server role? Targets upload public key to name server. Enforced at the end node/target. Supports both soft and hard zoning. How does it fit into discovery? Naming and discovery team will look at this to see how well it fits. Should this be maintained as a separate document vs incorporated into naming/discovery team? Yes, this is a separate document because it supports more than just iSCSI. In reading the draft, reliance on WWN. This draft would need to be redone to support WWUI of n&d requirements. Direction is one in which naming and discovery team approves of? Yes, close. Is there working group consensus as a base document; working w/ NDT group to produce a revised document, aligned with N&D, which would then be adopted as an official wg document. Rough consensus - next revised version of document will become an official working group document. Not unanimous. -- FCIP - Status and progress of FCIP. - Raj Bhagwat (LightSand) Current status - difference from previous presentation Solution for bridging remote FC SAN islands. From FC point of view, appears to be entirely an FC network. Initially did not have congestion management (previous presentation). Draft overhauled to incorporate TCP as transport in order to address congestion management and recovery mechanisms. In rev -00, PSH flag incorporated. Based on feedback from mailing list, this was eliminated and in -01, a new frame boundary mechanism introduced. Topics under discussion -- QOS, security, MTU/MSS, Framing/synchronization, order of delivery, discovery, error recovery. Alignment with new project in T11 - FC-BB2. FC-BB2 focused on issues outside the scope of the IETF, including link level issues. Target date for completion - June 2001. Much FC/IP work is being done on conference calls. Conference calls are design team calls open to design team members and authors. Public review is on the mailing list. An FCIP device is a gateway between an FC SAN and IP network. Discovery of FCIP gateway (device) and other FCIP gateways is currently via static configuration. Dynamic configuration support is envisioned, perhaps using iSNS. David Black will work with authors on revising the QoS text. -- iFCP - presented by Charles Monia, Nishan What is the difference between iFCP and FCIP? - FCIP is a tunneling model between FC SANs. A conduit for FC frames to flow transparently to FC network over IP backbone. - iFCP network model extends up to the FC storage device itself. Uses a session model. Consolidates FC storage switching and routing functions in the IP fabric. Reduces total cost of ownership, unifies network and storage management domains and exploits IP technology investment. Extend SAN over lan/man/wan distances. Next step -- complete the n_port session model. Encapsulation changes for additional end-to-end error detection. The authors of iFCP would like to see it considered for adoption as a work group item. Adoption of iFCP as a work group item requires modification to the WG charter. David Black requested input on this be set to the WG chairs. Revising of the charter requires consultation of the area directors and working group chairs. After the presentation, Suggestions were made that iFCP and FC/IP should merge since they are so similar. It was pointed out that the two protocols take different approaches. iFCP works by intercepting FC logins (connection requests) and modifying FC frames. In addition, it doesn't run FC routing protocols between FC SANs. Clarification of FCIP and iFCP - the latter is for FCP protocol mapping only, whereas FCIP can transport any FC upper level protocol. FC/IP works at a lower level than iFCP. It doesn't modify FC frames. FC/IP requires running FC routing/switching protocols between FC domains. Some thought that iFCP was a superset of FC/IP. There was a concern that the iFCP gateway would need to run IP routing protocols. It was eventually decided the iFCP gateway was just an IP host and didn't have to run IP routing protocols. Other comments need to be sent to mailing list or chairs directly. -- Adaptation Layer presentation -- Randall Stewart, Cisco Randall Stewart's presentation introduced how the IPS protocols could be architected with an adaptation layer independent of the underlying transport (i.e. at least both SCTP and TCP). To do this, a uniform API boundary between the ULP and transport would need to be defined. This would require many changes to all existing drafts. APIs would need to be a message oriented type of mechanism. Critical path would need to be done so that they would be protocol agnostic. Transport interface would need to provide methods for passing buffers to/from control of transport, e.g. for zero copy. Adaptation layer would need to worry about Framing Zero copy Parallel paths Message retrieval Notifications Must be very careful that this API would not make assumptions about the transport being used. In adaptation model, would need to figure out how to overcome the issues. Randall would be more than glad to help by contributing both advice and/or drafts to bring about this sort of adaptation layer. A concern was expressed that the adaptation layer would add too many layers between iSCSI and TCP and that separate protocol should be done for SCTP. It was suggested that the CAM may be an inspiration for the adaptation layer. Others responded that the CAM is at the wrong layer, above iSCSI.