Hi, It appears I didn't copy ops-dir with this review. Doh! David Harrington ietfdbh at comcast.net +1-603-828-1401 > -----Original Message----- > From: ietfdbh [ mailto:ietfdbh at comcast.net] > Sent: Saturday, April 05, 2014 5:15 PM > To: draft-ietf-cdni-logging.all at tools.ietf.org > Cc: Benoit Claise; spencer at wonderhamster.org > Subject: OPSDIR review of draft-ietf-cdni-logging-10 > > Hi, > > I have been asked to provide an early OPSDIR review of > draft-ietf-cdni-logging-10. > > This memo specifies the Logging interface between a downstream CDN > (dCDN) and an upstream CDN (uCDN) that are interconnected as per the > CDN Interconnection (CDNI) framework. First, it describes a > reference model for CDNI logging. Then, it specifies the CDNI > Logging File format and the actual protocol for exchange of CDNI > Logging Files. > > An OPS-DIR review usually has a principal goal of helping the OPS ADs in > their evaluation and balloting of documents at IESG reviews. > An early review is focused more on providing feedback to authors, the > working group, and the relevant area directors about issues that they might > want to consider and address as they move forward with the draft. > > Overall, the document is well-written, as are the related documents, and the > set of documents do a good job of explaining the important points of CDN > interconnection. > > --- RFC5706 review --- > > RFC5706 provides guidelines that protocol designers should consider about > the operations and management of their protocols. RFC5706 has an > Appendix > that OPSDIR reviewers use to help guide their reviews. The following points > from the RFC5706 Review Checklist apply. > > 1. Has deployment been discussed? > A number of related documents discuss CDN Interconnection, describe the > problems and use cases, propose a framework for considering various > interfaces relevant to CDN Interconnections, and detail some requirements > for those interfaces. A significant amount of discussion is provided about > expected deployment models, and how the various conceptual interfaces > apply > to these deployment models. > > * Does the proposed approach have any > scaling issues that could affect usability for large-scale > operation? > > Yes, there are scalability issues. As mentioned in section 2.1, and > reflected in the terminology "CDN Reporting" and "CDN Monitoring", there > are > requirements for the logging interface to support both near-real-time > monitoring (for fault and performance mitigation) and deferred analysis (for > billing, past-mortem delivery analytics, etc.). The deferred analysis is a > much easier problem to solve, given the potentially huge amounts of logging > records, and the timeliness requirements involved in operating a large CDN. > > > The proposed solution generally looks good for the deferred case, but seems > inadequate for the near-real-time monitoring case. The document declares > the > near-real-time support to be out of scope. > > Given the costs in time and resources of converting between logging > formats, > it may be really desirable to have one format that can serve both sets of > needs. To a large degree, the proposed format could serve both purposes; > however, there are various points in the document that specify MUST > requirements that would prevent using the solution for both use cases. > Mostly, these requirements imply that a "file" must be fully collected > before sharing, and given the timeliness required by the monitoring use > case, waiting until a file is fully collected makes the file approach > unsuitable for monitoring. It can be feasible to use a logging file approach > for monitoring, if the log can be "tail"ed, thereby allowing the contents to > be passed as the collection is performed, without waiting for the file to be > complete. > > In most of the network management protocols designed by the IETF, we > recognize three parts - a data modeling language, data models, and a > protocol that can transport portions of the data model. This document > doesn't do a great job of keeping those separate. If the document was > written with such separation in mind, then the record formats would be > independent of how the file was going to be transported, and a logging > record might be able to be used with two different approaches to transport - > one suitable for deferred usage, where it can be expected that the file is > completed before transport, and another transport approach suitable for > near-real-time transport of individual records, possibly using a tail > approach for a file-based collection of records. The constraint in section > 4.1.2 is written in a manner that would seem to preclude a tail usage. > > This concern might be able to be mitigated by changing some wordings in the > document. I think the document would be better if, rather than declaring > one > set out of scope, it recognized the two sets of needs, and discussed how the > design of the data modeling language and data models could address both > sets > of needs, while recognizing that different transport solutions might be > needed to meet the two sets of needs, and there might be constraints on > the > data model designs to be suitable for both needs. > > 2. Has installation and initial setup been discussed? > > It has been discussed to a degree, but this document declares a tremendous > amount of "stuff" to be out-of-scope, and that typically includes how to > configure the interaction between peers. I think this document really needs > to address some of the issues that it declares out of scope, especially the > configuration options that must be standardized or negotiated, in order to > make the solution interoperable across different implementations. Yes, > some > of these negotiations may be business-oriented negotiations, but those > negotiated parameters could then be specified in a technical manner so the > logging application can be automated in an interoperable manner. > > Some examples where I think this document could have either standardized > the > answer or allowed for standardized negotiation: > a. what exact set of logging information is to be provided by the dCDN to > the uCDN. The proposed solution includes record-types and field lists > (templates), so why can't the two use that information to negotiate what > should be included in the logging? > b. In section 2.1, it says the uCDN can configure customization, but that > the dCDN is free to ignore that, and apparently the uCDN doesn't even get > told that the dCDN is going to ignore it. If I were the uCDN and the dCDN > was going to ignore my request, I'd like to be told that so I can choose a > different dCDN that will pay attention to my customization preferences. I > would find this especially important if my preferences had to do with > end-user and/or content provider privacy issues, for which there might be > legal requirements/ramifications. > c. > > 3. Has the migration path been discussed? > The migration path is not a direct path. There is no existing standard for > CDNI, so there really is no migration, per se. > > CDNs are typically proprietary, and how they operate can contribute to > competitive differentiation. CDNs deliver content for content providers, and > it is common that the content providers want reports of completion and > performance of the deliveries. Connecting CDNs is like connecting black > boxes together, and the sharing of information is on a need to know basis. > This document works at describing the subset of monitoring/logging > information to be shared. > > As observed in the last paragraph of section 2.1, it is desirable to keep > the intra-CDN and inter-CDN logging compatible. This document explicitly > tries to reuse (migrate from) the conceptual logging done within CDNs, and > explain how to use it between CDNs. Often within a CDN, there are multiple > geo-located facilities (points of presence) each of which more or less acts > as a CDN in its own right. So it is not uncommon to conceptually have > upstream and downstream CDNs within a single CDN. For the most part, this > attempt at reuse appears to work well. > > I think there are some design problems in the proposed solution because the > migration is from a single aggregate CDN to potentially multiple cascading > levels of CDN. Within a single CDN, I expect that the hierarchy is rather > flat, and probably most CDNs don't go beyond two levels - one uCDN and > one > dCDN for a particular delivery. But CDNI was started to address the growing > need for interconnection between multiple CDNs, often in cascading > uCDN/dCDN > relationships. The migration from non-cascaded CDNs to cascaded CDNs has > a > few design flaws and inconsistencies within the proposed design. > > Most notably, the logging from a dCDN to a uCDN typically contains only one > dCDN identifier, such as Verified-origin, and doesn't really permit > specifying a sub-ordinate dCDN. For example, in a cascade such as CDN-A -> > CDN-B -> CDN-C, CDN-C will share logging info with CDN-B, but not with > CDN-A; CDN-B shares logging info with CDN-A. This can be desirable for > hiding the topology and delegation used by the dCDN. However, this > document > does not discuss how the logging provided by CDN-C gets converted into the > logs for CDN-B that will be sent to CDN-A. Without some logging information > from CDN-C, it would seem difficult for CDN-A to utilize the information to > meet requirements of billing, analytics, and fault and performance analysis. > Maybe the logging gets aggregated and reported in CDN-B's logging, but that > "transitive or aggregate logging" doesn't appear to be discussed in this > document. > > I think Verified-Origin is particularly problematic, because the text states > that this can only be added by the uCDN, never the dCDN. So what if CDN-B > verifies the origin of the logs from CDN-C and then passes the information > (now as a dCDN) to CDN-A? The text mentions that this might be established > using authentication mechanisms; so do we lose this logged > authentication/verification when cascading? > > (Maybe this is resolved by having the uCDN (CDN-B) record a "transaction" > that starts when it delegates a task to a dCDN (e.g. CDN-C) and ends when > the dCDN completes the task, and the uCDN just records the whole > transaction > without the details that are reported by CDN-C. Then CDN-B logs only the > whole transaction. I would like to see an explicit example of such logging, > using the cdni_http_request_v1 record-type, so this is clear.) > > [I had a bunch of notes, listed below, from my review. It is simpler for me > to list those notes, as I have done below. Converting them into comments > organized according to RFC5706 checklist is time-consuming. To help keep my > review shorter than the document I am reviewing, and since this is an early > review, I am discontinuing the RFC5706 format review and simply providing > my > list of comments. I assume I will be asked to continue reviewing this > document, so later reviews will go back to the RFC5706 format ..] > > --- Technical advice --- > I am not an assigned technical advisor for this working group; I have a > certain degree of expertise in operations and management, especially IETF > protocols for network management and logging, plus a lesser amount of > expertise in CDN management. The following comments come from that > background, and deserve no more attention than comments from other > contributors. > > 1) Section 2.1 starts saying what is involved in the reference model, but > then presents a bullet list that seems more interested in detailing what is > out of scope for the document, than what is included in the model. I think > this should be rewritten to be cleaner. As part of that, I recommend moving > away from bullets to full sentences. > > 2) An editing/reviewing nit - this document uses lots of bullets rather than > sub-section numbering; for review purposes, this is irritating because a > reviewer has go count through the bullets to reference text contained in the > sub-sections. > > 3) in section 2.2.3, some rfc2119 terms are used in lowercase. It isn't > specified that these are not used to represent RFC2119 requirements, and I > am not sure if they are meant to be used that way. As a result, the intent > of this text is ambiguous. > > 4) in section 2.2.4, there is an issue about correctly reporting data about > sessions that cross logging periods; it only says "it is important to > correctly report this". I think this document needs to better specify how to > correctly report in such an environment. It would help to require that data > models, such as the data model defined in this document, identify those > elements that might suffer such discontinuities, and explain how to resolve > the discontinuity. (The MIB Doctor directorate often watches for issues of > persistence and discontinuities, so you might be able to get some advice > from them on this issue.) > > 5) I am rather disappointed that this document only defines one data model - > the cdni_http_request_v1 model. While I recognize that this probably > represents the bulk of CDN traffic, it would have been nice to provide a > proof-of-concept for the registration of record-types and field names to > have more than one (say, a cdni_ftp_v1 model, to show how to register > each, > and especially to be show how to reuse field-names across record types. > Such > a proof-of-concept would also be helpful uncovering any problems with the > approach before it becomes a standard. > > 6) I wonder if support for multi-line entries will become important over > time; this format doesn't permit expansion to such multi-line records. > > 7) Syslog/TLS ran into a problem when multiple messages were transported > in > a stream, as compared to the original UDP-based design where each > message > was in its own packet. A great deal of discussion happened about delimiters > that could be used to delimit messages in a stream, because no delimiters > had been reserved for such purpose, and recovering from a delimiter lost in > transit could be problematic. (Syslog/TLS finally used a counted-length > approach, which requires the completed message size to be known before > sending.) I'm not sure that would be relevant here. > > 8) Under Record-Type, directive value, it says "cdni_http_request_v1" MUST > be indicated ... ; is this meant to be a REQUIREMENT ala RFC2119, or is this > an example? I would hope it is not a hard-coded value, and is just an > example, so a new record-Type could be defined in the future that is NOT > http-specific, and a new http-specific record-type might be defined if > needed. The wording needs to make this clear. > > 9) There is an interaction between record-type and the Fields directive. The > text mentions "the first instance", and I'm not sure this isn't required for > every instance. Can the file have the following structure? Record-type; > fields; ; fields; ? i.e., can a second fields directive change > the format of the (following) information within a single record-type > declaration? Is this needed? There is no explanation as to why this feature > is included. An explanation, and examples would be nice. > > 10) The Integrity-hash directive value text seems slightly contradictory - > a) the behavior of the entity that received a corrupted logging file is > outside the scope of this specification, and b) depending on the validation > of the hash, the receiving entity MUST consider the logging record corrupted > or non-corrupted, c) if the entity receives a non-corrupted file, and adds a > verified-origin directive, then it must recompute. > > 11) I am concerned that the Integrity hash is optional; would the > implementation be able to tell whether the integrity was being provided via > some external means? It would be nice to standardize whether the integrity > hash must be present. > > 12) would it make sense to make the integrity hash part of the transport > protocol (i.e., set in a format that wraps the original logging file, rather > than being part of the original file? I am also concerned about having the > uCDN recompute the hash after verifying the host, There would now be two > copies archived, wouldn't there - the one being held by the dCDN per > agreement, and the one with a recomputed hash. Which one takes > precedence? > If yu defined a uCDN wrapper for the file presented by the dCDN, and had > the > verified-origin and integriy hash as part of the wrapper, then the > uCDN-wrapped file would match the dCDN-retained file. > > 13) under Integrity-hash, it states "Note that this is not a guarantee ..." > I think deserves some expansion, either here or in the security > considerations section. > > 14) If the file were tailed, and transmitted as it was generated, would it > be compliant to this specification to compute the MD5 as the file was being > sent, and then logged the hash directive after the file (without the hash > directive) finished transmitting? > > 15) What if a mode of corruption is found in the future that this hash > computation wouldn't detect? MUST the receiver still consider the file to be > non-corrupted? > > > 16) the last paragraph of 3.4 has a couple spelling/grammar errors. > > 17) for c-ip, is the "client" address unambiguous? I'm not an expert in http > redirection, but if IIRC, the client might be specified in more than one > way. Is it unambiguous which field of the http request this must come from? > If there are more than one, such as if the dCDN could differentiate the > actual client from the DNS resolver address, would it be helpful to include > more than one client address? > > 18) I am concerned about not standardizing a negotiation for the u-uri. How > is the transformation "agreed upon" in a manner that allows an application > to know what to do? I think it might be better if expressing the > transformation expected could be standardized. Otherwise, how do we get > interoperability across multiple implementations? > > 19) As mentioned above, I'd like to see more than one protocol dealt with, > as proof of concept. > > 20) sc-total-bytes apparently only applies to HTTP bytes. So maybe this > should be labeled sc-http-total-bytes, to differentiate it from > sc-ftp-total-bytes, and from sc-total-bytes (a protocol-independent value). > Ditto for many of the other fields defined here. > > 21) sc-status: is there a valid range associated with this field? I know > squid supports up to status=600, even though it doesn't understand many of > those values. Is it legal/compliant for some implementations to decide to > end status=900? Is it legal/compliant for some implementations to consider > any value >500 to be invalid? > > 22) s-sid mentions http-specific session. > > 23) s-sid: who establishes the session ID? Is it the CDN performing the > delivery (the dCDN) or the uCDN? Can a uCDN establish a session ID but then > have different dCDNs deliver different portions of the content? Having a > consistent sid would allow the uCDN to correlate the multiple "sessions" > into one. In a cascaded environment, are the sids always distinct, and able > to be correlated by the uCDN? > > 24) s-cached: I found some of the wording to be ambiguous. I recommend > some > rewriting to make sure this is not ambiguous. "exclusively"? "some, but not > all, content"? > > 25) why is s-cached important to CDNI? I understand why it is important to a > CDN, but why does a uCDN need to know if a dCDN surrogate had it cached? > Should this be about whether the dCDN had it cached, not whether a > surrogate > of the dCDN had it cached? > > 26) After the s-cached definition, there is text about the Fields directive. > Doesn't this text belong near the definition of the fields directive? > > 27) What is the benefit of the feature that says the fields can be in any > order? How does this benefit compare to some level of standardization? > > 28) Why make three fields optional to support? If implementations don't > support them, then users can't use them. It would seem better to make > these > mandatory-to-implement, optional-to-use. > > 29) if a uCDN needs to have any of these optional fields, especially the > anonymizing field, but there is no negotiation phase specified by this > document, then how can a uCDN decide not to deal with a dCDN that doesn't > support the field? > > 30) Updates to log files and the feed are outside the scope ... Really, > can't we standardize a way to specify the frequency, the period of time, and > timeliness of publishing? > > 31) 4.1.2 says implementation "SHOULD use HTTP cache headers ..."; why > only > SHOULD? Is there a specific example of when MUST would be inappropriate? > > 32) Is there a reason we don't record the retention lifetime in the file? > This might make it easer for the archiver to purge files that have passed > their retention dates. > > 33) Can redundant feeds have different timing characteristics? If so, is > "SHOULD ue the UUID ... to avoid ... pulling and storing ..." still valid? > Maybe there should be some discussion of why this is a SHOULD rather than > a > MUST. > > 34) 4.2 says "MUST use HTTP v1.1". I understand trying to have a baseline > for interoperability. Shouldn't this be MUST implement support for v1.1, and > MAY support additional HTTP versions. Clients MUST implement support for > v1.1, but MAY negotiate which version they use. > > 35) SHOULD support gzip; why not MUST-implement gzip, MAY use. > > 36) 5.2 - is this meant to be an extensible registry? Is it allowed to have > enterprises register their own proprietary formats (which would seem > acceptable if the provide a specification). So are there preferred naming > conventions, so we can tell the ietf-defined record formats from proprietary > record formats? It would also be nice if there was a short description > field, it is easy to see what a record contains without having to go find > and read the document to determine that. (I don't know whether is > meant to be a preserved prefix for use by the cdni WG or not. I cannot tell > from the name whether this is v1 of the record format for http_requests, or > whether this is the record for exclusively for http_v1 requests, or ...; a > comment column would be helpful, as would naming conventions. > > 37) The intention to share the registry across record-types is discussed > after the registry (and on the next page in my printout). It might be nice > to specify the intention before the registry. > > 38) I LOVE reuse. I am thrilled that you intend to make these field names > reusable. I am often amused when developers reinvent the wheel, and then > advertise how standards-supportive they are by announcing their intention > that others should reuse their work. > > I don't know the [ELF] format; does that format define any fieldnames that > could have been reused here, rather than defining them again? I am aware > that the SIPCLF WG worked on developing a log format for SIP, and they > explicitly liked the well-known formats used by servers like Apache and > Squid. Did they define any fieldnames that could have been reused here? > Syslog and ipfix define fieldnames in their own formats (SDE and IE > respectively). Could any of these fieldname semantics be defined so as to > reuse the semantics already defined elsewhere? While they aren't defined > as > fieldnames, if the semantics are the same, then applications would be able > to correlate the information more easily if it is standardized that > cdni:date follows the same semantics as ipfix:date, and cdni:s_hostname > matches syslog:hostname semantics, and so on. > > Many of these fieldname values seem to come from URLs; it might be helpful > to have a reference clause for each item that shows where the extracted > value is defined (e.g., in the HTTP protocol spec, or subsequent specs). > > 39) in 6.1, there are some MUST-implement statements; section 4.2 seems > more > lenient. These should be updated for consistency. > > 40) "Alternae methods my be used ..." I think it is important to be > unambiguous here about implementation versus usage. We want to be sure > that > a small set of required methods be MUST-implement, and they may be > supplemented by additional implementation features. Users MAY use > whatever > method they choose of those that are implemented. > > 41) "Both parties SHOULD support mutual authentication." Hmmm. How do > you do > MUTUAL authentication if only one side supports it? Should this be MUST > implement? Are there circumstances where MUST is not applicable. > Justifying > the SHOULD? > > 42) Should the uCDN be able to specify that the dCDN MUST NOT retain > sensitive information about its clients? (I think this might be geoPriv > territory). > > > Hope this helps, > David Harrington > ietfdbh at comcast.net > +1-603-828-1401