I am the assigned Gen-ART reviewer for this draft. For background on Gen-ART, please see the FAQ at < http://wiki.tools.ietf.org/area/gen/trac/wiki/GenArtfaq>. Please resolve these comments along with any other Last Call comments you may receive. Document: draft-bormann-cbor-04 Reviewer: Martin Thomson Review Date: 2013-07-29 IETF LC End Date: ? IESG Telechat date: 2013-08-15 Summary: This document is not ready for publication as proposed standard. I'm glad that I held this review until Paul's appsarea presentation. This made it very clear to me that the types of concerns I have are considered basically irrelevant by the authors because they aren't interested in changing the design goals. I don't find the specific design goals to be interesting and am of the opinion that the goals are significant as a matter of general application. I hope that is clear from my review. Independent of any conclusions regarding design goals, there are issues that need addressing. (This is an atypical Gen-ART review. I make no apologies for that. I didn't intend to write a review like this when I started, but feel that it's important to commit these thoughts to the record. It's also somewhat long, sorry. I tried to edit it down.) I have reviewed the mailing list feedback, and it's not clear to me that there is consensus to publish this. It might be that the dissent that I have observed is not significant in Barry's learned judgment, or that this is merely dissent on design goals and therefore irrelevant. The fact that this work isn't a product of a working group still concerns me. I'm actually interested in why this is AD-sponsored rather than a working group product. Major issues: My major concerns with this document might be viewed as disagreements with particular design choices. And, I consider it likely that the authors will conclude that the document is still worth publishing as is, or perhaps with some minor changes. In the end, I have no issue with that, but expect that the end result will be that the resulting RFC is ignored. What would cause this to be tragic, is if publication of this were used to prevent other work in this area from subsequently being published. (For those drawing less-than-charitable inferences from this, I have no desire to throw my hat into this particular ring, except perhaps in jest [1].) This design is far too complex and large. Regardless of how well-considered it might be, or how well this meets the stated design goals, I can't see anything but failure in this document's future. JSON succeeds largely because it doesn't attempt to address so many needs at once, but I could even make a case for why JSON contains too many features. In comparison with JSON, this document does one major thing wrong: it has more options than JSON in several dimensions. There are more types, there are several more dimensions for extensibility than JSON: types extensions, values extensions (values of 28-30 in the lower bits of the type byte), plus the ability to apply arbitrary tags to any value. I believe all of these to be major problems that will cause them to be ignored, poorly implemented, and therefore useless. In part, this complexity produces implementations that are far more complex than they might need to be, unless additional standardization is undertaken. That idea is something I'm uncomfortable with. Design issue: extensibility: This document avoids discussion of issues regarding schema-less document formats that I believe to be fundamental. These issues are critical when considering the creation of a new interchange format. By choosing this specific design it makes a number of trade-offs that in my opinion are ill-chosen. This may be in part because the document is unclear about how applications intend to use the documents it describes. You may conclude after reading this review that this is simply because the document does not explain the rationale for selecting the approach it takes. I hope that isn't the conclusion you reach, but appreciate the reasons why you might do so. I believe the fundamental problem to be one that arises from a misunderstanding about what it means to have no schema. Aside from formats that require detailed contextual knowledge to interpret, there are several steps toward the impossible, platonic ideal of a perfectly self-describing format. It's impossible because ultimately the entity that consumes the data is required at some level to understand the semantics that are being conveyed. In practice, no generic format can effectively self-describe to the level of semantics. This draft describes a format that is more capable at self-description than JSON. I believe that to not just be unnecessary, but counterproductive. At best, it might provide implementations with a way to avoid an occasional extra line of code for type conversion. Extensibility as it relates to types: The use of extensive typing in CBOR implies an assumption of a major role for generic processing. XML schema and XQuery demonstrate that this desire is not new, but they also demonstrate the folly of pursuing those goals. JSON relies on a single mechanism for extensibility. JSON maps that contain unknown or unsupported keys are (usually) ignored. This allows new values to be added to documents without destroying the ability of an old processor to extract the values that it supports. The limited type information JSON carries leaks out, but it's unclear what value this has to a generic processor. All of the generic uses I've seen merely carry that type information, no specific use is made of the knowledge it provides. ASN.1 extensibility, as encoded in PER, leads to no type information leaking. Unsupported extensions are skipped based on a length field. (As an aside, PER is omitted from the analysis in the appendix, which I note from the mailing lists is due to its dependency on schema. Interestingly, I believe it to be possible - though not trivial - to create an ASN.1 description with all the properties described in CBOR that would have roughly equivalent, if not fully equivalent, properties to CBOR when serialized.) By defining an extensibility scheme for types, CBOR effectively acknowledges that a generic processor doesn't need type information (just delineation information), but it then creates an extensive type system. That seems wasteful. Design issue: types: The addition of the ability to carry uninterpreted binary data is a valuable and important feature. If that was all this document did, then that might have been enough. But instead it adds numerous different types. I can understand why multiple integer encoding sizes are desirable, and maybe even floating point representations, but this describes bignums in both base 2 and 10, embedded CBOR documents in three forms, URIs, base64 encoded strings, regexes, MIME bodies, date and times in two different forms, and potentially more. I also challenge the assertion made where the code required for parsing a data type produces larger code sizes if performed outside of a common shared library. That's arguably provably true, but last time I checked a few extra procedure calls (or equivalent) weren't the issue for code size. Sheer number of options on the other hand might be. Half-precision floating point numbers are a good example of excessive exuberance. They are not available in many languages for good reason: they aren't good for much. They actually tend to cause errors in software in the same way that threading libraries do: it's not that it's hard to use them, it's that it's harder than people think. And requiring that implementations parse these creates unnecessary complexity. I do not believe that for the very small subset of cases where half precision is actually useful, the cost of transmitting the extra 2 bytes of a single-precision number is not going to be a burden. However, the cost of carrying the code required to decode them is not as trivial as this makes out. The fact that this requires an appendix would seem to indicate that this is special enough that inclusion should have been very carefully considered. To be honest, if it were my choice, I would have excluded single-precision floating point numbers as well, they too create more trouble than they are worth. Design issue: optionality CBOR embraces the idea that support for types is optional. Given the extensive nature of the type system, it's almost certain that implementations will choose to avoid implementation of some subset of the types. The document makes no statements about what types are mandatory for implementations, so I'm not sure how it is possible to provide interoperable implementations. If published in its current form, I predict that only a small subset of types will be implemented and become interoperable. Design issue: tagging The tagging feature has a wonderful property: the ability to create emergency complexity. Given that a tag itself can be arbitrarily complex, I'm almost certain that this is a feature you do not want. Minor issues: Design issue: negative numbers Obviously, the authors will be well-prepared for arguments that describe as silly the separation of integer types into distinct positive and negative types. But it's true, this is a strange choice, and a very strange design. The fact that this format is capable of describing 64-bit negative numbers creates a problem for implementations that I'm surprised hasn't been raised already. In most languages I use, there is no native type that is capable of carrying the most negative value that can be expressed in this format. -2^64 is twice as large as a 64-bit twos-complement integer can store. It almost looks as though CBOR is defining a 65-bit, 33-bit or 17-bit twos complement integer format, with the most significant bit isolated from the others, except that the negative expression doesn't even have the good sense to be properly sortable. Given that and the fact that bignums are also defined, I find this choice to be baffling. Document issue: Canonicalization Please remove Section 3.6. c14n is hard, and this format actually makes it impossible to standardize a c14n scheme, that says a lot about it. In comparison, JSON is almost trivial to canonicalize. If the intent of this section is to describe some of the possible gotchas, such as those described in the last paragraph, then that would be good. Changing the focus to "Canonicalization Considerations" might help. I believe that there are several issues that this section would still need to consider. For instance, the use of the types that contain additional JSON encoding hints carry additional semantics that might not be significant to the application protocol. Extension based on minor values 28-30 (the "additional information" space): ...is impossible as defined. Section 5.1 seems to imply otherwise. I'm not sure how that would ever happen without breaking existing parsers. Section 5.2 actually makes this worse by making a wishy-washy commitment to size for 28 and 29, but no commitment at all for 30. Nits: Section 3.7 uses the terms "well-formed" and "valid" in a sense that I believe to be consistent with their use in XML and XML Schema. I found the definition of "valid" to be a little difficult to parse; specifically, it's not clear whether invalid is the logical inverse of valid. Appendix B/Table 4 has a TBD on it. Can this be checked? Table 4 keeps getting forward references, but it's hidden in an appendix. I found that frustrating as a reader because the forward references imply that there is something important there. And that implication was completely right, this needs promotion. I know why it's hidden, but that reason just supports my earlier theses. Section 5.1 says "An IANA registry is appropriate here.". Why not reference Section 7.1? [1] https://github.com/martinthomson/aweson