Human-Friendly Names/Identifiers BOF Cliff Lynch, BOF chair Minutes constructed by Larry Masinter from slides and notes taken by Jacob Palme and Ari Bixhorn. Mailing list: send 'subscribe' to 'hfi-request@cs.utk.edu'. ============================= Larry Masinter spoke about defining the problem space: What problems can we solve? The URL syntax is unfriendly for non-experts. Internationalization: Exact match doesn't work for non-English; for example, accented characters may or may not retain the accent when presented in upper case, in French; bidirectional markers for Hebrew, Arabic may be optional, etc. See draft-masinter-url-i18n-03.txt for more details. Who are the players? (1) Services with collections of names: Titles of books, names of medical conditions, trade names. (2) Software for name entry: Browsers, anyone with "open URL", address books. When you type in "ibm" in a browser window, it would first try "ibm.com", then www.ibm.com, then www.ibm.edu, then www.ibm.org , etc; More advanced features in new browsers. (3) Search engines: Searching content has simple case of 'name matching' What areas should we stay away from? (1) Defining uniqueness: Impossible, user type in so short names that they cannot be unique. The same name will appear in multiple contexts. (2) Creating authorities: Avoid political winds of the gTLD debate. (3) Registration mechanisms: May vary between name spaces. An operational model of how 'friendly names' could work: - user types something in native language - some kind of matching with choices (cannot be standardized yet today, open to innovation) - choices are unique but still 'friendly' - choice is mapped to a URN - the URN is mapped to a URL - the resource is accessed Of course, there can be various shortcuts and optimizations, but this is the model of how the mappings could happen and be defined. Summary: We can solve some real problems. Systems exist: we can standardize the interfaces between them. Let's stick to what's common, leave open the parts that are not; avoid politics. Michael Mealling presented his two drafts on requirements and architecture. Requirements (draft-mealling-human-friendly-identifier-req-00.txt). The requirements document summarizes requirements from Users (what are they willing to accept, people like you, me or your grandmother), Marketing (advertising interest), and Trademark (like users, but have more lawyers). Justification: HTTP URLs and domain names are not viable identifiers for unsophisticated users, and the unsophisticated users far outnumber the sophisticated users. Marketing, trademark and user communities attempt to force URLs and domain name systems into what they want, but what may not work so well on the net. Ratholes: Do not develop a generic directory service, do not become a trademark enforcement activity. General requirements: The names should be as short as possible, they should be fully internationalized, the names will be non-unique, a single name can match a multiple of separate resources. Matching time should be 1-2 seconds maximum. Matching semantics must allow substrings. User should be able to specify geographic limitations. Openness: There should be an open way of inserting your own names, allow for different quality of name registration; user should be able to understand how much vetting is done. Quality of name registration should be allowed to vary. User should be able to determine how much trust the returned search result has (if AltaVista says that this site is McDonalds, you'd trust it more than if Joe Schmoe says it's McDonalds). On the architecture document: (draft-mealling-human-friendly-identifier-architecture-00.txt) This was a strawman, and not to be taken as a proposal for the actual service. A lot was taken from DNS. Components: The 'root' is a flat namespace, so some place for the names to reside is needed. 'Registrars' can write qualified entries into the root. You can input unvetted names, but qualified names are listed first in result listings. Content servers: Data outside the root. They are kept at the local level (McDonalds ahs their own local server.) Local servers: Just like in DNS, a user can use a local server for locally scoped names. Client: Can maintain context information. Clients: net surfers. Michael gave an example of a "go" URI scheme: go:Nike go:IETF go:Martin%20J.%20D%C3%BCrts (not what user sees, but what is sent on the wire) Keith Moore said he wants to see actual requirements and a suggested architecture that could be used to address the issues of HFNs, if this was a strawman. It was pointed out that the presentations were meant to bring people up to date on the current drafts. Nico Popp gave a presentation about "The real name system: an example human friendly naming scheme" (draft-popp-realname-hfn-00.txt). The draft was intended to present an example of a human-friendly-name system, based on Centraal's RealName system. A real name is a company name (example: "Walt Disney"), or a brand name (example: "M&M"), or an advertising slogan (example: "just do it"). These are names, which marketing has spent a lot of effort on getting people to know. Can we use URLs? Can we use URNs? Not quite. Some companies are highly persistent, but not all companies are persistent. Persistence might be promised for a certain time, for example one year. Services are emerging, for example "smart browsing" from Netscape. The size of the database is currently 2 million records. Ted Wolf gave a presentation about "Domain Names and Company Name Retrieval" (RFC 2345) and the experience with it. The work was done when he was at Dunn and Bradstreet. They have a 200,000 record database that has been prototyped for about a year now (using WhoIs for access.) A plug-in exists for Internet Explorer. Ted described several issues. One problem is that people want to use abbreviations, and not the full name, like for example "IBM" instead of "International Business Machines". Advice: Do not design complex protocols which people cannot use. Avoid showing the URLs to the users. Keep the complexity on the back end. Standardized APIs that are SIMPLE for the user to use. Ted mentioned that they wound up with several different WHOIS servers listening on different ports for different combinations of services, e.g., one for lookup within a particular geographic region and one for a global lookup. Discussion "Why did you not use my protocol?" Answer: We tried LDAP, it was too slow. How can different services with separate real name data bases cooperate? Is there anything which needs standardization here? Or should we leave this to the market? One speaker: People claim that the DNS cannot be used because it does not allow international characters. That is wrong. The limitation on character set only applies to domain names. If you use the DNS for other things than domain names, you can use any 8-bit string. You can even implement fuzzy matching, even though typical DNS servers today do not support fuzzy matching. Who will pay for this? Different people/organizations (like Centraal) have different databases that they have online. Should there be a standardized protocol that deals with user input into these databases? Should there be a standard way to steer users to these services? Should there be a standard way that the HFN services should return the information back to the user? Should there be a standard way to insert information into these types of databases? It was pointed out that there was a distinction between what Michael presented and what Nico and Ted presented: where Michael said that user would be able to enter their own names, etc, the others talked about tightly focused on the commercial industry. There would be a lot more work to be done in the area that Michael talked about. There was a question about the benefit of HFN systems versus regular search engines. One answer is that the search engines return WAY too much (this can be seen by typing in the name of a company into a search engine and see how low on the list of returned results the company actually is). The search engines are simply internet site text parsers and do not qualify the names. Keith Moore mentioned that one or two models made use of a "root" server. This can not happen in the HFN model because you will have a large number of Centraal-type services that may specialize (several for company names, several CDs and books, several for food, etc.) Ian King said that HFNs are merely a different way of indirection that "better interprets" what the user types in. Ted Wolf responded that search engines work off of a different set of data. The biggest serach engines on the web only contain about 29% of global server population. Meanwhile, 70% of the servers in the world (some of them are Fortune 500 and 1000 companies) are not found in search engines. HFNs address that. Will there be a new IETF working group? What should the charter be? Charter: Larry Masinter put up a slide with several possible elements of work for a working group: - Discovery of DBS Standardize the interface by which user tools and search engines might discover friendly name databases? (In discussion, this was marked as '2nd'). - User Interface <-> Name Database Standardize the interface to be used between a user's tool (browser's 'open location', etc.) and a friendly name database. Take lessons from existing deployed systems. - Search <-> Name Database Standardize the interface to be used between a search engine and a friendly name database. - Specify Query model: what other elements? (besides partial match string) how are they represented? Results model: what data is returned? How is it represented? Does it allow for referal to other sources? Scope of application: Beyond web pages, is this useful for address books or user white pages? - Are there canonical ('returnable') friendly identifiers? That is, even if what users type might be ambiguous and not unique, are there globally unique identifiers that are friendlier, if not 'friendly'? - Submission: Should the working group take on how names and registration information is transmitted to a friendly name database, or what other metadata is included? (In discussion, this was marked as '3rd'). Interest: the BOF was on the last hours of the IETF meeting, so many people had already left IETF; there were only about 30 people present, but about 10 of these wanted to do work on standards in this area.