IAB Teleconference (Techchat)
1. Roll-call, agenda-bash, administrivia, approval of minutes
Bernard Aboba (incoming IAB)
Ross Callon (incoming IAB)
Spencer Dawkins (incoming IAB)
Russ Housley (IETF Chair)
Olaf Kolkman (IAB Chair)
Andre Robachevsky (incoming IAB)
Lynn St.Amour (ISOC Liaison)
Dow Street (IAB Executive Director)
p Hannes Tschofenig (incoming IAB)
Ron Bonica (IESG Liaison to the IAB)
Aaron Falk (IRTF Chair)
Sandy Ginoza (RFC Editor Liaison)
p – participated in part of the call
No agenda items were added.
There was a short discussion of hotel options for the IAB retreat. Dow also asked about the verbatim transcript from IETF 77; as is, the transcript would require significant editing for clarity prior to posting to the proceedings. The question will be taken up offline.
Dave Thaler reminded Patrik about the existing encoding draft, and Patrik agreed to review it soon for historical accuracy.
1.4. Meeting Minutes
No meeting minutes were reviewed during this call.
2. Techchat on IDNA2008, Unicode, and UTR 46
The purpose of this session was to review the technical issues at hand related to IDNA2008, Unicode, and actions within UTR 46, and determine any next steps for the IAB.
Patrik began by explaining that Unicode is a character set, or a set of codepoints, that can be encoded in various ways. Today the IETF is mostly using the UTF-8 encoding for new protocols. He then noted that when Unicode was initially defined, it was largely based on the input of printer vendors, and was oriented toward the way things look on paper. It was never an explicit design goal to support database search or distributed match – that is, to be able to tell if two things are equal or not. Russ noted that this lack of a canonical encoding is problematic in security protocols, making it difficult to match an ACL to an identity in a cert (for example). John added that the meaning of a character, such as an ‘o’ with two dots above it, can be language dependent. In addition, in Unicode there are two ways to represent many glyphs: a single character, or a character plus a combining character. For example, that ‘o’ with two dots above it can be represented either as a single codepoint (U+00F6) or as a lowercase ‘o’ (U+006F) followed by a combining diaeresis (U+0308). This makes comparison impossible without a canonical form that can be used for comparisons, of which there are currently four (incompatible) versions of one type (normalized forms) and three more of another (case forms). To address this problem, the IETF has argued that comparing any way other than byte-by-byte is not really possible.
The discussion then turned to the differences between IDNA2003 and IDNA2008. Patrik described that there are 3 basic categories of Unicode code points:
(i) those that can be compared without conversion
(ii) those with a mapping algorithm to map to (i). In some cases the mapping is many to one and not reversible.
(iii) those that don’t map clearly and which the IETF wants to avoid.
In IDNA2003, groups i and ii were allowed, and iii was discouraged. In IDNA2008, only group i was included in the core spec, with a possible mapping operation (ii -> i) specified in a separate informational document. John then went into the details of the remaining issues and challenges. Those included characters with multiple widths, new characters with mappings to existing ones, unassigned code points that acquire properties once they are assigned, case sensitivity in mappings, incompatibilities between IDNA2003 and IDNA2008 and behaviors during the transition period, etc.
The Unicode consortium is trying to standardize a mapping table, but any table of this sort is likely to be context dependent and change over time, making definitive matches impossible without the use of network-wide flag day(s). The IETF is attempting to define a mapping model in an algorithmic way. Various IETF protocols are affected by the approach taken here. For example, if a non-ASCII character is used in a DNS name, it must be clear when matching on this name which value(s) should be returned. Further, if the resolution is not common across protocols (e.g., DNS and security certificates) security vulnerabilities are introduced. However, there may be good reasons for the set of allowable characters to differ across applications.
It was suggested that it would be extremely difficult to get consensus on a set of mappings that would work for all protocols, leading to “no mappings from one normalized Unicode character to another” as the only algorithm that really works. The group then discussed the complexity of cross-application referrals and possible trade-offs in the type of identifiers visible in the user interface. Much of this part of the problem involves the relationship between more expressive local forms (for which there is no global agreement, precisely because they are local forms) and the need for global consistency in various IETF protocols and for application developers.
It was agreed that in addition to the WG and independent submission stream documents already in progress, it would be useful for the IAB to help document the various issues and trade-offs. A possible liaison statement to the Unicode Consortium was also discussed that would help explain the philosophical difference and rationale. The board will finalize next steps on the IAB mailing list.