On 17 July 2016, the IAB responded to the ICANN call for public comments on Reference Label Generation Rulesets (LGRs) for the Second Level:
The IAB appreciates the opportunity to comment on “Reference Label Generation Rulesets (LGRs) for the Second Level” and the associated “Evaluation of Deviation from the Reference Second Level LGRs”.
The IAB understands the motivation behind this effort, and supports the idea that registries should normally use a more-restricted set of Unicode code points than all of those available under IDNA2008. RFC 5894 (“Internationalized Domain Names for Applications (IDNA): Background, Explanation, and Rationale”), Section 3.2 is quite clear that registries (at all levels of the DNS, not just the top or second levels) are expected to restrict registrations to labels containing those code points that are “well-understood by the registry or its advisers”. In RFC 6912 (“Principles for Unicode Code Point Inclusion in Labels in the DNS”), the IAB reiterated that advice, and noted the desirability of more-restrictive rules the closer one gets to the root.
The IAB is, as a result, concerned about the suggestion that zone operators should not select a smaller subset of code points than the ICANN review has produced for a given “language” (more on the term “language” below). In particular, the call for comments and “Evaluation of Deviation from the Second Level LGR References” are both quite clear that a registry using a smaller repertoire of code points than the relevant ICANN reference LGR(s) is considered “deviant”. That approach is quite plainly at odds with RFC 5894, Section 3.2. If the registry or its advisors does not understand how the character would form part of the repertoire appropriate to registrations in the zone, it should not allow the code point.
In keeping with the IAB advice in RFC 6912, registries near the root should be as restricted in the code points they permit as is practical. The proposed LGRs are, however, problematic for general use by names near the root. RFC 6912 says
Public zones are, by definition, zones that are shared by different
groups of people. Therefore, any decision to permit a code point in
a public zone (including the root) should be as conservative as
practicable. Doubts should always be resolved in favor of rejecting
a code point for inclusion rather than in favor of including it, in
order to minimize risk.
Yet the “LGR for Belarusian” (<https://www.icann.org/sites/default/files/packages/lgr/lgr-second-level-belarusian-15may16-en.html>) explicitly permits U+02BC, despite it being the very example the IAB chose as problematic under the principle of conservatism.
This raises the more fundamental problem with these “reference” LGRs: they’re characterized by language. Yet DNS labels are not in a language, and there is no requirement that all words (or even any word) in a given language can be “written” in the DNS. It seems entirely reasonable that a zone operator might decide not to include any code point in its repertoire, regardless of ICANN’s evaluation.
The IAB understands the reason why these evaluations proceed by language: writing systems are often used to write multiple languages, and a character in a script that is appropriate for one language may be inappropriate or even dangerous in the context of another. Yet there does not appear to be the kind of guidance in these LGRs that would be necessary in order to put them together successfully. In general, only a registry operator will know which code points are the correct ones to include given the user population; it seems a mistake for ICANN to encourage registries to permit more code points than they think wise.
We encourage ICANN in continuing to provide evaluations of code points, but we discourage these evaluations being taken as mechanisms which impose inclusivity conditions on subordinate zones. That is both outside ICANN’s remit and inappropriate to the deployment of zones which may rightly prefer more conservative rules.
for the IAB