Network Working Group M. Petit-Huguenin Internet-Draft Impedance Mismatch LLC Intended status: Experimental 16 September 2021 Expires: 20 March 2022 The RFC Prolog Database draft-petithuguenin-rfc-prolog-00 Abstract This document explores some techniques that can be used to mine various sources of data from the IETF for the purpose of analyzing how tools and formal description techniques are used at the IETF, how they contribute in fulfilling the IETF mission, and if an effort to popularize a more systematic use of tools and formal description techniques could improve the ability of the IETF to fulfill its mission. The foundation for these techniques is a publicly available and actively maintained dataset, expressed as a Prolog database, named "RFC-Prolog". Status of This Memo This Internet-Draft is submitted in full conformance with the provisions of BCP 78 and BCP 79. Internet-Drafts are working documents of the Internet Engineering Task Force (IETF). Note that other groups may also distribute working documents as Internet-Drafts. The list of current Internet- Drafts is at https://datatracker.ietf.org/drafts/current/. Internet-Drafts are draft documents valid for a maximum of six months and may be updated, replaced, or obsoleted by other documents at any time. It is inappropriate to use Internet-Drafts as reference material or to cite them other than as "work in progress." This Internet-Draft will expire on 20 March 2022. Copyright Notice Copyright (c) 2021 IETF Trust and the persons identified as the document authors. All rights reserved. This document is subject to BCP 78 and the IETF Trust's Legal Provisions Relating to IETF Documents (https://trustee.ietf.org/ license-info) in effect on the date of publication of this document. Please review these documents carefully, as they describe your rights and restrictions with respect to this document. Code Components Petit-Huguenin Expires 20 March 2022 [Page 1] Internet-Draft RFC Prolog September 2021 extracted from this document must include Simplified BSD License text as described in Section 4.e of the Trust Legal Provisions and are provided without warranty as described in the Simplified BSD License. Table of Contents 1. Introduction . . . . . . . . . . . . . . . . . . . . . . . . 2 2. Dataset Design . . . . . . . . . . . . . . . . . . . . . . . 4 2.1. Immutable Metadata . . . . . . . . . . . . . . . . . . . 5 2.1.1. The "rfc" Table . . . . . . . . . . . . . . . . . . . 5 2.1.2. The "updates" Table . . . . . . . . . . . . . . . . . 6 2.1.3. The "obsoletes" Table . . . . . . . . . . . . . . . . 6 2.1.4. The "keyword" Table . . . . . . . . . . . . . . . . . 7 2.1.5. The "abstract" Table . . . . . . . . . . . . . . . . 7 2.1.6. The "reference" Table . . . . . . . . . . . . . . . . 7 2.2. Mutable Metadata . . . . . . . . . . . . . . . . . . . . 8 2.2.1. The "metadatum" Table . . . . . . . . . . . . . . . . 8 2.2.2. The "bcp" Table . . . . . . . . . . . . . . . . . . . 8 2.2.3. The "std" Table . . . . . . . . . . . . . . . . . . . 8 2.2.4. The "fyi" Table . . . . . . . . . . . . . . . . . . . 9 2.2.5. The "erratum" Table . . . . . . . . . . . . . . . . . 9 2.2.6. The "errata" Directory . . . . . . . . . . . . . . . 9 2.3. The Manual Tables . . . . . . . . . . . . . . . . . . . . 9 2.3.1. The "technique" Table . . . . . . . . . . . . . . . . 9 2.3.2. The "use" Table . . . . . . . . . . . . . . . . . . . 10 2.3.3. The "prevent" table . . . . . . . . . . . . . . . . . 10 3. Dataset Life Cycle . . . . . . . . . . . . . . . . . . . . . 10 3.1. Update Schedule . . . . . . . . . . . . . . . . . . . . . 11 3.2. Manual Update . . . . . . . . . . . . . . . . . . . . . . 11 4. References . . . . . . . . . . . . . . . . . . . . . . . . . 11 Author's Address . . . . . . . . . . . . . . . . . . . . . . . . 12 1. Introduction The main reason for building the dataset described in this document is to collect data on what we call "techniques", and that are not available from direct sources. "technique" is a vague term that can be explained with the following analogy. If an (immutable) RFC was a dam (an immovable structure that separates a flow of water in two parts), then we could talk about "upstream" as the group of activities that led to the publication of an RFC. Similarly we could talk about "downstream" as the group of activities that start from the publication of an RFC. The word "techniques" then covers activities, both upstream or downstream, that are or can be mechanized. Petit-Huguenin Expires 20 March 2022 [Page 2] Internet-Draft RFC Prolog September 2021 We call the set of upstream techniques "tools". This is the set of all tools that, at some point before the publication of an RFC, contributed to some part of that RFC. Some examples of tools are: * idnits. * ABNF checkers. * NS-2, NS-3. * Programs written to extract examples from packet capture files. * Security property provers. * etc... We call the set of downstream techniques "fdts" for Formal Description Techniques. This is the set of formalisms in an RFC that would permit to mechanize the activities related to an RFC. Some examples of fdts are: * Bit diagrams, from which code, tests and validators can be derived. * ABNF, from which code, tests, and validators can be derived. * Equations, that can be used to predict or validate various parameters. * Examples, that can be used to partially validate a protocol implementation. * State machine descriptions, from which code, test and validators can be derived. * etc... It is expected that the analysis of the techniques used by each RFC, together with the analysis of the techniques that could have been used to prevent errata will bring some insight on the value of these techniques, and if there is a need to focus on improving the use of techniques at the IETF. More specifically these analyses are meant to guide the development of the Computerate Specifying paradigm [I-D.petithuguenin-computerate-specifying]. Computerate Specifying can be seen as a way to bridge upstream and downstream techniques, by not only bringing together tools and executing them as part of the generation of an Internet-Draft, but also by generating the fdts that are part of that Internet-Draft. "Computerate Specifying" literally means adding computer processing to the act of writing a specification, an RFC in our case. Petit-Huguenin Expires 20 March 2022 [Page 3] Internet-Draft RFC Prolog September 2021 An example of that is ABNF [RFC5234]. Traditionally ABNF is a downstream technique where a specific ABNF is assembled by hand and verified upstream with a tool like those listed at https://tools.ietf.org/. Because updating the ABNF when the normative text change, verifying it, and inserting it in the document use separate tools, it is easy to see how skipping one of these steps can lead to an incorrect result. An alternative is to use Computerate Specifying, which permits to describe an ABNF in a domain-specific language that is embedded in the source of an Internet-Draft, making its verification part of the processing of that source. The same processing also generates the text of the ABNF that is inserted in the generated Internet-Draft. Because the origin of the ABNF is in the same source than the text it formalizes, discrepancies are less likely to happen. The analysis of the use of techniques in RFCs, or of the lack of use that resulted in errata, will guide which techniques should be supported by Computerate Specifying. In addition to the tables containing the metadata for the RFCs published by the RFC Editor, 3 additional tables need to be populated to be able to analyze techniques: * A list of all techniques and their reference. * A table that associates tools and fdts to each existing RFC. * A table that lists, for each erratum, which tools and fdts could have been used to prevent that erratum. Some parts of these tables can be extracted from the other tables, but a large part will have to be manually entered. Although it is trivial to add a table that contains the authors of each RFC to the dataset, such table is not part of the dataset to discourage analyses of correlations between individuals and the various possible improvements that this dataset is meant to help discover. 2. Dataset Design The RFC Prolog [RFC-Prolog] dataset is composed of a set of Prolog tables and files that are populated from various IETF sources and complemented by hand-filled tables. Prolog was chosen because it permits to express both a database and the queries that can be run on it with the same language. [Clocksin03] is the classic introduction book on Prolog, [O_Keefe90] and [Sterling94] completes the trilogy of books indispensables to the advanced Prolog programmer. Petit-Huguenin Expires 20 March 2022 [Page 4] Internet-Draft RFC Prolog September 2021 The dataset is designed to be used with XSB 4.0 [XSB] because of its ability to handle large in-memory databases, but should be usable with other Prolog implementations. The dataset is composed of tables and a directory that can be grouped in 3 categories. 2.1. Immutable Metadata The "rfc", "updates", "obsoletes", "keyword", "abstract", and "reference" tables contain the immutable metadata extracted from published RFCs. The 5 first tables are grouped in the "rfcs.P" file, and the last one in the "references.P" file. 2.1.1. The "rfc" Table The "rfc" table is a compound term composed of the following 9 arguments: 1. The RFC number, without any prefix, as a Prolog number. 2. The title of the RFC, as a Prolog atom. 3. The stream that published that RFC, as a Prolog atom. The current list of streams can be found with this program: [ordsets], [rfcs], findall(S, rfc(_, _, S, _, _, _, _, _, _), L), list_to_ordset(L, Streams). 4. The status at the time of publication, as a Prolog atom. The current list of statuses can be found with this program: [ordsets], [rfcs], findall(S, rfc(_, _, _, S, _, _, _, _, _), L), list_to_ordset(L, Statuses). 5. The canonical format for the RFC, as a Prolog atom. The current list of canonical formats can be found with this program: Petit-Huguenin Expires 20 March 2022 [Page 5] Internet-Draft RFC Prolog September 2021 [ordsets], [rfcs], findall(F, rfc(_, _, _, _, F, _, _, _, _), L), list_to_ordset(L, Formats). 6. The date of publication for the RFC, as a compound term with functor `date' and the year (a Prolog number) and month (a Prolog number) as arguments. 7. The name of the IETF Working Group that produced the RFC, as a Prolog atom. 8. The name of the IETF Area that produced the RFC, as a Prolog atom. 9. The name of the last Internet-Draft that immediately preceded the publication of the RFC, as an atom. | NOTE: April Fool's RFC also contain the day of publication. | The database will be updated to reflect that. 2.1.2. The "updates" Table The "updates" table is a compound term composed of the following 2 arguments: 1. The RFC number of the RFC that updated another RFC, as a Prolog number. 2. The RFC number of the RFC that was updated, as a Prolog number. The following program builds the list of all updating RFCs chains: :- [rfcs]. update_chain(Rfc, List) :- updates(Rfc, Prev), update_chain(Prev, L), List = [Rfc|L]. update_chain(Rfc, [Rfc]). 2.1.3. The "obsoletes" Table The "obsoletes" table is a compound term composed of the following 2 arguments: 1. The RFC number of the RFC that obsoleted another RFC, as a Prolog number. Petit-Huguenin Expires 20 March 2022 [Page 6] Internet-Draft RFC Prolog September 2021 2. The RFC number of the RFC that was obsoleted, as a Prolog number. 2.1.4. The "keyword" Table The "keyword" table is a compound term composed of the following 2 arguments: 1. The RFC number of an RFC, as a Prolog number. 2. The keyword, as a Prolog atom. 2.1.5. The "abstract" Table The "abstract" table is a compound term composed of the following 2 arguments: 1. The RFC number of an RFC, as a Prolog number. 2. The abstract, as a Prolog atom. 2.1.6. The "reference" Table The "reference" table is a compound term composed of the following 3 arguments: 1. The RFC number, without any prefix, as a Prolog number. 2. The status of the reference, as a Prolog atom. The current list of statuses can be found with this program: [ordsets], [references], findall(S, reference(_, S, _), L), list_to_ordset(L, Statuses). 3. The referenced document, as a compound term with one argument. The functor determines the type of resources, the argument is the identifier for that resource. The current list of all reference types can be found with this program: Petit-Huguenin Expires 20 March 2022 [Page 7] Internet-Draft RFC Prolog September 2021 :- [ordsets]. :- [references]. types(T) :- reference(_, _, I), functor(I, T, _). findall(T, types(T), L), list_to_ordset(L, Types). 2.2. Mutable Metadata The "metadatum", "bcp", "std", "fyi", and "erratum" tables contains the mutable metadata extracted from files provided by the RFC Editor. The 4 first tables are grouped in the "metadata.P" file, and the last one in the "errata.P" file. 2.2.1. The "metadatum" Table The "metadatum" table is a compound term composed of the following 2 arguments: 1. The RFC number, without any prefix, as a Prolog number. 2. The current status for the RFC, as a Prolog atom. | NOTE: One missing information in that table is the current | email address that should be used to discuss the RFC. 2.2.2. The "bcp" Table The "bcp" table is a compound term composed of the following 2 arguments: 1. The RFC number, without any prefix, as a Prolog number. 2. The current BCP number for that RFC, as a Prolog number. 2.2.3. The "std" Table The "std" table is a compound term composed of the following 2 arguments: 1. The RFC number, without any prefix, as a Prolog number. 2. The current STD number for that RFC, as a Prolog number. Petit-Huguenin Expires 20 March 2022 [Page 8] Internet-Draft RFC Prolog September 2021 2.2.4. The "fyi" Table The "fyi" table is a compound term composed of the following 2 arguments: 1. The RFC number, without any prefix, as a Prolog number. 2. The current FYI number for that RFC, as a Prolog number. 2.2.5. The "erratum" Table The "erratum" table is a compound term composed of the following 7 arguments: 1. The Erratum number, without any prefix, as a Prolog number. 2. The list of formats this erratum applies to, as a list of Prolog atoms. 3. The RFC number this erratum applies to, without any prefix, as a Prolog number. 4. The name of the reporter for this erratum, as a Prolog atom. 5. The date the erratum was reported, as a compound term made of the "date" functor and the year, month and day, all 3 as Prolog numbers. 6. The type of the Erratum. 7. The current status of the erratum. If the status was modified, then the status is a compound term, with the name of the verifier as first argument and, if available, the date of the modification as second argument. 2.2.6. The "errata" Directory The text of each erratum is stored as an individual html file in the errata directory. The name of the file is "errata/ erratum.html", with replaced by the erratum identifier. 2.3. The Manual Tables The "technique", "usage", and "prevention" tables contains either manually entered facts, or the result of queries on the other tables. Each table is stored in its own file, respectively "techniques.P", "usages.P", and "preventions.P". | NOTE: The format of these tables will probably change. 2.3.1. The "technique" Table The "technique" table is a compound term composed of the following 3 arguments: Petit-Huguenin Expires 20 March 2022 [Page 9] Internet-Draft RFC Prolog September 2021 1. The name of the technique, which is a one argument compound term which functor is either "tool" or "fdt" and whose argument is the name of the technique, as a Prolog atom. 2. An indication if the reference is solely about the technique or is defined in a document that is mostly about something else, respectively as a "standalone" atom or an "adhoc" atom. 3. The reference of the technique, using the same format that is used in the "reference" table. 2.3.2. The "use" Table The "use" table is a compound term composed of the following 3 arguments: 1. The RFC number, without any prefix, as a Prolog number. 2. An indication if the technique is used by reference or redefined, respectively as a "reference" atom or as a "redefine" atom. 3. The name of the technique, which is a one argument compound term which functor is either "tool" or "fdt" and whose argument is the name of the technique, as a Prolog atom. 2.3.3. The "prevent" table 1. The name of the technique, which is a one argument compound term which functor is either "tool" or "fdt" and whose argument is the name of the technique, as a Prolog atom. 2. The Erratum number, without any prefix, as a Prolog number. 3. Dataset Life Cycle The dataset is distributed as a git repository that can be cloned with the following command: git clone git://shalmaneser.org/rfc-prolog This git repository is mirrored in various locations over the world. The "dig +dnssec txt shalmaneser.org" command returns the GPS coordinates in decimal degrees and shalmaneser.org subdomain for each of these locations. This can be used to find the closest location and substitute the subdomain in the git URL above. | NOTE: The git repository does not currently contain the manual | tables. These will be added at the same time than the | conclusions for that work will be submitted for public review. | The rfc-prolog dataset is distributed without these tables in | case other parties want to use it for their own analysis. Petit-Huguenin Expires 20 March 2022 [Page 10] Internet-Draft RFC Prolog September 2021 3.1. Update Schedule New RFCs, new erratum and modifications to mutable metadata require to keep that dataset up-to-date. New tables or code processing refinements should also be distributed in a timely manner. The git repository is updated with a new commit that covers these changes each Monday before 5:00pm PT until November 29 2021. After this date the dataset will be updated each Saturday before 5:00pm PT. The date of the next update is also inserted in the comment of the latest commit. 3.2. Manual Update The code that is used to build the dataset is distributed together with the dataset, so the dataset can continue to be updated in case the current maintainer is unable to do so. The process to update the dataset is described in the README.adoc file distributed in the git repository. 4. References [Clocksin03] Clocksin, W. F. and C. S. Mellish, "Programming in Prolog", Berlin ; New York:Springer-Verlag, 2003. [I-D.petithuguenin-computerate-specifying] Petit-Huguenin, M., "The Computerate Specifying Paradigm", Work in Progress, Internet-Draft, draft-petithuguenin- computerate-specifying, 6 September 2021, . [O_Keefe90] Keefe, R. A. O., "The Craft of Prolog", Cambridge:MIT Press, 1990. [RFC-Prolog] Petit-Huguenin, M., "The RFC Prolog Dataset", . [RFC5234] Crocker, D., Ed. and P. Overell, "Augmented BNF for Syntax Specifications: ABNF", RFC 5234, DOI 10.17487/RFC5234, January 2008, . Petit-Huguenin Expires 20 March 2022 [Page 11] Internet-Draft RFC Prolog September 2021 [Sterling94] Sterling, L. and E. Y. Shapiro, "The Art of Prolog", Cambridge, Mass:MIT Press, 1994. [XSB] "XSB", . Author's Address Marc Petit-Huguenin Impedance Mismatch LLC Email: marc@petit-huguenin.org URI: hallway@jabber.ietf.org/MPH Petit-Huguenin Expires 20 March 2022 [Page 12]