Call for Papers: IAB Workshop on AI-CONTROL

10 Jul 2024, 1:05 a.m.

This workshop aims to explore practical opt-out mechanisms for AI, and build an understanding of use cases, requirements, and other considerations in this space.

Webpage: https://datatracker.ietf.org/group/aicontrolws/about/

Large Language Models and other machine learning techniques require voluminous input data, and one common source of such data is the Internet -- usually, "crawling" Web sites for publicly available content, much in the same way that search engines crawl the Web.

This similarity has led to an emerging practice of allowing the Robots Exclusion Protocol (RFC 9309) to control the behavior of AI-oriented crawlers.

This emerging practice raises many design and operational questions. It is not yet clear whether robots.txt (the mechanism specified by RFC 9309) is well-suited to controlling AI crawlers. A content creator or host may not be able to distinguish a crawler used for search indexing from a crawler used for LLM ingest – and indeed some crawlers may be used for both purposes. Potential use cases may extend across many different units of content, policies to be signaled, and types of content creators. Before robots.txt becomes a de facto solution to AI crawling opt-out, it is necessary to examine whether it is an appropriate mechanism: in particular, whether the creator of a particular unit of content can realistically and fully exercise their right to opt-out, and the scope of data ingest to which that opt-out applies.

This workshop aims to explore practical opt-out mechanisms for AI, and build an understanding of use cases, requirements, and other considerations in this space. The workshop will focus on mechanisms to communicate the opt-out choice and their associated data models. Technical enforcement of opt-out signals is not in scope.

For full details, please see the workshop's Datatracker page.