News publishers demand accountability from Common Crawl over unauthorized use of content

The News/Media Alliance (NMA) has issued a formal demand to Common Crawl, calling for an end to the unauthorized scraping and storage of news content used to train commercial AI models. Representing leading publishers, the NMA argues that the archive site has moved beyond its academic roots to become a primary source for AI developers, bypassing copyright protections and licensing opportunities. This move highlights the growing tension between content creators and data aggregators as the publishing industry seeks to protect its intellectual property in the age of generative AI.
The News/Media Alliance (NMA), representing leading news publishers, has formally demanded that Common Crawl stop the unauthorized scraping and storage of their content. NMA President and CEO Danielle Coffey stated that the archive site is "blatantly taking our content without our permission" and failing to honor existing opt-out requests. While Common Crawl has historically positioned itself as a resource for researchers and academics, the NMA argues it has evolved into a primary data source for large AI companies to train commercial models without compensating or seeking authorization from the original content creators.
The NMA’s letter outlines specific requirements for Common Crawl, including the establishment of protocols to prevent publisher content from being used by AI developers and the enforcement of an Opt-Out Registry. The alliance is requesting that Common Crawl clarify to its users that scraped content is not authorized for commercial use unless express permission is granted. This move follows similar demands from international organizations, such as the Danish Rights Alliance and the Alliance de la Presse d’Information Générale, who have previously requested the removal of their articles to prevent unauthorized AI exploitation.
Beyond the legal and ethical challenges posed by AI scraping, the publishing sector is grappling with internal economic pressures related to digital sustainability. Industry veteran Zack Watson recently highlighted that while many local media companies have invested heavily in digital services, hidden fulfillment costs and operational inefficiencies are quietly eroding profit margins. Watson noted that reporting expenses, programmatic markups, and fulfillment overhead often leave publishers with significantly less profit than their top-line revenue numbers suggest, making the protection of intellectual property and licensing revenue even more vital for long-term viability.
Summary generated by RabbitReport AI from public reporting. The full article and original reporting belong to Editor and Publisher.