Tech
Public Projects Content

Several French media block OpenAI’s GPTBot over data collection concerns

Following steps by many English-language media, a series of French media groups including Radio France and France24 have decided to block a feature by OpenAI’s GPTBot from collecting their content online.

This article is part of our special report AI4TRUST – AI-based-technologies for trustworthy solutions against disinformation

Access the full report
Content-Type:

News Based on facts, either observed and verified directly by the reporter, or reported and verified from knowledgeable sources.

GPTBot is the Microsoft-backed company's web crawler, which scrapes publicly accessible data online to feed into efforts to improve ChatGPT's accuracy - which may include copyrighted material. The chatbot uses a deep-learning language model for language processing and text generation. [Shutterstock/Robert Way]

Julia Tar Euractiv's Public Projects Aug 29, 2023 17:10 4 min. read
News

Based on facts, either observed and verified directly by the reporter, or reported and verified from knowledgeable sources.

Following steps by many English-language media, a series of French media groups including Radio France and France24 have decided to block a feature by OpenAI’s GPTBot from collecting their content online.

Artificial intelligence (AI) research and deployment company OpenAI is best known as the creator of ChatGPT, the generative AI tool that made a splash following its launch in November 2022, gathering over 100 million users in its first two months of public release.

GPTBot is the Microsoft-backed company's web crawler, which scrapes publicly accessible data online to feed into efforts to improve ChatGPT's accuracy - which may include copyrighted material. The chatbot uses a deep-learning language model for language processing and text generation.

A blog post by OpenAI says that “allowing GPTBot to access your site can help AI models become more accurate and improve their general capabilities and safety". On 8 August, the company announced that the tool will automatically collect data from the entire internet, to train its GPT-4 and GPT-5 models.

However, according to the same blog post, it will filter out paywall-restricted sources, any source that violates OpenAI’s policies, or those that gather personally identifiable information. The latter refers to any type of information that can be linked to an individual and can reveal their identity.

French data protection authority lays out action plan on AI, ChatGPT

The French data protection watchdog, the National Commission on Informatics and Liberty (CNIL), published an action plan on Tuesday (16 May) addressing privacy concerns related to Artificial Intelligence, particularly generative applications like ChatGPT.

ChatGPT, the world's most famous chatbot, expanded its …

France says no

Radio France and TF1 have now blocked the tool from gathering data from their websites. However, they are not the first to do so: according to the French newspaper Les Échos, all the France Médias Monde websites, such as France24.com, RFI.fr, or mc-doualiya.com, also blocked GPTBot.

Vincent Fleury, Director of Digital Environments at France Médias Monde, told EURACTIV that they made the decision because "as a public service, we invest money and people in creating content. We don't want our data to train the model for free. We don't want OpenAI to allow other businesses to create value with our content [...] without getting something in return."

He also said that they do not want their content to be associated with incorrect responses that may be given by the chatbot. Fleury added that this is a preventative measure and that they would like to reach an agreement in the future.

Les Échos also reported that Le Monde contacted OpenAl and Google (because of its rival AI chatbot, Bard) to start negotiations. According to the same article, the Vice President of the Alliance de la Presse d'Information Générale also expressed that he was in favour of a 'new deal' with AI companies.

Moreover, Les Échos mentioned that newspaper Le Figaro said they are also looking forward to an agreement with platforms - however, if one cannot be reached, they are also planning to block access.

European data protection authorities launch task force on ChatGPT

The European privacy regulators decided on Thursday (13 April) to launch a dedicated task force to address the privacy concerns related to the world’s most famous chatbot.

Previously, The New York Times, CNN, Reuters, Chicago Tribune, ABC (the Australian Broadcasting Corporation), and other Australian Community Media brands such as the Canberra Times and the Newcastle Herald, have all disallowed the tool.

A Reuters spokesperson said that since “intellectual property is the lifeblood of our business, it is imperative that we protect the copyright of our content".

OpenAI first clashed with regulators in March, when the Italian data regulator Garante temporarily shut the chatbot down domestically, accusing the company of flouting European privacy rules. ChatGPT returned to Italy after OpenAI instituted new privacy measures for users.

Following this decision, the European Data Protection Board, which gathers all EU data regulators, established a task force to ensure consistent enforcement in April.

In May, the French data protection watchdog, the National Commission on Informatics and Liberty, also published an action plan addressing privacy concerns related to Artificial Intelligence, particularly generative applications like ChatGPT.

Italian data protection authority bans ChatGPT citing privacy violations

The Italian privacy watchdog mandated a ban on the popular chatbot ChatGPT and launched an investigation on its provider OpenAI for suspected breaches of EU data protection rules.

[Edited by Nathalie Weatherald]

Subscribe