AI Regulatory Intelligence — by YRproject

factual analysis · traceable to primary sources

Explainer

AI and copyright: may you use protected material as training data?

Adopted 2026-06-19 · ≈ 3 min read · Dirk Baaijen

Commercial AI training in the EU relies on the text-and-data-mining exception (Art. 4 DSM Directive): it applies unless the rightholder made a machine-readable reservation. The AI Act obliges GPAI providers to respect it. Purely machine-generated output usually carries no copyright.

Short answer: To train AI models on copyright-protected material, commercial AI in the EU relies on the text-and-data-mining exception (Art. 4 of the DSM Directive 2019/790). That exception applies automatically, but only if the rightholder has not reserved the use — and that reservation must be machine-readable. The AI Act builds on this: providers of general-purpose AI (GPAI) models must respect that reservation and publish a sufficiently detailed summary of the training data used. AI output generally carries no copyright, unless a human creative choice underlies it.

Two different questions

"AI and copyright" mixes two questions that are legally distinct:

  1. The input: may you use protected work (text, images, code) to train a model?
  2. The output: who owns the copyright in what an AI system produces?

In the EU, both are regulated with surprising precision — but not in a single law.

The input: the text-and-data-mining exception

Copying protected work to extract patterns from it ("text and data mining", TDM) is in principle a copyright-relevant act. The DSM Directive (EU) 2019/790 provides two exceptions:

  • Article 3 — TDM for scientific research by research organisations and heritage institutions. No reservation can be made against it; this exception is mandatory.
  • Article 4 — a general TDM exception that also applies for commercial purposes, including the training of commercial AI models. But it applies only insofar as the rightholder has not expressly reserved the use ("opt-out"). For material available online, that reservation must be expressed in a machine-readable way.

The practical core: anyone training a model commercially relies on Article 4 — so the rightholder's opt-out counts. A growing number of publishers and creators now make such a machine-readable reservation (for example through metadata or the robots/TDM protocol on their site).

What the AI Act adds

The AI Act (Regulation (EU) 2024/1689) makes copyright part of the model obligations. Under Article 53, providers of GPAI models must:

  • put in place a policy to comply with EU copyright law, and in particular identify and respect the reservation under Article 4(3) of the DSM Directive;
  • publish a sufficiently detailed summary of the content used for training, following a template made available by the AI Office.

The reach matters: this duty also applies to models trained outside the EU, once they are placed on the EU market (recital 106). EU copyright thus follows the model to the market, not the place of training. The GPAI Code of Practice contains a dedicated copyright chapter through which providers can demonstrate compliance.

The output: who is the author?

European copyright protects work that is the maker's "own intellectual creation" — a standard developed by the Court of Justice (in Infopaq and Painer, among others). That standard carries a human condition: free, creative choices by a human must underlie the work.

It follows that purely machine-generated output, without human creative input, in principle carries no copyright. If, however, a human gives the final result their own creative stamp through selection, editing or targeted instructions, that result may be protected. Exactly where the line falls is case-specific and still very much developing.

What this means in practice

  • Using an external AI model? Set out in the contract with the provider how training data was handled and who is liable in case of an infringement claim (indemnification).
  • Are you a rightholder? If you do not want your work used for AI training, make a machine-readable reservation — a reservation made orally or buried in terms and conditions is not enough.
  • Working with AI output? Do not assume you hold copyright in it. If you want exclusivity, arrange it contractually and ensure demonstrable human creative input.

Copyright and training data are no longer a side issue: it is one of the sharpest intersections between the AI Act, the DSM Directive and classic copyright — and an area where the first enforcement and case law have yet to land.

Sources

  1. https://eur-lex.europa.eu/eli/dir/2019/790/oj
    Directive (EU) 2019/790 (DSM Copyright Directive), Art. 3 and 4: the text-and-data-mining exceptions.
  2. https://eur-lex.europa.eu/eli/reg/2024/1689/oj
    Regulation (EU) 2024/1689 (AI Act), Art. 53 and recitals 104-107: copyright policy and training-data summary for GPAI.
  3. https://digital-strategy.ec.europa.eu/en/policies/ai-code-practice
    European Commission — GPAI Code of Practice, with a dedicated copyright chapter.

Share on LinkedIn

Read next

U

The GPAI Code of Practice: what is in it and who is it for?

The GPAI Code of Practice is a voluntary instrument (Art. 56 AI Act) that lets providers of GPAI models demonstrate compliance with their duties under Arts 53 and 55. Three chapters: transparency, copyright, safety and security.

U

AI, energy consumption and sustainability reporting

AI consumes a lot of energy. The AI Act requires providers of general-purpose AI models to document energy consumption, while the CSRD forces large companies to report on the environmental impact of their activities — including AI.

U

The AI Office: role, tasks and enforcement powers

The AI Office within the European Commission coordinates implementation of the AI Act and is the exclusive supervisor of GPAI models. It draws up codes of practice, conducts investigations and can have fines imposed on model providers.

Dirk Baaijen

About this knowledge base

Compiled and maintained by YRproject — programme and project direction at the intersection of digital transformation, AI and regulation. Every factual claim is traceable to its primary source. YRproject is led by Dirk Baaijen About & method →

A project or programme? Work with YRproject →

The monthly briefing

AI regulation in five minutes: what changed, what is coming and what it means. No spam, unsubscribe anytime.

Your address is used for this only and stored on our own servers.