Explainer

AI and copyright: may you use protected material as training data?

Adopted 2026-06-19 · ≈ 3 min read · edited by Dirk Baaijen

Commercial AI training in the EU relies on the text-and-data-mining exception (Art. 4 DSM Directive): it applies unless the rightholder made a machine-readable reservation. The AI Act obliges GPAI providers to respect it. Purely machine-generated output usually carries no copyright.

Short answer: To train AI models on copyright-protected material, commercial AI in the EU relies on the text-and-data-mining exception (Art. 4 of the DSM Directive 2019/790). That exception applies automatically, but only if the rightholder has not reserved the use — and that reservation must be machine-readable. The AI Act builds on this: providers of general-purpose AI (GPAI) models must respect that reservation and publish a sufficiently detailed summary of the training data used. AI output generally carries no copyright, unless a human creative choice underlies it.

Two different questions

"AI and copyright" mixes two questions that are legally distinct:

The input: may you use protected work (text, images, code) to train a model?
The output: who owns the copyright in what an AI system produces?

In the EU, both are regulated with surprising precision — but not in a single law.

The input: the text-and-data-mining exception

Copying protected work to extract patterns from it ("text and data mining", TDM) is in principle a copyright-relevant act. The DSM Directive (EU) 2019/790 provides two exceptions:

Article 3 — TDM for scientific research by research organisations and heritage institutions. No reservation can be made against it; this exception is mandatory.
Article 4 — a general TDM exception that also applies for commercial purposes, including the training of commercial AI models. But it applies only insofar as the rightholder has not expressly reserved the use ("opt-out"). For material available online, that reservation must be expressed in a machine-readable way.

The practical core: anyone training a model commercially relies on Article 4 — so the rightholder's opt-out counts. A growing number of publishers and creators now make such a machine-readable reservation (for example through metadata or the robots/TDM protocol on their site).

What the AI Act adds

The AI Act (Regulation (EU) 2024/1689) makes copyright part of the model obligations. Under Article 53, providers of GPAI models must:

put in place a policy to comply with EU copyright law, and in particular identify and respect the reservation under Article 4(3) of the DSM Directive;
publish a sufficiently detailed summary of the content used for training, following a template made available by the AI Office.

The reach matters: this duty also applies to models trained outside the EU, once they are placed on the EU market (recital 106). EU copyright thus follows the model to the market, not the place of training. The GPAI Code of Practice contains a dedicated copyright chapter through which providers can demonstrate compliance.

The output: who is the author?

European copyright protects work that is the maker's "own intellectual creation" — a standard developed by the Court of Justice (in Infopaq and Painer, among others). That standard carries a human condition: free, creative choices by a human must underlie the work.

It follows that purely machine-generated output, without human creative input, in principle carries no copyright. If, however, a human gives the final result their own creative stamp through selection, editing or targeted instructions, that result may be protected. Exactly where the line falls is case-specific and still very much developing.

What this means in practice

Using an external AI model? Set out in the contract with the provider how training data was handled and who is liable in case of an infringement claim (indemnification).
Are you a rightholder? If you do not want your work used for AI training, make a machine-readable reservation — a reservation made orally or buried in terms and conditions is not enough.
Working with AI output? Do not assume you hold copyright in it. If you want exclusivity, arrange it contractually and ensure demonstrable human creative input.

Copyright and training data are no longer a side issue: it is one of the sharpest intersections between the AI Act, the DSM Directive and classic copyright — and an area where the first enforcement and case law have yet to land.

Sources

https://eur-lex.europa.eu/eli/dir/2019/790/oj
Directive (EU) 2019/790 (DSM Copyright Directive), Art. 3 and 4: the text-and-data-mining exceptions.
https://eur-lex.europa.eu/eli/reg/2024/1689/oj
Regulation (EU) 2024/1689 (AI Act), Art. 53 and recitals 104-107: copyright policy and training-data summary for GPAI.
https://digital-strategy.ec.europa.eu/en/policies/ai-code-practice
European Commission — GPAI Code of Practice, with a dedicated copyright chapter.

Share on LinkedIn

AI and copyright: may you use protected material as training data?

Two different questions

The input: the text-and-data-mining exception

What the AI Act adds

The output: who is the author?

What this means in practice

Sources

Read next

The GPAI Code of Practice: what is in it and who is it for?

AI, energy consumption and sustainability reporting

The AI Office: role, tasks and enforcement powers

The monthly briefing