Guide

Data quality and governance (Article 10): training, validation and test data

Adopted 2026-06-22 · ≈ 2 min read · edited by Dirk Baaijen

Article 10 of the AI Act sets requirements for the data used to train, validate and test high-risk AI. Datasets must be relevant, representative, as error-free as possible and complete, with attention to bias. Data governance makes these choices traceable.

Short answer: Article 10 of the AI Act states that high-risk systems trained with data must use training, validation and test data that meet quality criteria. The data must be relevant, sufficiently representative, as error-free as possible and complete for the intended purpose. Providers must also apply a form of data governance that makes the choices around that data traceable.

Why data is central

The quality of an AI system is largely determined by the data it was trained on. Biased or incomplete data produces biased or unreliable outcomes — precisely in high-risk applications with consequences for people. Article 10 therefore anchors data quality as a standalone obligation, not a side effect of model design.

The quality criteria

Article 10 requires that datasets, as appropriate for the intended purpose:

Are relevant and representative of the people and situations the system is applied to.
Are as error-free as possible and complete, with appropriate statistical properties.
Take into account the specific geographical, contextual, behavioural or functional setting in which the system is used.

Complete error-freeness is not an absolute requirement — the standard is "as error-free as possible" given the purpose. That forces a reasoned trade-off, not perfection.

Data governance and bias

Beyond the data itself, Article 10 calls for governance practices: documentation of design choices, the provenance and collection of data, assumptions, and an examination of possible bias. Where necessary, providers may process special categories of personal data, solely to detect and correct bias, under strict safeguards. These choices must be traceable — they connect to the broader AI governance framework.

Link to documentation and risk

The data choices under Article 10 belong in the technical documentation (Annex IV) and are input for the risk management system: bias in data is a risk you must identify and mitigate. Together they form part of the high-risk obligations overview.

What to do

Record the provenance of every dataset: source, collection method and assumptions.
Test representativeness against the target group and the use setting.
Examine bias systematically and document the findings.
Correct where needed and justify the choices, including any processing of special data.
Keep data governance current when the data or purpose changes.

A model is no more reliable than the data beneath it; Article 10 makes that data testable.

Sources

https://eur-lex.europa.eu/eli/reg/2024/1689/oj
Regulation (EU) 2024/1689 (AI Act), Article 10: data quality and data governance requirements for training, validation and test data.
https://artificialintelligenceact.eu/article/10/
Article 10 AI Act: criteria for relevant, representative and as-error-free-as-possible datasets, plus attention to bias.

Share on LinkedIn

AI and discrimination in recruitment: how to prevent bias?

AI recruitment tools can discriminate unintentionally. For high-risk systems the AI Act requires representative, bias-examined data (Art. 10) and human oversight; equal-treatment law and the GDPR also apply. Mitigating bias is an obligation, not a good intention.

Pay transparency and AI pay analysis: opportunities and pitfalls

The EU Pay Transparency Directive (2023/970) must be transposed by 7 June 2026. AI can support equal-pay analysis but can also introduce bias into pay decisions. With GDPR points of attention and practical steps for employers.

AI for strategic workforce planning: usually not high-risk, as long as it does not become individual

AI for strategic workforce planning and skills forecasting at organisation level is usually not high-risk under the AI Act. But once it steers individual decisions, it can tip over. Data quality, governance and transparency remain crucial.