In the European Union, the processing of personal data along with data protection in registers is mainly regulated by the EU General Data Protection Regulation (GDPR) [i]. Many other legal acts, such as EU regulations and national laws, also contain provisions related to the processing of personal data and the protection of privacy.
For example, the Finnish Basic Education Act does not directly regulate what personal data must be explicitly processed in education, but its obligations lay the groundwork for municipalities’ pupil registers and, more broadly, what personal data can be processed in basic education in general [ii]. A similar situation is typical at different levels of education.
The body responsible for organising education must plan and decide what personal data will be processed and how, within the framework of the laws governing the activity. Individual teachers must comply with the policies and guidelines of the education provider. Where the purpose of the processing of personal data is the provision of education, the data may not be used for other purposes such as marketing, product development of the software provider, or training in general AI models. In e-learning environments, AI applications, and other digital platforms, special care must be taken to respect data protection rules to ensure that the processing of personal data complies with the law.
The body responsible for organising education must plan and decide what personal data will be processed and how, within the framework of the laws governing the activities.
Data protection begins with planning
In education, the controller of personal data is the body responsible for organising the education; for example, the municipality or the private educational institution. Data protection should be considered right from the planning stage.
One of the first issues to be addressed is the legal basis for processing personal data. The most natural and preferred permitted by the GDPR in education is legal obligation, although this is not without its problems [iii]. In the case of commercial or otherwise contractual education, the processing of personal data is based on the contract in question.
An essential part of planning the processing of personal data involves a prior risk assessment and the safeguards decided on that basis. Safeguards can be both technical and organisational. They are primarily aimed at preventing the use of data for unlawful purposes and ensuring the rights of data subjects under GDPR. These include, on the one hand, ensuring that personal data is not leaked to third parties and is not used for purposes other than those related to the provision of education. And on the other hand, ensuring that individuals can exercise their rights (if they so wish) to receive transparent information about the processing of their personal data and to be able review and correct their data.
On digital platforms, security measures include user IDs, login, network encryption, secure data storage and backup, access levels, timely data deletion, anti-malware, security updates and so on. As applications are used by people, technical safeguards alone are not enough, but organisational measures such as guidelines, staff training and manual controls are also needed.
Web services and applications are often complex, both in terms of their actual functionality and their security features. The technical complexity is further increased in AI applications. AI features are also being added to applications where they have not been encountered before.
For example, generative AI functions have been added to conventional image processing software. From a privacy risk assessment perspective, this poses challenges, as the list of functions and features to be assessed can prove to be extensive, even for familiar applications. [iv]
Data protection risk assessment of an application
The GDPR requires the controller to carry out a risk assessment of the processing of personal data and implement safeguards to address the risks. The risk assessment will consider both the severity of the potential impact on individuals and the likelihood of it occurring. If the risk is considered high, the controller shall carry out a data protection impact assessment in accordance with the GDPR. An impact assessment is a procedure provided for by the GDPR that involves a systematic review of the processing of personal data and an identification of threats in order to identify effective safeguards. [v] [vi]
Careless use of apps in education can mean a failure to respect data protection and can lead to personal data being compromised. Therefore, AI applications need to be assessed for data protection in the same way as other digital platforms before they are used in education.
Both the education provider as controller and individual teachers can be held liable for potential breaches if they have not followed the employer’s instructions on the applications to be used and on data protection considerations. Before implementing AI or other application, teachers should check their organisation’s data protection guidelines and policies.
Usually, the provider of the learning platform or application is in the role of processor of personal data, where its liability is limited by contract. A processor in the general data protection regulation means an entity that processes data on behalf of a controller, for example, operating an application because of a service contract. [vii]
The risk assessment can be assisted by the following chart, which lists the risk factors related to the nature, scope, context and purposes of the processing. [viii]
Image: Risk assessment (Office of the Data Protection Ombudsman [ix]
When it comes to digital platforms and AI applications for teaching, risks are often increased by factors such as:
- Specific personal data (e.g. health data)
- Confidential data (e.g. pedagogical documents and verbal assessments of a person’s characteristics)
- Vulnerability of the data subject (minors, people with special needs, learners in relation to the education provider)
- Location data (to allow systematic monitoring)
- Large numbers (e.g. children of compulsory school age in a given municipality)
- Long retention period (e.g. nine years of compulsory education)
- Confidential data (e.g. data relating to family members and other personal data)
If the above criteria are met, the situation may be one of high risk for personal data. In addition, an impact assessment must be carried out in several explicitly mentioned cases. These may include, for example, the use of new technologies, the use of location data, the processing of sensitive or very personal data, profiling and automated decision-making which may have a significant impact on the data subject.
In education, it is often the case that the high-risk criteria may be met in some individual cases. For example, specific personal data and confidential information of learners may be stored in the application, even if not intended by the teacher, if learners voluntarily store it in their answers to learning tasks. At the same time, if the application is running an AI-assisted automated review of tasks, this may involve the processing of personal data with profiling and automated decision-making. The educator must therefore consider the processing of personal data in different situations where the AI features of the application may be combined with the data stored by learners and teachers on the platform. On the other hand, teachers need to be careful not to inadvertently introduce AI features into the application without first assessing them.
The learning platform may store learners’ specific personal data and confidential information, even if this is not the intention of the education provider.
In cloud services, which most AI applications are, the server stores many types of personal data, such as user IDs, files and other content, as well as activity and other log data generated using the service. The risk assessment of cloud computing should pay particular attention to the transparency of the processing of personal data, data minimisation and limitation of retention periods, and data confidentiality. [x]
To find out how personal data is processed, it is necessary to review the service provider’s contracts, security descriptions and other documents. Many software companies have developed their applications primarily for non-EU markets, which may not take into account GDPR requirements. Even if an application advertises GDPR compliance, this does not guarantee that the features meet the level of data protection required for education.
Cloud computing also raises the issue of geographical storage of data and transfers to countries outside the EU and EEA. For example, Microsoft, Google and Adobe web applications may transfer personal data to countries outside the EU where data protection does not meet the level required by the GDPR. A service contract for a single application can involve up to dozens of sub-processors in dozens of different countries. For some applications, data storage may be limited to the EU, for example by an additional service or by purchasing a more expensive licence. If this is not possible, other safeguards need to be explored.
Observations on data protection in AI applications
Over the past year, I have been involved in several impact assessments of AI applications and other digital platforms for education. Below, I will discuss my findings. In conclusion, I always recommend careful planning and risk assessment before implementing AI applications and functionalities.
Finding 1: Generative AI applications may lack data protection
Generative AI applications, which have become rapidly popular, are typically free of charge and their functionalities are targeted at consumers. In the case of educational use, it is worth assessing whether the application is intended for private users or for organisations.
For chatbots based on large language models, the minimum privacy requirement for teaching can be that the input (prompts) written for the application are not used to train that language model. Any personal data provided in the input should not be used for the development of the AI application. If such aspects are not reflected in the documentation of the application, this does not reflect well on the data protection skills of the service provider.
There may also be weaknesses in the security features of applications. For example, if an app allows you to share an AI-generated text or image via a sharing link, such a feature can easily lead to personal data being leaked to an outsider, either accidentally or intentionally, for example for bullying purposes. An educator should be able to monitor the use of the service – including the sharing functionality – to address abuses and remove inappropriate content. It is desirable that overly extensive sharing functions can be switched off.
Shortcomings may even relate to basic data protection issues, such as the possibility to obtain transparent information on the processing of personal data or to verify one’s own stored data. At a minimum, viewing or downloading of user data in a computerised format should be available to users at the application administrator level, so that the education provider can fulfil its obligations as a data controller.
Finding 2: Generative AI may have too broad access to stored data
Cloud services for education providers may allow a generative AI such as a chatbot to have access to all files and other data stored in the organisation’s cloud service. In such a case, there is a significant risk that personal data related to education will be used by the generative AI for purposes unrelated to education.
In general, care must be taken over what data is allowed to be accessed by the generative AI or entered as part of a prompt. A good rule of thumb is that the AI should only be given access to individual pieces of data that have been verified not to contain personal data. If AI is to be used to process learners’ personal data, the risk assessment described above must be carried out to ensure compliance with data protection regulation.
Finding 3: Generative AI can be offered to student users without a personal data processing agreement
The fact that the same service providers offer both cloud services for educational institutions and generative AI applications open to consumers easily leads to confusion about what is included in the online services offered to educational institutions. A user ID managed by an educational institution may allow access to a generative AI application that is not covered by the contract, in addition to the applications covered by the educational institution contract. Cloud services typically offer a range of licensing options, some of which include and some of which do not include a generative AI application. In this case, it depends on the licence whether the controller of the processing of personal data in the AI application is the education provider or the service provider.
The education provider should check the settings of cloud services when deploying them so that learners’ IDs cannot be used to access AI and other web applications whose processing of personal data is not covered by the contract.
Finding 4: The personal data processing agreement is not always comprehensive
Whatever the application, the education provider must check whether a personal data processing agreement can bedrawn up with the service provider or whether it is included in the service contract. If this is not the case, no personal data of learners should be stored in the application without explicitly requested consent for such disclosure and use of personal data. Teachers should check their institution’s policy on how to request consent and whether teachers can request consent for the use of applications.
The provider’s standard contract may not specify all the personal data to be processed and the purposes of processing. In such cases, it is possible that the data stored in the service may be used, for example, for product development, AI training and other purposes specific to the service provider. If the application contains functionalities that are not covered by the contract, the application provider may be considered as the data controller in this respect.
This would mean that students’ data would end up outside the school and for purposes unrelated to education.
The education provider must always check that the contract in force complies with the general data protection regulation and that personal data is processed only for the purposes of the controller. A cautionary example of this is seen in the case of Google Workspaces for Education, whose suitability for educational purposes was questioned by the data protection ombudsman in 2021. The decision is not yet final[xi]. The Danish data protection authorities, for example, have also pointed out problems with Google cloud services in municipal primary education[xii].
Guidelines on data protection for the use of AI in education
The use of AI in education must weigh up both the obligations of the data controller responsible for education and the data protection knowledge of teachers, to be able to act correctly in practical situations. The latter requires that teachers’ in-service training covers data protection and the use of AI.
While data protection may seem like a complex issue, addressing it in education starts with simple basics. They stem from the data protection principles at the heart of the general data protection regulation. These include the purpose limitation and minimisation principles, which require that personal data should only be used for the purposes for which it was originally collected and that the data processed should be kept to a minimum. It is worth bearing in mind the principle of transparency, which can be summarised as the principle that everyone should be able to know what data about them is being processed and how. With these three principles in mind, it is also safe to start using AI applications.
In the following image, I present a set of data protection guidelines for the use of AI in education, which are intended to help teachers consider data protection in practical teaching situations:
Read the entire AI Guide for Teachers here.
Sources
[ii] Perusopetuslaki 628/1998, https://www.finlex.fi/fi/laki/ajantasa/1998/19980628
[iii] Silvennoinnen, E., Tedre, M. & Valtonen, T. (2024). Datafikoituva peruskoulu – tasapainoilua lapsen henkilötietojen suojan ja opetuksen tavoitteiden välillä, Lakimies, 122(5), 655–678., https://journal.fi/lakimies/article/view/143755
[iv] ibid.
[v] Tietosuojatyöryhmä. (2017). Ohjeet tietosuojaa koskevasta vaikutustenarvioinnista ja keinoista selvittää ”liittyykö käsittelyyn todennäköisesti” asetuksessa (EU) 2016/679 tarkoitettu “korkea riski”, https://tietosuoja.fi/documents/6927448/8316711/Vaikutustenarviointi+fi.pdf
[vi] Tietosuojavaltuutetun toimisto. (2021). Tietosuojan vaikutustenarvioinnin ohje, https://tietosuoja.fi/documents/6927448/66036250/TVA+ohje.pdf/ff0b6e1b-5b89-e85e-a2e5-6c4bd4c0ccfc/TVA+ohje.pdf?t=1639729535787
[vii] Regulation (EU) 2016/679 of the European Parliament and of the Council of 27 April 2016 on the protection of natural persons with regard to the processing of personal data https://eur-lex.europa.eu/eli/reg/2016/679/oj
[viii] Office of the Data Protection Ombudsman. (n.d.). Assess the risks and plan measures to implement data protection, referenced 2.1.2025, https://tietosuoja.fi/arvioi-riskit
[ix] Office of the Data Protection Ombudsman https://tietosuoja.fi/en/risk-assessment-and-data-protection-planning
[x] DigiFinland. (26.6.2024). Cirrus-hanke: Tapausesimerkit, https://digifinland.fi/wp-content/uploads/2024/06/Cirrus-Tapausesimerkit-2024-v1.0.pdf
[xi] Tietosuojavaltuuttu. (30.12.2021). Henkilötietojen käsittelyn lainmukaisuus ja siirto kolmansiin maihin koulun opetusohjelman käytössä, https://www.finlex.fi/fi/viranomaiset/tsv/2021/20211503
[xii] Tietosuojavaltuutetun toimisto. (2.2.2024). Tanskan tietosuojaviranomainen antoi päätöksen Googlen ohjelmistojen käytöstä peruskouluissa, https://tietosuoja.fi/-/tanskan-tietosuojaviranomainen-antoi-paatoksen-googlen-ohjelmistojen-kaytosta-peruskouluissa
Read the entire AI Guide for Teachers here.