The use of AI services raises ethical and intellectual property concerns. In order for teachers and students to work confidently, critically and safely with new and emerging technologies, they should have basic knowledge and a common understanding of how GenAI can be used in education, without forgetting copyright issues.
Generative AI services can be used to create text, images, videos and music. However, there are many copyright uncertainties behind the creative outputs of these services. Rapid technological developments result in legislation lagging behind
Training large language models requires huge amounts of text, code, images and data, much of which may be protected by copyright. If such data is collected without the permission of the rightholder, there is a risk that someone’s copyright may be infringed.
The content of the material used to train the popular language models is currently opaque. The EU’s AI Act, which comes into force in 2026, will require future general-purpose AI systems to comply with EU copyright law and provide summaries of the content used to train the data. The near future will show how this will be implemented. As we do not yet know what information has been used to train AI models, it is difficult to assess potential copyright infringements.
The content of the material used to train the popular language models is, at least for the time being, obscure.
Many lawsuits
AI services can produce output that imitates or closely resembles a copyrighted work. Several artists and publishers have filed lawsuits accusing AI platforms of illegal copying and prohibiting the use of their works for data collection and education. The most high-profile of these lawsuits is the New York Times case against OpenAI. In its defence, OpenAI has issued an interesting response:
”Because copyright today covers virtually every sort of human expression – including blog posts, photographs, forum posts, scraps of software code, and government documents – it would be impossible to train today’s leading AI models without using copyrighted materials.” [i]
The decisions in these cases will shape and update copyright law and practice for years to come. It is still relatively unclear how we can prevent AI services from infringing copyright, and what sanctions should be applied and by whom.
The main issue for users of AI services is who owns the outputs of AI. Can traditional copyright apply to them, or are new solutions needed? Currently, copyright does not protect AI-generated works with little or no human input.
Preventing data collection
In particular, large publishers and media houses have begun to block the collection of copyrighted content on their websites. However, blocking data collection can reduce the site’s visibility in search engines and prevent legitimate search engines from indexing the site’s content.
According to Wired [ii], 88% of leading US news agencies block data collection. The Reuters Institute reported [iii] that in February 2024, nearly half (48%) of leading news sites in ten countries blocked OpenAI data collectors. This is something of a problem for GenAI users, as quality and up-to-date content is often excluded from training material. AI services have therefore started to license content from different content providers to ensure quality educational content. This positive trend is likely to increase in the near future.
Major publishers and media houses have started to block the collection of copyrighted content on their websites.
Copyright and education
Generative AI services may collect information on all input and prompts in their databases in order to train language models. The same can happen, for example, when students’ work is scanned by plagiarism checkers.
Particular care should be taken with sensitive data, which should not be fed into AI services under any circumstances. It is therefore crucial to choose AI tools that ensure security practices. For example, they should ensure that data from feeds is not stored or reused for further training purposes. It is also important to remember that most student work is copyrighted material. Education providers should therefore ensure that the AI tools they use comply with EU copyright laws.
Sensitive data should not, under any circumstances, be fed into an AI service.
EUIPO published an interactive infographic [iv] on ” Generative AI in Education – Understanding copyright implications”
- Educate your students about copyright.
- Explain that AI-generated content can still infringe on existing copyright if it reproduces copyrighted material.
- Teach proper citation skills and how to attribute and credit content generated by GenAI.
- Promote critical thinking to verify the accuracy of the GenAI outputs.
- Encourage students to create original works and use AI as a tool for inspiration, not to replace their own creativity.
- Create interactive lessons where students practice looking for, using, and citing AI-generated content properly.
Read the entire AI Guide for Teachers here.
Sources
[i] OpenAI (Taken 12.12.2024) OpenAI faces multiple lawsuits over its use of copyrighted articles, books, and art to train its generative AI tools. https://www.euronews.com/next/2024/01/09/openai-says-its-impossible-to-train-ai-without-copyrighted-materials
[ii] Wired (2024) Most top news sites block AI bots. Rigth-wing media welcomes them. https://www.wired.com/story/most-news-sites-block-ai-bots-right-wing-media-welcomes-them/
[iii] Reuters Institute (2024) How many news websites block AI crawlers. https://reutersinstitute.politics.ox.ac.uk/how-many-news-websites-block-ai-crawlers
[iv] EUIPO (2024) Generative AI in Education – understanding copyright implications. https://euipo.europa.eu/tunnel-web/secure/webdav/guest/document_library/observatory/documents/reports/2024_Generative_AI_in_Education_infographic/2024_Generative_AI_in_Education_Understanding_copyright_implications.en.pdf
Read the entire AI Guide for Teachers here.