Large Language Models and Library Electronic Resources: More Questions than Answers

Guest Post by Sara Pike

As generative tools like Chat GPT and Gemini become more popular, libraries are facing new questions about electronic resources licensing and use. Subscription resources provided by libraries open up a world of content and it might not always be clear how that content can or should be used. For example, can articles from library databases be scraped in order to train Large Language Models (LLM’s)? Is it ok to load an article into a chat bot in order to request a summary? Does this violate the agreement many schools have against sharing content with third parties? When it comes to LLM’s, what constitutes ethical use of content that was created by someone else?

Recently, the New York Times brought a lawsuit against OpenAI and Microsoft claiming the companies used millions of articles from the publication to train chatbots in a breech of copyright. These chatbots then became competitors with the Times for those seeking online information.

Librarians will likely see clauses about the use of electronic publications related to large language models popping up in license agreements as content creators and copyright holders seek to protect their work. And as some sectors push for the rapid advancement of this technology, stressing the benefits it may bring, we will still need to grapple with ethical and other considerations related to potential harms. As educators, this includes bringing these issues into conversation spaces with students and colleagues and hopefully charting the way forward together.

James grills, CC BY-SA 4.0 <https://creativecommons.org/licenses/by-sa/4.0>, via Wikimedia Commons

Leave a Reply

Your email address will not be published. Required fields are marked *