Critically Assessing AI Tools and Cultural Data for Digital Humanities Applications
Assessing whether insights gained from using AI or other computational techniques constitute a meaningful and interesting trend or merely reflect an error, limitation or bias in the tools or data used proves to be surprisingly difficult.
This difficulty involves well known quality issues, such as errors in optical character recognition (OCR). These errors are easy to spot by scholars and are widely recognized as a problem in the community. Even for these problems, however, little is known about how they impact other digital methods used “downstream”, such as named entity recognition, sentiment analysis, word embeddings and other frequently-used AI methods. When any specific method is provided with erroneous or biased data as input, how it may or may not, influence the outcome of the culturally-oriented research projects in which they are deployed is entirely unclear.
However, there are other sources of bias of which only a few users are aware. For example, while algorithmic bias in full text search has been studied in the information retrieval community for more than a decade, there is little awareness around this topic when it comes to using search tools in a non-commercial digital library. Since no technology can be assumed to be neutral a priori, for these lesser known sources of tool bias it is of key importance to measure the amount of bias and to be able to assess its impact on the research conducted using the tools.
Exploring sources of bias
Myriam Traub explores sources of bias in data and tools used by humanities scholars and addresses a number of these in her PhD thesis “Measuring Tool Bias & Improving Data Quality for Digital Humanities Research”, which she defended on Monday 11th May, 2020, at Utrecht University. Myriam’s work was carried out at CWI in the SealincMedia research project as part of the national COMMIT/ program, with the Dutch National Library, Rijksmuseum and other partners. She interviewed humanities scholars on their use of digital methods and the role of these methods in the overall research process. Traub studied retrievability bias in the search engine of the Dutch historic newspaper archive, the impact of partially fixing OCR errors by using human computation, and the potential of crowd sourcing on difficult tasks that are traditionally seen as limited to domain experts.
Traub’s research enabled a better understanding of the role AI and other computational methods play in current humanities research. In particular, she shows that in addition to the quest for better performing tools and higher quality data, what we also need are better techniques to measure limitations in tools and data and for conveying the results of these computational measures to humanities scholars interested in the historical artifacts or events expressed in the data.
It is clear that there is a need for more intense, multidisciplinary collaboration between humanities scholars, data custodians and tool developers to better understand each others’ assumptions, approaches and requirements. This will help to build not only the technical research infrastructure humanities scholars need, but also the human infrastructure where scholars need to be trained in the skills necessary to routinely make critical assessments of the fitness of digital data and tools available in the technical infrastructure.
SEALINCMedia Rijksmuseum Use Case video explainer (scroll down to SEALINCMedia)
The Dutch Applied AI Award is one of the awards presented during the annual Computable Awards. Since 2020 the Centre of Expertise Applied AI together with Computable and the podcast De Dataloog reward an innovative initiative in the field of applied Artificial Intelligence.
The Digital Interactions Lab. Our research bridges the gap between the technology-oriented and market-led formulation of the smart agenda with a sociological and psychological understanding of what people need artificial intelligence to be, and how data science might enhance our societies. This is a research group within the Informatics Institute at the University of Amsterdam.
The University of Applied Sciences has opened applications for its new Master Applied AI per April 1st 2022. This Dutch, 1-year Master’s programme allows students to understand, design, develop and implement AI. For the study year of 2022-2023 there will be place for 30 students.