Best Paper Awards for CWI Researchers at Digital Libraries Conference

CWI Information Access researchers, Myriam C. Traub, Thaer Samar, Jacco van Ossenbruggen, Lynda Hardman, receive best paper and best student paper awards at the ACM/IEEE Joint Conference on Digital Libraries in June 2018 in Texas. The paper, Impact of Crowdsourcing OCR Improvements on Retrievability Bias’demonstrates how readers can increase retrievability of documents in the Royal Library collection.

Digitized document collections often suffer from OCR errors that may impact a document’s readability and retrievability. We studied the effects of correcting OCR errors on the retrievability of documents in a historic newspaper corpus of a digital library. We computed retrievability scores for the uncorrected documents using queries from the library’s search log, and found that the document OCR character error rate and retrievability score are strongly correlated. We computed retrievability scores for manually corrected versions of the same documents, and report on differences in their total sum, the overall retrievability bias, and the distribution of these changes over the documents, queries and query terms. For large collections, often only a fraction of the corpus is manually corrected. Using a mixed corpus, we assess how this mix affects the retrievability of the corrected and uncorrected documents. The correction of OCR errors increased the number of documents retrieved in all conditions. The increase contributed to a less biased retrieval, even when taking the potential lower ranking of uncorrected documents into account.

More info HERE

Read More

  • ADS Thesis Awards 2022

    The ADS Thesis Awards aim to promote excellence in Data Science and AI from students at BSc and Master level. The Awards are open to students from all Amsterdam-based knowledge institutes.

  • Join LAB42’s Grand Opening on the 22nd of September

    On the 22nd of September LAB42 will host its Grand Opening from 15:00 onwards. LAB42 is an international hub for developing talent in the fields of digital innovation and AI. The building is the result of a partnership between the UvA, the municipality of Amsterdam and the business sector.

  • Register for the Data Science Day on the 13th of October

    The UvA Data Science Centre, part of the University Library, is pleased to host its second yearly Data Science Day on the 13th of October 2022.