Winners of the Inaugural ADS Thesis Awards Announced

The ADS Thesis Award winners for 2020 are Johanna Heddes and Nick Tehrany in the BSc category, and Rochelle Choenni and Mario Giulianelli in the MSc category. The winners were announced at the ADS Highlights Event.

The winners of the inaugural ADS Thesis Awards were announced today at the ADS Highlights event by award sponsors: Elsevier and Prosus.

The ADS Thesis Awards aim to promote excellence in Data Science and AI from students at the Bachelor and Master level in all Amsterdam-based knowledge institutes. The goal of the awards are:

      1. Reward and champion high-quality thesis work;
      2. Promote women and underrepresented minorities and encourage them to continue their education;
      3. Encourage diversity in Data Science and AI research;
      4. Advance Amsterdam and the ADS network as an innovation hub by showcasing excellent theses.

The Selection Committee was incredibly impressed by the number and quality of the nominations. All submissions were judged on how they advance Data Science and/or AI through:

      • Innovative scientific and technical contributions;
      • High societal and economical impact from the findings and results of the thesis;
      • Promotion of open science via FAIR data principles, and the availability of high-quality open-source code, results, traces, frameworks, etc.

A summary of each of the winning theses can be found below along with a link to the full thesis.

Bachelor Thesis Award
/

Johanna A.E. Heddes

The automatic detection of dataset names in scientific articles

University of Amsterdam
Supervisor: Dr. M. J. Marx

This thesis tackles the problem of automatically extracting datasets used in experimental evaluation from scientific papers in Machine Learning, Data Mining, Information Retrieval, and Computer Vision. In her work, Johanna contributed to the creation of a huge manually annotated dataset; she further crafted very good and explicit annotation guidelines, based on existing work, which resulted in high quality gold standard data.

The reviewers found the thesis thoroughly executed, very elaborate, and very well written. Johanna has performed thorough research and in-depth analysis, so much so that this could have very well been a Master thesis. The implementation and analysis work is also outstanding: she tested and compared the main NLP approaches in order to investigate the automated extraction of data sets in scientific publications.

As one of its contributions, this thesis introduces a new benchmarking dataset. The reviewers appreciated it as very useful and extensive; this dataset is made available in an online repository at https://github.com/xjaeh/ner_dataset_recognition.

As for impact, the reviewers found that automatically extracting datasets used in experimental evaluation from scientific papers in the many fields of Data Science is very relevant for the science community. It will certainly have an impact in the overall research evaluation. The outcomes of the thesis contribute to the body of work towards improved reproducibility of scientific results.

The award selection committee has agreed that this thesis is a worthy winner of an ADS Thesis Award due to its excellent execution and analysis, and for its high potential to make a concrete impact on the Data Science community.

Read Johanna’s thesis in full.

Nick-Andian Tehrany

Evaluating Performance Characteristics of the PMDK Persistent Memory Software Stack

Vrije Universiteit Amsterdam
Supervisor: Dr. ir. Animesh Trivedi and Ir. Sacheendra Talluri

This thesis investigates the performance of non-volatile memory, i.e., data is not lost when electric power is lost. The thesis uncovers several performance issues of such systems through a thorough performance evaluation.

The reviewers found that the topic and findings of this thesis are crucial for making a large variety of computing systems more efficient and reliable. They provide practical insights that are directly applicable by practitioners to reduce computing time and resources, and thus energy consumption. The latter has a major impact in our digital societies, riddled with sustainability issues, and requiring such research and improvements of low-level computational frameworks.

The reviewers also remarked that the complex technical issues are investigated with rigour, depth, and clarity. The findings, their strengths and limitations, and their context of application are clearly discussed. The code and research framework are fully documented to enable reproducibility and follow-up research at https://github.com/nicktehrany/membench and Appendix B in the thesis. The societal impacts, namely reducing computing resources and energy consumption, deserve further investigation in follow-up research.

The award selection committee has unanimously acknowledged the maturity of the work, the excellence of its technical discussion, and its large scope of practical contributions. It is a brilliant example of highly technical research that has broad impacts on both engineering and sustainability issues.

Read Nick-Andian’s thesis in full.

Master Thesis Award
/

Mario Giulianelli

Lexical Semantic Change Analysis with Contextualised Word Representations

University of Amsterdam
Supervisors: Dr. Raquel Fernandez and Marco del Tredici

This thesis presents a novel approach that allows the detection and analysis of word-meaning and how this changes over time. It is the first unsupervised approach for this task that obtains word representations from a Transformer-based neural language model. This approach is domain-independent, data-driven, automatic, and easily reproducible. The results of the empirical evaluation demonstrate that the proposed approach allows the recognition of cultural drifts driven by technological innovations, cultural transitions, and specific events, as well as more subtle linguistic shifts such as changes in the subcategorisation frames of nouns and verbs.

The reviewers found that tracking the evolution of words’ meanings is essential in our data-driven society for a variety of applications. For instance, information retrieval and conversational AI can benefit from this technique. Discrimination arises when community-specific semantics are not well recognised, if not entirely obliterated by data-driven systems that are only able to process allegedly mainstream semantics. As a consequence, humans have to adapt to the ways words are interpreted by computers. Allowing for semantics to evolve is crucial for letting human cultures and diversity develop according to human values, rather than the limitations imposed by the current technologies. The work also has a clear impact on other scientific disciplines. This thesis is likely to even impact research in Digital Humanities, for in-depth further studies on language evolution across large historical corpora.

The work is outstanding due to the depth of technical details it discusses, and the informed choice of algorithms and metrics. The author provides an in-depth analysis of different factors to consider, and their expected and observed impact on the results. Limitations of the proposed approach, as well as other competing approaches, are discussed in great detail. The code and the dataset used to conduct experiments are publicly available at https://github.com/glnmario/cwr4lsc.

The award selection committee has unanimously agreed that the presented thesis is a high quality work and a clear example of a very well executed research project that could be made possible only by a highly motivated and talented student.

Read Mario’s thesis in full.

Rochelle Choenni

What does it mean to be language-agnostic? Probing multilingual sentence encoders for typological properties

University of Amsterdam
Supervisor: Dr. Ekaterina Shutova

In her MSc thesis Rochelle Choenni focuses on the interpretation of popular multilingual sentence encoders, investigating which linguistic typological properties they encode and how. More specifically, in her thesis she

  1. Sheds light on the lexical, morphological and syntactic properties captured by multilingual sentence encoders;
  2. Investigates how these properties are encoded in different layers of the neural network;
  3. Studies the influence of different architectures and pretraining strategies on the encoded properties;
  4. Identifies the way in which relevant properties are jointly captured across languages.

Rochelle’s thesis work contributes to a better understanding of modern natural language processing techniques and makes an important step in making them more accessible to low-resource languages. It served as the basis for research paper submissions at the top conferences in artificial intelligence and computational linguistics fields. In addition, the results of the thesis could be relevant to related research fields, such as information retrieval, computer vision and multimedia. Details about the results reproducibility are provided in the Appendix B of the thesis.

The reviewers praised the novelty of the thesis and a clear potential for socioeconomic impact. The award selection committee has unanimously agreed that the impressive set of contributions made in the thesis, as well as in her previous work, make Rochelle Choenni a worthy winner of the ADS Thesis Award and a role model for the next generation of Data Science and AI students.

Read Rochelle’s thesis in full.

Read More