Winners of the ADS Thesis Awards Announced!

The ADS Thesis Award winners for 2020 are Rachel van ‘t Hull and Hongyu He in the BSc category, and Selma Muhammad and Martine Toering for the MSc category. The winners were announced at the ADS Highlights Event on the 9th of December 2021.

The winners of the inaugural ADS Thesis Awards were announced at the ADS Highlights event by award sponsors: Prosus, ADS, Elsevier, and DSC.

The ADS Thesis Awards aim to promote excellence in Data Science and AI from students at the Bachelor and Master level in all Amsterdam-based knowledge institutes. The goal of the awards are:

  1. Reward and champion high-quality thesis work.
  2. Promote women and underrepresented minorities and encourage them to continue their education.
  3. Encourage diversity in Data Science and AI research.
  4. Advance Amsterdam and the ADS network as an innovation hub by showcasing excellent theses.

The Selection Committee was incredibly impressed by the number and quality of the nominations. All submissions were judged on how they advance Data Science and/or AI through:

  • Innovative scientific and technical contributions;
  • High societal and economical impact from the findings and results of the thesis;
  • Promotion of open science via FAIR data principles, and the availability of high-quality open-source code, results, traces, frameworks, etc.

A summary of each of the winning theses can be found below along with a link to the full thesis.

______________________________

 

BSc Thesis Awards

 

Rachel van ‘t Hull

Towards Explainable Artificial Text Detection

University of Amsterdam

Supervisors: Dr. Willem H. (Jelle) Zuidema, Valentin Vogelmann, Bas Cornelissen

As the quality of artificially generated texts improved considerably over the past few years, manifested by deep-learning based language models trained on many billions of tokens, so does the possibility to deceive others by means of fake reviews, identity theft and phishing. To counter such maleficent usage of text generation, this thesis focuses on the task of distinguishing artificially generated texts from human-written text. In contrast to detector models that are commonly applied to this task, lacking in generalizability and transparency, the thesis explores the value of word distributions as a signal in the context of different grammatical categories and corpus sizes. The outcomes of the extensive experimentation show a promising performance of the proposed explainable approach to text classification.

In terms of impact, the thesis provides extensive empirical evidence that a focus on word distributions provides a broadly applicable and explainable alternative to the opaque detector-based models, thereby inspiring future studies into artificial text detection to further investigate this perspective. The complete pipeline is publicly shared.

The endeavor has obvious societal relevance, providing a powerful and explainable handle to automatically flag deceptive texts that are generated at large scale. This work is a well-deserved winner of the thesis prize, standing out with a clear writing style and experimental structure, as well as extensive detail and motivation. It has an impressive conceptual depth for a bachelor thesis, and reflects good computational and statistical skill

Read Rachel’s full thesis here.

 

______________________________

 

Hongyu He

How Can Datacenters Join the Smart Grid to Address the Climate Crisis? Using Simulation to Explore Power and Cost Effects of Direct Participation in the Energy Market.

Vrije Universiteit Amsterdam

Supervisors: Prof. Dr. Alexandru Iosup, Fabian Mastenbroek, Leon Overweel

This thesis explores the means for data centers to participate in the energy market. The research explores approaches that allow data centers to reduce their operating costs while simultaneously helping the reliability and stability of the energy grid. The research demonstrates through a series of experiments that such participation can lead to financial gains for datacenters and the energy market.

The challenge of matching demand and supply on the energy market is very important in the energy transition. So this work supports major opportunities for power grid operators and for sustainable energy grids with variable outputs.

The contributions of this thesis to the theoretical aspects and implementation tools (e.g., OpenDC simulator) are highly commendable, especially considering that this work has been done by a Bachelor student. This thesis is very extensive, it adds significantly to the literature on this topic and can be very valuable for followup studies.

The code and data for experiments are published to public repositories, and have contributed to upstream projects. This too is a commendable achievement.

Read Hongyu’s full thesis here.

 

______________________________

 

MSc Thesis Awards

 

Selma Muhammad

Auditing Algorithmic Fairness with Unsupervised Bias Discovery

Vrije Universiteit van Amsterdam

Supervisors: Dr. Emma Beauxis-Aussalet, Linda van de Fliert, Dr. Michael Cochez

Fairness in data science applications must be guaranteed even if labels identifying protected groups are not available. This thesis, by Selma Muhammad, proposes a new unsupervised learning approach to detect bias in the absence of labels pre-identifying groups. The method proposed, named Hierarchical Bias-Aware Clustering (HBAC), consists in applying clustering methods to the original features and outcomes of classification algorithms in order to automatically identify discrimination. The thesis compares how multiple clustering algorithms fare when considered as part of HBAC. The proposed algorithm is tested in both empirical (German Credit and COMPAS) and synthetically generated datasets.

The reviewers praised the novelty and excellent writing quality of this work. This thesis addresses a challenge of major concern for the Municipality of Amsterdam and other public institutions. The timeliness of the topic addressed was also highlighted by the reviewers.

Besides the potential for societal impact of this thesis, the reviewers stressed the inventiveness of the algorithm proposed and emphasized the comprehensive analysis performed, which comprises multiple variations of the method and multiple datasets.

It is also noteworthy that Selma Muhammad endeavoured to make her results accessible to a wide community by developing outreach materials. The code supporting this awarded thesis is available in a public repository and a summary is available both as a video presentation and blog post for the general public.

Read Selma’s full thesis here.

 

______________________________

 

Martine Toering

Self-supervised Video Representation Learning with Cross-Stream Prototypical Contrasting

University of Amsterdam

Supervisors: Prof. Dr. Cees G.M. Snoek, Tao Hu, Ioannis Gatopoulos, Maarten C. Stol

One of the main challenges in computer vision by learning is the use of labelled data. The thesis contributes to the field by providing a new method for learning representations of elements in video’s from fewer examples.

The panel was impressed by the technical quality of the contribution as well as the experiments. The paper is very well written, has a very strong novel technical contribution, seems well embedded in the body of existing literature, and draws from knowledge about human vision in coming up with a novel technique for video recognition.

The results show the quality of the method, improving over several state-of-the-art instance contrastive learning methodologies. Additionally, the work has already been accepted to a renowned computer vision conference. Lastly, all code is published online through Github, and is extensively documented, supporting reproducibility.

Video recognition largely depends on the interpretation of separate images, but also comes with a greater disadvantage as annotating videos is laborious and complex (unclear def. of human actions). The author argues that especially video is too high-dimensional for supervision through labelling, as labels cannot capture its inherent structure. Methods based on instance-level contrasting are able to bridge this gap. They require an augmentation module that obtains multiple views of one instance, and a loss function that contrasts between these augmented views. The goal is to produce higher similarity scores being contrasted with views from the same instances than with negative examples. These methods have a few fundamental issues: i) they rely heavily on data augmentation ii) the number of pairwise comparisons is high.

The thesis provides a strong novel contribution to the field of video recognition and contrastive learning; a method for obtaining high quality representations of video elements from fewer samples. The authors (replace with name? or singular?) make the observation that spatiotemporal coherence and motion are not optimally used in contrastive learning methods for video recognition. They draw from vision processing in the brain to propose a new method. For negative pairs, low similarity measures are produced, neglecting the semantic similarity between the frames coming from the temporal/motion characteristic of video’s. Therefore, distances between instances don’t accurately reflect the semantics between them. Instead of instances, the authors aim to look at representatives of semantically similar groups (prototypes). The features learned in this way are very descriptive, and are therefore useful in a variety of video application tasks

Read Martine’s full thesis here.

Read More