Text Analytics Meetup – KPN Results
The Text Analytics Meetup series is a collaboration in transfer learning for Natural Language Processing (NLP), initiated by Gianluigi Bardelloni and Yury Kashnitsky (KPN). It was intended to develop best practices in using unlabeled data to boost performance in classification tasks. Given that labeling data is particularly cumbersome and expensive in NLP tasks, one of the key motivations for such a collaboration was to find ways of leveraging current State-of-the-Art transfer learning techniques (such as ULMFiT and transformer-based approaches, BERT) to diminish the need for vast amounts of labeled data in applied business tasks.
The key contribution done by the mentioned transfer learning “club” is a DistilBERT classification pipeline with Catalyst, which helps any NLP practitioner quickly test transformers by HuggingFace in their classification tasks, reusing best DL practices through the Catalyst framework.
At the same time several seminars were organized during this collaboration, where participants shared various tips and best practices, e.g. how to train deep learning models with TPUs or how BERT in general works.
The main outcomes from KPN are as follows:
- Intro, ideas for collaboration within Amsterdam Data Science by Yury Kashnitsky
- Common classification pipeline (plans) by Yury Kashnitsky
- Common classification pipeline (result, with Catalyst) by Yury Kashnitsky & Co.
- “Unsupervised Data Augmentation” – summary by Boris Zubarev
- Overview of the Jigsaw competition by Yury Kashnitsky
- Intro to Transformers by Boris Zubarev
- Intro to BERT by Boris Zubarev
- Intro to Transformers (formerly, PyTorch-transformers) by Boris Zubarev
- Tips on training with TPUs + ipynb by Dmitry Leghikov
- Overview of NLP from RNNs to transformers, ADS collaboration on text analytics by Yury Kashnitsky
If you’re interested collaborating by pitching a challenge, please email firstname.lastname@example.org.
CWI spin-off company DuckDB Labs helped create startup MotherDuck which aims to connect DuckDB to the cloud. MotherDuck sports some big names: its CEO is Jordan Tigani, founding engineer at Google’s BigQuery, Google’s fully managed data analysis platform. A big part of the $47.5 million funding comes from Andreessen Horowitz, a prominent venture capital firm, specialized in technology startups.
Breaking ground as the first international conference on Hybrid Human Artificial Intelligence, HHAI22 held its first-ever in-person meeting in Amsterdam in the summer of 2022, establishing the beginnings of an international research community. ADS contributed to the conference by hosting a Meetup around the topic of Hybrid Intelligence.
There are legal rules and ethical frameworks, but little or no practical guidance on responsible design. In her inaugural lecture “System error, please restart”, Nanda Piersma argues what ‘responsible’ means and how we can carry out the (further) development of IT systems in such a way that it also earns our trust.