a2t_topics

Introduction

The Aks2Transformer package by Sainz and Rigau (2021) uses a predefined set ob labels to find the best match for a string of words. For this a simple list of topics (topic_list.txt) has to be defined with the upcoming conversation in mind. This label list has a broad range of possible topics, but should be adjusted prior to the live debates to fit the theme and scope of the debates. The label matching is done via a transformer type neural network; we use the default network for the package “roberta-large-mnli”. The output is an estimate of fit for each label.

“The Ask2Transformers work aims to automatically annotate textual data without any supervision. Given a particular set of labels (BabelDomains, WNDomains, …), the system has to classify the data without previous examples. This work uses the Transformers library and its pretrained LMs.” (Source: https://github.com/osainz59/Ask2Transformers)

Preloaded models

The a2t framework uses the roberta-large-mnli model, which is 1.32 GB big; in the default state it gets the model on each execution from the hugging face database. To decrease the startup time and limit the network usage it is recommended to preload the model. Please place your preferred model in folder 00_custom/a2t_models.

Settings

Users can adjust the default settings by, e.g., providing a list of predefined expected topics, please see the settings.txt file in the main folder - the default setting are:

MODEL_A2T_SETTING: value SMALL uses “textattack/roberta-base-MNLI” model (about 550MB); LARGE uses “roberta-large-mnli” model (about 1.32 GB)
MODEL_A2T_TOPIC_N: LARGE (42 topics), MEDIUM (15 topics) and SMALL (8 topics), defaults to MEDIUM

Technical Framework

The script is writen in Python.

Main Contributor

Kolja J. Holzapfel

Reference

[Sainz and Rigau 2021] Sainz, Oscar ;Rigau, German: Ask2Transformers: Zero-Shot Domain labelling with Pretrained Language Models. In: Proceedings of the 11th Global Wordnet Conference. University of South Africa (UNISA): Global Wordnet Association, Januar 2021, p. 44–52. https://www.aclweb.org/anthology/2021.gwc-1.6