Data and models for stance and premise detection in COVID-19 tweets: Insights from the Social Media Mining for Health (SMM4H) 2022 shared task

In recent years, social media platforms have become powerful channels for public discourse, shaping opinions, and disseminating information on various topics. The advent of the COVID-19 pandemic has further intensified the prominence of online discussions, particularly regarding critical public health measures such as masks, lockdowns, school closures, and vaccine mandates. Understanding the stances and premises expressed in these discussions is crucial for policymakers, healthcare professionals, and researchers to gauge public sentiment, identify misinformation, and develop effective communication strategies [1], [2].

This paper addresses the challenge of automatic stance and premise detection in tweets specifically related to COVID-19 mandates. Stance detection involves determining the point of view (stance) of the text’s author towards a particular topic, while premise detection aims to identify the underlying reasons for supporting those stances. By automatically analyzing a large volume of tweets, we can gain valuable insights into the prevailing opinions, concerns, and rationales within the online community. Detecting stances and premises in tweets poses unique challenges due to the limited length of the messages, the informal nature of the language used, and the presence of noise and ambiguity. Furthermore, the topic of COVID-19 mandates is highly polarized, with divergent perspectives ranging from enthusiastic support to vehement opposition. Therefore, developing robust and accurate computational models capable of capturing the nuanced stance and premises expressed in tweets is vital for gaining a comprehensive understanding of the public discourse surrounding this critical issue.

A preliminary version of this work has appeared in [3], [4]. In this journal version, we have made several significant improvements, including:


Annotation of an external test dataset on a new claim topic related to vaccination. This topic was not included in the training set of the SMM4H 2022 Task 2, which covered three other claims: school closures, stay-at-home orders, and wearing masks. The dataset is annotated by human experts, ensuring high-quality labels for training and evaluation purposes.


An extended description of the experimental datasets and emotion analysis of tweets.


An extended description of the high-scoring systems used in the SMM4H 2022 Task 2. These models combine deep learning algorithms and linguistic features to predict stances and identify premises within the tweet.


Investigation of model performance on different claim topics, with the addition of new experimental results and conclusions.


Error analysis of the best-performing model and a discussion of its limitations.

In this work, we seek to answer the following research questions: RQ1: To what extent can models trained on specific claims be transferred to generalize and apply to other claims within the same domain? RQ2: Can the fusion of tweets and corresponding claims significantly enhance the performance of models?

The remainder of this paper is organized as follows. Section 2 provides an overview of related work on stance and premise detection in social media. Section 3 describes the methodology employed in the data collection and annotation process. In Section 4.4, we present the experimental setup, model architectures, and evaluation metrics. Section 5 describes obtained results. Finally, in Section 6, we discuss the implications of our findings, limitations of the study, problems that we faced during data collection, and potential directions for future research. All the data and code written in support of this publication are publicly available via

留言 (0)
