Infectious Disease Reports, Vol. 14, Pages 855-883: MonkeyPox2022Tweets: A Large-Scale Twitter Dataset on the 2022 Monkeypox Outbreak, Findings from Analysis of Tweets, and Open Research Questions

1. IntroductionMonkeypox, caused by the monkeypox virus, which belongs to the Poxviridae family, Chordopoxvirinae subfamily, and Orthopoxvirus genus [1], is a re-emerging zoonotic disease. The monkeypox virus was initially discovered in monkeys in 1958 [2], and the first case of human monkeypox was detected in the Democratic Republic of the Congo (DRC) in a nine-month-old boy in 1970 [3]. The monkeypox virus is closely related to the variola virus (smallpox virus) and results in a smallpox-like disease. The incubation period of monkeypox is 5–21 days, and common symptoms include fever (between 38.5 °C and 40.5 °C), headache, and myalgia. A distinguishing feature of the monkeypox infection is the presence of swelling at the maxillary, cervical or inguinal lymph nodes (lymphadenopathy) [4,5]. A recent study found that during the ongoing outbreak of monkeypox, inguinal lymphadenopathy was more common than cervical and axillary lymphadenopathy [6]. In individuals infected with the monkeypox virus, rashes appear following the onset of fever, beginning on the face, tongue, and oral cavity before spreading across the body. In the later stages of the infection, lesions in the oral cavity may make it challenging for the patients to eat and drink [5]. However, during the ongoing outbreak, multiple atypical clinical observations have been reported as compared to the prior outbreaks [7,8]. The severity of the infection is usually determined by the lesion count, as there is a direct correlation between high lesion counts and severe health-related complications [5]. Studies have shown that patients with severe complications may experience respiratory and gastrointestinal issues [9], septicemia [9,10], encephalitis [5], and ocular infections [11].The monkeypox virus had been endemic in the DRC and a few African countries for a very long time, and a few cases outside these geographic regions were recorded only twice—first in 2003 [12] and then in 2018–2019 [13,14]. However, at the time of writing this paper, the world is experiencing a global outbreak of the monkeypox virus with 71,096 cases, of which 70,377 cases have been reported in locations that have not historically reported any monkeypox infections [15]. Some of the countries that have recorded the greatest number of monkeypox cases so far include the United States (26,577 cases), Brazil (8207 cases), Spain (7209 cases), France (4043 cases), the United Kingdom (3654 cases), Germany (3645 cases), Peru (2587 cases), Colombia (2453 cases), Mexico (1968 cases), Canada (1411 cases), and the Netherlands (1221 cases).The first case of this 2022 global monkeypox outbreak was confirmed in the United Kingdom on 7 May 2022 [16]. On 19 May 2022, the first draft genome sequence of the monkeypox virus was performed by scientists in Portugal [17]. The genomic data related to this outbreak that has been studied so far indicate that this outbreak is caused by the West African clade [18]. On 20 May 2022, the World Health Organization (WHO) called an “emergency meeting” [19] to discuss the global concerns centered around the rising cases of the monkeypox virus. Since then, WHO was considering whether the outbreak should be assessed as a “potential public health emergency of international concern” or PHEIC, as was done for the COVID-19 and Ebola outbreaks in the past [20]. On 6 June 2022, the Center for Disease Control (CDC) in the United States raised its monkeypox alert to “Level 2” following the rapid increase in cases [21]. On 23 July 2022, following another meeting, the WHO declared monkeypox a Global Public Health Emergency (GPHE) [22]. There have been several reports and findings related to the spread of Monkeypox. In a recent report, the CDC said, “monkeypox eradication unlikely in the U.S. as virus could spread indefinitely” [23]. In a report by the New Scientist, it was discussed that a dangerous monkeypox variant circulating in the DRC could go global [24]. According to a recent article published in Nature [25], monkeypox could become impossible to contain if wild animal spread continues.As per the CDC, “currently there is no treatment approved specifically for monkeypox virus infections” [26]. However, recently, a vaccine for monkeypox has been approved by the Food and Drug Association (FDA). The vaccine, previously used for smallpox, is called JYNNEOS and was developed by Bavarian Nordic, a Danish biotechnology firm [27]. The JYNNEOS vaccine has been the primary vaccine being used in the United States during this outbreak [28]. The ACAM2000 vaccine is an alternative to JYNNEOS. It is also approved to help protect against smallpox and monkeypox [29]. In addition to vaccines, in the United States, as per the CDC, several antivirals, such as Tecovirimat (also known as TPOXX, ST-246), Vaccinia Immune Globulin Intravenous (VIGIV), Cidofovir (also known as Vistide), and Brincidofovir (also known as CMX001 or Tembexa), are currently available from the Strategic National Stockpile (SNS) as options for the treatment of monkeypox [26]. As the cases surge, countries all over the world are taking various forms of preparations, initiatives, and measures to reduce the spread of the virus. These include a lockdown in Belgium [30], the United States ordering 500,000 doses of the JYNNEOS vaccine [31], Canada offering vaccination to high-risk groups [32], health authorities in France and Denmark suggesting a vaccine rollout to adults infected by the virus [33], Germany recommending vaccinations for high-risk groups [34], and the United Kingdom advising self-isolation for everyone infected with the virus [35], just to name a few. The rising cases of monkeypox and the associated recommendations, initiatives, and measures by various countries have led to the public engaging in conversations for information seeking and sharing related to monkeypox. The Internet of Everything lifestyle of today’s living is centered around people engaging in online conversations via the internet, specifically social media platforms, and spending a lot more time on the internet than ever before [36]. As a result, there has been a tremendous increase in the use of social media platforms in the recent past [37,38]. Conversations on social media include a wide range of topics, such as recent issues, global challenges, pandemics, emerging technologies, news, current events, politics, family, relationships, trending topics, and career opportunities [39]. Twitter, one such social media platform, is used by people of almost all age groups from different parts of the world [40,41]. At present, there are about 450 million monthly active users on Twitter [42]. In view of the surge in Tweets about monkeypox since the beginning of the outbreak, Twitter recently added a link for accurate information on monkeypox [43]. A recent press release reported—“medical experts are building brands as monkeypox influencers and thought leaders, using their credentials and controversial posts to gain Twitter clout as mounting anxiety over the virus continues to spread” [44]. In addition to this, several other Tweets about monkeypox have also been discussed and debated in press releases in the last few days [45,46,47]. Mining social media conversations, for instance, Tweets, to develop datasets has been of significant interest to the scientific community in the last few years, as can be seen from several recent works in this field (Section 2.1). Such Twitter datasets serve as a data resource for a wide range of applications and use-case scenarios related to studying the associated conversation paradigms as well as for investigating the patterns of the underlying information-seeking and sharing behavior on Twitter. Some of the recent virus outbreaks, such as COVID-19, Ebola, Zika virus, and flu, were followed by the scientific community developing Twitter datasets, performing a comprehensive analysis of the multimodal components of the Tweets (such as hashtags, language, retweets, studying the source of the Tweet, etc.), and analyzing the sentiments of these Tweets. The recent outbreak of monkeypox has also led to an increase in research and development in this field in the last few weeks (Section 2.2). However, none of these prior works focused on mining Tweets about the 2022 monkeypox outbreak to develop a dataset. Neither did any of these prior works focus on performing a comprehensive analysis of the Tweets about this outbreak. Furthermore, there has been no work conducted in this field thus far that has focused on outlining open research questions or research directions to advance knowledge, innovation, and discovery in this field. This paper aims to address these challenges. In summary, it makes the following scientific contributions to this field:It presents an open-access dataset of 556,427 Tweet IDs of the same number of Tweets about monkeypox that were posted on Twitter from 7 May 2022 to 9 October 2022. The dataset is available at https://doi.org/10.7910/DVN/CR7T5E. The earliest date was selected as 7 May 2022, as the first case of the 2022 monkeypox outbreak was recorded on this date. 9 October 2022 was the most recent date at the time of resubmission of this paper after the second review round. The dataset is compliant with the privacy policy, developer agreement, and guidelines for content redistribution of Twitter, as well as with the FAIR principles (Findability, Accessibility, Interoperability, and Reusability) principles for scientific data management. A comparative study is also presented that compares this dataset with 36 prior works in this field that focused on the development of Twitter datasets to further uphold the novelty, relevance, and usefulness of this dataset.

It presents the findings from a comprehensive content analysis of these Tweets. The findings show that:

All the 34 languages supported by the Twitter API have been used to post Tweets about the outbreak. However, English has been the most used language.

The day WHO declared monkeypox as a GPHE, about 40,000 Tweets related to monkeypox were posted in a span of just 24 h.

A total of 5470 distinct hashtags have been used in Tweets about this outbreak, of which #monkeypox is the most used hashtag as compared to all other variations of the spelling in terms of use of uppercase or lowercase characters, such as #MonkeyPox, #monkeyPox, #MONKEYPOX, etc.

Twitter for iPhone has been the leading source that has been used to post Tweets about monkeypox since the first case of this outbreak. It is followed by Twitter for Android, the Twitter Web App, and other sources.

The paper also presents the findings of sentiment analysis of the Tweets of this dataset. The findings of this study show that despite a lot of discussions, debate, opinions, information, and misinformation on Twitter on various topics in this regard, such as monkeypox and the LGBTQI+ community, monkeypox and COVID-19, vaccines for monkeypox, etc., a “neutral” sentiment is present in most of the Tweets. It is followed by “negative” and “positive” sentiments, respectively.

Finally, to support research and development in this field, a list of 50 open research questions in the areas of Big Data, Data Mining, Machine Learning, Natural Language Processing, and Information Retrieval with a specific focus on this outbreak is presented that may be studied, analyzed, and investigated using this dataset.

The rest of the paper is organized as follows. Section 2 presents the literature review. The methodologies that were followed for the development of this dataset, content analysis of the Tweets, and the sentiment analysis of Tweets are presented in Section 3. Section 3 also outlines how the dataset is compliant with the privacy policy, developer agreement, and guidelines for content redistribution of Twitter, as well as with the FAIR principles (Findability, Accessibility, Interoperability, and Reusability) for scientific data management. Section 4 presents the results of this work. In Section 4.1, a detailed description of the dataset files is presented. It also presents step-by-step instructions on how to use this dataset. A comprehensive comparative study with prior works in this field that focused on the development of Twitter datasets is also presented in Section 4.1 to uphold the novelty, relevance, and usefulness of this dataset. Section 4.2 presents the results of the content analysis of the Tweets. It is followed by Section 4.3, where the results of the sentiment analysis of the Tweets are presented and discussed. A list of 50 open research questions that may be investigated using this dataset is presented in Section 4.4. It is followed by the conclusion and scope for future work in Section 5, which is followed by references. 5. Conclusions and Scope of Future Work

Twitter datasets serve as a rich data resource for the investigation of different research questions for the timely advancement of knowledge, innovation, and discovery in different fields. Therefore, scientists in this field have focused on developing Twitter datasets on recent issues, global challenges, pandemics, virus outbreaks, emerging technologies, and trending matters in the last few years. In addition to the development of Twitter datasets, analysis of multimodal components of Tweets, specifically Tweets about virus outbreaks, has been of significant interest to the scientific community, as can be seen from several works that focused on analyzing different characteristics of Tweets posted about some of the recent virus outbreaks, such as COVID-19, Ebola, Zika virus, and the flu. The world is currently experiencing an outbreak of the monkeypox virus. A total of 71,096 cases have been reported so far, out of which 70,377 cases have been reported in locations that have not historically reported any monkeypox infections. The World Health Organization (WHO) has declared monkeypox to be a Global Public Health Emergency. This has resulted in a tremendous increase in different types of conversations on Twitter related to monkeypox. None of the prior works in this field have focused on mining these conversations to develop a Twitter dataset. Furthermore, no prior work has analyzed multiple components of these conversations about monkeypox on Twitter. The work presented in this paper aims to address these research challenges. First, it presents an open-access dataset of 556,427 Tweets about monkeypox that were posted on Twitter since the first detected case of this outbreak. Second, the paper reports the results of a comprehensive content analysis of the Tweets of this dataset. This analysis presents several novel findings such as − English has been the most used language (out of all the 34 languages supported by Twitter) to post Tweets about monkeypox, about 40,000 Tweets related to monkeypox were posted on the day WHO declared monkeypox as a GPHE, a total of 5470 distinct hashtags have been used on Twitter about this outbreak out of which #monkeypox is the most used hashtag, and Twitter for iPhone has been the leading source of Tweets about the outbreak. The sentiment analysis of the Tweets was also performed, and the results show that despite a lot of discussions, debate, opinions, information, and misinformation on Twitter on various topics in this regard, such as monkeypox and the LGBTQI+ community, monkeypox and COVID-19, vaccines for monkeypox, etc., “neutral” sentiment was present in most of the Tweets. It was followed by “negative” and “positive” sentiments, respectively. Finally, to support research and development in this field, the paper presents a list of 50 open research questions related to the outbreak in the areas of Big Data, Data Mining, Natural Language Processing, and Machine Learning that may be investigated based on this dataset. Future work on this research project would involve updating the dataset with more recent Tweets on a routine basis to ensure that the scientific community has access to the most recent data in this regard.

留言 (0)

沒有登入
gif