23
views
0
recommends
+1 Recommend
3 collections
    0
    shares

      Submit your digital health research with an established publisher
      - celebrating 25 years of open access

      • Record: found
      • Abstract: found
      • Article: found
      Is Open Access

      Social Media Monitoring of the COVID-19 Pandemic and Influenza Epidemic With Adaptation for Informal Language in Arabic Twitter Data: Qualitative Study

      research-article

      Read this article at

      ScienceOpenPublisherPMC
      Bookmark
          There is no author summary for this article yet. Authors can add summaries to their articles on ScienceOpen to make them more accessible to a non-specialist audience.

          Abstract

          Background

          Twitter is a real-time messaging platform widely used by people and organizations to share information on many topics. Systematic monitoring of social media posts (infodemiology or infoveillance) could be useful to detect misinformation outbreaks as well as to reduce reporting lag time and to provide an independent complementary source of data compared with traditional surveillance approaches. However, such an analysis is currently not possible in the Arabic-speaking world owing to a lack of basic building blocks for research and dialectal variation.

          Objective

          We collected around 4000 Arabic tweets related to COVID-19 and influenza. We cleaned and labeled the tweets relative to the Arabic Infectious Diseases Ontology, which includes nonstandard terminology, as well as 11 core concepts and 21 relations. The aim of this study was to analyze Arabic tweets to estimate their usefulness for health surveillance, understand the impact of the informal terms in the analysis, show the effect of deep learning methods in the classification process, and identify the locations where the infection is spreading.

          Methods

          We applied the following multilabel classification techniques: binary relevance, classifier chains, label power set, adapted algorithm (multilabel adapted k-nearest neighbors [MLKNN]), support vector machine with naive Bayes features (NBSVM), bidirectional encoder representations from transformers (BERT), and AraBERT (transformer-based model for Arabic language understanding) to identify tweets appearing to be from infected individuals. We also used named entity recognition to predict the place names mentioned in the tweets.

          Results

          We achieved an F1 score of up to 88% in the influenza case study and 94% in the COVID-19 one. Adapting for nonstandard terminology and informal language helped to improve accuracy by as much as 15%, with an average improvement of 8%. Deep learning methods achieved an F1 score of up to 94% during the classifying process. Our geolocation detection algorithm had an average accuracy of 54% for predicting the location of users according to tweet content.

          Conclusions

          This study identified two Arabic social media data sets for monitoring tweets related to influenza and COVID-19. It demonstrated the importance of including informal terms, which are regularly used by social media users, in the analysis. It also proved that BERT achieves good results when used with new terms in COVID-19 tweets. Finally, the tweet content may contain useful information to determine the location of disease spread.

          Related collections

          Most cited references48

          • Record: found
          • Abstract: found
          • Article: not found

          Twitter as a Tool for Health Research: A Systematic Review

          Background. Researchers have used traditional databases to study public health for decades. Less is known about the use of social media data sources, such as Twitter, for this purpose. Objectives. To systematically review the use of Twitter in health research, define a taxonomy to describe Twitter use, and characterize the current state of Twitter in health research. Search methods. We performed a literature search in PubMed, Embase, Web of Science, Google Scholar, and CINAHL through September 2015. Selection criteria. We searched for peer-reviewed original research studies that primarily used Twitter for health research. Data collection and analysis. Two authors independently screened studies and abstracted data related to the approach to analysis of Twitter data, methodology used to study Twitter, and current state of Twitter research by evaluating time of publication, research topic, discussion of ethical concerns, and study funding source. Main results. Of 1110 unique health-related articles mentioning Twitter, 137 met eligibility criteria. The primary approaches for using Twitter in health research that constitute a new taxonomy were content analysis (56%; n = 77), surveillance (26%; n = 36), engagement (14%; n = 19), recruitment (7%; n = 9), intervention (7%; n = 9), and network analysis (4%; n = 5). These studies collectively analyzed more than 5 billion tweets primarily by using the Twitter application program interface. Of 38 potential data features describing tweets and Twitter users, 23 were reported in fewer than 4% of the articles. The Twitter-based studies in this review focused on a small subset of data elements including content analysis, geotags, and language. Most studies were published recently (33% in 2015). Public health (23%; n = 31) and infectious disease (20%; n = 28) were the research fields most commonly represented in the included studies. Approximately one third of the studies mentioned ethical board approval in their articles. Primary funding sources included federal (63%), university (13%), and foundation (6%). Conclusions. We identified a new taxonomy to describe Twitter use in health research with 6 categories. Many data elements discernible from a user’s Twitter profile, especially demographics, have been underreported in the literature and can provide new opportunities to characterize the users whose data are analyzed in these studies. Twitter-based health research is a growing field funded by a diversity of organizations. Public health implications. Future work should develop standardized reporting guidelines for health researchers who use Twitter and policies that address privacy and ethical concerns in social media research.
            Bookmark
            • Record: found
            • Abstract: found
            • Article: found
            Is Open Access

            Using Social Media for Actionable Disease Surveillance and Outbreak Management: A Systematic Literature Review

            Objective Research studies show that social media may be valuable tools in the disease surveillance toolkit used for improving public health professionals’ ability to detect disease outbreaks faster than traditional methods and to enhance outbreak response. A social media work group, consisting of surveillance practitioners, academic researchers, and other subject matter experts convened by the International Society for Disease Surveillance, conducted a systematic primary literature review using the PRISMA framework to identify research, published through February 2013, answering either of the following questions: Can social media be integrated into disease surveillance practice and outbreak management to support and improve public health? Can social media be used to effectively target populations, specifically vulnerable populations, to test an intervention and interact with a community to improve health outcomes? Examples of social media included are Facebook, MySpace, microblogs (e.g., Twitter), blogs, and discussion forums. For Question 1, 33 manuscripts were identified, starting in 2009 with topics on Influenza-like Illnesses (n = 15), Infectious Diseases (n = 6), Non-infectious Diseases (n = 4), Medication and Vaccines (n = 3), and Other (n = 5). For Question 2, 32 manuscripts were identified, the first in 2000 with topics on Health Risk Behaviors (n = 10), Infectious Diseases (n = 3), Non-infectious Diseases (n = 9), and Other (n = 10). Conclusions The literature on the use of social media to support public health practice has identified many gaps and biases in current knowledge. Despite the potential for success identified in exploratory studies, there are limited studies on interventions and little use of social media in practice. However, information gleaned from the articles demonstrates the effectiveness of social media in supporting and improving public health and in identifying target populations for intervention. A primary recommendation resulting from the review is to identify opportunities that enable public health professionals to integrate social media analytics into disease surveillance and outbreak management practice.
              Bookmark
              • Record: found
              • Abstract: not found
              • Book: not found

              BERT: Pre-training of deep bidirectional transformers for language understanding

                Bookmark

                Author and article information

                Contributors
                Journal
                JMIR Med Inform
                JMIR Med Inform
                JMI
                JMIR Medical Informatics
                JMIR Publications (Toronto, Canada )
                2291-9694
                September 2021
                17 September 2021
                17 September 2021
                : 9
                : 9
                : e27670
                Affiliations
                [1 ] School of Computing and Communications Lancaster University Lancaster United Kingdom
                [2 ] College of Computer and Information Sciences King Saud University Riyadh Saudi Arabia
                Author notes
                Corresponding Author: Lama Alsudias l.alsudias@ 123456lancaster.ac.uk
                Author information
                https://orcid.org/0000-0003-1131-4251
                https://orcid.org/0000-0002-1257-2191
                Article
                v9i9e27670
                10.2196/27670
                8451962
                34346892
                24322dd6-d3ff-41cf-a6a9-00b33a6533aa
                ©Lama Alsudias, Paul Rayson. Originally published in JMIR Medical Informatics (https://medinform.jmir.org), 17.09.2021.

                This is an open-access article distributed under the terms of the Creative Commons Attribution License ( https://creativecommons.org/licenses/by/4.0/), which permits unrestricted use, distribution, and reproduction in any medium, provided the original work, first published in JMIR Medical Informatics, is properly cited. The complete bibliographic information, a link to the original publication on https://medinform.jmir.org/, as well as this copyright and license information must be included.

                History
                : 2 February 2021
                : 6 April 2021
                : 20 April 2021
                : 20 June 2021
                Categories
                Original Paper
                Original Paper

                arabic,covid-19,infectious disease,influenza,infodemiology,infoveillance,social listening,informal language,multilabel classification,natural language processing,named entity recognition,twitter

                Comments

                Comment on this article

                scite_
                0
                0
                0
                0
                Smart Citations
                0
                0
                0
                0
                Citing PublicationsSupportingMentioningContrasting
                View Citations

                See how this article has been cited at scite.ai

                scite shows how a scientific paper has been cited by providing the context of the citation, a classification describing whether it supports, mentions, or contrasts the cited claim, and a label indicating in which section the citation was made.

                Similar content134

                Cited by7

                Most referenced authors285