4
views
0
recommends
+1 Recommend
1 collections
    0
    shares
      • Record: found
      • Abstract: found
      • Article: found
      Is Open Access

      Modeling COVID-19 incidence with Google Trends

      research-article

      Read this article at

      Bookmark
          There is no author summary for this article yet. Authors can add summaries to their articles on ScienceOpen to make them more accessible to a non-specialist audience.

          Abstract

          Infodemiologic methods could be used to enhance modeling infectious diseases. It is of interest to verify the utility of these methods using a Nigerian case study. We used Google Trends data to track COVID-19 incidences and assessed whether they could complement traditional data based solely on reported case numbers. Data on the Nigerian weekly COVID-19 cases spanning through March 1, 2020, to May 31, 2021, were matched with internet search data from Google Trends. The reported weekly incidence numbers and the GT data were split into training and testing sets. ARIMA models were fitted to describe reported weekly COVID cases using the training set. Several COVID-related search terms were theoretically and empirically assessed for initial screening. The utilized Google Trends (GT) variable was added to the ARIMA model as a regressor. Model forecasts, both with and without GTD, were compared with weekly cases in the test set over 13 weeks. Forecast accuracies were compared visually and using RMSE (root mean square error) and MAE (mean average error). Statistical significance of the difference in predictions was determined with the two-sided Diebold-Mariano test. Preliminary results of contemporaneous correlations between COVID-related search terms and weekly COVID cases reveal “loss of smell,” “loss of taste,” “fever” (in order of magnitude) as significantly associated with the official cases. Predictions of the ARIMA model using solely reported case numbers resulted in an RMSE (root mean squared error) of 411.4 and mean absolute error (MAE) of 354.9. The GT expanded model achieved better forecasting accuracy (RMSE: 388.7 and MAE = 340.1). Corrected Akaike Information Criteria also favored the GT expanded model (869.4 vs. 872.2). The difference in predictive performances was significant when using a two-sided Diebold-Mariano test (DM = 6.75, p < 0.001) for the 13 weeks. Google trends data enhanced the predictive ability of a traditionally based model and should be considered a suitable method to enhance infectious disease modeling.

          Related collections

          Most cited references38

          • Record: found
          • Abstract: found
          • Article: not found

          An interactive web-based dashboard to track COVID-19 in real time

          In December, 2019, a local outbreak of pneumonia of initially unknown cause was detected in Wuhan (Hubei, China), and was quickly determined to be caused by a novel coronavirus, 1 namely severe acute respiratory syndrome coronavirus 2 (SARS-CoV-2). The outbreak has since spread to every province of mainland China as well as 27 other countries and regions, with more than 70 000 confirmed cases as of Feb 17, 2020. 2 In response to this ongoing public health emergency, we developed an online interactive dashboard, hosted by the Center for Systems Science and Engineering (CSSE) at Johns Hopkins University, Baltimore, MD, USA, to visualise and track reported cases of coronavirus disease 2019 (COVID-19) in real time. The dashboard, first shared publicly on Jan 22, illustrates the location and number of confirmed COVID-19 cases, deaths, and recoveries for all affected countries. It was developed to provide researchers, public health authorities, and the general public with a user-friendly tool to track the outbreak as it unfolds. All data collected and displayed are made freely available, initially through Google Sheets and now through a GitHub repository, along with the feature layers of the dashboard, which are now included in the Esri Living Atlas. The dashboard reports cases at the province level in China; at the city level in the USA, Australia, and Canada; and at the country level otherwise. During Jan 22–31, all data collection and processing were done manually, and updates were typically done twice a day, morning and night (US Eastern Time). As the outbreak evolved, the manual reporting process became unsustainable; therefore, on Feb 1, we adopted a semi-automated living data stream strategy. Our primary data source is DXY, an online platform run by members of the Chinese medical community, which aggregates local media and government reports to provide cumulative totals of COVID-19 cases in near real time at the province level in China and at the country level otherwise. Every 15 min, the cumulative case counts are updated from DXY for all provinces in China and for other affected countries and regions. For countries and regions outside mainland China (including Hong Kong, Macau, and Taiwan), we found DXY cumulative case counts to frequently lag behind other sources; we therefore manually update these case numbers throughout the day when new cases are identified. To identify new cases, we monitor various Twitter feeds, online news services, and direct communication sent through the dashboard. Before manually updating the dashboard, we confirm the case numbers with regional and local health departments, including the respective centres for disease control and prevention (CDC) of China, Taiwan, and Europe, the Hong Kong Department of Health, the Macau Government, and WHO, as well as city-level and state-level health authorities. For city-level case reports in the USA, Australia, and Canada, which we began reporting on Feb 1, we rely on the US CDC, the government of Canada, the Australian Government Department of Health, and various state or territory health authorities. All manual updates (for countries and regions outside mainland China) are coordinated by a team at Johns Hopkins University. The case data reported on the dashboard aligns with the daily Chinese CDC 3 and WHO situation reports 2 for within and outside of mainland China, respectively (figure ). Furthermore, the dashboard is particularly effective at capturing the timing of the first reported case of COVID-19 in new countries or regions (appendix). With the exception of Australia, Hong Kong, and Italy, the CSSE at Johns Hopkins University has reported newly infected countries ahead of WHO, with Hong Kong and Italy reported within hours of the corresponding WHO situation report. Figure Comparison of COVID-19 case reporting from different sources Daily cumulative case numbers (starting Jan 22, 2020) reported by the Johns Hopkins University Center for Systems Science and Engineering (CSSE), WHO situation reports, and the Chinese Center for Disease Control and Prevention (Chinese CDC) for within (A) and outside (B) mainland China. Given the popularity and impact of the dashboard to date, we plan to continue hosting and managing the tool throughout the entirety of the COVID-19 outbreak and to build out its capabilities to establish a standing tool to monitor and report on future outbreaks. We believe our efforts are crucial to help inform modelling efforts and control measures during the earliest stages of the outbreak.
            Bookmark
            • Record: found
            • Abstract: not found
            • Article: not found

            Testing the null hypothesis of stationarity against the alternative of a unit root

              Bookmark
              • Record: found
              • Abstract: found
              • Article: found
              Is Open Access

              Infodemiology and Infoveillance: Framework for an Emerging Set of Public Health Informatics Methods to Analyze Search, Communication and Publication Behavior on the Internet

              (2009)
              Infodemiology can be defined as the science of distribution and determinants of information in an electronic medium, specifically the Internet, or in a population, with the ultimate aim to inform public health and public policy. Infodemiology data can be collected and analyzed in near real time. Examples for infodemiology applications include: the analysis of queries from Internet search engines to predict disease outbreaks (eg. influenza); monitoring peoples' status updates on microblogs such as Twitter for syndromic surveillance; detecting and quantifying disparities in health information availability; identifying and monitoring of public health relevant publications on the Internet (eg. anti-vaccination sites, but also news articles or expert-curated outbreak reports); automated tools to measure information diffusion and knowledge translation, and tracking the effectiveness of health marketing campaigns. Moreover, analyzing how people search and navigate the Internet for health-related information, as well as how they communicate and share this information, can provide valuable insights into health-related behavior of populations. Seven years after the infodemiology concept was first introduced, this paper revisits the emerging fields of infodemiology and infoveillance and proposes an expanded framework, introducing some basic metrics such as information prevalence, concept occurrence ratios, and information incidence. The framework distinguishes supply-based applications (analyzing what is being published on the Internet, eg. on Web sites, newsgroups, blogs, microblogs and social media) from demand-based methods (search and navigation behavior), and further distinguishes passive from active infoveillance methods. Infodemiology metrics follow population health relevant events or predict them. Thus, these metrics and methods are potentially useful for public health practice and research, and should be further developed and standardized.
                Bookmark

                Author and article information

                Contributors
                Journal
                Front Res Metr Anal
                Front Res Metr Anal
                Front. Res. Metr. Anal.
                Frontiers in Research Metrics and Analytics
                Frontiers Media S.A.
                2504-0537
                15 September 2022
                2022
                15 September 2022
                : 7
                : 1003972
                Affiliations
                Centre for Applied Data Science, College of Business and Economics, University of Johannesburg , Johannesburg, South Africa
                Author notes

                Edited by: Felix Bankole, University of South Africa, South Africa

                Reviewed by: Ayankunle Taiwo, Schreiner University, United States; Kehinde Aruleba, University of Leicester, United Kingdom; George Obaido, University of California, San Diego, United States

                *Correspondence: Lateef Babatunde Amusa amusasuxes@ 123456gmail.com

                This article was submitted to Research Policy and Strategic Management, a section of the journal Frontiers in Research Metrics and Analytics

                Article
                10.3389/frma.2022.1003972
                9520600
                36186843
                ea2b533e-026a-4513-b4f6-1d704d587ccc
                Copyright © 2022 Amusa, Twinomurinzi and Okonkwo.

                This is an open-access article distributed under the terms of the Creative Commons Attribution License (CC BY). The use, distribution or reproduction in other forums is permitted, provided the original author(s) and the copyright owner(s) are credited and that the original publication in this journal is cited, in accordance with accepted academic practice. No use, distribution or reproduction is permitted which does not comply with these terms.

                History
                : 26 July 2022
                : 30 August 2022
                Page count
                Figures: 5, Tables: 2, Equations: 1, References: 41, Pages: 08, Words: 4640
                Categories
                Research Metrics and Analytics
                Original Research

                big data,google trends,arima,covid-19,infectious disease modeling

                Comments

                Comment on this article