Public Opinion Analysis on Social Media Platforms: A Case Study of High Speed 2 (HS2) Rail Infrastructure Project

In this work, the author proposed a learning-based framework to evaluate public opinion through social media platform. Most of the contents are presented clearly with a proper use of language. The methods effectively solve the problem, and the experimental results are acceptable.

Authors’ response:

We would like to thank Dr. Guanlan Zhang for the detailed assessment of our work. We have considered all the comments in the revision, and our responses are as follows:

Comments:

there are symbols not correct in math formulas. In Eq.1), the x_1 and p(xn|y) are not in correct form.
- Authors’ response: We have corrected the Eq.(1) where the vectors and probability are in corrected form. In addition to correct form, we also explained the mathematical formulation of the multinomial naïve Bayes classifier in greater detail, as shown in Section 3.1.
the state-of-the-art methods are not clearly stated and the comparison between the proposed method and the SOTA is not verified. What is the significance of this work over previous work?
- Authors’ response: The state-of-art method in natural language processing is now discussed in Section 1.3 and Section 3.2. Section 1.3 covers the transformer architectures, and section 3.2 discusses the state-of-art pre-trained model (RoBERTa). Our proposed methods for sentiment analysis combine RoBERTa with two-layer bidirectional gated recurrent units and a fully connected layer with a soft-max function. The RoBERTa-BiGRU performance is compared with MNB and VADER, as suggested by the third reviewer Dr.Kwadwo Agyapon-Ntra.
in section 3.2.2, on what machine do you train your model and what is the time consumption?
- Authors’ response: Section 3.4 now includes the details of machine specifications. The RoBERTa model is trained on a Tesla T4 GPU with a total training time of 2421.23 seconds for 100 epochs. The Python Jupyter notebook is also available in the GitHub link, which is provided at the end of this article.
the training results in section 3.2.2 is not very good. Have you consider using deep neural networks to solve the problem?
- Authors’ response: The sentiment classifier now uses the state-of-art deep neural network, which has a much better performance in terms of both accuracy (89.52%) and ROC curves (AUC score: 0.8904).
how to evaluate the accuracy of sentiment analysis and topic modeling?
- Authors’ response: The performance of the sentiment classifier is quantified and evaluated against MNB and VADER in terms of both accuracy and ROC curved, which is highlighted in Section 3.4. Topic modelling is a type of unsupervised machine learning model, which use LDA to generate topics. The performance of LDA topic modelling can be evaluated with manual inspection [1]. The detail of manual inspection is covered in Section 3.5.

[1] Jiang, H., Qiang, M., & Lin, P. (2016). Assessment of online public opinions on large infrastructure projects: A case study of the Three Gorges Project in China. Environmental Impact Assessment Review,61, 38–51. https://doi.org/10.1016/j.eiar.2016.06.004</div>

Comment from Dr. Chrisina Jayne

The paper investigates sentiment analysis and topic modelling using machine learning based on Twitter data. It considers a specific topic related to the UK railway project, High Speed 2. The paper compares Multinomial Naïve Bayes and Support Vector Machine for sentiment analysis of tweets. Topic modelling was conducted with Latent Dirichlet Allocation (LDA) using publicly available scripts. Experiments, discussion, and results are presented. The paper is written well, and sufficient background is included. The references are appropriate but some more recent ones could have been included. The paper provides insights into the feasibility of using social media data for public opinion evaluation of civil infrastructure projects. The study's contribution lies in presenting a public opinion evaluation framework with a machine learning algorithm and comparing the accuracy of two classifiers.

Authors’ response:

We would like to thank Dr. Chrisina Jayne for feedbacks on our work. The more recent references, such as transformer models and lasted development in natural language processing, are included.

Comment from Dr. Kwadwo Agyapon-Ntra

Summary:
In this study, the authors successfull conduct an analysis of public opinions on a public infrastructure project (the High Speed 2 railway project in the United Kingdom) using data from Twitter and machine learng algorithms. The approach is sound, but the methodology could do with the adoption of SOTA models, benchmarking against very basic models, and handling potentially imbalanced datasets. Details are provided below:

Authors’ response:

We would like to thank Dr. Kwadwo Agyapon-Ntra for the critical review comments. These comments help us improve the manuscript significantly.

Deep learning techniques, especially those that employ transformer architectures are the current SOTA. While methods like Naive Baye, SVM’s and LDA are still very useful, it would be prudent to compare with the results from transformer-based deep learning architectures. Neural networks in the transformer family fine-tuned for specific tasks like classification have proven to be a very promising research direction in recent years, and some models like twitter-roberta-base-sentiment can be used out of the box. Since these deep learning architectures transform text into numerical embeddings that preserve semantic context to a degree, they reduce the amount of pre-processing that has to be done on tweets (like stemming and stop-word removal).
- Authors’ response: The sentiment classifier is now based on RoBERTa as transformer encoders. The outputs of RoBERTa mode are fine-tuned by two-layer bidirectional gated recurrent units and a fully connected layer with soft-max function. The details of the sentiment classifier are presented in Section 2.2
Another good tool to consider for establishing baselines is the VADER sentiment analysis model, which was developed specifically for social media use-cases. In the worst case it can serve as a reasonable baseline, since it requires no training.
- Authors’ response: We greatly appreciated this review suggestion the VADER is now used as a baseline to compare the sentiment analysis with proposed RoBERTa-BiGRU. The performance of sentiment classifiers are compared with respect to accuracy and ROC curves.
Steps should be taken to address dataset imbalance. If any such steps were taken, they were not stated. This can cause issues for a classifier, such as overfitting to a label with an overwhelmingly higher representaion. The F1 score is a good metric for catching this, but it might be better to train on a balanced dataset.
- Authors’ response: The scarcity of positive tweets about HS2 leads to imbalanced training data. We addressed this issue in two approaches. Firstly, we increased the training dataset to 1,400 tweets from around 900 tweets, which included more positive tweets. Secondly, the classification task is modified to a binary sentiment classification, including negative (700 tweets) and non-negative(700 tweets). As a result, the more balanced training dataset lead to a better performance of sentiment classifiers. The MNB classifier accuracy increased from 64% to 82.62%, even though with the same structure.

Other revisions:

In addition to the revisions, we mentioned in response to reviewers’ comments. The following revisions are made:

we updated all the figures such that figures are more consistent and have better clarity
we merged the literature review into Section 1. Section 2 now exclusively discusses the details of machine learning algorithms. Section 3 has changed to present a case study with HS2, where Section 3.1 covers the background of HS2 in more detail. The data collection and processing are changed to section 3.2. The above modifications help improve the clarity and delivery of this article.
we rewrite the abstract and highlight the research motivation and main contributions in Section 1.4. This change helps multi-disciplinary readers to comprehend our manuscript.

Abstract

Abstract: Public opinion evaluation is becoming increasingly significant in infrastructure project assessment. The inefficiencies of conventional evaluation approaches can be improved with social media analysis. Posts about infrastructure projects on social media provide a large amount of data for assessing public opinion. This study proposed a hybrid model which combines pre-trained RoBERTa and gated recurrent units for sentiment analysis. We selected the United Kingdom railway project, HighSpeed 2, as the case study. The sentiment analysis showed the proposed hybrid model has good performance in classifying social media sentiment. Furthermore, the study applies LDA topic modelling to identify key themes within the tweet corpus, providing deeper insights into the prominent topics surrounding the HS2 project. The findings from this case study serve as the basis for a comprehensive public opinion evaluation framework driven by social media data. This framework offers policymakers a valuable tool to effectively assess and analyse public sentiment.

Content

Author and article information

Journal

Title: UCL Open: Environment Preprint

Publisher: UCL Press

Publication date (Electronic preprint): 2 June 2023

Affiliations

[1 ] Civil, Environmental and Geomatic Engineering / University College London / London / the United Kingdom;

[2 ] Department of Civil and Environmental Engineering / Northeastern University / Boston / the United States;

Author notes

[* ]Email: ruiqiu.yao.19@ 123456ucl.ac.uk .

Author information

Ruiqiu YAO https://orcid.org/0000-0002-2596-5031

Article

DOI: 10.14324/111.444/000154.v2

SO-VID: 69248a26-e9e7-4399-af60-806e7c557580

License:

This is an open access article distributed under the terms of the Creative Commons Attribution License (CC BY) 4.0 https://creativecommons.org/licenses/by/4.0/, which permits unrestricted use, distribution and reproduction in any medium, provided the original author and source are credited.

History

Date received : 16 June 2022

Date accepted : 30 June 2023

Comments

UCL Open: Environment Editorial Office wrote:

Date: 30 June 2023

Handling Editor: Prof Dan Osborn

Editorial decision: Accept. This revised article has been accepted following peer review and it is suitable for publication in UCL Open: Environment.

2023-06-30 16:07 UTC

UCL Press journals including UCL Open Environment have now moved website.

You will now find the journal, all publications, reviews and submission information at https://journals.uclpress.co.uk/ucloe

Public Opinion Analysis on Social Media Platforms: A Case Study of High Speed 2 (HS2) Rail Infrastructure Project

Revision notes

Abstract

Content

Author and article information

Journal

Affiliations

Author notes

Author information

Article

History

Categories

Comments

Comment on this article