Comments from Dr.Guanlan Zhang:
In this work, the author proposed a learning-based framework to evaluate public opinion through social media platform. Most of the contents are presented clearly with a proper use of language. The methods effectively solve the problem, and the experimental results are acceptable.
We would like to thank Dr. Guanlan Zhang for the detailed assessment of our work. We have considered all the comments in the revision, and our responses are as follows:
- there are symbols not correct in math formulas. In Eq.1), the x_1 and p(xn|y) are not in correct form.
- Authors’ response: We have corrected the Eq.(1) where the vectors and probability are in corrected form. In addition to correct form, we also explained the mathematical formulation of the multinomial naïve Bayes classifier in greater detail, as shown in Section 3.1.
- the state-of-the-art methods are not clearly stated and the comparison between the proposed method and the SOTA is not verified. What is the significance of this work over previous work?
- Authors’ response: The state-of-art method in natural language processing is now discussed in Section 1.3 and Section 3.2. Section 1.3 covers the transformer architectures, and section 3.2 discusses the state-of-art pre-trained model (RoBERTa). Our proposed methods for sentiment analysis combine RoBERTa with two-layer bidirectional gated recurrent units and a fully connected layer with a soft-max function. The RoBERTa-BiGRU performance is compared with MNB and VADER, as suggested by the third reviewer Dr.Kwadwo Agyapon-Ntra.
- in section 3.2.2, on what machine do you train your model and what is the time consumption?
- Authors’ response: Section 3.4 now includes the details of machine specifications. The RoBERTa model is trained on a Tesla T4 GPU with a total training time of 2421.23 seconds for 100 epochs. The Python Jupyter notebook is also available in the GitHub link, which is provided at the end of this article.
- the training results in section 3.2.2 is not very good. Have you consider using deep neural networks to solve the problem?
- Authors’ response: The sentiment classifier now uses the state-of-art deep neural network, which has a much better performance in terms of both accuracy (89.52%) and ROC curves (AUC score: 0.8904).
- how to evaluate the accuracy of sentiment analysis and topic modeling?
- Authors’ response: The performance of the sentiment classifier is quantified and evaluated against MNB and VADER in terms of both accuracy and ROC curved, which is highlighted in Section 3.4. Topic modelling is a type of unsupervised machine learning model, which use LDA to generate topics. The performance of LDA topic modelling can be evaluated with manual inspection . The detail of manual inspection is covered in Section 3.5.
 Jiang, H., Qiang, M., & Lin, P. (2016). Assessment of online public opinions on large infrastructure projects: A case study of the Three Gorges Project in China. Environmental Impact Assessment Review,61, 38–51. https://doi.org/10.1016/j.eiar.2016.06.004</div>
Comment from Dr. Chrisina Jayne
The paper investigates sentiment analysis and topic modelling using machine learning based on Twitter data. It considers a specific topic related to the UK railway project, High Speed 2. The paper compares Multinomial Naïve Bayes and Support Vector Machine for sentiment analysis of tweets. Topic modelling was conducted with Latent Dirichlet Allocation (LDA) using publicly available scripts. Experiments, discussion, and results are presented. The paper is written well, and sufficient background is included. The references are appropriate but some more recent ones could have been included. The paper provides insights into the feasibility of using social media data for public opinion evaluation of civil infrastructure projects. The study's contribution lies in presenting a public opinion evaluation framework with a machine learning algorithm and comparing the accuracy of two classifiers.
We would like to thank Dr. Chrisina Jayne for feedbacks on our work. The more recent references, such as transformer models and lasted development in natural language processing, are included.
Comment from Dr. Kwadwo Agyapon-Ntra
In this study, the authors successfull conduct an analysis of public opinions on a public infrastructure project (the High Speed 2 railway project in the United Kingdom) using data from Twitter and machine learng algorithms. The approach is sound, but the methodology could do with the adoption of SOTA models, benchmarking against very basic models, and handling potentially imbalanced datasets. Details are provided below:
We would like to thank Dr. Kwadwo Agyapon-Ntra for the critical review comments. These comments help us improve the manuscript significantly.
- Deep learning techniques, especially those that employ transformer architectures are the current SOTA. While methods like Naive Baye, SVM’s and LDA are still very useful, it would be prudent to compare with the results from transformer-based deep learning architectures. Neural networks in the transformer family fine-tuned for specific tasks like classification have proven to be a very promising research direction in recent years, and some models like twitter-roberta-base-sentiment can be used out of the box. Since these deep learning architectures transform text into numerical embeddings that preserve semantic context to a degree, they reduce the amount of pre-processing that has to be done on tweets (like stemming and stop-word removal).
- Authors’ response: The sentiment classifier is now based on RoBERTa as transformer encoders. The outputs of RoBERTa mode are fine-tuned by two-layer bidirectional gated recurrent units and a fully connected layer with soft-max function. The details of the sentiment classifier are presented in Section 2.2
- Another good tool to consider for establishing baselines is the VADER sentiment analysis model, which was developed specifically for social media use-cases. In the worst case it can serve as a reasonable baseline, since it requires no training.
- Authors’ response: We greatly appreciated this review suggestion the VADER is now used as a baseline to compare the sentiment analysis with proposed RoBERTa-BiGRU. The performance of sentiment classifiers are compared with respect to accuracy and ROC curves.
- Steps should be taken to address dataset imbalance. If any such steps were taken, they were not stated. This can cause issues for a classifier, such as overfitting to a label with an overwhelmingly higher representaion. The F1 score is a good metric for catching this, but it might be better to train on a balanced dataset.
- Authors’ response: The scarcity of positive tweets about HS2 leads to imbalanced training data. We addressed this issue in two approaches. Firstly, we increased the training dataset to 1,400 tweets from around 900 tweets, which included more positive tweets. Secondly, the classification task is modified to a binary sentiment classification, including negative (700 tweets) and non-negative(700 tweets). As a result, the more balanced training dataset lead to a better performance of sentiment classifiers. The MNB classifier accuracy increased from 64% to 82.62%, even though with the same structure.
In addition to the revisions, we mentioned in response to reviewers’ comments. The following revisions are made:
- we updated all the figures such that figures are more consistent and have better clarity
- we merged the literature review into Section 1. Section 2 now exclusively discusses the details of machine learning algorithms. Section 3 has changed to present a case study with HS2, where Section 3.1 covers the background of HS2 in more detail. The data collection and processing are changed to section 3.2. The above modifications help improve the clarity and delivery of this article.
- we rewrite the abstract and highlight the research motivation and main contributions in Section 1.4. This change helps multi-disciplinary readers to comprehend our manuscript.