A Two-stage Text Feature Selection Algorithm for Improving Text Classification

There is no author summary for this article yet. Authors can add summaries to their articles on ScienceOpen to make them more accessible to a non-specialist audience.

Abstract

As the number of digital text documents increases on a daily basis, the classification of text is becoming a challenging task. Each text document consists of a large number of words (or features) that drive down the efficiency of a classification algorithm. This article presents an optimized feature selection algorithm designed to reduce a large number of features to improve the accuracy of the text classification algorithm. The proposed algorithm uses noun-based filtering, a word ranking that enhances the performance of the text classification algorithm. Experiments are carried out on three benchmark datasets, and the results show that the proposed classification algorithm has achieved the maximum accuracy when compared to the existing algorithms. The proposed algorithm is compared to Term Frequency-Inverse Document Frequency, Balanced Accuracy Measure, GINI Index, Information Gain, and Chi-Square. The experimental results clearly show the strength of the proposed algorithm.

Related collections

Most cited references 42

Record: found
Abstract: not found
Article: not found

Wrappers for feature subset selection

Ron Kohavi, George H. John (1997)

0 comments Cited 1046 times – based on 0 reviews      Review now

Bookmark

Record: found
Abstract: not found
Article: not found

Machine learning in automated text categorization

Fabrizio Sebastiani (2002)

0 comments Cited 233 times – based on 0 reviews      Review now

Bookmark

Record: found
Abstract: not found
Article: not found

Analysis of Dimensionality Reduction Techniques on Big Data

G Thippa Reddy, M. Reddy, Kuruva Lakshmanna … (2020)

0 comments Cited 134 times – based on 0 reviews      Review now

Bookmark

All references

Author and article information

Contributors

Gautam Srivastava: (View ORCID Profile)

Thippa Reddy Gadekallu: (View ORCID Profile)

Journal

Title: ACM Transactions on Asian and Low-Resource Language Information Processing

Abbreviated Title: ACM Trans. Asian Low-Resour. Lang. Inf. Process.

Publisher: Association for Computing Machinery (ACM)

ISSN (Print): 2375-4699

ISSN (Electronic): 2375-4702

Publication date Created: May 2021

Publication date (Print): May 2021

Volume: 20

Issue: 3

Pages: 1-19

Affiliations

[1 ]Sri Ramachandra College of Engineering and Technology, Sri Ramachandra Institute of Higher Education and Research, Chennai, Tamil Nadu

[2 ]Department of Mathematics and Computer Science, Brandon University Research Center for Interneural Computing, China Medical University, Taichung, Taiwan, Republic of China

[3 ]School of Information Technology, VIT, Vellore, Tamil Nadu

Article

DOI: 10.1145/3425781

SO-VID: 0d4497ed-5035-4307-a05c-7e618f9245a8

History

Data availability:

Comments

Comment on this article

scite_

Cited by 13

See all cited by

Most referenced authors 252

See all reference authors

A Two-stage Text Feature Selection Algorithm for Improving Text Classification

Read this article at

Abstract

Related collections

Bioscientifica Reproduction

Most cited references 42

Wrappers for feature subset selection

Machine learning in automated text categorization

Analysis of Dimensionality Reduction Techniques on Big Data

Author and article information

Contributors

Journal

Affiliations

Article

History

Comments

Comment on this article

Similar content 39

Cited by 13

Most referenced authors 252