1
views
0
recommends
+1 Recommend
0 collections
    0
    shares
      • Record: found
      • Abstract: found
      • Article: found
      Is Open Access

      A deep learning approach for Named Entity Recognition in Urdu language

      research-article

      Read this article at

      Bookmark
          There is no author summary for this article yet. Authors can add summaries to their articles on ScienceOpen to make them more accessible to a non-specialist audience.

          Abstract

          Named Entity Recognition (NER) is a natural language processing task that has been widely explored for different languages in the recent decade but is still an under-researched area for the Urdu language due to its rich morphology and language complexities. Existing state-of-the-art studies on Urdu NER use various deep-learning approaches through automatic feature selection using word embeddings. This paper presents a deep learning approach for Urdu NER that harnesses FastText and Floret word embeddings to capture the contextual information of words by considering the surrounding context of words for improved feature extraction. The pre-trained FastText and Floret word embeddings are publicly available for Urdu language which are utilized to generate feature vectors of four benchmark Urdu language datasets. These features are then used as input to train various combinations of Long Short-Term Memory (LSTM), Bidirectional LSTM (BiLSTM), Gated Recurrent Unit (GRU), CRF, and deep learning models. The results show that our proposed approach significantly outperforms existing state-of-the-art studies on Urdu NER, achieving an F-score of up to 0.98 when using BiLSTM+GRU with Floret embeddings. Error analysis shows a low classification error rate ranging from 1.24% to 3.63% across various datasets showing the robustness of the proposed approach. The performance comparison shows that the proposed approach significantly outperforms similar existing studies.

          Related collections

          Most cited references36

          • Record: found
          • Abstract: not found
          • Article: not found

          Enriching Word Vectors with Subword Information

            Bookmark
            • Record: found
            • Abstract: found
            • Article: not found

            A Review of Recurrent Neural Networks: LSTM Cells and Network Architectures

            Recurrent neural networks (RNNs) have been widely adopted in research areas concerned with sequential data, such as text, audio, and video. However, RNNs consisting of sigma cells or tanh cells are unable to learn the relevant information of input data when the input gap is large. By introducing gate functions into the cell structure, the long short-term memory (LSTM) could handle the problem of long-term dependencies well. Since its introduction, almost all the exciting results based on RNNs have been achieved by the LSTM. The LSTM has become the focus of deep learning. We review the LSTM cell and its variants to explore the learning capacity of the LSTM cell. Furthermore, the LSTM networks are divided into two broad categories: LSTM-dominated networks and integrated LSTM networks. In addition, their various applications are discussed. Finally, future research directions are presented for LSTM networks.
              Bookmark
              • Record: found
              • Abstract: not found
              • Article: not found

              Unsupervised named-entity extraction from the Web: An experimental study

                Bookmark

                Author and article information

                Contributors
                Role: ConceptualizationRole: Data curationRole: Writing – original draft
                Role: ConceptualizationRole: Formal analysisRole: Writing – original draft
                Role: Data curationRole: Formal analysisRole: Methodology
                Role: MethodologyRole: SoftwareRole: Visualization
                Role: InvestigationRole: ResourcesRole: Visualization
                Role: Funding acquisitionRole: InvestigationRole: Project administration
                Role: ResourcesRole: SoftwareRole: Validation
                Role: SupervisionRole: ValidationRole: Writing – review & editing
                Role: Editor
                Journal
                PLoS One
                PLoS One
                plos
                PLOS ONE
                Public Library of Science (San Francisco, CA USA )
                1932-6203
                2024
                28 March 2024
                : 19
                : 3
                : e0300725
                Affiliations
                [1 ] Department of Computer Science, COMSATS University Islamabad, Lahore, Pakistan
                [2 ] Department of Computer Science, Government College University, Lahore, Pakistan
                [3 ] Department of Signal Theory, Communications and Telematics Engineering, Unviersity of Valladolid, Valladolid - Spain
                [4 ] Universidad Europea del Atlántico, Santander, Spain
                [5 ] Universidad Internacional Iberoamericana Arecibo, Puerto Rico, Puerto Rico, United States of America
                [6 ] Universidade Internacional do Cuanza, Cuito, Bié, Angola
                [7 ] Universidad Internacional Iberoamericana Campeche, México
                [8 ] Fundación Universitaria Internacional de Colombia Bogotá, Bogotá, Colombia
                [9 ] Department of Information and Communication Engineering, Yeungnam University, Gyeongsan, Korea
                University of Sargodha, PAKISTAN
                Author notes

                Competing Interests: The authors have declared that no competing interests exist.

                Author information
                https://orcid.org/0000-0002-0114-0887
                https://orcid.org/0000-0003-3134-7720
                https://orcid.org/0000-0002-8271-6496
                Article
                PONE-D-23-41659
                10.1371/journal.pone.0300725
                10977791
                38547173
                3fb749a5-a6ee-4911-bc4b-138fbe86d426
                © 2024 Anam et al

                This is an open access article distributed under the terms of the Creative Commons Attribution License, which permits unrestricted use, distribution, and reproduction in any medium, provided the original author and source are credited.

                History
                : 19 December 2023
                : 1 March 2024
                Page count
                Figures: 3, Tables: 5, Pages: 21
                Funding
                Funded by: the European University of Atlantic
                This research was supported by the European University of the Atlantic. The funders had no role in study design, data collection and analysis, decision to publish, or preparation of the manuscript.
                Categories
                Research Article
                Biology and Life Sciences
                Neuroscience
                Cognitive Science
                Cognitive Psychology
                Language
                Biology and Life Sciences
                Psychology
                Cognitive Psychology
                Language
                Social Sciences
                Psychology
                Cognitive Psychology
                Language
                Computer and Information Sciences
                Information Technology
                Natural Language Processing
                Word Embedding
                Social Sciences
                Linguistics
                Semantics
                Social Sciences
                Linguistics
                Linguistic Morphology
                Computer and Information Sciences
                Artificial Intelligence
                Machine Learning
                Deep Learning
                Social Sciences
                Linguistics
                Grammar
                Syntax
                Computer and Information Sciences
                Software Engineering
                Programming Languages
                Engineering and Technology
                Software Engineering
                Programming Languages
                Computer and Information Sciences
                Information Technology
                Natural Language Processing
                Named Entity Recognition
                Custom metadata
                The datasets used in this study are publicly available at the following link: https://github.com/tahirmuhammad/Urdu-NER-Datasets.

                Uncategorized
                Uncategorized

                Comments

                Comment on this article