Opportunities and challenges of text mining in aterials research

There is no author summary for this article yet. Authors can add summaries to their articles on ScienceOpen to make them more accessible to a non-specialist audience.

Abstract

Research publications are the major repository of scientific knowledge. However, their unstructured and highly heterogenous format creates a significant obstacle to large-scale analysis of the information contained within. Recent progress in natural language processing (NLP) has provided a variety of tools for high-quality information extraction from unstructured text. These tools are primarily trained on non-technical text and struggle to produce accurate results when applied to scientific text, involving specific technical terminology. During the last years, significant efforts in information retrieval have been made for biomedical and biochemical publications. For materials science, text mining (TM) methodology is still at the dawn of its development. In this review, we survey the recent progress in creating and applying TM and NLP approaches to materials science field. This review is directed at the broad class of researchers aiming to learn the fundamentals of TM as applied to the materials science publications.

Graphical Abstract

Abstract

Data Analysis; Computing Methodology; Computational Materials Science; Materials Design

Related collections

Most cited references 126

Record: found
Abstract: found
Article: not found

Long Short-Term Memory

Jürgen Schmidhuber, Jürgen Schmidhuber (2003)

Learning to store information over extended time intervals by recurrent backpropagation takes a very long time, mostly because of insufficient, decaying error backflow. We briefly review Hochreiter's (1991) analysis of this problem, then address it by introducing a novel, efficient, gradient-based method called long short-term memory (LSTM). Truncating the gradient where this does not do harm, LSTM can learn to bridge minimal time lags in excess of 1000 discrete-time steps by enforcing constant error flow through constant error carousels within special units. Multiplicative gate units learn to open and close access to the constant error flow. LSTM is local in space and time; its computational complexity per time step and weight is O(1). Our experiments with artificial data involve local, distributed, real-valued, and noisy pattern representations. In comparisons with real-time recurrent learning, back propagation through time, recurrent cascade correlation, Elman nets, and neural sequence chunking, LSTM leads to many more successful runs, and learns much faster. LSTM also solves complex, artificial long-time-lag tasks that have never been solved by previous recurrent network algorithms.

0 comments Cited 7846 times – based on 0 reviews      Review now

Bookmark

Record: found
Abstract: found
Article: not found

Attention Is All You Need

Ashish Vaswani, Noam Shazeer, Niki Parmar … (2017)

The dominant sequence transduction models are based on complex recurrent or convolutional neural networks in an encoder-decoder configuration. The best performing models also connect the encoder and decoder through an attention mechanism. We propose a new simple network architecture, the Transformer, based solely on attention mechanisms, dispensing with recurrence and convolutions entirely. Experiments on two machine translation tasks show these models to be superior in quality while being more parallelizable and requiring significantly less time to train. Our model achieves 28.4 BLEU on the WMT 2014 English-to-German translation task, improving over the existing best results, including ensembles by over 2 BLEU. On the WMT 2014 English-to-French translation task, our model establishes a new single-model state-of-the-art BLEU score of 41.8 after training for 3.5 days on eight GPUs, a small fraction of the training costs of the best models from the literature. We show that the Transformer generalizes well to other tasks by applying it successfully to English constituency parsing both with large and limited training data. 15 pages, 5 figures

0 comments Cited 3310 times – based on 0 reviews      Review now

Bookmark

Record: found
Abstract: not found
Conference Proceedings: not found

Rethinking the Inception Architecture for Computer Vision

Christian Szegedy, Vincent Vanhoucke, Sergey Ioffe … (2016)

0 comments Cited 1781 times – based on 0 reviews

Bookmark

All references

Author and article information

Contributors

Gerbrand Ceder

Journal

Journal ID (nlm-ta): iScience

Journal ID (iso-abbrev): iScience

Title: iScience

Publisher: Elsevier

ISSN (Electronic): 2589-0042

Publication date PMC-release: 06 February 2021

Publication date Collection: 19 March 2021

Publication date (Electronic): 06 February 2021

Volume: 24

Issue: 3

Electronic Location Identifier: 102155

Affiliations

[1 ]Department of Materials Science & Engineering, University of California, Berkeley, CA 94720, USA

[2 ]Materials Sciences Division, Lawrence Berkeley National Laboratory, Berkeley, CA 94720, USA

[3 ]Department of Materials Science & Engineering, MIT, Cambridge, MA 02139, USA

Author notes

[∗ ]Corresponding author gceder@ 123456berkeley.edu

Article

Publisher Item ID: S2589-0042(21)00123-1 Publisher ID: 102155

DOI: 10.1016/j.isci.2021.102155

PMC ID: 7905448

PubMed ID: 33665573

SO-VID: 7c8fe87a-db1f-4b58-b75c-93d4dc68f4d5

License:

This is an open access article under the CC BY-NC-ND license (http://creativecommons.org/licenses/by-nc-nd/4.0/).

History

Comments

Comment on this article

scite_

Smart Citations

Citing PublicationsSupportingMentioningContrasting

View Citations

See how this article has been cited at scite.ai

scite shows how a scientific paper has been cited by providing the context of the citation, a classification describing whether it supports, mentions, or contrasts the cited claim, and a label indicating in which section the citation was made.

Cited by 44

See all cited by

Most referenced authors 2,057

See all reference authors

Opportunities and challenges of text mining in aterials research

Read this article at

Abstract

Graphical Abstract

Abstract

Related collections

Radiology Science

Most cited references 126

Long Short-Term Memory

Attention Is All You Need

Rethinking the Inception Architecture for Computer Vision

Author and article information

Contributors

Journal

Affiliations

Author notes

Article

History

Categories

Comments

Comment on this article

Similar content 178

Cited by 44

Most referenced authors 2,057