Article Mining experimental data from materials science literature with large language models: an evaluation study

Luca Foppiano ORCID ; Guillaume Lambard SAMURAI ORCID ; Toshiyuki Amagasa ; Masashi Ishii SAMURAI ORCID

Collection

Citation
Luca Foppiano, Guillaume Lambard, Toshiyuki Amagasa, Masashi Ishii. Mining experimental data from materials science literature with large language models: an evaluation study. Science and Technology of Advanced Materials: Methods. 2024, 4 (1), . https://doi.org/10.1080/27660400.2024.2356506
SAMURAI

Description:

(abstract)

This study is dedicated to assessing the capabilities of large language models (LLMs) such as GPT-3.5-Turbo, GPT-4, and GPT-4-Turbo in the extraction of structured information from scientific documents in materials science. To this end, we primarily focus on (i) a named entity recognition (NER) of studied materials and physical properties and (ii) a relation extraction (RE) between these entities. The performance of LLMs in executing these tasks is benchmarked against traditional models, BERT and rule-based approaches. As a typical result, GPT-4 and GPT-4-Turbo display remarkable reasoning and relationship extraction capabilities after being provided with merely a couple of examples.

Rights:

Keyword: Large language models, benchmark, NER, TDM, evaluation, materials science

Date published: 2024-12-31

Publisher: Informa UK Limited

Journal:

  • Science and Technology of Advanced Materials: Methods (ISSN: 27660400) vol. 4 issue. 1

Funding:

  • Research and Development JPMXP1122715503

Manuscript type: Publisher's version (Version of record)

MDR DOI:

First published URL: https://doi.org/10.1080/27660400.2024.2356506

Related item:

Other identifier(s):

Contact agent:

Updated at: 2024-10-24 16:30:23 +0900

Published on MDR: 2024-10-24 16:30:24 +0900