Luca Foppiano
(MaDIS, NIMS)
;
Pedro Baptista Castro
(MANA, NIMS)
;
Pedro Ortiz Suarez
(Data and Web Science Group, University of Mannheim)
;
Kensei Terashima
(MANA, NIMS)
;
Yoshihiko Takano
(MANA, NIMS)
;
Masashi Ishii
(MaDIS, NIMS)
Description:
(abstract)The automatic extraction of materials and related properties from the scientific literature is gaining attention in data-driven materials science (Materials Informatics). In this paper, we discuss Grobid-superconductors, our solution for automatically extracting superconductor material names and respective properties from text. Built as a Grobid module, it combines machine learning and heuristic approaches in a multi-step architecture that supports input data as raw text or PDF documents. Using Grobid-superconductors, we built SuperCon2, a database of 40,324 materials and properties records from 37,700 papers. The material (or sample) information is represented by name, chemical formula, and material class, and is characterized by shape, doping, substitution variables for components, and substrate as adjoined information. The properties include the Tc superconducting critical temperature and, when available, applied pressure with the Tc measurement method.
Rights:
Keyword: Materials informatics, superconductors, machine learning, NLP, TDM
Date published: 2023-12-31
Publisher: Tayor & Francis
Journal:
Funding:
Manuscript type: Publisher's version (Version of record)
MDR DOI:
First published URL: https://doi.org/10.1080/27660400.2022.2153633
Related item:
Other identifier(s):
Contact agent:
Updated at: 2024-01-05 22:12:53 +0900
Published on MDR: 2023-12-27 08:30:15 +0900
Filename | Size | |||
---|---|---|---|---|
Filename |
Automatic extraction of materials and properties from superconductors scientific literature.pdf
(Thumbnail)
application/pdf |
Size | 8.61 MB | Detail |