桂 ゆかり
(National Institute for Materials Science)
;
熊谷 将也
;
郡司 咲子
;
今井 庸二
;
木村 薫
Description:
(abstract)Although numerous papers are published each year, most of the experimental data reported in those papers are only available as two-dimensional plot images. Data-driven materials science using the machine learning technologies will be accelerated by gathering those published experimental data into a database. By taking thermoelectric materials as a test case, we attempted to optimize the processes of collection of papers, extraction of numeric data from plot images, and sample-based data storage into a database. By searching with a keyword “thermoelectric”, we obtained a list of 47,936 papers. Among these papers, we selected 18,471 papers as possible papers with thermoelectric properties, and succeeded to download 14,835 full-text PDF files. We developed a web system named “Starry data”, to assist the sequential data extraction from the images contained in those PDF files. This system also assists materials scientists to annotate experimental samples efficiently, to develop a descriptive database that can be used for machine-learning of the complex, sample-dependent materials properties.
Rights:
Keyword: materials informatics, materials database, data curation, thermoelectric materials
Date published: 2017-08-30
Publisher: Japan Society of Powder and Powder Metallurgy
Journal:
Funding:
Manuscript type: Author's version (Accepted manuscript)
MDR DOI: https://doi.org/10.48505/nims.4832
First published URL: https://doi.org/10.2497/jjspm.64.467
Related item:
Other identifier(s):
Contact agent:
Updated at: 2024-10-10 16:30:55 +0900
Published on MDR: 2024-10-10 16:30:55 +0900
Filename | Size | |||
---|---|---|---|---|
Filename |
粉体粉末冶金協会特集号記事5(著者版).pdf
(Thumbnail)
application/pdf |
Size | 688 KB | Detail |