ジャーナル論文 Semi-automatic staging area for high-quality structured data extraction from scientific literature
Luca Foppiano (author) (この著者で検索)
ORCID https://orcid.org/0000-0002-6114-6164
National Institute for Materials Science
ORCID ;
Tomoya Mato (author) (この著者で検索)
ORCID SAMURAI ;
Kensei Terashima (author) (この著者で検索)
ORCID SAMURAI ;
Pedro Ortiz Suarez (author) (この著者で検索)
;
Taku Tou (author) (この著者で検索)
; ORCID ;
Wei-Sheng Wang (author) (この著者で検索)
ORCID https://orcid.org/0009-0001-3572-5736 (unauthenticated)
National Institute for Materials Science
ORCID ;
Toshiyuki Amagasa (author) (この著者で検索)
;
Yoshihiko Takano (author) (この著者で検索)
ORCID SAMURAI ;
Masashi Ishii (author) (この著者で検索)
ORCID SAMURAI
コレクション

引用
Luca Foppiano, Tomoya Mato, Kensei Terashima, Pedro Ortiz Suarez, Taku Tou, Chikako Sakai, Wei-Sheng Wang, Toshiyuki Amagasa, Yoshihiko Takano, Masashi Ishii. Semi-automatic staging area for high-quality structured data extraction from scientific literature. Science and Technology of Advanced Materials: Methods. 2023, (), 2286219 . https://doi.org/10.1080/27660400.2023.2286219
SAMURAI

説明:

(abstract)

We propose a semi-automatic staging area for efficiently building an accurate database of experimental physical properties of superconductors from literature, called SuperCon2, to enrich the existing manually-built superconductor database SuperCon. Here we report our curation interface (SuperCon2 Interface) and a workflow managing the state transitions of each examined record, to validate the dataset of superconductors from PDF documents collected using Grobid-superconductors in a previous work. This curation workflow allows both automatic and manual operations, the former contains ‘anomaly detection’ that scans new data identifying outliers, and a ‘training data collector’ mechanism that collects training data examples based on manual corrections. Such training data collection policy is effective in improving the machine-learning models with a reduced number of examples. For manual operations, the interface (SuperCon2 interface) is developed to increase efficiency during manual correction by providing a smart interface and an enhanced PDF document viewer. We show that our interface significantly improves the curation quality by boosting precision and recall as compared with the traditional ‘manual correction’. Our semi-automatic approach would provide a solution for achieving a reliable database with text-data mining of scientific documents.

権利情報:

キーワード: materials informatics, superconductors, machine learning, database, tdm

刊行年月日: 2023-12-31

出版者: Informa UK Limited

掲載誌:

  • Science and Technology of Advanced Materials: Methods (ISSN: 27660400) 2286219

研究助成金:

  • Research and Development

原稿種別: 出版者版 (Version of record)

MDR DOI:

公開URL: https://doi.org/10.1080/27660400.2023.2286219

関連資料:

その他の識別子:

連絡先:

更新時刻: 2024-07-11 16:30:24 +0900

MDRでの公開時刻: 2024-07-11 16:30:24 +0900

ファイル名 サイズ
ファイル名 Semi-automatic staging area for high-quality structured data extraction from scientific literature.pdf (サムネイル)
application/pdf
サイズ 7.34MB 詳細