Journal article Development of LLM-assisted data curation tools for the Starrydata materials science database
Yukari Katsura (author) (Search by this author)
ORCID SAMURAI ;
Tomoya Mato (author) (Search by this author)
ORCID https://orcid.org/0000-0002-0918-6468
Center for Basic Research on Materials, National Institute for Materials Science
SAMURAI NIMS Researchers Directory SAMURAI
ORCID SAMURAI ;
Yu Takada (author) (Search by this author)
ORCID https://orcid.org/0009-0002-0709-1817
Center for Basic Research on Materials, National Institute for Materials Science
SAMURAI NIMS Researchers Directory SAMURAI
ORCID SAMURAI ;
Eiji Koyama (author) (Search by this author)
Center for Basic Research on Materials, National Institute for Materials Science (NIMS)
;
Dewi Yana (author) (Search by this author)
Center for Basic Research on Materials, National Institute for Materials Science (NIMS)
;
Atsumi Tanaka (author) (Search by this author)
Center for Basic Research on Materials, National Institute for Materials Science (NIMS)
;
Masaya Kumagai (author) (Search by this author)
Sakura Internet Research Center, Sakura Internet Inc.
Collection

Citation
Yukari Katsura, Tomoya Mato, Yu Takada, Eiji Koyama, Dewi Yana, Atsumi Tanaka, Masaya Kumagai. Development of LLM-assisted data curation tools for the Starrydata materials science database. Science and Technology of Advanced Materials: Methods. 2025, 5 (1), 2590811. https://doi.org/10.1080/27660400.2025.2590811

Description:

(abstract)

We developed two LLM-assisted tools to accelerate data collection from materials science publications for the Starrydata database. The first tool, Starrydata Auto-Suggestion, generates concise English descriptions from abstracts and methods that conform to our database schema, integrated into the Starrydata2 platform using lightweight models. The second tool is a dual-component system: Auto-Summary GPT processes PDF files to generate comprehensive JSON output capturing all figures, tables, and samples, while Auto-Summary Viewer transforms this into interactive tables for efficient curator review. These tools enhance curation efficiency and advance automated scientific database construction.

Rights:

Keyword: Materials informatics, materials database, data curation, literature data mining, automated data extraction, automated knowledge extraction, large language model

Date published: 2025-12-31

Publisher: Informa UK Limited

Journal:

  • Science and Technology of Advanced Materials: Methods (ISSN: 27660400) vol. 5 issue. 1 2590811

Funding:

  • JST-CREST Grant JPMJCR19J1
  • Kazuchika Okura Memorial Foundation

Manuscript type: Publisher's version (Version of record)

MDR DOI:

First published URL: https://doi.org/10.1080/27660400.2025.2590811

Related item:

Other identifier(s):

Contact agent:

Updated at: 2026-01-20 09:35:27 +0900

Published on MDR: 2026-01-20 12:23:04 +0900