Article CSIML: a cost-sensitive and iterative machine-learning method for small and imbalanced materials data sets

Shengzhou Li SAMURAI ORCID ; Ayako Nakata SAMURAI ORCID

Collection

Citation
Shengzhou Li, Ayako Nakata. CSIML: a cost-sensitive and iterative machine-learning method for small and imbalanced materials data sets. Chemistry Letters. 2024, 53 (5), . https://doi.org/10.1093/chemle/upae090
SAMURAI

Description:

(abstract)

Materials science research benefits from the powerful machine-learning (ML) surrogate models, but it is also limited by the implicit requirement for sufficiently big and balanced data distribution for ML. In this paper, we propose a model to obtain more credible results for small and imbalanced materials data sets as well as chemical knowledge. Taking 2 bandgaps imbalanced data sets as instances, we demonstrate the usability and performance of our model compared with common ML models with normal sampling and resampling methods.

Rights:

Keyword: cost-sensitive, iterative machine-learning method, small and imbalanced materials data sets, chemical knowledge, CSIML

Date published: 2024-05-02

Publisher: Oxford University Press (OUP)

Journal:

  • Chemistry Letters (ISSN: 03667022) vol. 53 issue. 5

Funding:

  • JSPS JP20H05883
  • JSPS JP20H05878
  • JST PRESTO JPMJPR20T4

Manuscript type: Publisher's version (Version of record)

MDR DOI:

First published URL: https://doi.org/10.1093/chemle/upae090

Related item:

Other identifier(s):

Contact agent:

Updated at: 2024-11-28 16:30:28 +0900

Published on MDR: 2024-11-28 16:30:29 +0900

Filename Size
Filename upae090 (1).pdf (Thumbnail)
application/pdf
Size 3.35 MB Detail