KAWANO, Hiroyuki
(Ridgelinez Limited)
;
SATO, Fumitaka
(Ridgelinez Limited)
;
YOSHITAKE, Michiko
(MaDIS, National Institute for Materials Science
)
;
MOTEKI, Fuma
(Ridgelinez Limited)
;
TERAOKA, Hiroshi
(Ridgelinez Limited)
Description:
(abstract)A BERT (Bidirectional Encoder Representations from Transformers) model, which we named “MaterialBERT,” has been generated using scientific papers in wide area of material science as a corpus. A new vocabulary list for tokenizer was generated using material science corpus. Two BERT models with different vocabulary lists for the tokenizer, one with the original one made by Google and the other newly made by the authors, were generated. Word vectors embedded during the pre-training with the two MaterialBERT models reasonably reflect the meanings of materials names in material-class clustering and in the relationship between base materials and their compounds or derivatives for not only inorganic materials but also organic materials and organometallic compounds. Fine-tuning with CoLA (The Corpus of Linguistic Acceptability) using the pre-trained MaterialBERT showed ahigher score than the original BERT.
MaterialBERT could be used as a starting point for generating a narrower domain-specific BERT model in materials science field by transfer learning.
Rights:
Keyword: word embedding, pre-training, BERT, literal information
Date published:
Publisher: National Institute for Materials Science
Journal:
Funding:
Manuscript type: Not a journal article
MDR DOI:
First published URL: https://doi.org/10.51094/jxiv.119
Related item:
Other identifier(s):
Contact agent:
Updated at: 2023-01-25 00:11:19 +0900
Published on MDR: 2025-04-14 17:02:37 +0900
Filename | Size | |||
---|---|---|---|---|
Filename |
journal_list.xlsx
application/vnd.openxmlformats-officedocument.spreadsheetml.sheet |
Size | 22.4 KB | Detail |
Filename |
MaterialBERT_README__20220808.md
text/markdown |
Size | 5.7 KB | Detail |
Filename |
MaterialBERT_Pre-trained_Model.zip
application/zip |
Size | 1020 MB | Detail |
Filename |
MaterialBERT_Jxiv_complete.pdf
(Thumbnail)
application/pdf |
Size | 1.66 MB | Detail |
Filename |
MaterialBERT_Dict_Pre-trained_Model.zip
application/zip |
Size | 1.14 GB | Detail |
Filename |
Jxiv_article.zip
application/zip |
Size | 1.66 MB | Detail |