論文 MaterialBERT for Natural Language Processing of Materials Science Texts

KAWANO, Hiroyuki (Ridgelinez Limited) ; SATO, Fumitaka (Ridgelinez Limited) ; YOSHITAKE, Michiko SAMURAI ORCID (MaDIS, National Institute for Materials ScienceROR) ; MOTEKI, Fuma (Ridgelinez Limited) ; TERAOKA, Hiroshi (Ridgelinez Limited)

コレクション

引用
KAWANO, Hiroyuki, SATO, Fumitaka, YOSHITAKE, Michiko, MOTEKI, Fuma, TERAOKA, Hiroshi. MaterialBERT for Natural Language Processing of Materials Science Texts.

説明:

(abstract)

A BERT (Bidirectional Encoder Representations from Transformers) model, which we named “MaterialBERT,” has been generated using scientific papers in wide area of material science as a corpus. A new vocabulary list for tokenizer was generated using material science corpus. Two BERT models with different vocabulary lists for the tokenizer, one with the original one made by Google and the other newly made by the authors, were generated. Word vectors embedded during the pre-training with the two MaterialBERT models reasonably reflect the meanings of materials names in material-class clustering and in the relationship between base materials and their compounds or derivatives for not only inorganic materials but also organic materials and organometallic compounds. Fine-tuning with CoLA (The Corpus of Linguistic Acceptability) using the pre-trained MaterialBERT showed ahigher score than the original BERT.
MaterialBERT could be used as a starting point for generating a narrower domain-specific BERT model in materials science field by transfer learning.

権利情報:

キーワード: word embedding, pre-training, BERT, literal information

刊行年月日:

出版者: National Institute for Materials Science

掲載誌:

研究助成金:

原稿種別: 論文以外のデータ

MDR DOI:

公開URL: https://doi.org/10.51094/jxiv.119

関連資料:

その他の識別子:

連絡先:

更新時刻: 2023-01-25 00:11:19 +0900

MDRでの公開時刻: 2025-04-14 17:02:37 +0900

ファイル名 サイズ
ファイル名 journal_list.xlsx
application/vnd.openxmlformats-officedocument.spreadsheetml.sheet
サイズ 22.4KB 詳細
ファイル名 MaterialBERT_README__20220808.md
text/markdown
サイズ 5.7KB 詳細
ファイル名 MaterialBERT_Pre-trained_Model.zip
application/zip
サイズ 1020MB 詳細
ファイル名 MaterialBERT_Jxiv_complete.pdf (サムネイル)
application/pdf
サイズ 1.66MB 詳細
ファイル名 MaterialBERT_Dict_Pre-trained_Model.zip
application/zip
サイズ 1.14GB 詳細
ファイル名 Jxiv_article.zip
application/zip
サイズ 1.66MB 詳細