# MaterialBERT for Natural Language Processing of Materials Science Texts

https://mdr.nims.go.jp/datasets/936f7bdd-e1eb-4100-929b-db71a00f8df9

## File

- [journal_list.xlsx](https://mdr.nims.go.jp/filesets/72876edf-f542-4400-ad8a-19c71163920d/download) ([Detail](https://mdr.nims.go.jp/filesets/72876edf-f542-4400-ad8a-19c71163920d.md))
- [MaterialBERT_README__20220808.md](https://mdr.nims.go.jp/filesets/3a22e6b8-22bd-4fc8-a304-29196fbb6f1e/download) ([Detail](https://mdr.nims.go.jp/filesets/3a22e6b8-22bd-4fc8-a304-29196fbb6f1e.md))
- [MaterialBERT_Pre-trained_Model.zip](https://mdr.nims.go.jp/filesets/980ae5f1-d71b-4d2c-83f9-0a49a84bd6be/download) ([Detail](https://mdr.nims.go.jp/filesets/980ae5f1-d71b-4d2c-83f9-0a49a84bd6be.md))
- [MaterialBERT_Jxiv_complete.pdf](https://mdr.nims.go.jp/filesets/a7fac00d-fbf7-4b4b-b053-19de9746f932/download) ([Detail](https://mdr.nims.go.jp/filesets/a7fac00d-fbf7-4b4b-b053-19de9746f932.md))
- [MaterialBERT_Dict_Pre-trained_Model.zip](https://mdr.nims.go.jp/filesets/1acf4293-d09c-4375-9450-d1a5f112ec53/download) ([Detail](https://mdr.nims.go.jp/filesets/1acf4293-d09c-4375-9450-d1a5f112ec53.md))
- [Jxiv_article.zip](https://mdr.nims.go.jp/filesets/36fa86e1-dc48-45b2-aa9d-2e59835dfe17/download) ([Detail](https://mdr.nims.go.jp/filesets/36fa86e1-dc48-45b2-aa9d-2e59835dfe17.md))

## Id

936f7bdd-e1eb-4100-929b-db71a00f8df9

## Local identifier

identifier: mdr-schema-yaml/pc289n449

## Visibility

open_to_public

## State

published

## Created at

2023-01-24T14:57:36.202430Z

## Updated at

2025-04-14T23:30:11.210882Z

## Published at

2025-04-14T08:02:37.297343Z

## Doi

https://doi.org/10.48505/nims.3705

## First published url

https://doi.org/10.51094/jxiv.119

## Date published



## Recorded date published



## Resource type

journal_article

## Manuscript type

na

## Collection



## Title

- title: MaterialBERT for Natural Language Processing of Materials Science Texts
  title_type: original
  lang: und

## Description

- description: "A BERT (Bidirectional Encoder Representations from Transformers) model,
    which we named “MaterialBERT,” has been generated using scientific papers in wide
    area of material science as a corpus. A new vocabulary list for tokenizer was
    generated using material science corpus. Two BERT models with different vocabulary
    lists for the tokenizer, one with the original one made by Google and the other
    newly made by the authors, were generated. Word vectors embedded during the pre-training
    with the two MaterialBERT models reasonably reflect the meanings of materials
    names in material-class clustering and in the relationship between base materials
    and their compounds or derivatives for not only inorganic materials but also organic
    materials and organometallic compounds. Fine-tuning with CoLA (The Corpus of Linguistic
    Acceptability) using the pre-trained MaterialBERT showed ahigher score than the
    original BERT.\r\nMaterialBERT could be used as a starting point for generating
    a narrower domain-specific BERT model in materials science field by transfer learning.\r\n"
  description_type: abstract
  lang: und

## Creator

- name: KAWANO, Hiroyuki
  role: author
  organization: Ridgelinez Limited
- name: SATO, Fumitaka
  role: author
  organization: Ridgelinez Limited
- name: YOSHITAKE, Michiko
  role: author
  orcid: https://orcid.org/0000-0002-0973-5666
  organization: National Institute for Materials Science
  department: MaDIS
  ror: https://ror.org/026v1ze26
- name: MOTEKI, Fuma
  role: operator
  organization: Ridgelinez Limited
- name: TERAOKA, Hiroshi
  role: author
  organization: Ridgelinez Limited

## Contact agent



## Publisher

organization: National Institute for Materials Science
ror: https://ror.org/026v1ze26

## Managing organization

organization: 0 ~ NIMS

## Keyword

- subject: word embedding
  schema: not_defined
- subject: pre-training
  schema: not_defined
- subject: BERT
  schema: not_defined
- subject: literal information
  schema: not_defined

## Rights

- description: Creative Commons BY-ND Attribution-NoDerivatives 4.0 International
  identifier: https://creativecommons.org/licenses/by-nd/4.0/

## Other identifier(s)



## Data origin



## Embargo



## Journal



## Conference



## Related item



## Funding



## Instrument



## Instrument operator



## Instrument managing organization



## Measurement method



## Specimen



## Chemical composition



## Structure for specimen



## Structural feature for specimen



## Specific property for specimen



## Process for specimen treatment



## Computational method



## Energy level/transition state



## Software



## Custom property



## Fileset

- id: 72876edf-f542-4400-ad8a-19c71163920d
  filename: journal_list.xlsx
  content_type: application/vnd.openxmlformats-officedocument.spreadsheetml.sheet
  size: 22947
  md5: 4f710ce63e0f5b94db2e78b00e2fd118
- id: 3a22e6b8-22bd-4fc8-a304-29196fbb6f1e
  filename: MaterialBERT_README__20220808.md
  content_type: text/markdown
  size: 5833
  md5: 7cf0ac01e202ee65f671f53bfa502b37
- id: 980ae5f1-d71b-4d2c-83f9-0a49a84bd6be
  filename: MaterialBERT_Pre-trained_Model.zip
  content_type: application/zip
  size: 1064632177
  md5: 4350fddfa7402527b7938c1a81e14aaa
- id: a7fac00d-fbf7-4b4b-b053-19de9746f932
  filename: MaterialBERT_Jxiv_complete.pdf
  content_type: application/pdf
  size: 1742842
  md5: 4b856f2737927470420e541976838ab2
- id: 1acf4293-d09c-4375-9450-d1a5f112ec53
  filename: MaterialBERT_Dict_Pre-trained_Model.zip
  content_type: application/zip
  size: 1227420350
  md5: 9bd4d23f1551aba15836a3278aa410eb
- id: 36fa86e1-dc48-45b2-aa9d-2e59835dfe17
  filename: Jxiv_article.zip
  content_type: application/zip
  size: 1745303
  md5: d5015ce281abb929ffc6b85b6985fb4c

## Thumbnail

fileset_id: 3a22e6b8-22bd-4fc8-a304-29196fbb6f1e
filename: MaterialBERT_README__20220808.md