論文 Automatic Identification and Normalisation of Physical Measurements in Scientific Literature

FOPPIANO, Luca ORCID ; ROMARY, Laurent ORCID ; ISHII, Masashi SAMURAI ORCID ; TANIFUJI, Mikiko ORCID

コレクション

引用
FOPPIANO, Luca, ROMARY, Laurent, ISHII, Masashi, TANIFUJI, Mikiko. Automatic Identification and Normalisation of Physical Measurements in Scientific Literature . https://doi.org/10.1145/3342558.3345411
SAMURAI

説明:

(abstract)

We present Grobid-quantities, an open-source application for extracting and normalising measurements from scientific and patent literature. Tools of this kind, aiming to understand and make unstructured information accessible, represent the building blocks for large-scale Text and Data Mining (TDM) systems. Grobid-quantities is a module built on top of Grobid [6] [13], a machine learning framework for parsing and structuring PDF documents. Designed to process large quantities of data, it provides a robust implementation accessible in batch mode or via a REST API. The machine learning engine architecture follows the cascade approach, where each model is specialised in the resolution of a specific task. The models are trained using CRF (Conditional Random Field) algorithm [12] for extracting quantities (atomic values, intervals and lists), units (such as length, weight) and different value representations (numeric, alphabetic or scientific notation). Identified measurements are normalised according to the International System of Units (SI). Thanks to its stable recall and reliable precision, Grobid-quantities has been integrated as the measurement-extraction engine in various TDM projects, such as Marve (Measurement Context Extraction from Text), for extracting semantic measurements and meaning in Earth Science [10]. At the National Institute for Materials Science in Japan (NIMS), it is used in an ongoing project to discover new superconducting materials. Normalised materials characteristics (such as critical temperature, pressure) extracted from scientific literature are a key resource for materials informatics (MI) [9].

権利情報:

キーワード: tdm, physical quantities, machine learning

刊行年月日: 2019-09-23

出版者: Association for Computing Machinery

掲載誌:

研究助成金:

原稿種別: 査読前原稿 (Author's original)

MDR DOI: https://doi.org/10.48505/nims.3039

公開URL: https://doi.org/10.1145/3342558.3345411

関連資料:

その他の識別子:

連絡先:

更新時刻: 2022-10-03 02:00:14 +0900

MDRでの公開時刻: 2021-08-13 01:20:03 +0900

ファイル名 サイズ
ファイル名 main.pdf (サムネイル)
application/pdf
サイズ 506KB 詳細