Dataset Machine extraction of polymer data from tables using XML versions of scientific articles

YOSHIZAWA, Atsushi ; ISHII, Masashi SAMURAI ORCID ; SHINDO, Hiroyuki ORCID ; OKA, Hiroyuki SAMURAI ORCID ; MATSUMOTO, Yuji ORCID

Collection

Citation
YOSHIZAWA, Atsushi, ISHII, Masashi, SHINDO, Hiroyuki, OKA, Hiroyuki, MATSUMOTO, Yuji. Machine extraction of polymer data from tables using XML versions of scientific articles. https://doi.org/10.1080/27660400.2021.1899456
SAMURAI

Description:

(abstract)

In this study, we examined machine extraction of polymer data from tables in scientific articles. The extraction system consists of five processes: table extraction, data formatting, polymer name recognition, property specifier identification, and data extraction. Tables were first extracted in plain text. XML versions of scientific articles were used, and the tabular forms were accurately extracted, even for complicated tables, such as multi-column, multi-row, and merged tables. Polymer name recognition was performed using a named entity recognizer created by deep neural network learning of polymer names. The preparation cost of the training data was reduced using a rule-based algorithm. The target polymer properties in this study were glass transition temperature (Tg), melting temperature (Tm), and decomposition temperature (Td), and the specifiers were identified using partial string matching. Through these five processes, 2,181 data points for Tg, 1,526 for Tm, and 2,316 for Td were extracted from approximately 18,000 scientific articles published by Elsevier. Nearly half of them were extracted from complicated tables. The F-scores for the extraction were 0.871, 0.870, and 0.841, respectively. These results indicate that the extraction system created in this study can rapidly and accurately collect large amounts of polymer data from tables in scientific literature.

[Revision history]
2020-05-26: Initial upload under the title "Automatic extraction of polymer data from tables in XML documents of scientific articles"
2021-02-25: Updated version under the title "Machine extraction of polymer data from tables using XML versions of scientific articles"

Data origin type:

Rights:

Keyword: polymer data, table, machine extraction, informatics, XML

Date published: 2021-01-01

Publisher:

Journal:

Funding:

Manuscript type: Author's version (Submitted manuscript)

MDR DOI: https://doi.org/10.11503/nims.1190

First published URL: https://doi.org/10.1080/27660400.2021.1899456

Related item:

Other identifier(s):

Contact agent:

Updated at: 2024-06-21 15:45:49 +0900

Published on MDR: 2021-08-19 22:30:04 +0900

Measurement method / 計測法

Description / 説明 :

Category / カテゴリ :

Category description / カテゴリの説明 :

Analysis field / 解析分野 :

Analysis field description / 解析分野の説明 :

Measurement environment / 計測環境 :

Standarized procedure / 標準手順 :

Measured at / 計測時刻 :

Filename Size
Filename supplemental-data_210113.zip (Thumbnail)
application/zip
Size 91.8 KB Detail