# Fileset

[EUROMAT-NIMS-Matsuda-v2mdr.pdf](https://mdr.nims.go.jp/filesets/85e68ddc-a147-4ced-8dad-f3dbae2d3759/download)

## Creator

[松田 朝彦](https://orcid.org/0000-0001-5989-027X), [田邉 浩介](https://orcid.org/0000-0002-9986-7223), [石井 真史](https://orcid.org/0000-0003-0357-2832), [門平 卓也](https://orcid.org/0000-0003-0569-1309)

## Rights

[In Copyright](http://rightsstatements.org/vocab/InC/1.0/)

## Other metadata

[Materials Data Repository metadata schema and cross-database federation](https://mdr.nims.go.jp/datasets/d2dd6290-eff6-418d-9890-172ac921ac28)

## Fulltext

NATIONAL INSTITUTE FOR MATERIALS SCIENCE   |  MATERIALS DATA PLATFORMMaterials Data Repository metadata schema and cross-database federationAsahiko Matsuda*, Kosuke Tanabe, Masashi  Ishi i ,  Takuya Kadohira* ht tps:/ /orc id.org/0000-0001-5989-027XMATSUDA.Asahiko@nims.go. jpMater ia ls Data Platform, Research Network and Faci l i ty  Services Div is ion,Nat ional  Inst i tute for Mater ia ls Science (Tsukuba, Japan)https://orcid.org/0000-0001-5989-027XNATIONAL INSTITUTE FOR MATERIALS SCIENCE   |  MATERIALS DATA PLATFORM1An artist’s rendition of a materials data lifecycleCirca 2017: “We should build a platform to support all four stages of a materials data lifecycle!”https://dice.nims.go.jp/https://dice.nims.go.jp/NATIONAL INSTITUTE FOR MATERIALS SCIENCE   |  MATERIALS DATA PLATFORM2Every data needs a name tagMetadataData • Basic description• Instrumentation, methodology• Sample/Material descriptionFindability• Instrument conditions• Experiment parameters• Experiment environmentExperiment reliability• Dataset format• Column informationData integrationWithout it, data can easily end up as a random blob.NATIONAL INSTITUTE FOR MATERIALS SCIENCE   |  MATERIALS DATA PLATFORM“Materials Data Repository” for data publishingK. Tanabe, A. Matsuda, “A development of Materials Data Repository for materials informatics”, IPSJ SIG Tech. Rep. IOT51/SPT39 (2020) (in Japanese)researchers (depositors)https://mdr.nims.go.jp/ publicationsdatasetsindexers, discovery services0500010000150002020 2021 2022 2023# of MDR works over the yearsPublicationsDatasetsInterested in “Bibliographic metadata”• Title• URL• Creators and ORCIDsetc. researchers(data users)Interested in “Scientific metadata”• Material/Specimen• Sample preparation• Measurement conditionsetc.depositregister harvestdownloadDOIwork workwork…metadatafilesNATIONAL INSTITUTE FOR MATERIALS SCIENCE   |  MATERIALS DATA PLATFORM4Metadata categoriesMetadataData• Title• Creator• Date created• Material name/type• Characterization method• Experimental conditions• Calculation method• Properties addressed• Sample preparation processetc.Bibliographic metadata Scientific metadataCommon to all Domain specificAdministrative metadata• Data manager• LicenseA. Matsuda, “Findability of Materials Research Data”, Library Fair & Forum 2020 (in Japanese)NATIONAL INSTITUTE FOR MATERIALS SCIENCE   |  MATERIALS DATA PLATFORM5One schema to rule them all?Characterization metadataMethod,Environment…Specimen metadataMaterial type,Structure…PropertymetadataPhysical properties,Units…Synthesis/ProcessmetadataProcessed date,Temperature…CalculationmetadataComputer software,Version…Characterizationprimary paramsSpecimen primary paramsProperty primary paramsSynthesis/Processprimary paramsCalculationprimary paramsData DataData Data DataMandatory metadataDomain-specific metadataPrimary parametersImplementedas data modelSave as filesMETADATADATABibliographic metadata Administrative metadata Subject material++S. Kikuchi et al., IEICE Tech. Rep. 119 SC2019-2 (in Japanese)Schema: https://doi.org/10.48505/nims.3240JSON schema designed for system-to-system communicationamong DICE systemsDICE Common Message Format 1.0https://doi.org/10.48505/nims.3240NATIONAL INSTITUTE FOR MATERIALS SCIENCE   |  MATERIALS DATA PLATFORM6One schema to rule them all?Characterization metadataMethod,Environment…Specimen metadataMaterial type,Structure…PropertymetadataPhysical properties,Units…Synthesis/ProcessmetadataProcessed date,Temperature…Characterizationprimary paramsSpecimen primary paramsSynthesis/Processprimary paramsMandatory metadataDomain-specific metadataMETADATADATABibliographic metadata Administrative metadata Subject material++DICE Common Message Format 1.0Example for a characterization dataprimary.csvmeasurement.csv process.csv✔ Highly descriptive. High reusability.  Successful demonstration of cross-system messaging.but…NATIONAL INSTITUTE FOR MATERIALS SCIENCE   |  MATERIALS DATA PLATFORM7Nobody’s got time for all these!The full schema defines 300+ fields.110 of them were implemented in MDR 1.0.https://xkcd.com/927 (cc by-nc)(The schema to cover all uses was too ambitious to cover practical uses.)✘ Complex. Not human-readable.Implementation difficulties.MDR 1.0 metadata form implementationhttps://xkcd.com/927NATIONAL INSTITUTE FOR MATERIALS SCIENCE   |  MATERIALS DATA PLATFORMMDR Schema 2.0 (Common for all MDR works)• Single-layer◦ no multi-level nesting• YAML ◦ for ease of input and readability• Deliberately simple◦ centered around bibliographic metadata for focus on repository use◦ lightweight support for scientific metadata• Some of the defined fields:◦ Description, Subjects (Keywords), Creator, Rights statement…◦ Instrument, Specimen, Experimental method, Processing, Features…8(snippet)github.com/nims-dpfc/mdr-schema(doi: 10.48505/nims.3239)NATIONAL INSTITUTE FOR MATERIALS SCIENCE   |  MATERIALS DATA PLATFORMHokkaido UPhoton FactoryAichi SRSPring-8Ritsumeikan USAGA-LS9MDR XAFS DB (X-ray absorption fine structure database)XAFS spectra• Collaboration between6 data providers• Consolidated as a unified database by NIMS?QueryDownloadhttps://mdr.nims.go.jp/download_all/{workid}.zipsynchrotronsuniversitiesexisting databases Machine readable and accessiblehttps://doi.org/10.48505/nims.1447M. Ishii et al., “Integration of X-ray absorption fine structure databases for data-driven materials science”, STAM Methods 3 (2023)NATIONAL INSTITUTE FOR MATERIALS SCIENCE   |  MATERIALS DATA PLATFORM10Metadata alignment within the communityM. Ishii et al., “Integration of X-ray absorption fine structure databases for data-driven materials science”, STAM Methods 3 (2023)Common info among participating data providers:• Metadata according to MDR Schema• keywords for querying within MDR• Primary parameters CSV• Structured experimental metadata YAMLcsvyamlschema(from the STAM Methods paper)maindataNATIONAL INSTITUTE FOR MATERIALS SCIENCE   |  MATERIALS DATA PLATFORM11In addition to the metadata and files in the repository system itself, MDR XAFS DB defines its own ontology.See https://dice.nims.go.jp/ontology/about.html by M. Ishii  (Docs and Turtle available)RDF for MDR XAFS DBSide note:MDR runs on a customized version of Samvera Hyrax software, which internally stores all metadata as RDF.However, customizing its RDF requires rewrite, rebuild, and restart of the whole repository software. Not suited for user-side RDF like this.MDR XAFS DB’s RDF lives outside the MDR system.https://dice.nims.go.jp/ontology/about.htmlNATIONAL INSTITUTE FOR MATERIALS SCIENCE   |  MATERIALS DATA PLATFORM12RDF for MDR XAFS DB@prefix mdr-xafs: <http://dice.nims.go.jp/ontology/mdr-xafs-ont/Schema#>@prefix obo: <http://purl.obolibrary.org/obo/>@prefix prism: <http://prismstandard.org/namespaces/1.2/basic/>@prefix wd: <http://matvoc.nims.go.jp/entity/><http://dice.nims.go.jp/ontology/mdr-ont#8a02714e-46a7-4fdc-95a5-1acb18338d7d> a mdr-xafs:Work ;rdfs:seeAlso <https://mdr.nims.go.jp/concern/datasets/h128nh653> ;rdfs:label "XAFS spectrum of Gold(III) hydroxide"@en ;prism:doi "https://doi.org/10.48505/nims.1602"^^xsd:string ;obo:RO_0000057 wd:Q1304, wd:Q1308 .DICE’s vocabulary service MatVoc (beta ver.)https://matvoc.nims.go.jp/explore/(has participant)NATIONAL INSTITUTE FOR MATERIALS SCIENCE   |  MATERIALS DATA PLATFORM13That was (hopefully) a nice example, but…repository engineermotivatedresearcherdomaincommunitytimeSetting up MDR XAFS DBrequired all of these:discussionrepository engineeranyresearcherdomaincommunitytimeWe want to make our next oneswork with lighter effort:discussionNATIONAL INSTITUTE FOR MATERIALS SCIENCE   |  MATERIALS DATA PLATFORMTypical situation in labs: Metadata as directories14instruments & lab PCs📂 OSC-100 (Instrument ID)└📂 u01234 (User Name/ID)└📂 ProjABC (Project Name)└📄 data.csvMapped to appropriatemetadata fieldsInstrument, User, Project...platform systemsImplemented as part of our IoT-assisted data collection systemPre-defined structure:✔ Exact alignment with each lab’s modus operandi✘ Only simple common metadata possible✘ Different mapping for every research project, customization effort required✘ May not cover all necessary metadataA. Matsuda et al., “Materials metadata: as a custom schema, as directories, or in a data package”, RDA Virtual Plenary 15 (2020) https://doi.org/10.48505/nims.3031S. Matsunami et al., “Data Architecture for IoT Data Collection System”, Digital Practice 2(2) 80-89, IPSJ (2021) (in Japanese)https://doi.org/10.48505/nims.3031NATIONAL INSTITUTE FOR MATERIALS SCIENCE   |  MATERIALS DATA PLATFORM15Towards integrating different types of dataLaboratory• Raw instrument outputData system• What needs to be stored at this stage?Public• Bibliographic metadataData journalsPublic data repositoriesOnly the researchers themselves can provide this information.(But frequently, this part can be tacit knowledge!)Explicit knowledgeTacit knowledgeNATIONAL INSTITUTE FOR MATERIALS SCIENCE   |  MATERIALS DATA PLATFORM16Metadata and the ‘PSPP’ modelP• ProcessS• StructureP• PropertyP• PerformanceWhat are the key parameters that define your process?What structure are you measuring?What property are you measuring?What performance are you shooting for?• In-silico link among process, structure, property, performance• Each module is aware of what it takes as input and outputHeat-treatment Image analysis Mechanical testsMaterials Integration by Network Technologycf. DICE’s data integration systemKey metadata for that groupCommon sense for some,New framework for othersQuestions I askthe researchers:NATIONAL INSTITUTE FOR MATERIALS SCIENCE   |  MATERIALS DATA PLATFORM17• The iterations of our metadata models coincides with how our data platform has been evolving.◦ First, we tried a single mega-schema to cover all use cases.◦ A more simplified schema centered around bibliographic metadata was adopted for our data repository.• A cross-institutional XAFS database built on MDR used a combination of MDR’s simplified metadata schema, community-defined YAML, and RDF.• Spreading awareness of metadata management is an ongoing effort. We’ve begun to ask researchers about their key parameters using the PSPP model and store them in our system.◦ We hope this leads to better integration of heterogeneous datasets, and lead to accelerated materials research and engineering.SummaryThank you for your attention! Contact: matsuda.asahiko@nims.go.jp