# Fileset

[s43246-024-00580-7.pdf](https://mdr.nims.go.jp/filesets/9a861775-e014-4360-8185-7b6c9488d7ea/download)

## Creator

[Ryo Tamura](https://orcid.org/0000-0002-0349-358X), Haruhiko Morito, Guillaume Deffrennes, [Masanobu Naito](https://orcid.org/0000-0001-7198-819X), Yoshitaro Nose, [Taichi Abe](https://orcid.org/0000-0002-5065-0939), Kei Terayama

## Rights

[Creative Commons BY Attribution 4.0 International](https://creativecommons.org/licenses/by/4.0/)

## Other metadata

[AIPHAD, an active learning web application for visual understanding of phase diagrams](https://mdr.nims.go.jp/datasets/1473a8e6-5355-4091-bb8a-ab059bb6ed17)

## Fulltext

AIPHAD, an active learning web application for visual understanding of phase diagramscommunicationsmaterials Articlehttps://doi.org/10.1038/s43246-024-00580-7AIPHAD, an active learning webapplication for visual understanding ofphase diagramsCheck for updatesRyo Tamura 1,2,3 , Haruhiko Morito 4 , Guillaume Deffrennes 5, Masanobu Naito6,Yoshitaro Nose7, Taichi Abe8 & Kei Terayama 3,9,10Phase diagrams provide considerable information that is vital for materials exploration. However, thedetermination of multidimensional phase diagrams typically requires a significant investment of time,cost, and human resources owing to the necessity of numerous experiments or simulations. Machinelearning and artificial intelligence techniques present a viable solution for expediting phase diagramsinvestigations. Additionally, effective visualization is critical for understanding phase diagrams. Thisstudy reports the development of AIPHAD (Artificial Intelligence technique for PHAse Diagram), anopen-sourcewebapplication to assist in the investigation andvisual understandingof phasediagramsusing active learning. AIPHADemploys PDC (PhaseDiagramConstruction) algorithm, which operateson the principle of uncertainty sampling in active learning. The AIPHAD application facilitates theexamination of five diagram types: two-variable diagrams, three-variable diagrams, ternary sections,ternary phase diagrams, and quaternary sections. The efficacy of the application is demonstrated inthe study of the Fe-Ti-Sn ternary system, where it efficiently identified the presence of the Heuslerphase. The integration of machine learning tools with traditional materials science approachesshowcased in this study has the potential to drive groundbreaking advancements in materialsexploration and discovery.A phase diagram serves as an essential tool in materials science, providingdetailed mappings of various phases and their transformations on changesin thermodynamic variables such as temperature, pressure, and composi-tion. Extensive research in materials science has led to the development ofnumerous phase diagrams for alloys and compounds1–4; the study of phasediagrams for magnetic structures is also prevalent in condensed-matterphysics5–7. However, generating a phase diagram typically involves a mul-tidimensional search space requiring extensive experiments or simulations,which can be resource-intensive in terms of time, cost, and human effort.The advent of data-driven approaches in materials research8–12has seen the emerging application of these methodologies to phasediagrams. Machine learning techniques enable the prediction ofphase diagrams for previously unexplored materials based on existingphase diagrams, thereby circumventing the need for additionalexperiments or simulations. Applications of machine learning in thisdomain include predicting phase formation in high-entropy alloys13,stability of quasicrystals14, coexisting phases in ternary sections15, andphase boundaries in binary systems16. In condensed-matter physics,data-driven techniques have been employed to analyze simulation-based phase diagrams for studies on critical phenomena, includingresearch on strongly correlated fermions17 and topological quantumsystems18–20.1Center for Basic Research on Materials, National Institute for Materials Science, 1-1 Namiki, Tsukuba, Ibaraki, 305-0044, Japan. 2Graduate School of FrontierSciences, The University of Tokyo, 5-1-5 Kashiwa-no-ha, Kashiwa, Chiba, 277-8561, Japan. 3RIKEN Center for Advanced Intelligence Project, 1-4-1 Nihon-bashi, Chuo-ku, Tokyo, 103-0027, Japan. 4Institute for Materials Research, Tohoku University, 2-1-1 Katahira, Aoba-ku, Sendai, 980-8577, Japan. 5Univ.Grenoble Alpes, CNRS, Grenoble INP, SIMaP, Grenoble, F-38000, France. 6Research Center for Macromolecules and Biomaterials, National Institute forMaterials Science, 1-2-1 Sengen, Tsukuba, Ibaraki, 305-0047, Japan. 7Department of Materials Science and Engineering, Kyoto University, Yoshida Honmachi,Sakyo-ku, Kyoto, 606-8501, Japan. 8Research Center for Structural Materials, National Institute for Materials Science, 1-2-1 Sengen, Tsukuba, Ibaraki, 305-0047, Japan. 9Graduate School of Medical Life Science, Yokohama City University, 1-7-29, Suehiro-cho, Tsurumi-ku, Kanagawa, 230-0045, Japan. 10MDXResearch Center for Element Strategy, Tokyo Institute of Technology, 4259 Nagatsuta-cho, Midori-ku, Yoko-hama, Kanagawa, 226-8501, Japan.e-mail: tamura.ryo@nims.go.jp; haruhiko.morito.b5@tohoku.ac.jp; terayama@yokohama-cu.ac.jpCommunications Materials |           (2024) 5:139 11234567890():,;1234567890():,;http://crossmark.crossref.org/dialog/?doi=10.1038/s43246-024-00580-7&domain=pdfhttp://crossmark.crossref.org/dialog/?doi=10.1038/s43246-024-00580-7&domain=pdfhttp://crossmark.crossref.org/dialog/?doi=10.1038/s43246-024-00580-7&domain=pdfhttp://orcid.org/0000-0002-0349-358Xhttp://orcid.org/0000-0002-0349-358Xhttp://orcid.org/0000-0002-0349-358Xhttp://orcid.org/0000-0002-0349-358Xhttp://orcid.org/0000-0002-0349-358Xhttp://orcid.org/0000-0003-3440-4778http://orcid.org/0000-0003-3440-4778http://orcid.org/0000-0003-3440-4778http://orcid.org/0000-0003-3440-4778http://orcid.org/0000-0003-3440-4778http://orcid.org/0000-0002-3752-2537http://orcid.org/0000-0002-3752-2537http://orcid.org/0000-0002-3752-2537http://orcid.org/0000-0002-3752-2537http://orcid.org/0000-0002-3752-2537http://orcid.org/0000-0003-3914-248Xhttp://orcid.org/0000-0003-3914-248Xhttp://orcid.org/0000-0003-3914-248Xhttp://orcid.org/0000-0003-3914-248Xhttp://orcid.org/0000-0003-3914-248Xmailto:tamura.ryo@nims.go.jpmailto:haruhiko.morito.b5@tohoku.ac.jpmailto:terayama@yokohama-cu.ac.jpDevelopments in active learning for phase diagram analysis have led tomethodswhere the algorithmproposespivotal experiments fordelineating aphase diagram. This process involves three iterative steps: (i) identifying themost informative experiments through a machine learning model, (ii)conducting these experiments, and (iii) retraining the machine learningmodel with the newly acquired data. In uncertainty sampling, the infor-mative experiments are gauged by the uncertainty in predictions, withGaussian process regression commonly used for uncertainty evaluation.While active learning approaches based on Gaussian process regressionhave been explored for phase diagrams21–23, adapting them for multiplediscrete categories, typical in phase diagram investigations with variousphase types, necessitates the implementation of an appropriate acquisitionfunction.To address the challenge of multiple discrete categories, an activelearning method known as the PDC (Phase Diagram Construction) algo-rithm was developed24–27. This algorithm is based on the label propagation(LP) approach, a form of semi-supervised learning. Through LP, it becomespossible to determine the probabilities of unlabeled points belonging to eachphase region. These probabilities are then utilized to assess uncertaintywithin the phase diagram, enabling the selection of the most uncertainpoints, informative experiments. The efficacy of the PDC algorithm wasevidenced by its ability to reduce the number of required experiments by20% compared to random experimentation24.The practical utility of the PDC algorithm was further validated in anexperimental study focusing on the phase diagram for Zn–Sn–P filmdeposition using molecular beam epitaxy (MBE)28. Additionally, the algo-rithm has been applied to ascertain phase boundaries and property transi-tions. For instance, it facilitated the determination of temperature- andcomposition-dependent boundaries between the creep zone and the lowercreep zone in cross-linked polymers29. Consequently, the PDC algorithmholds significant potential for broad application in both material develop-ment and fundamental scientific research, particularly where efficientinvestigation of boundaries across various categories is crucial. Through thedevelopment of the PDC algorithm and its applications to experiments, weunderstood that the visualization technique of the phase diagrams predictedby the PDC algorithm is important to deepen their understanding. Inparticular, since two experimental phase diagrams constructed by the PDCalgorithm28,29 were on a two-dimensional space, an appropriate visualizationtechnique is essential to construct phase diagrams in a three-dimensionalspace, which is not easy to consider even if the researchers are familiar withphase diagrams.This paper reports thedevelopment ofAIPHAD(Artificial Intelligencetechnique for PHAse Diagram), a web application based on the PDCalgorithm, designed for the investigation and visual understanding of phasediagrams. It is accessible at https://aiphad.org/. AIPHAD streamlines thevisualization of key experimental proposals, maps of uncertainty, and theestimated phase diagrams. The application encompasses five types of phasediagrams: (i) two-variable diagrams; (ii) three-variable diagrams; (iii)ternary sections; (iv) ternary phase diagrams; and (v) quaternary sections.Additionally, a Python version of AIPHAD is available on GitHub30. Theutility ofAIPHAD is illustrated through its application in the studyof Fe-Ti-Sn ternary phase diagrams, which are known to contain Heuslercompounds31,32. The emergence of the Heusler phase is significant, as it isassociated with vital electronic and magnetic properties for practicalapplications33,34. AIPHAD’s capability to create and verify estimated phasediagrams is successfully demonstrated. In developing AIPHAD, we focusedon providing a framework as a web application that can be easily used byresearchers and engineers who are not familiar with programming. Wehope that our user-friendly application will contribute to the efficientconstruction of phase diagrams.The structure of this paper is as follows: Methods section reviews thePDC algorithm and detail the usage of theAIPHADweb application and itsPython version are explained. Results section presents the experimentalfindings obtained from the Fe-Ti-Sn system using the AIPHAD webapplication.MethodsReview of the PDC algorithmThe PDC algorithm commences with the discretization of the phase dia-gram and an initial setting. The estimation of phase regions and the com-putation of their uncertainties are conducted using machine learningtechniques. Based on these estimations, informative experiments are sug-gested, which are then conducted to identify phase information. By itera-tively executing these steps, phase diagrams are derived from a limitednumber of experiments, as illustrated in Fig. 1. Incorporating thermo-dynamics into this closed-loop investigation further accelerates the process.The details of each step are shown below.Initial setting. The process begins by setting up the space for the phasediagram, with each dimension discretized into candidate points for experi-ments (Fig. 1). For a phase diagram of dimension d, the discretized positionvector is represented as x 2 Rd ; X ¼ xi� �i¼1;...;N denotes the dataset com-prising N candidate points. An initial training dataset ofM points, known aslabeled data, is prepared from these candidate points, based on completedexperiments. This initial dataset is either derived from pre-existing data orgenerated from preliminary experiments using random sampling. The indicesof the labeled data points are denoted as fljgj¼1;...;M . The remaining indices,i ¼ 1; . . . ;N excluding fljgj¼1;...;M correspond to unlabeled data points. Forthe labeled data, the experimentally determined categories within the phasediagram, such as phase names, coexisting phase names, and regions with largeor small properties, are known. In the AIPHAD implementation, single phasesand coexisting phases have to be categorized as distinct “phases.” For sim-plicity, all categories in the phase diagram are referred to as “phase.” Eachcategory in the initial dataset is assigned an integer index from L ¼ f1; . . . ;Cgwhen there are C categories. This index serves as a label for the labeled datapoints, denoted as ylj 2 L for j ¼ 1; . . . ;M.Phase estimation. Phase estimation for unlabeled data within the PDCalgorithm employsmachine learning techniques, specifically LP and labelspreading (LS). These methods function as follows:1. Label propagation (LP): In LP, the labels of the labeled datafyljgj¼1;::;M are propagated across the datasetX, estimating the probabilitiesof each unlabeled data point belonging to various labels. This process beginsby constructing a fully connected graph for X. The weight wij for the edgeconnecting the i th and j th datapoints in this graph is definedusing theRBFkernel aswij ¼ exp �γjxi�xjj2� �; ð1Þwhere γ is a hyperparameter, set as γ ¼ 20 in theAIPHADweb application,following the default value in the Scikit-learn package35. This value can beadjusted in thePython software.Usingweightwij, the transitionmatrixT onthe graph is defined, with each element tij representing the transitionprobability from the j th to the i th data expressed astij ¼wijPNk¼1wkj: ð2ÞIn the subsequent step, a vector pi 2 RC is prepared for each data point,which represents the probability that it belongs to a phase 2 L for the i thdata. The probability matrix P given by pi� �i¼1;...;N is then defined. Theinitial state ofP is prepared as follows: For the labeled data in fljgj¼1;...;M , theelements corresponding to the label of ylj are 1, and 0 for the other elementin pi. For the unlabeled data points, all elements in pi were set to 0. Theprobabilitymatrix P is updated through a series of steps using the transitionmatrix T . These steps are as follows:(i) Update operation: Apply the operation P TP.(ii-1) Normalization: Normalize each pi such that the sum of the elementsbecomes 1 for unlabeled data.https://doi.org/10.1038/s43246-024-00580-7 ArticleCommunications Materials |           (2024) 5:139 2https://aiphad.org/(ii-2) Normalization: Returnpi for the labeled data in fljgj¼1;...;M to the initialstate; that is, the elements corresponding to the label of ylj are 1 and 0for the other element.(iii) Convergence check: Repeat steps (i) and (ii) iteratively until the pireach convergence.Upon convergence, each vector pi represents the final probability dis-tribution across different phase regions for the corresponding data point.Notably, in the LPmethod, due to the resettingmechanism in step (ii-2), theprobabilities for labeled data points remain consistent with their initialvalues, ensuring that their original labels fylj gj¼1;::;M are preservedthroughout the process.2. Label spreading (LS): TheLSmethod is similar to the LPmethod, butthe label of the labeled data can be changed from fyljgj¼1;:::;M to be morerobust to noise in the labeled data. Similar to the LP method, we prepare afully connectedgraph forX and calculate theweightwij for the edgebetweenthe i th and j th data points using Eq. (1) when i≠j. Conversely, for i ¼ j,wij ¼ 0 in contrast to the LP method. Using the prepared wij, transitionmatrixT is defined byEq. (2). The probabilitymatrixPwas prepared for theLP method. The initial value of the probability matrix P is defined as P0.Probability matrix P is propagated using transition matrix T as follows:(i) Update operation: Apply the operationP αTP þ ð1� αÞP0, where0<α<1 signifies the likelihood of changing the label of labeled data,functioning as a hyperparameter in the LS method.(ii) Normalization: Normalize each pi such that the sum of its elementsequals 1, applicable across all data points.(iii) Convergence check: Repeat steps (i) and (ii) iteratively until the pireach convergence.Fig. 1 | Flowchart illustrating the phase diagram construction using AIPHAD.The diagram shows the closed-loop process, which includes phase estimation, uncertaintyscore calculation, and conducting experiments.https://doi.org/10.1038/s43246-024-00580-7 ArticleCommunications Materials |           (2024) 5:139 3In the AIPHADweb application, α is preset to 0.2, aligning with the defaultin the Scikit-learn package. However, users have the option to adjust thisvalue in thePython software tofine-tune theLSprocess according to specificdataset characteristics or objectives.Phase diagrams are predicted using the derived pi vectors. For the i thdata point, the label yi ¼ argmaxLpi denotes the predicted phase region.Uncertainty sampling. The uncertainty map is generated from theobtained pi vectors, which contain probabilities of belonging to eachphase region2 L for the i th data point. In uncertainty sampling, themostuncertain point in dataset X is selected for informative experiments. Toquantify uncertainty, three types of uncertainty scores are commonlyused, varying according to the data point x. These scores are defined asfollows:uLC xð Þ ¼ 1� Pðk1jxÞ; ð3ÞuMS xð Þ ¼ 1� P k1jx� �� P k2jx� ��  ; ð4ÞuEA xð Þ ¼ �XCk¼1PðkjxÞ logPðkjxÞ; ð5Þwhere elements ofpi are defined asPðkjxiÞwith k 2 L. The indices k1 and k2represent the elementswith thehighest- and second-highest valuesofPðkjxÞrespectively. The uncertainty in the phase diagram is quantified using threedifferent methods, as outlined in Eqs. (3)–(5): the least confident (LC),margin sampling (MS), and entropy-based approach (EA). These methodsdetermine the most uncertain point, denoted as x�, which is proposed forconducting the most informative experiment asx� ¼ argmaxXuðxÞ: ð6ÞAnexperiment conducted at x�, results in the identification of the phase andan increase in the number of labeled data points. This process is pivotal forrefining the phase diagram and enhancing the accuracy of the machinelearning model.In scenarios where experiments are conducted in parallel, the meth-odology requires multiple suggestions. As discussed in ref. 27, twostraightforward but effectivemethods for selectingmultiple candidates havebeen identified:1. Only the uncertainty score (US) ranking: This approach involvesselectingmultiple candidates basedon their descending order of uncertaintyscores.2. Neighbor exclusionmethod: In thismethod,multiple candidates arealso selected based on their descending order of uncertainty scores. How-ever, the neighboring points of the selected candidates are excluded toensure diversity in the selection. This method incorporates a hyperpara-meter K , which determines the extent of exclusion. Data points that arecloser than theK thnearest neighborpoints arenot included in the selection.These strategies are crucial for efficiently exploring the phase diagramspace, particularly when aiming to maximize the information gained fromparallel experiments.Using thermodynamic considerations. Reference 26. presents a studyfocusing on optimizing the investigation of phase diagrams throughthermodynamic considerations. While utilizing the same algorithm asdescribed earlier, this approach demonstrates that incorporating infor-mation about coexisting phases and the phase rule can lead to moreefficient construction of phase diagrams. This methodology involves twokey strategies:1. Utilization of coexisting phases: When coexisting phases are iden-tified through proposed experiments, it simultaneously generates a sub-stantial volume of labeled data. For instance, if two coexisting phases arediscovered, the tie line’s endpoints indicate singlephases,while points on thetie line represent the two-phase region. This information, when used aslabeled data, enriches the machine learning model with extensive phase-related details from a single experiment.2. Application of theGibbs phase rule: TheGibbs phase rule serves as atool to streamline the search process by excluding specific regions from thesearch space. In a ternary phase diagram, for instance, if three coexistingphases are identified, they form a triangle devoid of any other phases.Consequently, this area can be excluded from further investigation. Thisexclusion significantly reduces the number of candidate points, therebyenhancing the efficiency of phase diagram determination.The PDC algorithm remains applicable, with the modification thatdata points deemed unnecessary for search are omitted from the dataset X.This strategic approach not only optimizes the phase diagram explorationprocess but also maximizes the information extracted from each experi-mental result, contributing significantly to the advancement of materialsscience research.Usage of the AIPHAD web application versionThe AIPHADweb application was utilized to investigate five types of phasediagrams: (i) two-variable diagrams, (ii) three-variable diagrams, (iii)ternary sections, (iv) ternary phase diagrams, and (v) quaternary sections.These diagrams are represented in either two- or three-dimensional spaces.The following steps outline the procedure for deriving phase diagrams, asdepicted in Fig. 2:1. Defining the search space: In the “Search Space”menu, users specifyaxis names, parameter ranges, and step sizes.2. Inputting phase information: Experiments are labeled at specificpoints on the phase diagram.Users can select points directly on the diagramor input them in the “Data Table” menu. The application supports bothnumerical and textual input for phase names.Unnecessary unlabeled pointscan be excluded using the “Delete Mode” in the “Data Table”menu.3. Selecting active learning conditions: The “Proposal Method”menuallows users to choose between LP (Label Propagation) and LS (LabelSpreading) as the estimation method, and LC (Least Confident), MS(Margin Sampling), EA (Entropy-based Approach), and RS (RandomSampling) as the sampling method. If “RS” is selected, candidates are pro-posed randomly from unlabeled data. The number of candidates proposedfor informative experiments is determined based on the “only US ranking”strategy. The “Neighbor exclusion” method can be used to remove pointsclose to proposed candidates. Default values for hyperparameters γ and αare set according to Scikit-learn.4. Running calculations: Clicking the “Run” button initiates calcula-tions, with candidate points for informative experiments displayed in boththe phase diagram and the “Data Table.” Additionally, a map of theuncertainty score and the estimated phase diagram are shown. Labeled datado not appear on the uncertainty map. While LS may alter the labels oflabeled data, the phase information displayed in the phase diagram remainsconsistent with the input information.5. Viewing phase probabilities: The “Probability” menu ranks unla-beled points in descending order of probability for the selected phase.Probabilities are evaluated using the chosen phase estimation method.This detailed procedure enables users to efficiently explore and analyzephase diagrams, significantly aiding in the understanding and advancementof materials research.Usage of the AIPHAD Python versionThe AIPHAD Python manual, accessible at https://nims-da.github.io/aiphad/docs/en/index.html, provides comprehensive guidance on its usage.Below is a basic overview of utilizing AIPHAD in Python.Install. AIPHAD is developed in Python3 (requires version 3.6 or higher)and can be installed via PyPI as follows:$ pip3 install aiphadSingle suggestion. The program outlined in Scheme 1 describes thefundamental steps for using the AIPHAD Python package for phasehttps://doi.org/10.1038/s43246-024-00580-7 ArticleCommunications Materials |           (2024) 5:139 4https://nims-da.github.io/aiphad/docs/en/index.htmlhttps://nims-da.github.io/aiphad/docs/en/index.htmldiagram estimation and uncertainty sampling. The program flow can besummarized as follows:1. Import libraries: Initially, the ‘pdc_sampler’ from AIPHADand ‘numpy’ are imported into the Python environment. This stepprepares the necessary functions and data structures for phase dia-gram estimation.2. Specify parameters for ‘pdc_sampler’:– ‘estimation’: Choose between “LP” or “LS” as the method for phasediagram estimation.– ‘sampling’: Select an uncertainty score from “LC”, “MS”, “EA”, or “RS”.– ‘proposal’: Define the number of proposals as an integer, whichdetermines how many points will be suggested for experimentalinvestigation.3. Prepare dataset:– ‘X’: A list representing all candidate points in the discretized phasediagram. The method can handle datasets with arbitrary dimen-sions.– ‘y’: A one-dimensional list corresponding to ‘X’ that contains thelabel data. Each data point where the phase is already known isassigned a phase index from the set L ¼ f1; . . . ;Cg. For unlabeleddata points, an index of −1 is used.4. Phase diagramestimation:Utilize ‘pdc.fit(X, y)’ to estimate the phasediagram using the chosen LP or LS method on the input data arrays.5. Uncertainty sampling: Uncertainty scores are calculated forthe candidate points using ‘pdc.us()’. The indices of candidate pointsFig. 2 | Control panel of the AIPHAD web application. A detailed procedure for investigating phase diagrams using the AIPHAD web application is presented, outliningeach step in the process.https://doi.org/10.1038/s43246-024-00580-7 ArticleCommunications Materials |           (2024) 5:139 5with the highest uncertainty scores are stored in ‘pdc.proposals’, andtheir corresponding position vectors (x2Rd) are in ‘pdc.proposals_X’.The uncertainty scores of the selected points can be accessed using‘pdc.proposals_us’.Multiple suggestion. To generate multiple proposals using the‘pdc_sampler()’ function, the ‘proposal’ argument can be set to thedesired number. Additionally, the function enables the specification ofthe following arguments to tailor the proposal strategy:– ‘multi_method’: This argument defines the method for generatingmultiple proposals. Two options are available: (i) “OU” (Only USranking), this option selects unlabeled data points in descending orderof their uncertainty scores. (ii) “NE” (Neighbor Exclusion), this optionalso ranks unlabeled points by their uncertainty scores but excludespoints that are adjacent to already selected ones. If no method is spe-cified, “OU” is chosen as the default.– ‘NE_k’: Relevant only when the “NE”method is selected, this argumentis set as an integer. It defines the exclusion radius around each selectedpoint, ensuring that no data point within the nearest ‘NE_k’ neighborsof any selected data point is included in the proposal. The default is 1.Hyperparameters. For phase estimation methods, the hyperparametersare γ for the “LP”method, and both γ and α for the “LS”method. In thePython version of AIPHAD, the ‘gamma’ and ‘alpha’ arguments allowusers to modify these hyperparameters from their default values inscikit-learn.Probabilities of belonging to each phase. In the analysis, uncertaintyscores and probabilities for each label were computed for all unlabeledpoints. The original uncertainty score is derived from this data. AIPHADstores the indices of unlabeled data in ‘pdc.unlabeled_index_list’ andtheir associated uncertainty scores in ‘pdc.u_score_list’. Moreover,probabilities for each label are contained within ‘pdc.label_distributions’,arranged according to the phase index. The ‘pdc.label_distributions’facilitate the identification of points with the highest probability ofbelonging to a specified phase.ResultsThe Fe-Ti-Sn system was subject to experimental exploration guided byAIPHAD. In this context, theFe2TiSnHeusler phase, known for its potentialin thermoelectric materials, was examined36–38. The electronic properties ofthe Fe2TiSn Heusler phase, subject to changes by varying the compositionratio39,40, underscores the importance of a ternary phase diagram for accu-rately determining the compositional region where the Heusler phase isstably generated. Additionally, the ordering of each element significantlyinfluences the properties of the Heusler phase40,41, with the atomic orderingwithin the phase varying considerably based on synthesis and annealingtemperatures42. Therefore, the temperature dependence of the phase dia-gram is a vital aspect of this study.This research focused on the stability of the Fe2TiSn Heusler phasewithin a ternary phase diagram, using guidance from AIPHAD. In the Fe-Ti-Sn system, extensive heat treatments, approximately 1000 h at tem-peratures ranging from 800 to 1000 °C, are required to achieveequilibrium43. Notably, the study did not investigate the equilibrium phasediagram such as a previous report44. Instead, the objective was to delineatethe metastable phase diagram, labeling phase regions based on the pre-dominant phase(s) following short-duration heat treatments, particularlyidentifying the region where the Heusler phase is stable. Metastable phasesand unreacted raw materials often remain post-thermal treatment in themetastable phase diagram. AIPHAD’s flexible labeling system facilitates theaccelerated determination of both equilibrium and metastable phase dia-grams, as demonstrated in this study.Experimental detailThe samples for this study were prepared via a solid–liquid reaction usinghigh-purity elemental powders. Specifically, Fe, Ti, and Sn powders, eachwith a purity of 99.99% obtained from Kojundo Chemical LaboratoryCo., were measured in a predetermined ratio. These powders were thenplaced into a boron nitride crucible, sourced from Zikusu Industry Co.,Ltd., with a purity of 99.7%, an outer diameter of 8.5 mm, an innerdiameter of 6.5 mm, and a depth of 18mm. The crucible was sealed in astainless-steel reaction container within an argon-filled glove box(with O2 and H2O levels below 1 ppm). The design of this reactioncontainer aligns with that reported in a previous study45. To synthesizethe sample, this container was heated for 24 h in an electric furnacein an air atmosphere. The crystalline phases present in the synthesizedsamples were identified through powder X-ray diffraction (XRD), uti-lizing a Bruker D2-Phaser system with Cu-Kα radiation at 30 kVand 10mA.Isothermal section at 900 °CThe construction of the phase diagram for the ternary section of the Fe-Ti-Sn system at 900 °C was undertaken using AIPHAD. The phase diagramwas discretized into 231 points with composition increments of 5%. Fromthese candidate points, initial experiments were conducted on seven com-positions, including a Heusler composition. Four distinct phases wereidentified: a Ti-rich phase, a Sn-rich phase, an Fe-rich phase, and a Heuslerphase. The specific compositions at which these phases were found and theresults of the XRD for each composition are summarized in SupplementaryNote 1 and Supplementary Fig. 1.Using these initial seven experiments as labeled data, AIPHADwas employed to propose the next promising points for further investi-gation of the phase diagram. Figure 3 illustrates the proposed pointsand the distributions of uncertainty scores based on the chosenScheme 1 | A basic Python program for AIPHAD. Uncertainty sampling with asingle proposal using AIPHAD.https://doi.org/10.1038/s43246-024-00580-7 ArticleCommunications Materials |           (2024) 5:139 6phase-estimation method (LP or LS) and uncertainty scores (LC, MS,or EA). Notably, the selection of proposed points varied significantlywith different uncertainty scores under the LP method. However,with the LS method, despite minor variations in the distribution ofuncertainty scores, the most uncertain point remained consistent acrossdifferent uncertainty score methods. Furthermore, the phase boundary atwhich the uncertainty score increased was more distinctly observedwhen using the LS method. This approach highlights the potentialof AIPHAD in efficiently navigating the complex landscape of phasediagrams, particularly in systems with multiple phases, such as theFe-Ti-Sn system.In this study, LS combined with the LC method was selected toidentify the next experimental points. This is because that LS can clearlypredict the phase boundaries, and LC would be suitable for constructingternary phase diagrams. From the definition of LC, the uncertainty scoreof the boundary points of three or more phases will be higher than that ofthe boundary of two phases, and to complete ternary phase diagrams,finding invariant points and monovariant lines is essential. Figure 4presents the results of three closed-loop cycles, illustrating the identifiedexperimental points, the distribution of uncertainty scores, and theevolving predicted phase diagram through each cycle. The experimentalpoints proposed by AIPHAD, particularly around the predicted phaseboundaries, indicate that the phase boundaries were accurately deli-neated as the closed-loop cycles progressed. However, no new phaseswere discovered by experiments for selected experimental conditionsduring these cycles, resulting in minimal changes in the outline of thepredicted phase diagram compared to the initial data and the threeadditional experiments.Ternary phase diagramThe ternary phase diagram of the Fe-Ti-Sn system is represented as a tri-angular prism, with ternary sections stacked along the temperature axis.This diagram was discretized for temperatures ranging from 700 °C to1000 °C in 100 °C increments and for compositions in 5% steps. The con-struction of the phase diagram commencedwith the same initial data pointsof the case for the isothermal section at 900 °C, specifically at 900 °C. InAIPHAD, the LSmethodwas employedwith the LC uncertainty score. Theevolution of the uncertainty scores and the estimated phase diagrams acrossiterations are summarized in Fig. 5.Initially, temperatures were set at 700, 800, 900, and 1000 °C, withAIPHAD proposing 14 experimental points using the “Only US” option.The primary phases identified at each proposed point are detailed in Sup-plementary Table 1. Subsequent experimental findings at 700 and 800 °Crevealed three new phase regions: FeSn, FeSn2, and a mixed sample con-tainingunreactedFe andTi, labeled asFe+Ti.These discoveries expand theunderstanding of the Fe-Ti-Sn system, particularly in lower temperatureranges, highlighting the efficacy of AIPHAD in guiding experimentalinvestigations to uncover complex phase relationships in multicomponentsystems.The investigation into the formation of theHeusler phase in the Fe-Ti-Sn system revealed that at 700 °C, the phase did not form, probably due toinsufficiently long heat treatments at this temperature. Consequently, it wasdeemed that further exploration at 700 °C would not yield relevant results.Therefore, the subsequent phase diagram analysis focused on the tem-perature range of 800 °C to 1000 °C, with a finer temperature increment of50 °C, insteadof the initial 100 °C steps. In this temperature range,AIPHADproposed fourteen new experiments. Although these experiments did notFig. 3 | Uncertainty scores for the Fe-Ti-Sn ternary system at 900 °C.Distributionsof uncertainty scores in the Fe-Ti-Sn ternary system at 900 °C are shown, accordingto different phase estimationmethods (LP or LS) and uncertainty scores (LC,MS, orEA). High uncertainty points are marked in dark green, with the red circle high-lighting themost uncertain point proposed byAIPHAD for experimental validation.https://doi.org/10.1038/s43246-024-00580-7 ArticleCommunications Materials |           (2024) 5:139 7lead to the discovery of additional phase regions, they were instrumental infurther clarifying the existing phase boundaries. Utilizing all collected data,including that from the 700 °C experiments, a comprehensive and detailedmetastable phase diagram was constructed, incorporating custom labeling.For each stage of this process, Supplementary Data 1-5 compatible with theAIPHAD web application are available for enhanced visualization. It isimportant to note that the obtained phase diagrams shown in Figs. 4, 5 arethe metastable ones, because we used a short time for heat treatment. Thus,for each point, it is not guaranteed that equilibrium states are observed, andthe shape of the phase diagram is different from the equilibrium onereported in refs. 43,44.Search for a specific phase regionTo specifically target a particular phase region, AIPHAD offers a uniqueapproach. It can display probabilities of unlabeled data belonging to alreadyidentified phases, enabling a focused search within a desired phase region.This capability was demonstrated in the search for stable regions of theHeusler phase along the 900 °C isothermal section of the Fe-Ti-Sn ternarysystem.Using the same initial data as in the case for the isothermal section at900 °C and employing the LS method for phase estimation, six candidatepoints were identified as having a high probability of yielding the Heuslerphase. Subsequent experiments at these locations confirmed the presence ofthe Heusler phase in four points, while the other two points aligned withdifferent phase regions. Figure 6 depicts the evolution of the isothermalsection, both before and after the experiments, as guided by AIPHAD. Thisexploration successfully delineated the region conducive to synthesizing theHeusler phase. In summary, AIPHAD proves to be an effective tool forefficiently identifying specific phase regions of interest within complexmulticomponent systems.ConclusionsIn this study, theAIPHAD toolboxwas introduced as an efficientmeans forphase diagramdetermination usingmachine learning, incorporating both aweb application and a Python program. The underlying PDC algorithm inAIPHADutilizes uncertainty sampling through label propagation and labelspreading methods, which have been elaborately discussed. This studysuccessfully demonstrated that phase diagrams, particularly in the Fe-Ti-Snsystem exhibiting the Heusler phase, can be efficiently constructed withfewer experiments using the PDC algorithm starting from no data. Thisdemonstration shows that AIPHAD is a powerful tool when the phasediagram is constructed from scratch without prior knowledge of the targetsystem. Sample preparation involved solid–liquid reactions of elementalcomponents, with subsequent phase identification conducted throughXRDmeasurements. TheAIPHADwebapplication facilitated the visualizationofthe evolving phase diagram and the delineation of phase boundaries,offering an insightful representation of the material system under study.Additionally, this study incorporated NIMS-OS, designed to enable aseamless integration of robotic experiments and artificial intelligence forautonomous material exploration46. The PDC algorithm’s implementationwithin NIMS-OS enables the visualization of phase diagrams createdthrough autonomous experiments on the AIPHAD platform. AIPHAD’saccessibility and user-friendly interface are anticipated to simplify theFig. 4 | AIPHAD-guided determination of the Fe-Ti-Sn ternary system at 900 °C.AIPHAD-guided determination of the Fe-Ti-Sn ternary section at 900 °C followingshort heat treatments is presented. The LSmethod and LCuncertainty score are usedfor estimation. The figure shows the labeled points, AIPHAD’s experimental pro-posal (green point in red circle), uncertainty score distribution, and estimated phasediagrams.https://doi.org/10.1038/s43246-024-00580-7 ArticleCommunications Materials |           (2024) 5:139 8Fig. 5 | AIPHAD-guided determination of the metastable ternary Fe-Ti-Sn sys-tem. Evolution of the uncertainty score distribution and the predicted phase dia-gram during the AIPHAD-guided determination of themetastable ternary Fe-Ti-Snsystem is presented. The figure also highlights changes in the temperature range anddataset volume, with phase regions labeled according to the predominant phase(s)found after short heat treatments.Fig. 6 | AIPHAD-guided determination of theHeusler phase.Progression of the 900 °C isothermalsection in the Fe-Ti-Sn system before and after sixtargeted experiments to identify theHeusler phase ispresented. The figure shows the labeled points andAIPHAD’s proposals (red circles) and the estimatedphase diagrams after experiments.https://doi.org/10.1038/s43246-024-00580-7 ArticleCommunications Materials |           (2024) 5:139 9construction of phase diagrams using machine learning for a broad userbase. The versatility of the developed method extends beyond theexploration of phase regions or equilibrium diagrams, suggesting wide-ranging applications in material science. Moreover, the integration of thistool with the CALPHADmethod holds significant importance, potentiallyenhancing the efficiency and accuracy of phase diagram predictions indiverse material systems. This study’s findings underscore the potential ofcombining machine learning tools with traditional materials scienceapproaches to advance the field of materials exploration and discovery.Data availabilityAll data generated during this study are included in this published articleand its Supplementary Data 1-5.Code availabilityThe underlying code for this study is available in GitHub and can beaccessed via this link https://github.com/NIMS-DA/aiphad. The manual ofGitHub version can be accessed via this link https://nims-da.github.io/aiphad/docs/en/index.html.Received: 21 March 2024; Accepted: 17 July 2024;References1. Chang, Y. A. et al. Phase diagram calculation: past, present andfuture. Prog. Mater. Sci. 49, 313–345 (2004).2. Kennedy, K., Stefansky, T., Davy, G., Zackay, V. F. & Parker, E. R.Rapid method for determining ternary‐alloy phase diagrams. J. Appl.Phys. 36, 3808–3810 (2004).3. Miracle, D. B. & Senkov, O. N. A critical review of high entropy alloysand related concepts. Acta Mater. 122, 448–511 (2017).4. Enoki, M., Minamoto, S., Ohnuma, I., Abe, T. & Ohtani, H. Currentstatus and future scope of phase diagram studies. ISIJ Int. 63,407–418 (2023).5. Schiffer, P., Ramirez, A. P., Bao,W.&Cheong,S.-W. Low temperaturemagnetoresistance and the magnetic phase diagram ofLa1−xCaxMnO3. Phys. Rev. Lett. 75, 3336–3339 (1995).6. Schmid, G., Todo, S., Troyer, M. & Dorneich, A. Finite-temperaturephase diagram of hard-core bosons in two dimensions. Phys. Rev.Lett. 88, 167208 (2002).7. Reuther, J., Thomale, R. & Trebst, S. Finite-temperature phasediagram of the Heisenberg-Kitaev model. Phys. Rev. B 84,100406 (2011).8. Rajan, K. Materials informatics.Mater. Today 8, 38–45 (2005).9. Pilania, G., Wang, C., Jiang, X., Rajasekaran, S. & Ramprasad, R.Accelerating materials property predictions using machine learning.Sci. Rep. 3, 1–6 (2013).10. Gómez-Bombarelli, R. et al. Automatic chemical design using a data-driven continuous representation of molecules. ACS Cent. Sci. 4,268–276 (2018).11. Agrawal, A. & Choudhary, A. Perspective: materials informatics andbig data: realization of the “fourth paradigm” of science in materialsscience. APL Mater. 4, 053208 (2016).12. Ramprasad, R., Batra, R., Pilania, G., Mannodi-Kanakkithodi, A. &Kim, C. Machine learning in materials informatics: recent applicationsand prospects. npj Comput. Mater. 3, 1–13 (2017).13. Huang, W., Martin, P. & Zhuang, H. L. Machine-learning phaseprediction of high-entropy alloys. Acta Mater. 169, 225–236 (2019).14. Liu, C. et al. Machine learning to predict quasicrystals from chemicalcompositions. Adv. Mater. 33, 2102507 (2021).15. Deffrennes, G., Terayama, K., Abe, T. & Tamura, R. A machinelearning–based classification approach for phase diagramprediction.Mater. Des. 215, 110497 (2022).16. Deffrennes, G., Terayama, K., Abe, T., Ogamino, E. & Tamura, R. Aframework to predict binary liquidus by combining machine learningand CALPHAD assessments.Mater. Des. 232, 112111 (2023).17. Ch’ng, K., Carrasquilla, J., Melko, R. G. & Khatami, E. Machinelearning phases of strongly correlated fermions. Phys. Rev. X 7,031038 (2017).18. Zhang, Y. & Kim, E.-A. Quantum loop topography for machinelearning. Phys. Rev. Lett. 118, 216401 (2017).19. Scheurer, M. S. & Slager, R.-J. Unsupervised machine learning andband topology. Phys. Rev. Lett. 124, 226401 (2020).20. Wong, S., Olthaus, J., Bracht, T. K., Reiter, D. E. &Oh, S. S. Amachinelearning approach to drawing phase diagrams of topological lasingmodes. Commun. Phys. 6, 1–7 (2023).21. Dai, C. & Glotzer, S. C. Efficient phase diagram sampling by activelearning. J. Phys. Chem. B 124, 1275–1284 (2020).22. Ament, S. et al. Autonomous materials synthesis via hierarchicalactive learning of nonequilibrium phase diagrams. Sci. Adv. 7,eabg4930 (2021).23. Tian, Y. et al. Determining multi-component phase diagrams withdesired characteristics using active learning. Adv. Sci. 8,2003165 (2021).24. Terayama, K. et al. Efficient construction method for phase diagramsusing uncertainty sampling. Phys. Rev. Mater. 3, 033802 (2019).25. Terayama, K., Tsuda, K. & Tamura, R. Efficient recommendation toolof materials by an executable file based on machine learning. Jpn. J.Appl. Phys. 58, 098001 (2019).26. Terayama, K. et al. Acceleration of phase diagram construction bymachine learning incorporatingGibbs’ phase rule.Scr.Materialia 208,114335 (2022).27. Tamura, R. et al. Machine-Learning-Based phase diagramconstruction for high-throughput batch experiments. Sci. Technol.Adv. Mater.: Methods 2, 153–161 (2022).28. Katsube, R., Terayama, K., Tamura, R. & Nose, Y. Experimentalestablishment of phase diagrams guided by uncertainty sampling: anapplication to the deposition of Zn–Sn–P films by molecular beamepitaxy. ACS Mater. Lett. 2, 571–575 (2020).29. Hu, W.-H. et al. Topological alternation from structurally adaptable tomechanically stable crosslinked polymer. Sci. Tech. Adv. Mater. 23,66–75 (2022).30. NIMS-DA/aiphad: Artificial Intelligence techniques for PHAseDiagrams. https://github.com/NIMS-DA/aiphad.31. Saito, T. & Kamishima, S. Magnetic and thermoelectric properties ofFe–Ti–Sn alloys. IEEE Trans. Magn. 55, 1–4 (2019).32. Khovailo, A. et al. Structural properties of non-stoichiometricFe–Ti–Sn and Fe–V–Al Heusler alloys.MRS Adv. 8, 681–685 (2023).33. Graf, T., Felser, C. &Parkin, S. S. P. Simple rules for the understandingof Heusler compounds. Prog. Solid State Chem. 39, 1–50 (2011).34. Heusler Alloys: Properties, Growth, Applications. (SpringerCham, 2026).35. Label propagation on scikit-learn: https://scikit-learn.org/stable/modules/generated/sklearn.semi_supervised.LabelPropagation.html.36. Takeuchi, A. & Inoue, A. Classification of bulk metallic glasses byatomic size difference, heat of mixing and period of constituentelements and its application to characterization of the main alloyingelement. Mater. Trans. 46, 2817–2829 (2005).37. Yabuuchi, S., Okamoto, M., Nishide, A., Kurosaki, Y. & Hayakawa, J.Large Seebeck coefficients of Fe2TiSn and Fe2TiSi: first-principlesstudy. Appl. Phys. Express 6, 025504 (2013).38. Bilc, D. I., Hautier, G., Waroquiers, D., Rignanese, G.-M. & Ghosez, P.Low-dimensional transport and large thermoelectric power factors inbulk semiconductors by band engineering of highly directionalelectronic states. Phys. Rev. Lett. 114, 136601 (2015).https://doi.org/10.1038/s43246-024-00580-7 ArticleCommunications Materials |           (2024) 5:139 10https://github.com/NIMS-DA/aiphadhttps://nims-da.github.io/aiphad/docs/en/index.htmlhttps://nims-da.github.io/aiphad/docs/en/index.htmlhttps://github.com/NIMS-DA/aiphadhttps://github.com/NIMS-DA/aiphadhttps://scikit-learn.org/stable/modules/generated/sklearn.semi_supervised.LabelPropagation.htmlhttps://scikit-learn.org/stable/modules/generated/sklearn.semi_supervised.LabelPropagation.htmlhttps://scikit-learn.org/stable/modules/generated/sklearn.semi_supervised.LabelPropagation.htmlhttps://scikit-learn.org/stable/modules/generated/sklearn.semi_supervised.LabelPropagation.html39. Nakabayashi, M. et al. Magnetic and transport properties in Heusler-type Fe2TiSn compound. Phys. B: Condens. Matter 329–333,1134–1135 (2003).40. Buffon,M. L. C. et al. Thermoelectric performance and the role of anti-site disorder in the 24-electron Heusler TiFe2Sn. J. Phys.: Condens.Matter 29, 405702 (2017).41. Ślebarski, A. et al.Weak ferromagnetism inducedbyatomicdisorder inFe2TiSn. Phys. Rev. B 62, 3296–3299 (2000).42. Oikawa,K. et al. Phaseequilibria andphase transition of theNi–Fe–Gaferromagnetic shape memory alloy system.Met. Mater. Trans. A 38,767–776 (2007).43. Cai, Y. et al. Phase equilibria in Fe–Sn–Ti ternary system at 1073Kand1273K. Calphad 49, 110–119 (2015).44. Romaka, L., Romaka, V. V., Stadnyk, Y. & Melnychenko, N. On theformation of ternary phases in the Ti–Fe–Sn ternary system at 773 K.Chem. Met. Alloy. 6, 12–19 (2013).45. Iwasaki, S. et al. Electric transport properties ofNaAlB14withcovalentframeworks. Inorg. Chem. 61, 4378–4383 (2022).46. Tamura, R., Tsuda, K. & Matsuda, S. NIMS-OS: an automationsoftware to implementaclosed loopbetweenartificial intelligenceandrobotic experiments in materials science. Sci. Technol. Adv. Mater.:Methods 3, 2232297 (2023).AcknowledgementsThe authors extend their gratitude to Koji Tsuda, Fumiyasu Oba, HirokiTaniguchi, Hidenori Hiramatsu, and Hideo Hosono for their valuablecontributions and insightful discussions that significantly enhanced thisstudy. In addition,wewould like to strongly acknowledgeKaori Yoshida andYoko Tachibana for experimental contributions. We would like to thankNaotoshi Tominaga for his help in developing the application. This researchwas made possible by the support of the Core Research for EvolutionalScience and Technology (CREST), under the auspices of the Japan Scienceand Technology Agency (JST), and was funded by grants JPMJCR17J2,JPMJCR19J1, and JPMJCR19J3.Author contributionsR.T., H.M., andK.T. conceived the idea and designed the research. R.T. andK.T. developed themethod formachine learning analysis anddeveloped theapplications. H.M. conducted the experimentsof Fe-Ti-Sn system.G.D.,M.N., Y. N., and T. A. conducted the investigation and validation of theapplications andmethod. All members contributed to the preparation of themanuscript.Competing interestsThe authors declare no competing interests.Additional informationSupplementary information The online version containssupplementary material available athttps://doi.org/10.1038/s43246-024-00580-7.Correspondence and requests for materials should be addressed toRyo Tamura, Haruhiko Morito or Kei Terayama.Peer review information Communications Materials thanks theanonymous reviewers for their contribution to the peer review of this work.Primary Handling Editors: Milica Todorović and Aldo Isidori.Reprints and permissions information is available athttp://www.nature.com/reprintsPublisher’s note Springer Nature remains neutral with regard tojurisdictional claims in published maps and institutional affiliations.Open Access This article is licensed under a Creative CommonsAttribution 4.0 International License, which permits use, sharing,adaptation, distribution and reproduction in anymedium or format, as longas you give appropriate credit to the original author(s) and the source,provide a link to the Creative Commons licence, and indicate if changeswere made. The images or other third party material in this article areincluded in the article’s Creative Commons licence, unless indicatedotherwise in a credit line to the material. If material is not included in thearticle’sCreativeCommons licence and your intended use is not permittedby statutory regulation or exceeds the permitted use, you will need toobtain permission directly from the copyright holder. To view a copy of thislicence, visit http://creativecommons.org/licenses/by/4.0/.© The Author(s) 2024https://doi.org/10.1038/s43246-024-00580-7 ArticleCommunications Materials |           (2024) 5:139 11https://doi.org/10.1038/s43246-024-00580-7http://www.nature.com/reprintshttp://creativecommons.org/licenses/by/4.0/ AIPHAD, an active learning web application for visual understanding of phase diagrams Methods Review of the PDC algorithm Initial setting Phase estimation Uncertainty sampling Using thermodynamic considerations Usage of the AIPHAD web application version Usage of the AIPHAD Python version Install Single suggestion Multiple suggestion Hyperparameters Probabilities of belonging to each phase Results Experimental detail Isothermal section at 900 °C Ternary phase diagram Search for a specific phase region Conclusions Data availability Code availability References Acknowledgements Author contributions Competing interests Additional information