# Fileset

[manuscript.docx](https://mdr.nims.go.jp/filesets/9fbced2c-d34a-4bc2-82f4-8065910a2b43/download)

## Creator

[Yukinori Koyama](https://orcid.org/0000-0002-7090-4430), [Yukako Kohriki](https://orcid.org/0000-0002-6858-1273), [Masamichi Harada](https://orcid.org/0000-0002-7321-0733), [Naoto Hirosaki](https://orcid.org/0000-0001-9218-9557), [Takashi Takeda](https://orcid.org/0000-0003-2510-4562)

## Rights

This document is the Accepted Manuscript version of a Published Work that appeared in final form in Chemistry of Materials, copyright © 2024 American Chemical Society after peer review and technical editing by the publisher. To access the final edited and published work see https://doi.org/10.1021/acs.chemmater.4c01981.[In Copyright](http://rightsstatements.org/vocab/InC/1.0/)

## Other metadata

[Accelerating Materials Discovery of Novel Europium(II)-Activated Phosphors through Machine Learning Classification of Europium Valences](https://mdr.nims.go.jp/datasets/6ffc8a71-859d-4299-af90-08cf6a49cc7a)

## Fulltext

Template for Electronic Submission to ACS JournalsAccelerating Materials Discovery of Novel Europium(II)-Activated Phosphors through Machine Learning Classification of Europium ValencesYukinori Koyama, *, a Yukako Kohriki,b Masamichi Harada,b Naoto Hirosaki,b Takashi Takeda ba Center for Basic Research on Materials, National Institute for Materials Science, Tsukuba, Ibaraki 305-0047, Japanb Research Center for Electronic and Optical Materials, National Institute for Materials Science, Tsukuba, Ibaraki 305-0044, Japan.AbstractAn approach is presented to accelerate the discovery of host compounds for novel Eu2+-activated phosphor materials by integrating systematic data collection, machine learning, and experimental validation. A dataset of Eu2+- and Eu3+-activated phosphors has been constructed using systematic data collection methodology from numerous academic articles. A machine-learning classification model has been developed using the collected dataset to predict the oxidation states of Eu ions in potential hosts regarding luminescence. The model considers the non-exclusive nature of the divalent and trivalent oxidation states of Eu ions in phosphor applications. A comprehensive exploration of a material database has been conducted to identify host candidates for novel Eu2+-activated phosphor materials, leading to attempts to synthesize them. Photoluminescence analysis has revealed the successful synthesis of 12 new Eu2+-activated phosphors, demonstrating the potential of the proposed approach for accelerating material discovery.1. IntroductionPhosphors are materials capable of converting external energy, often induced by light or electron irradiation, into light. They typically consist of host materials with doped activators that serve as luminescent centers. Activators are commonly transition metals and lanthanoid elements, and their luminescent characteristics vary greatly depending on the activators. Among the transition metals and lanthanoid activators, intensive research has been conducted on Ce3+ and Eu2+ activators [1-6] because their luminescence utilizes parity-allowed 4f-5d transitions, which are characterized by strong absorption and emission intensities. The broad spatial distribution of 5d orbitals at the excited state strongly influences the luminescence properties of different host materials. Therefore, recent research includes data-driven exploration of novel hosts to design phosphors with the desired luminescent characteristics [7-15].In our previous study [15], we reported machine learning to predict the peak-top wavelengths of the emission spectra of Eu2+-activated phosphors. Based on the machine learning predictions, three new Eu2+-activated phosphor materials exhibiting green luminescence were successfully discovered. However, issues arise with nonluminous compounds and luminescence originating from Eu3+ ions instead of Eu2+. The luminescence of Eu activators can be derived from both divalent and trivalent oxidation states. In contrast to Eu2+ activators, the absorption and emission originating from Eu3+ activators are weak because of the parity-forbidden 4f-4f transitions. Additionally, the shielding effect of 5s and 5p electrons results in nearly constant emission wavelengths, irrespective of the host. Although different synthesis conditions can alter the oxidation states of doped Eu ions, many host materials tend to preferentially adopt one oxidation state. This may be due to limitations in the applicable synthesis conditions depending on the host materials, which could hinder precise control of the oxidation states of the Eu activators. Consequently, selecting suitable hosts is essential to achieve luminescence from a particular oxidation state. The ionic radii and oxidation states of ions substituted with Eu can be considered guidelines for host selection. However, the actual host dependence of the oxidation states of Eu activators is complex.This study aims to accelerate the exploration of new Eu2+-activated phosphors using machine learning to predict the likelihood of Eu2+-derived luminescence in potential host crystals. Figure 1 illustrates a schematic of how this study proceeded. A dataset of Eu-activated phosphors was constructed by extracting host chemical formulae from academic articles. A machine learning classification model was trained to predict whether luminescence was derived from Eu2+ or Eu3+. This classification model was applied to compounds registered in an inorganic material database, and new host candidates were selected based on the likelihood of Eu2+-derived luminescence. Subsequently, the machine learning predictions were validated through synthesis and analysis experiments. This strategy successfully discovered 12 new Eu2+-activated phosphors. This outcome demonstrates the capability of machine learning to effectively avoid candidates prone to Eu3+-derived luminescence, thereby enhancing the discovery of new Eu2+-activated phosphors.Figure 1. Schematic diagram of the procedure used in this study to explore new Eu2+-activated phosphors using machine learning classification.2. Methods2.1. Data collectionPhosphor materials are commonly denoted in academic articles in the form ‘SrLiAl3N4:Eu2+’, where the host chemical formula (SrLiAl3N4) is separated from the luminescent center (Eu2+) by a colon. Based on this standard notation, the following methodology was used to construct a dataset of Eu2+- and Eu3+-activated phosphors:As the initial pool of academic articles on phosphor materials, papers containing the words ‘phosphor’ or ‘lumine’ in the title, abstract, or keywords were extracted using a citation database. Titles and abstracts were obtained in plain text from the citation database. When articles were available in portable document format (PDF) from publisher's websites, the main body was extracted in plain text using robotic process automation and a self-written Python program. When the Data for Text and Data Mining (TDM data) provided by National Institute for Materials Science (NIMS) [16] were available, the main body was obtained in plain text. From the obtained titles, abstracts, and main-body texts, notations in the format of chemical formula, colon, and luminescent center, or inversely, luminescent center, colon, and chemical formula were identified and extracted using a self-written rule-based Python program. The luminescent centers considered were limited to Eu2+ and Eu3+ ions. Phosphors with codoped luminescent centers containing Eu2+ or Eu3+ were included in the dataset. Eu luminescent centers without oxidation states were excluded. Additional information associated with luminescent centers, such as Eu fractions, was disregarded.The following processing steps were undertaken to address data redundancy issues for similar but not identical chemical compositions and to eliminate potential errors arising from the simple mechanical approach to data extraction: (1) A comprehensive breakdown of all possible combinations was performed for chemical formulae listing multiple elements. (2) Compounds containing elements that served as luminescent centers were excluded. Specifically, transition metals Cr-Cu, Tc-Ag, and Re-Au; lanthanoid elements except La, Gd, and Lu; and elements with atomic numbers greater than or equal to Hg were designated luminescent center elements. (3) The variables in the chemical formulae were assigned a value of zero, and non-integer formulae were rounded to the nearest integers. (4) Compounds that violated charge neutrality were excluded. Oxidation states with closed-shell electron configurations, specifically 1s2, ns2 np6 (n = 2-5), ns2 np6 nd10 (n = 3, 4), and 5s2 5p6 4f14, were allowed except for the trivalent oxidation state of Gd. (5) To validate the appropriateness of the formula processing, the modified formulae were cross-referenced with compounds registered in the Inorganic Crystal Structure Database (ICSD) [17]. Formulae that did not correspond to any ICSD entry were excluded. This comparison was conducted after applying the aforementioned processing steps to the formulae registered in ICSD.2.2. Machine learningUsing the dataset of the collected Eu-activated phosphors, a classification model was developed to predict whether luminescence was derived from Eu2+ or Eu3+ ions based on their host compositions. In this study, Eu2+-derived luminescence was considered the positive class, and Eu3+-derived luminescence was considered the negative class. As will be discussed later, Eu2+- and Eu3+-derived luminescence is not necessarily exclusive. Therefore, both positive (Eu2+) and negative (Eu3+) instances were prepared for each host, and the sample weights were assigned based on the ratio of paper counts. The sum of the sample weights for the positive and negative instances was normalized to the unit weight.The input to the classification model is the chemical composition of the host. However, it must be converted into a numerical representation for machine learning applications. In this study, general-purpose descriptors [15, 18, 19] were employed, which were statistical transformations of elemental features that consider atomic fractions of constituent elements. The elemental features used were atomic number; period and group in the periodic table; atomic mass; number of valence electrons of s/p/d/f/all states; number of unoccupied valence states of s/p/d/f/all states; atomic/covalent/van der Waals radii; electronegativity; electron affinity; ionization energy; Mendeleev number; and polarizability. The statistical metrics used were weighted arithmetic/geometric/harmonic means; weighted standard deviation; minimum, maximum, and range. After excluding features with nearly constant values, the features were standardized such that the means were zero and the standard deviations were one.General-purpose descriptors are expected to capture various aspects of the chemical characteristics of the hosts. However, because these descriptors are derived systematically, some may be irrelevant to the target variable. Therefore, recursive feature elimination (RFE) was employed for feature selection. RFE builds a machine learning model, evaluates feature importance, and iteratively removes less important features, narrowing them to useful features as a preprocessing step. The machine learning model used in the RFE step was the same as that in the final classification model. Logistic regression was used for classification, applying L2 regularization and using the logarithmic loss function.In the machine learning pipeline, the number of features selected by the RFE and the (inverse) regularization strength of the logistic regression served as the parameters. These parameters were determined by a grid search to maximize the F-score for the validation data of the 10-fold cross-validation. To ensure that the positive and negative instances of each host were not split between the training and validation sets, the data were split per host in the cross-validation.The pymatgen package [20] and a customized version of the XenonPy package [21] were used to calculate the features. The scikit-learn package [22] was used for machine learning.2.3. Host candidate selectionThe developed classification model was used to select host candidates for new Eu2+-activated phosphors. This study focuses on phosphors with known compounds that have not been reported as hosts for Eu2+-activated phosphors.As candidates for new hosts, compounds composed of elements in the collected Eu-activated phosphor dataset and met the charge neutrality criteria were selected from the AtomWork-Adv inorganic materials database [23]. In AtomWork-Adv, records deemed to be in the same phase (with the same constituent elements and crystal structure type) are grouped into a single ‘substance.’ In this study, candidate compounds were extracted at the substance level to aggregate entries with similar compositions.  The probability of the positive class predicted by the machine learning model was used to measure the likelihood of Eu2+-derived luminescence. Substances with 50% or higher Eu2+-probability were selected as candidates for new hosts. The candidates were further narrowed down for validation experiments. The details of the selection are described below.2.4. ExperimentsThe candidate compounds were synthesized using a solid-state reaction method. Simple oxides, carbonates, sulfides, halides, phosphates, boron, sulfur, and boric acid were used as starting materials. The starting materials were mixed in stoichiometric compositions with 1 at% substitution of Ca, Sr, or Ba with Eu by adding Eu2O3, EuS, or Eu halides. The mixed samples were processed in several ways, namely firing in ambient atmospheres followed by heat treatment post-processes under reducing atmospheres, firing under reducing conditions, and heating with vacuum-sealed quartz ampoules. The choice of the synthesis condition, temperature, and time varied depending on the target compounds.The obtained products were characterized using a powder X-ray diffractometer (XRD) (Bruker, D8 ADVANCE, Cu Kα1 radiation) and a spectrofluorometer (JASCO, FP-8600). An airtight sample holder filled with nitrogen was used to protect air-sensitive samples. The phases and their fractions in the products were identified using the whole powder pattern fitting method (Rigaku, PDXL2). The synthesis was considered successful if the fraction of the target phase was 70 wt% or more. Photoluminescence (PL) measurements were performed for samples with 70 wt% or higher purity. After the emission peak-top wavelengths were identified using excitation-emission matrix (EEM) fluorescence spectroscopy, PL and PL excitation (PLE) spectra were obtained. All measurements were performed at room temperature.3. Results and Discussion3.1. Data collectionFrom academic articles on phosphors, the notations for Eu2+- and Eu3+-activated phosphors were extracted, resulting in 831 host compositions from 21,607 papers on Eu2+-activated phosphors and 1,261 host compositions from 24,430 papers on Eu3+-activated phosphors. Among them, 387 hosts had papers on both Eu2+- and Eu3+-activated phosphors, resulting in a total of 1,705 hosts. It is essential to note that our host extraction method does not consider context. Given the nature of academic articles, we assumed that if a phosphor material was mentioned, it was likely to be successfully synthesized and observed for luminescence and that descriptions of failed synthesis or nonluminous behavior were infrequent. Despite the inaccuracy introduced by ignoring context, we expected that collecting a large amount of data would mitigate its impact.The host with the most significant paper count was Y2O3, with 2,179 papers (eight on Eu2+-activated phosphor and 2,171 on Eu3+-activated phosphor). Six hosts, including Y2O3, had 1,000 papers or more, whereas 390 hosts had only one paper. To analyze the distribution of paper counts, the number of hosts for each paper count , where H is the set of hosts and Nh is the paper count of host h, is illustrated in Fig. 2a. In the region of fewer papers, roughly below 20, a nearly linear relationship was observed in the log-log plot. Linear regression in the range of 1-20 papers yielded a slope of -1.04 (standard error: 0.05), indicating an inverse proportionality between the number of hosts and paper count. In the larger paper count region, typically above 30, the scatter of the data points was larger. To avoid the impact of the scattering of the data points in examination of the relationship between the number of hosts and the paper count, the number of hosts that exceeded the given paper count  is illustrated in Fig. 2b. Note that f(n) is derived from g(n) as . In the range of 30-1,000 papers, linear regression in the log-log plot of Fig. 2b yielded a slope of -1.162 (standard error: 0.005). Therefore, the number of hosts for each paper count was approximately inversely proportional to the square of the paper count in this range. The significance of the shift in the exponent of the paper count from -1 to -2 and that of the boundary around 20-30 papers are unclear. However, the findings indicate that the paper counts of the hosts follow long-tailed power-law distributions in both regions of few and many papers. In other words, most hosts have a small number of papers, whereas a few hosts have a large number of papers.The ratio of the paper count on Eu2+-activated phosphor (hereinafter referred to as the Eu2+-paper ratio) for each host, plotted against the paper count, is shown in Fig. 2c. In the figure, the markers overlap, and many hosts belong exclusively to either Eu2+- or Eu3+-activated phosphors. However, as previously mentioned, 387 of 1,705 hosts (22.7%) have published papers on both Eu2+- and Eu3+-activated phosphors. Noise and mislabeling were probable in the data collected in this study. For hosts such as Y2O3, in which the paper counts are large and either Eu2+- or Eu3+-activated phosphors dominate, it seems reasonable to disregard papers on their minor counterparts, assuming that the minor counterparts are due to mislabeling. However, there are phosphors such as CaF2, in which numerous papers have reported both Eu2+-activated (115 papers) and Eu3+-activated (119 papers) phosphors, challenging the assumption that the oxidation states of Eu ions are exclusively divalent or trivalent in all hosts. For hosts with few papers on both Eu2+- and Eu3+-activated phosphors, it is also challenging to determine whether this is due to mislabeling or whether the hosts support both oxidation states.The oxidation states of Eu ions inferred from the paper count ratios appear to be more reliable when the host has a larger paper count. Therefore, it was assessed whether it was appropriate to assign more weight to hosts with larger paper counts. The hosts were divided into five groups based on their paper counts. The range of the paper counts for each group was determined to ensure an approximately equal number of hosts in each group. The Eu2+-paper ratio of each host was averaged within each group and compared among groups, as shown in Fig. 2d. The average Eu2+-paper ratio for the entire dataset was 38.8%. In the top group, with 27 or more papers, the average Eu2+-paper ratio was 45.7%, which was significantly higher than that in the other groups. It could be assumed that the oxidation state of Eu ions in a host is independent of its paper count. However, the paper count, which reflects researchers’ interests, is biased by the oxidation state, particularly in well-studied phosphor materials. Therefore, the emphasis on hosts with larger paper counts may have incorporated this bias.Figure 2. Characteristics of the collected dataset of Eu-activated phosphors: (a) number of hosts for each paper count, (b) number of hosts exceeding the paper count, (c) ratio of paper count on Eu2+-activated phosphor against its paper count, and (d) average Eu2+-paper ratio of each group divided based on the paper count. The lines in panels (a) and (b) show linear regression on 1-20 and 30-1,000 papers, respectively. The dashed line in panel (d) denote the average Eu2+-paper ratio for the entire dataset (38.8%).Constituent elements of the collected hosts ranged from H to W, excluding transition metals Cr-Cu and Tc-Ag, lanthanoids other than La, Gd, and Lu, and noble gases. The elements present in more than 5% of hosts were O, Ca, Si, Ba, Sr, P, La, Al, B, Y, Na, Mg, Li, F, N, K, Gd, and Zn, in descending order of host count. The constituent elements were very similar to those in the recently published inorganic phosphor optical property (IPOP) dataset [24]. To examine the relationship between the constituent elements and Eu oxidation states, the Eu2+-paper ratios were analyzed, focusing on the frequently used constituent elements and some other characteristic elements. Here, hosts were divided into two groups for each element: hosts containing the element of interest and those that did not. The average Eu2+-paper ratio was compared between the two groups, as illustrated in Fig. 3. First, considering the anions, the average Eu2+-paper ratio for hosts containing O was significantly smaller than that for hosts without O. In contrast, hosts containing other anions (N, S, Cl, Br, or I) had higher average Eu2+-paper ratios. This suggests that the oxide hosts are more likely to be used in Eu3+-activated phosphors. In contrast, non-oxide hosts are generally synthesized in inert gases, which are suitable for obtaining the divalent oxidation state of the Eu ions. Interestingly, the average Eu2+-paper ratio showed a slight difference for F, indicating that fluoride hosts were studied in both the Eu2+- and Eu3+-activated phosphors.Next, the impact of the cations was investigated. Hosts containing Mg, Ca, Sr, Ba, Al, Si, and P had higher average Eu2+-paper ratios, whereas those containing Y, La, Gd, Zn, and B had lower ratios. Cation elements are categorized into two groups. The first group includes cations with large ionic radii, namely, Na, K, Ca, Sr, Ba, Y, La, and Gd, which are likely to be substituted with Eu. Among them, the alkaline earth metals Ca, Sr, and Ba, with ionic radii close to Eu2+ and the same divalent oxidation state, were suitable for substitution with Eu2+. Similarly, the rare-earth elements Y, La, and Gd were designed as substitution sites for Eu3+. Na and K showed minor differences in the average Eu2+-paper ratios between the hosts with and without these elements. These alkaline metals may only occasionally be considered substitution sites with either divalent or trivalent Eu in material design because of their different oxidation states.The second cation group includes Li, Mg, Zn, B, Al, Si, and P, with relatively small ionic radii contributing to the crystal structure framework. The higher preference for Mg, Al, Si, and P and less preference for Zn in Eu2+-activated phosphors may be due to achieving wide band gaps. Generally, the 4f level of Eu2+ is higher than that of Eu3+, and the 5d level is even higher [25]. Therefore, the wide band gaps of the hosts are essential for Eu2+-activated phosphors to locate the 5d levels of the excited states within the band gaps. Hosts containing V, Nb, Ta, Mo, and W were exclusively used in the Eu3+-activated phosphors (not shown in Fig. 2). Oxyanion complexes of Group 5 and 6 cations are often used to enhance the absorption and emission intensities of Eu3+-activated phosphors by utilizing ligand-to-metal charge-transfer transitions.Figure 3. Average Eu2+-paper ratios of hosts containing each constituent element (blue) and those that did not (red). The dashed line denotes the average Eu2+-paper ratio for the entire dataset (38.8%).3.2. Machine learningUsing the dataset of the collected Eu-activated phosphor materials, a machine learning classification model was developed to predict the oxidation states of Eu ions in the hosts regarding luminescence based on their host compositions. Given the focus on Eu2+-activated phosphor materials in this study, the luminescence derived from Eu2+ ions was treated as the positive class, and that from Eu3+ ions as the negative class. As discussed in the previous section, the oxidation state of Eu ions in a host is not necessarily exclusive to the divalent or trivalent states. Therefore, the data for each host was duplicated, and positive and negative labels were assigned based on the ratio of paper counts to sample weights. The sum of the sample weights for the positive and negative instances of each host was normalized to the unit weight. The classifier was designed to output not only a binary class of Eu2+ or Eu3+ but also the probability of the positive class (Eu2+), using a log-loss function to train the model to predict the probability aligned with the Eu2+-paper ratio. Our preliminary investigation examined training data in another format, in which binary labels were assigned to one oxidation state with more papers. Accuracy, precision, recall, and F-score for the hold-out test data differed insignificantly by less than 2% between the sample-weight and binary-label methods. Binarization requires assigning a single label even in cases where the numbers of papers on Eu2+ and Eu3+ are similar, as in the case of CaF2; therefore, the sample-weight method was adopted in this study. The average Eu2+-paper ratio of the entire dataset was 38.8%, and the collected dataset showed an imbalance between the positive and negative classes. We deemed the class imbalance insignificant and did not perform any corrections against the class imbalance.Notably, the classification model in this study was trained using paper ratios of Eu2+- and Eu3+-activated phosphors. Therefore, it predicts which luminescence originating from Eu2+ or Eu3+ is more plausible. This does not imply that the model predicts the ratio of Eu2+ to Eu3+ ions in the host. The model also assumes that luminescence is observed. Furthermore, the oxidation state of Eu ions in the host varies depending on the synthesis conditions. However, information regarding the synthesis process was not included in the collected dataset. Therefore, the model implicitly assumes appropriate synthesis conditions for each phosphor material.Table 1 presents the parameter ranges of the machine learning model and the optimum values, and Table 2 shows the accuracy, precision, recall, and F-score for the training and validation datasets in cross-validation. In this evaluation, a predicted Eu2+-probability of 50% or more was considered a prediction of Eu2+-derived luminescence. An accuracy of 84.8% was achieved for the validation dataset. Given the emphasis on Eu2+-activated phosphor materials in this study, precision, recall, and F-score were prioritized for positive instances. Values of 80% or higher were obtained for these scores for the validation dataset. The differences in these metrics between the training and validation datasets were slight, suggesting that overfitting was insignificant in the present machine learning. The feature importance of the logistic regression is discussed in section S1 of the Supporting Information. While the use of general-purpose descriptors used in this study makes it difficult to interpret the correlation between the features and chemical composition, the interpretation of feature importance with significant contributions was consistent with the average Eu2+-paper ratios shown in the previous section.We also examined the gradient-boosted-trees method as a classifier in the machine learning pipeline instead of logistic regression. The results of 10-fold cross-validation using the gradient-boosted-trees method are summarized in section S2 in the Supporting Information. The gradient-boosted-trees model exhibited higher scores for validation data by 1.3% to 3.7% than the logistic regression model. However, the classification results for the training data were almost perfect, suggesting overfitting. As described in the previous section, there is a variation in the oxidation states of Eu ions reported in academic papers. This variation affects the accuracy of machine learning. We believe that the inclusion of inaccurate information in real-world data is inevitable. Therefore, in this study, we did not aim to achieve perfect or state-of-the-art machine learning. Instead, we aimed at reasonable machine learning that was useful for materials search. To this end, we used logistic regression, which has a low risk of overfitting.Table 1. Parameter ranges and optimum values of the machine learning classification model. Parameter Candidates Optimum Number of selected features in RFE 10, 20, …, 100 90 Inverse regularization strength in logistic regression 10–5, 10–4, …, 10+5 10+3Table 2. Classification metrics of the machine learning model with the optimum parameters evaluated for the training and validation datasets in 10-fold cross-validation. Metric Training Validation Accuracy 86.6% (0.4%) 84.8% (2.5%) Precision 82.3% (0.5%) 80.3% (5.6%) Recall 83.3% (0.8%) 80.4% (3.2%) F-score 82.8% (0.5%) 80.3% (3.5%)The scores were averaged among the folds. Standard deviations over folds are shown in parentheses.3.3. Host candidate selectionFor candidate compounds as hosts for new Eu2+-activated phosphors, inorganic ionic crystals composed of elements appearing in the collected Eu-activated phosphor dataset were extracted from the AtomWork-Adv materials database, resulting in 26,305 compounds. Applying the machine learning model to these compounds, 8,947 compounds were predicted to have a Eu2+ probability of 50% or higher. To narrow the list of candidate compounds for experimental validation, the following procedure was employed:Considering that most Eu-activated phosphor materials are designed with Eu ions of substitutional impurities, compounds containing alkaline earth metals Ca, Sr, or Ba are deemed suitable for Eu2+-activated phosphors. This is also evidenced by the collected data (Fig. 3). Therefore, compounds that did not contain these alkaline earth metals were excluded. Non-oxide compounds are suitable for Eu2+, whereas oxides that tend to exhibit lower Eu2+ probabilities are preferable for practical applications. Hence, oxides, sulfides, halides, and mixed-anion compounds were selected to validate the machine learning predictions. Furthermore, compounds requiring special synthesis methods such as high-pressure synthesis, compounds requiring hard-to-obtain starting materials, and compounds inferred to be difficult to synthesize based on our prior experience and experimental equipment were excluded. Hosts for known Eu2+-activated phosphor materials were also excluded. Finally, 35 candidate compounds were selected (Table 3).Table 3. Candidate compounds for Eu2+-activated phosphors with predicted Eu2+-probability, sample purity estimated by powder XRD analysis, space group of the target phase, and summary of photoluminescence measurements. Composition Eu2+ probability (%) Sample purity (wt%) Space group Photoluminescence λem (nm) λex (nm) Fluoride       Cs2CaF4 94 100 I 4/m m m (139) No   Chloride       Cs3Ca2Cl7 98 26 I 4/m m m (139)    Rb3Ca2Cl7 94 72 I 4/m m m (139) Eu2+ 440 352 RbSrCl3 91 98 P n m a (62) Eu2+ 426 343 Bromide       NaCaBr3 100 16 R -3 (148)    Na6CaBr8 100 0     Cs2CaBr4 100 91 I 4/m m m (139) Eu2+ 450 356 Rb4CaBr6 99 85 R -3 c (167) Eu2+ 451 363 Sulfide       SrB2S4 99 51 R -3 (148)    BaB2S4 99 87 C c (9) Eu2+ 623 321 LiSrB3S6 99 89 C c (9) No   LiBaB3S6 99 76 C c (9) No   BaGe2S5 92 84 F d -3 m (227) No   Ba2Ga8GeS16 90 82 P 63 m c (186) No   Ba3Y2S6 88 0     SrSc2S4 84 87 P n m a (62) No   CaSc2S4 74 85 P n m a (62) No   BaLa2ZnS5 68 77 I 4/m c m (140) Eu3+   Oxide       NaLi3BaB6O12 54 97 R -3 (148) Eu2+ 381 300 Oxyfluoride       KNaCaMg5Si8O22F2 96 95 C 2/m (12) Eu2+ 431 335 Na3Ba2B6O12F 62 100 P 63/m (176) No   Oxybromide       Ba2OBr2 100 69 I b a m (72)    Oxyiodide       Sr2OI2 100 73 I b a m (72) Eu2+ 611 257 Ba2OI2 100 81 I b a m (72) No   Sr4OI6 100 75 P 63 m c (186) Eu2+ 471 318 Ba4OI6 100 96 P 63 m c (186) Eu2+ 476 325 Ba2PO4I 92 94 P 21/c (14) Eu2+ 418 304 Sr2PO4I 77 97 P 21/c (14) Eu2+ 420 293 Oxysulfide       Sr2ZnGe2S6O 100 98 P -4 21 m (113) No   Ba2ZnGe2S6O 100 74 P -4 21 m (113) No   BaZnSO 94 93 C m c m (63) No   Ba10(PO4)6S 87 84 P -3 (147) No   Sr10(PO4)6S 86 92 P -3 (147) No   Ca10(PO4)6S 79 81 P -3 (147) No   Sulfide-chloride       NaBa4Ge3S10Cl 99 71 P 63 (173) No  3.4. ExperimentsThe synthesis of the selected 35 candidate compounds was attempted using a solid-state reaction method. A summary of the powder XRD and PL measurements is provided in Table 3. The space groups of the identified target phases are listed in the table. However, it is vital to note that the collected data included only the chemical formulae of the hosts and no crystal structures. Therefore, the machine learning model developed in this study used only chemical compositions as inputs and did not distinguish between polytypes.Six candidates did not achieve a sample purity of 70 wt%, and we considered the synthesis failed. Powder PL measurements were performed for the other 29 samples with a purity of 70 wt% or higher. The 16 samples denoted as ‘No’ in the table showed no significant luminescence intensities. Synthesis conditions, powder XRD patterns, and XRD analysis results of the 13 luminescent samples are summarized in Section S3 of the Supporting Information. Powder XRD analysis indicated that these powder products were mixtures of target compounds and impurity phases. As the PL spectra of the powder samples are largely influenced by impurity phases with bright luminescence, it was carefully confirmed that the PL intensities increased with increasing fractions of the target phases. It was also confirmed by microscopic observations that most particles exhibited photoluminescence under ultraviolet light. Finally, we concluded that the photoluminescence originated from the target phases. Figure 4 illustrates the PL and PLE spectra of the 13 luminescent samples. Relatively broad emission spectra were observed from 12 of the 13 luminescent samples, and it was determined that these were derived from the Eu2+ activators. These candidate compounds were two chlorides, Rb3Ca2Cl7:Eu2+ and RbSrCl3:Eu2+; two bromides, Cs2CaBr4:Eu2+ and Rb4CaBr6:Eu2+; a sulfide BaB2S4:Eu2+; an oxide NaLi3BaB6O12:Eu2+; an oxyfluoride KNaCaMg5Si8O22F2:Eu2+; and five oxyiodides, Sr2OI2:Eu2+, Sr4OI6:Eu2+, Ba4OI6:Eu2+, Ba2PO4I:Eu2+, and Sr2PO4I:Eu2+. Most compounds showed blue to near-ultraviolet emissions (381-476 nm peak-top wavelengths), whereas BaB2S4:Eu2+ and Sr2OI2:Eu2+ showed red emissions (623 and 611 nm, respectively). The remaining sample exhibited a sharp emission spectrum characteristic of the Eu3+ activator. The compound that exhibited the Eu3+-derived luminescence was BaLa2ZnS5:Eu3+.Since the data used in this study were extracted from academic articles on phosphors, the present machine learning model deals with conditional classification in the case of luminescence. Looking at the experimental results from this perspective, of the 13 candidate compounds that showed photoluminescence, 12 showed luminescence derived from Eu2+. The yield was as high as 12/13 (92%). Compounds containing sulfur and halogens showed high average Eu2+-paper ratios. Therefore, we discuss O-containing compounds here. The average Eu2+-paper ratio of O-containing compounds was 30.8%. In our previous study [15], three Eu2+-activated phosphors were discovered from ten luminescent oxides, and the yield of 3/10 was consistent with the average Eu2+-paper ratio of O-containing compounds. In contrast, this study discovered seven O-containing compounds showing photoluminescence, and all were derived from Eu2+. These results clearly demonstrate the potential of machine learning classification of the oxidation state of Eu ions to avoid candidates prone to Eu3+-derived luminescence.In this study, we only tested compounds predicted to exhibit luminescence derived from Eu2+ to accelerate the discovery of Eu2+-activated phosphors. As a result, we validated precision of the prediction but have yet to validate recall from a machine learning perspective. We believe that precision is more important than recall in materials search, because discovering new materials is more important than missing candidates, and the cost of experiments is high. Therefore, we conducted the validation experiments focusing only on precision in this study.In conjunction with the previous study, nonluminous compounds pose a challenge. Negative data, such as nonluminous results, are infrequently reported in academic articles, and there is no standardized notation method. Therefore, collecting data through mechanical methods from the text, as in this study, is challenging and remains an issue for the future.Figure 4. PL (blue) and PLE (red) spectra of the 13 candidate compounds showing photoluminescence.4. ConclusionsThis study aimed to explore the potential host compounds for Eu2+-activated phosphors. Utilizing a dataset of Eu2+- and Eu3+-activated phosphors collected from numerous academic articles, a machine learning classification model was developed to predict the luminescence originating from Eu2+ or Eu3+ ions in potential hosts based on their compositions. The classification model incorporates that the oxidation states of Eu ions are not necessarily mutually exclusive between Eu2+ and Eu3+ ions, as evidenced by the collected data.Following machine learning predictions, a comprehensive selection process was undertaken to identify host candidates for new Eu2+-activated phosphors. The criteria included the presence of alkali earth elements Ca, Sr, or Ba in the compounds; selection of oxides, sulfides, halides, and mixed-anion compounds with high Eu2+-probabilities; and exclusion of certain compounds based on synthetic challenges or unavailability of precursors. Experimental synthesis attempts were made for 35 selected candidate compounds. Subsequent analyses revealed the successful synthesis of 29 samples and PL measurements were performed. The observed luminescence results demonstrated a high yield rate for Eu2+-derived luminescence, with 12 of 13 luminescent samples. In summary, this study integrated systematic data collection, machine learning, and experimental validation to advance the development of Eu2+-activated phosphors. Notably, 12 new Eu2+-activated phosphor materials were discovered, and this highlights the potential of this approach.Supporting Information.The following file is available free of charge.Feature importance of the machine learning model; machine learning classification with gradient boosted trees method; synthesis conditions, powder XRD patterns, and XRD analysis results of the samples exhibiting photoluminescence. (PDF)Corresponding Author* E-mail: KOYAMA.Yukinori@nims.go.jpAuthor ContributionsAll authors have given approval to the final version of the manuscript.Funding SourcesPart of this work was supported by the Japan Science and Technology Agency (JST), CREST Gant Number JPMJCR19J2.AcknowledgementsPart of this work was supported by the Japan Science and Technology Agency (JST), CREST Gant Number JPMJCR19J2. Part of this study was conducted using Data for Text and Data Mining (TDM data) provided by the data platform for materials science research DICE of National Institute for Materials Science.References1 Li, G. G.; Tian, Y.; Zhao, Y.; Lin, J. Recent progress in luminescence tuning of Ce3+ and Eu2+-activated phosphors for pc-WLEDs. Chem. Soc. Rev. 2015, 44, 8688-8713.2 Qin, X.; Liu, X. W.; Huang, W.; Bettinelli, M.; Liu, X. G. Lanthanide-Activated Phosphors Based on 4f-5d Optical Transitions: Theoretical and Experimental Aspects. Chem. Rev. 2017, 117, 4488-4527.3 Li, S. X.; Xie, R. J.; Takeda, T.; Hirosaki, N. Review - Narrow-Band Nitride Phosphors for Wide Color-Gamut White LED Backlighting. ECS J. Solid State Sci. Technol. 2018, 7, R3064-R3078.4 Leaño, J. L.; Fang, M. H.; Liu, R. S. Review - Narrow-Band Emission of Nitride Phosphors for Light-Emitting Diodes: Perspectives and Opportunities. ECS. J. Solid State Sci. Technol. 2018, 7, R3111-R3133.5 Luo, X. F.; Xie, R. J. Recent progress on discovery of novel phosphors for solid state lighting. J. Rare Earths 2020, 38, 464-473.6 Zhou, X. Q.; Qiao, J. W.; Xia, Z. G. Learning from Mineral Structures toward New Luminescence Materials for Light-Emitting Diode Applications. Chem. Mater. 2021, 33, 1083-1098.7 Park, W. B.; Singh, S. P.; Kim, M.; Sohn, K. S. Phosphor Informatics Based on Confirmatory Factor Analysis. ACS Comb. Sci. 2015, 17, 317-325.8 Ha, J. M.; Wang, Z. B.; Novitskaya, E.; Hirata, G. A.; Graeve, O. A.; Ong, S. P.; McKittrick, J. An integrated first principles and experimental investigation of the relationship between structural rigidity and quantum efficiency in phosphors for solid state lighting. J. Lumin. 2016, 179, 297-305.9 Nakano, H.; Tanaka, K.; Miyao, T.; Funatsu, K.; Shirasawa, R.; Tomiya, S. Practical Models for Predicting the Emission Peak Wavelengths of Inorganic Phosphors Based on Stoichiometric Information. Chem. Lett. 2017, 46, 1482-1485.10 Zhuo, Y.; Tehrani, A. M.; Oliynyk, A. O.; Duke, A. C.; Brgoch, J. Identifying an efficient, thermally robust inorganic phosphor host via machine learning. Nat. Commun. 2018, 9, 4377-1 – 4377-10.11 Li, S. X.; Xia, Y. H.; Amachraa, M.; Hung, N. T.; Wang, Z. B.; Ong, S. P.; Xie, R. J. Data-Driven Discovery of Full-Visible-Spectrum Phosphor. Chem. Mater. 2019, 31, 6286-6294.12 Lai, S. Q.; Zhao, M.; Qiao, J. W.; Molokeev, M. S.; Xia, Z. G. Data-Driven Photoluminescence Tuning in Eu2+-Doped Phosphors. J. Phys. Chem. Lett. 2020, 11, 5680-5685.13 Park, C.; Lee, J. W.; Kim, M.; Lee, B. D.; Singh, S. P.; Park, W. B.; Sohn, K. S. A data-driven approach to predicting band gap, excitation, and emission energies for Eu2+-activated phosphors. Inorg. Chem. Front. 2021, 8, 4610-4624.14 Zhang, X.; Zhou, Z.; Ming, C.; Sun, Y.-Y. GPT-Assisted Learning of Structure-Property Relationships by Graph Neural Networks: Application to Rare-Earth-Doped Phosphors. J. Phys. Chem. Lett. 2023, 14, 11342-11349.15 Koyama, Y.; Ikeno, H.; Harada, M.; Funahashi, S.; Takeda, T.; Hirosaki, N. Rapid discovery of new Eu2+-activated phosphors with a designed luminescence color using a data-driven approach. Mater. Adv. 2023, 4, 231-239.16 Data for Text and Data Mining (TDM data). The data platform for materials science research DICE, National Institute for Materials Science, Japan. https://dice.nims.go.jp/services/TDM-PF/. (acquired on August 5, 2020)17 Inorganic Crystal Structure Database (ICSD). FIZ Karlsruhe GmbH, Germany. https://icsd.products.fiz-karlsruhe.de/. (accessed on January 12, 2023)18 Ward, L.; Agrawal, A.; Choudhary, A.; Wolverton, C. A general-purpose machine learning framework for predicting properties of inorganic materials. npj Comput. Mater. 2016, 2, 16028-1 – 16028-7.19 Seko, A.; Hayashi, H.; Nakayama, K.; Takahashi, A.; Tanaka, I. Representation of compounds for machine-learning prediction of physical properties. Phys. Rev. B 2017, 95, 144110-1 – 144110-11.20 Ong, S. P.; Richards, W. D.; Jain, A.; Hautier, G.; Kocher, M.; Cholia, S.; Gunter, D.; Chevrier, V. L.; Persson, K. A.; Ceder, G. Python Materials Genomics (pymatgen): A robust, open-source python library for materials analysis. Comput. Mater. Sci. 2013, 68, 314-319.21 XenonPy. https://github.com/yoshida-lab/XenonPy/. (version 0.6.7)22 Pedregosa, F.; Varoquaux, G.; Gramfort, A.; Michel, V.; Thirion, B.; Grisel, O.; Blondel, M.; Prettenhofer, P.; Weiss, R.; Dubourg, V.; Vanderplas, J.; Passos, A.; Cournapeau, D.; Brucher, M.; Perrot, M.; Duchesnay, É. Scikit-learn: Machine Learning in Python. J. Mach. Learn. Res. 2011, 12, 2825-2830.23 AtomWork-Adv. National Institute for Materials Science, Japan. https://atomwork-adv.nims.go.jp/. (accessed on November 7, 2023)24 Jang, S.; Na, G. S.; Choi, Y.; Chang, H. Optical property dataset of inorganic phosphor. Sci. Rep. 2024, 14, 7639-1 – 7639-10.25 Dorenbos, P. A Review on How Lanthanide Impurity Levels Change with Chemistry and Structure of Inorganic Compounds. ECS J. Solid State Sci. Technol. 2013, 2, R3001-R3011.Table of Contents Graphic11image4.pngimage5.pngimage1.pngimage2.pngimage3.png