# Fileset

[d3cp06298h.pdf](https://mdr.nims.go.jp/filesets/eebd8733-f17a-45b6-946e-64f5646c4caf/download)

## Creator

[Anh Khoa Augustin Lu](https://orcid.org/0000-0003-4702-0933), [Jianbo Lin](https://orcid.org/0000-0003-0769-9857), Yasunori Futamura, Tetsuya Sakurai, [Ryo Tamura](https://orcid.org/0000-0002-0349-358X), [Tsuyoshi Miyazaki](https://orcid.org/0000-0003-3534-4404)

## Rights

[Creative Commons BY Attribution 4.0 International](https://creativecommons.org/licenses/by/4.0/)

## Other metadata

[Extraction of local structure differences in silica based on unsupervised learning](https://mdr.nims.go.jp/datasets/b64b402c-8483-417d-8a21-c24d3b0854f9)

## Fulltext

Extraction of local structure differences in silica based on unsupervised learningThis journal is © the Owner Societies 2024 Phys. Chem. Chem. Phys., 2024, 26, 11657–11666 |  11657Cite this: Phys. Chem. Chem. Phys.,2024, 26, 11657Extraction of local structure differences in silicabased on unsupervised learning†Anh Khoa Augustin Lu, *ab Jianbo Lin, c Yasunori Futamura,defTetsuya Sakurai,def Ryo Tamura *cg and Tsuyoshi Miyazaki *afhSilica exhibits a rich phase diagram with numerous stable structures existing at different temperature andpressure conditions, including its glassy form. In large-scale atomistic simulations, due to the smallenergy difference, several phases may coexist. While, in terms of long-range order, there are cleardifferences between these phases, their short- or medium-range structural properties are similar formany phases, thus making it difficult to detect the structural differences. In this study, a methodologybased on unsupervised learning is proposed to detect the differences in local structures between eightphases of silica, using atomic models prepared by molecular dynamics (MD) simulations. A combinationof two-step locality preserving projections (TS-LPP) and locally averaged atomic fingerprints (LAAF)descriptor was employed to find a low-dimensional space in which the differences among all the phasescan be detected. From the distance between each structure in the found low-dimensional space, thesimilarity between the structures can be discussed and subtle local changes in the structures can bedetected. Using the obtained low-dimensional space, the b-a transition in quartz at a low temperaturewas analyzed, as well as the structural evolution during the melt-quench process starting from a-quartz.The proper differentiation and ease of visualization make the present methodology promising forimproving the analysis of the structure and properties of glasses, where subtle differences in structureappear due to differences in the temperature and pressure conditions at which they were synthesized.1 IntroductionSilica (SiO2) is one of the most abundant materials on earth asthe major constituent of sand and is an important ingredientfor a wide range of applications, such as concrete (with silicafume1), photonics2 or as a gate dielectric in metal–oxide–semiconductors field effect transistors.3 It is the simplest ofthe silicates and yet it is one of the most technologicallyrelevant.4 Several crystalline forms exist at different pressureand temperature conditions (quartz, coesite, stishovite, cristo-balite, tridymite, etc.) as well as in glass form.5–11 For mostphases, Si atoms are four-coordinated and form tetrahedra withthe surrounding O atoms. However, high-pressure phases suchas stishovite and seifertite have six-coordinated Si atoms.Therefore, the external conditions under which the process tosynthesize a silica sample is conducted, has an enormousimpact on its atomic structure and therefore are stronglyrelated to their properties. Understanding the structuralchanges is thus a necessity in order to engineer new materialswith enhanced properties. Moreover, for disordered materialssuch as silica glass, the structural differences are subtle andmay be localized in a relatively small region, yet exceeding thefirst-neighbor shell.In recent times, the progress in atomistic simulations hasenabled studies of the dynamics of a system with first-principles accuracy.12,13 In order to properly analyze the struc-tural properties, defining quantities of interest is of utmostimportance.14–17 Over the last decades, several descriptors havebeen developed in order to describe local structures, such asthe smooth overlap of atomic positions (SOAP).18 Ring statisticsbased on graph theory was also proposed to study the con-nectivity of amorphous materials.19–21 The topological featuresa Research Center for Materials Nanoarchitectonics, National Institute for MaterialsScience, Tsukuba 305-8568, Japan. E-mail: LU.Augustin@nims.go.jp,augustinlu@gmail.com, TAMURA.Ryo@nims.go.jp,MIYAZAKI.Tsuyoshi@nims.go.jpb Mathematics for Advances Materials Open Innovation Laboratory, NationalInstitute of Advanced Industrial Science and Technology, Sendai 980–8577, Japanc Center for Basic Research on Materials, National Institute for Materials Science,Tsukuba 305-0047, Japand Department of Computer Science, University of Tsukuba, Tsukuba 305-8573, Japane Center for Artificial Intelligence, University of Tsukuba, Tsukuba 305-8573, Japanf Master’s/Doctoral Program in Life Science Innovation, University of Tsukuba,Tsukuba 305-8577, Japang Graduate School of Frontier Sciences, The University of Tokyo, Chiba 277-8568,Japanh Graduate School of Engineering, Nagoya university, Nagoya 464-8603, Japan† Electronic supplementary information (ESI) available. See DOI: https://doi.org/10.1039/d3cp06298hReceived 27th December 2023,Accepted 21st March 2024DOI: 10.1039/d3cp06298hrsc.li/pccpPCCPPAPEROpen Access Article. Published on 28 March 2024. Downloaded on 8/9/2024 8:17:13 AM.  This article is licensed under a Creative Commons Attribution 3.0 Unported Licence.View Article OnlineView Journal  | View Issuehttps://orcid.org/0000-0003-4702-0933https://orcid.org/0000-0003-0769-9857https://orcid.org/0000-0002-0349-358Xhttps://orcid.org/0000-0003-3534-4404http://crossmark.crossref.org/dialog/?doi=10.1039/d3cp06298h&domain=pdf&date_stamp=2024-04-01https://doi.org/10.1039/d3cp06298hhttps://doi.org/10.1039/d3cp06298hhttps://rsc.li/pccphttp://creativecommons.org/licenses/by/3.0/http://creativecommons.org/licenses/by/3.0/https://doi.org/10.1039/d3cp06298hhttps://pubs.rsc.org/en/journals/journal/CPhttps://pubs.rsc.org/en/journals/journal/CP?issueid=CP02601511658 |  Phys. Chem. Chem. Phys., 2024, 26, 11657–11666 This journal is © the Owner Societies 2024and patterns in materials were also studied by persistenthomology.22,23 However, a method that can properly capturethe structural differences between phases of silica is stilllacking. Most methods rely on averaged quantities and statis-tics, such as the structure factor or radial distributionfunctions.24–26 Extracting local information therefore remainsa challenging task, exacerbated by the fact that subtle changesmay occur locally in a given region. Other methods based onlocal structures, such as common-neighbor analysis (CNA)27,28and various functions29,30 have been developed. However, onlyrelatively simple phases such as face-centered cubic (FCC),body-centered cubic (BCC) or hexagonal closed-packed (HCP)could be recognized. For silica and its numerous complexphases, these methods would be unable to distinguish relevantcomplex phases.As an alternative method of structural analysis focusing onlocal structures, an unsupervised learning method combiningthe two-step locality preserving projections (TS-LPP) methodand locally-averaged atomic fingerprints (LAAF) has beenrecently developed.31 Using this technique, it has been shownthat liquid, crystalline, and amorphous structures could beclearly distinguished in Si and SiGe systems. To obtain theseresults, higher dimensional descriptors focusing on local struc-tures were projected to a two-dimensional space via dimension-ality reduction using TS-LPP to facilitate understanding of thelocal structures. In particular, it is generally difficult to distin-guish liquid and amorphous structures only from their localstructures, and conventional dimensionality reduction techni-ques are inadequate for addressing this problem.31 However,when conducting structural analysis of the numerous complexphases of silica, it would be impossible to find a two-dimensional space in which all phases can be distinguishedeven with TS-LPP.In this study, we perform a structural analysis for eightstructures in silica using the potential of the unsupervisedlearning method with TS-LPP and LAAF descriptor based onatom-centered symmetry functions (ACSF). The targeted struc-tures are a-quartz, b-quartz, coesite, b-tridymite, b-cristobalite,stishovite, liquid, and glass. It was found that finding a modelthat distinguishes all phases in a two-dimensional space wasunattainable. On the other hand, a four-dimensional space inwhich all of these phases are separated could be created. Fromthe locations of each phase in the low-dimensional space, thesimilarity or differences between different phases can be under-stood. Using this low-dimensional space, the structural evolu-tion from b-quartz to a-quartz at low temperature was followed,and the melt-quench process was analyzed, showing that localchanges can be detected by our methodology.The methodology developed for this study is has thepotential to unlock new prospects for elucidating the natureof amorphous materials in future investigations of disorderedmaterials where impurities and interfaces are present, such asin Si/SiO2 interfaces. For such systems, our methodologyshould be able to detect the subtle variations in local structuresthat may lead to previously unknown types of structures.2 Methods2.1 Structural analysis based on unsupervised learningAn overview of the methodology used for this study to performstructural analysis is depicted in Fig. 1. Atomic configurationsof the different phases were generated by molecular dynamics(MD) simulations. From these models, locally averaged atomicfingerprints (LAAF) based on the G2 atom-centered symmetryfunctions (ACSF) were calculated.32 As discussed by Tamura et al.,31LAAF are expected to express two different types of locality in thecoordination environment around each atom by two cutoffs,Rd and Ra. The first cutoff is used for the definition of ACSFshowing the chemical or coordination environments, while thesecond one is related to the similarity of the chemical environ-ment with the surrounding atoms. The definition of the LAAFFig. 1 Overview of the methodology used for the analysis of silica phases.Paper PCCPOpen Access Article. Published on 28 March 2024. Downloaded on 8/9/2024 8:17:13 AM.  This article is licensed under a Creative Commons Attribution 3.0 Unported Licence.View Article Onlinehttp://creativecommons.org/licenses/by/3.0/http://creativecommons.org/licenses/by/3.0/https://doi.org/10.1039/d3cp06298hThis journal is © the Owner Societies 2024 Phys. Chem. Chem. Phys., 2024, 26, 11657–11666 |  11659descriptor can be found in Appendix A. These descriptors, afterfeature selection by variance threshold and standardization,were transformed onto a low-dimensional space using the two-step locality preserving projections method (TS-LPP).31 TheTS-LPP method consists in using LPP twice, with a reductionfrom the LAAF dimensionality M (after feature selection) to theintermediate dimension dm and from dm to the target low-dimension dt. The details of LPP method are explained inAppendix B. In TS-LPP, two hyperparameters (dm and s) needto be defined. To determine appropriate values for these hyper-parameters, the Calinsky–Harabasz score (pseudo-F33) was usedfor the k-means clustering results in the found low-dimensionalspace by TS-LPP. A grid exploration was performed to determinethe most appropriate values for dm and s. The same value fors was used for both LPP steps in the present implementationof the TS-LPP method. Finally, clustering methods (such ask-means clustering or DBSCAN) were used to identify differentgroups of local structures in the reduced space.2.2 First-principles molecular dynamicsTo generate the target atomic configurations, first-principlemolecular dynamics (FPMD) simulations were performedusing the CONQUEST code.34–36 The Perdew–Burke–Ernzerhof(PBE)37 exchange–correlation functional was used with pseudo-potentials.38,39 G -only calculations were performed with thedouble-zeta polarized (DZP) basis sets.40 The energy cutoff forthe charge density was set to 200 Ha. The temperature wascontrolled using Nosé–Hoover chains41 and the pressure wascontrolled by a Parrinello–Rahman barostat.42 The atomicconfigurations typically contain around 200 to 400 atoms. Thetime step is 1 fs and simulations of 5000 time steps (5 ps) wereperformed for each configuration. For the crystalline phases,it was observed that the average positions during constant-volume (NVT) simulations were very close to their initial posi-tions (within 0.2 Å). This suggests that the original crystallinephases are preserved during NVT simulations even at 300 K and600 K, despite the fact that some of them are stable only athigher temperatures in experiments. Note that we observedsome phase transitions in constant-pressure (NPT) simulations,for example the transition from b-quartz to a-quartz structures.While this is of physical relevance, such changes are undesiredin this work where the aim is to generate trajectories represen-tative of each phase. Therefore, only NVT trajectories for crystalphases are included in the training set, as reported in Table 1.The change of local structures in the b-to-a quartz transition isanalyzed in Subsection 3.4. On the other hand, the trajectoriesfor the liquid and glass phases were generated using the NPTensemble.2.3 Classical molecular dynamicsTo generate the initial atomic configurations of liquid and glassphases for FPMD, classical molecular dynamics simulationswere performed using the Large-scale Atomic MolecularMassively Parallel Simulator (LAMMPS) software package withGPU acceleration43,44 to generate liquid and glassy structures ofSiO2. Munetoh potentials (Tersoff-type) were used to describethe inter-atomic interactions.45 The temperature and pressurewere controlled using Nosé–Hoover style non-Hamiltonianequations of motions.46,47 These structures were used as initialconfigurations for FPMD simulations. The time step was set to1 fs. From an initial a-quartz crystal, the system was first meltedat 5000 K, then cooled down to 300 K with a cooling rateranging from 1010 K s�1 to 1013 K s�1, resulting in simulationtimes ranging from to 0.5 ns to 370 ns. Liquid structures wererelaxed for 50 ps. These structures were used as initial states forFPMD simulations.3 Structural analysis of silica systems3.1 Target structures in silicaThe detailed composition of the training set is provided inTable 1. The target data set for the structural analysis for silicasystems includes eight different phases: a-quartz (stable atroom temperature) and b-quartz (high temperature), b-cristo-balite (high temperature), b-tridymite (high temperature), coe-site (high pressure), stishovite (high pressure), liquid and glass.A phase diagram of silica based on ref. 10 is shown in Fig. 2(left). For each phase, at least two independent trajectories wereincluded in the training set (see Table 1).The two-body radial distribution functions, g(r), calculatedfrom the MD simulations for the eight phases are shown inFig. 2 (right). Here the first and second peak correspond to thenearest Si–O and O–O distances, respectively. The position ofthe first and second peaks in g(r) is almost the same in thesephases, except the one for stishovite. This is reasonable becausemost Si atoms in the MD simulations of these phases aretetrahedrally coordinated with O atoms and the local environ-ments within this range should therefore be similar. The varietyTable 1 Details on the training data setsPhaseSiatoms/cell ConditionsNumber ofdata pointsa-quartz 108 300 K (NVT) and 600 K (NVT) 1080b-quartz 108 300 K (NVT) and 600 K (NVT) 1080b-cristobalite 64 300 K (NVT) and 600 K (NVT) 640b-tridymite 128 300 K (NVT) and 600 K (NVT) 1280Coesite 64 300 K (NVT) and 600 K (NVT) 640Stishovite 90 300 K (NVT) and 600 K (NVT) 900Liquid 108 3000 K (NPT) and 4000 K (NPT) 1080Glass 108 300 K (NPT) (cooling rateof 1010–13 K s�1)2160Fig. 2 (Left) Schematic phase diagram of SiO2. (Right) Radial distributionfunctions at 3000 K for liquid silica and 300 K for the seven other phases.PCCP PaperOpen Access Article. Published on 28 March 2024. Downloaded on 8/9/2024 8:17:13 AM.  This article is licensed under a Creative Commons Attribution 3.0 Unported Licence.View Article Onlinehttp://creativecommons.org/licenses/by/3.0/http://creativecommons.org/licenses/by/3.0/https://doi.org/10.1039/d3cp06298h11660 |  Phys. Chem. Chem. Phys., 2024, 26, 11657–11666 This journal is © the Owner Societies 2024of the silica structures comes from the different network(arrangement) of these SiO4 tetrahedra, resulting in differencesin g(r) for r beyond 2.8 Å. However, it is still difficult to find thedifferences among some different phases, for example betweenb-cristobalite and b-tridymite structures. In addition, it shouldbe noted that the analysis from g(r) is based on the statistics ofthe all atoms in the simulation cell. Even though g(r) is verydifferent between the glass and liquid phases, it is not obvious thatthey can be distinguished according to the local environmentaround atoms selected arbitrarily from the snapshot structures inthe MD simulations of the two phases. Therefore, it is unclearwhether it is possible or not to differentiate the atoms in thesnapshot structures of the MD simulations on various phases,only from their local information such as atomic fingerprints.Nevertheless, in this work we will show that this is possible, atleast for the eight phases of silica under consideration.In this work, the LAAF descriptors around Si atoms werecalculated for each configuration and included in the trainingset. Here, Rd = Ra = 6 Å unless stated otherwise. The resultsobtained by using other sets of parameters are discussed in theESI,† Section II and Section IIIA.In addition, feature selection based on variance threshold(set to 10�4 in this work) was performed on the resulting LAAFvector, resulting in a reduction of the number of features from100 to 63. Standardization was then performed to obtain unitvariance in all features. While variance threshold was found toimprove the stability of machine-learning based moleculardynamics simulations,48 we found that it is also effective toproperly differentiate all phases (see ESI†, Section IIIB).3.2 Low dimensional space obtained by TS-LPPAs mentioned in Subsection 3.1, it may be very difficult, if notimpossible, to differentiate the atoms in the different phasessolely from their local information. But, as shown below, wehave found that the all of the atoms in the MD simulations ofthe eight phases are perfectly distinguished in the seven-dimensional space made by the TS-LPP method using the LAAFdescriptors. In forming the low dimensional space using theTS-LPP method, k-means clustering for k = 8 has been per-formed for each target dimension dt.Fig. 3 shows the distribution of data points in the two-dimensional subspace made from the first seven components,which are obtained by the TS-LPP method using the targetdimension of seven (dt = 7). In the one-dimensional space madefrom the first component, the data points from the stishovitephase are clearly isolated from the other phases, being locatedat the bottom-left (negative values). This corroborates with thefact that stishovite is the only phase where Si atoms aresurrounded by six O atoms, as mentioned earlier. When thesecond component is considered, three groups are additionallydistinguished; the disordered phases (liquid and glass) as thefirst group, high-temperature b-cristobalite and b-tridymite asthe second group and then the final group (quartz, coesite)corresponds to the phases stable at low temperatures. Stisho-vite, which is also stable at low temperatures under pressure,also belongs to the third group if one considers solely thesecond component. Component 3 differentiates a-quartz fromb-quartz, as well as coesite. Components 4 and 5 apparentlybring no relevant contributions. On the other hand, component6 differentiates liquid from glass. Interestingly, despite beingseen as being of lower importance, component 7 distinguishesthe data points from b-cristobalite completely from those fromb-tridymite. It can be seen that in order to distinguish suchmany phases, it is necessary to increase the dimensionality ofthe embedding space. In particular, the structures of b-cristobalite and b-tridymite can hardly be differentiated bystudying the radial distribution function (RDF), although par-tial RDFs show differences at distances beyond 5 Å. It should beemphasized that the present method is not supervised anddoes not rely on any information how the data points arecreated. The present analysis clearly shows that local environ-ment of the atoms, expressed by LAAF, in the MD simulationsof the two similar phases have some clear differences.Fig. 4 shows the cluster indices obtained by the clusteringmethod for each target dimension dt. The present method isunsupervised and the cluster index here does not represent a givenphase, while the data points in Fig. 3 are ordered depending on theoriginal phase. It should be noted that the space of the reduceddimensions may be different for different dt even when the numberof dimensions is same, because the optimization of hyperpara-meters is performed independently for each value of dt. Theoptimized hyperparameters, dm and s, for different target dimen-sion dt are listed in Table S2 in the ESI.† Even though there aresome differences in the optimized hyperparameters, the results ofthe differentiation of atoms in the reduced dimensions are con-sistent with those in Fig. 3 when the dimension is higher than 2.In the space whose dimension is lower than 7D, it is difficult todistinguish the atoms in the liquid and amorphous phases, or theatoms in the b-tridymite and b-cristobalite structures. On the otherhand, all data points in the all eight phases are assigned to thecorrect phases in the 7D space made with dt = 7. It was alsoconfirmed that the same differentiation can be obtained by otherclustering methods such as the Density-Based Spatial Clusteringof Applications with Noise (DBSCAN49), when a proper low-dimensional space is generated. The results obtained by theDBSCAN method can be found in the ESI† (Fig. S12).It is noteworthy that the component 6 and component 7 areessential for achieving the differentiation of the eight phases,while the component 4 and component 5 make less importantcontributions, if any. This suggests that the seven-dimensionalspace obtained by TS-LPP could be further reduced while keepingthe needed information for differentiation. While such a procedureis system-dependent, it simplifies the visualization in the reducedspace. In this study, the four-dimensional space obtained bykeeping components 2, 3, 6, and 7 is enough to have a properdifferentiation of the eight phases, and these low-dimensionalspaces are indicated by the red dotted lines in Fig. 3.3.3 Analysis of the transformation (mapping) matricesfor TS-LPPAs discussed in the last section, the components 6 and7 are important for the differentiation of the eightPaper PCCPOpen Access Article. Published on 28 March 2024. Downloaded on 8/9/2024 8:17:13 AM.  This article is licensed under a Creative Commons Attribution 3.0 Unported Licence.View Article Onlinehttp://creativecommons.org/licenses/by/3.0/http://creativecommons.org/licenses/by/3.0/https://doi.org/10.1039/d3cp06298hThis journal is © the Owner Societies 2024 Phys. Chem. Chem. Phys., 2024, 26, 11657–11666 |  11661phases. The relationship between these components andthe original LAAF dimension can be expressed by themapping or transformation matrix explained in the ESI,†Section IIB.Fig. 4 Cluster indices by k-means clustering on the projected data in low-dimensional spaces with different numbers of dimensions. The color reflectsthe phase of the data point.Fig. 3 Distributions in the 2D subspaces of the 7D space generated by TS-LPP. Here, dm = 20 and s = 1. The two distributions highlighted by dotted linesare the four-dimensional space to have a proper differentiation of the eight phases.PCCP PaperOpen Access Article. Published on 28 March 2024. Downloaded on 8/9/2024 8:17:13 AM.  This article is licensed under a Creative Commons Attribution 3.0 Unported Licence.View Article Onlinehttp://creativecommons.org/licenses/by/3.0/http://creativecommons.org/licenses/by/3.0/https://doi.org/10.1039/d3cp06298h11662 |  Phys. Chem. Chem. Phys., 2024, 26, 11657–11666 This journal is © the Owner Societies 2024As the TS-LPP transformation consists of two linear trans-formations, the transformation matrices corresponding to eachLPP step can be analyzed. Fig. 5 shows the mapping matricesfor the first, second and finally the whole transformation. Thecomponents of the first LPP show that there is some degree ofredundancy, as can be seen from the occurrence of several nearidentical columns. The second LPP mixes the components fromthe first LPP. Except for the final 7th component, contributionsof the lower dimensional components (1st – 6th) in the inter-mediate dimension are large. Thus, the two-step strategy is notso relevant up to the 6th dimension. On the other hand, forfinal 7th component, 12th and 13th components in the inter-mediate dimension are important, and TS-LPP was essential forthe differentiation between b-cristobalite and b-tridymite. Infact, this differentiation could not be achieved with othermethods such as PCA or LPP.It should be noted that performing LPP to a much largenumber of dimension works as the relevant functions neededto capture the structural differences between b-cristobalitefrom b-tridymite are taken into account (ESI,† Fig. S10).However, the interpretation and visualization of such modelis difficult and makes this approach less promising for futureanalysis of systems where the actual phase is unknown.3.4 b-quartz to a-quartz transitionOnce the linear transformation matrix is obtained by TS-LPP,new local structures which are not included in the training setcan be projected onto the low-dimensional space. The modeldeveloped for differentiating all eight phases was subsequentlyused to analyze a trajectory starting from b-quartz and evolvedat 600 K under NPT conditions. It is well known that a-quartz isthe most stable phase at this temperature under atmosphericpressure (see Fig. 2). During the 5 ps run, a transition fromb-quartz to a-quartz was observed. The progressive phasetransition could be tracked, as shown in Fig. 6, where onlycomponents 2 and 3 of the seven-dimensional space are shown.In the initial stage (first 100 fs), the projections of the test dataappear in the same region of the subspace as b-quartz (blueregion in Fig. 3). In the final stage (4.9 ps to 5 ps), these pointslie in the same region as a-quartz (black region in Fig. 3).Considering the whole trajectory for a selection of five atoms,a smooth transition from the b-quartz region to the a-quartzone is observed, which is captured by component 3. FromFig. 5(right), it can be seen that component 3 (third column)is mainly made of contributions from features related to Si–Odistances (upper rows).So far, we have analyzed an MD simulation where the b-to-aquartz transition occurs using the low dimensional spacepresented in Subsection 3.2, which is based on the trainingdata of eight phases. However, it is also possible to performanother analysis using a different low dimensional space,obtained solely from the data of a single trajectory for theb-to-a quartz transition. Here, a two-dimensional space isobtained, and the results are shown in Fig. 7. In the early stageof the simulation, the structure evolves away from its initialconfiguration. If the time range is large enough, the wholetransition is captured as shown in Fig. 7c and d. More detailscan be found in the ESI.†This demonstrates that the change of local structures can becorrectly extracted in the low-dimensional space from onesingle MD trajectory, where the structures evolve over time.3.5 Melt-quench processIn this case, a trajectory of a classical molecular dynamicssimulation of melt and quench of silica (2592 atoms per cell)with a cooling rate of 109 K s�1 is analyzed. First, focusing onthe melting part of the simulation, the structure evolves follow-ing from the a-quartz region to b-quartz, and finally liquid, asshown in Fig. 8a. By visualizing components 2, 3, 6 and 7, theevolution from one region to another can be visualized, from theblue points (initial configuration) to the red ones (final configu-ration). It should be noted that the transitions are quite abrupt andthe melting process results in several distinct regions corres-ponding to different phases, quickly changing from a-quartz tob-quartz, then transitioning towards the region corresponding tothe liquid phase. For the quenching process (Fig. 8b), it can be seenFig. 5 Transformation matrices for (left) the first LPP, (center) the secondLPP, (right) the whole transformation. Each column is normalized.Fig. 6 Transition from b-quartz to a-quartz at 600 K followed by com-ponents 2 and 3 of the TS-LPP space. All atoms at (a) first 100 fs and (b) last100 fs are plotted. For (c), the whole process is shown for five selectedatoms. Data from the training set appear in light gray. Colors reflect thetime range shown, from blue for early snapshots to red for late snapshots.Paper PCCPOpen Access Article. Published on 28 March 2024. Downloaded on 8/9/2024 8:17:13 AM.  This article is licensed under a Creative Commons Attribution 3.0 Unported Licence.View Article Onlinehttp://creativecommons.org/licenses/by/3.0/http://creativecommons.org/licenses/by/3.0/https://doi.org/10.1039/d3cp06298hThis journal is © the Owner Societies 2024 Phys. Chem. Chem. Phys., 2024, 26, 11657–11666 |  11663that the system evolves from the liquid-like region to the glass-likeone in a continuous transition. No crystalline region was detectedduring the quench process, suggesting that the atomic structureremains disordered.4 ConclusionsWe have proposed a new methodology to extract the differ-ences in the local structures of silica system using a recentlydeveloped dimensionality reduction technique. Our approachconsists of locally averaged atomic fingerprints (LAAF) based onatom-centered symmetry functions (ACSF) and using two-steplocality preserving projections (TS-LPP) to generate a relativelylow-dimensional space that properly catches relevant structuralproperties of the different phases. The targeted data set forsilica system was constructed by the data points obtained fromeight phases. Importantly, unlike previous works,29,31 featureselection and at least seven dimensions were needed in orderto capture lower importance features that enables the properdifferentiation of all phases.Our results show that proper differentiation of many phasesof a given compound is possible via the combination of LAAFand an adequate dimensionality reduction. This method canbe applied to any kind of material or alloy and should beindependent of the simulation method (classical force fields,first-principles molecular dynamics, etc.). In the future, thisknowledge should be useful for analyzing the structure of silicaglass and its evolution with respect to external conditions,such as cooling rate or pressure during a quenching process.In addition, for systems such as metallic alloys, this approachcould provide useful indications on the transitions that occur forinstance during the glass transition, and how different factors(cooling rate, pressure, deformations, shear, etc.) can change thestructural properties. In general, the present methodology shouldextract valuable information from inhomogeneous systems, forinstance to detect the onset of nucleation in a supercooled liquid orthe coexistence of several phases in different regions. By thedetection of local changes, the relation between the structureand properties of atomic configurations that are typically difficultto analyze (ex: in the presence of impurities, vacancies or inter-faces) could be better understood. Our methodology should there-fore be applicable to inhomogeneous systems where phases maycoexist or during nucleation in a supercooled liquid.Our method is unsupervised, thus does not rely on anybiased information, and is promising for detecting particularphases or local structures, which may appear in any kind ofmaterial, as well as detecting new unknown phases dynamicallyor density fluctuations.4 Used as a supervised learning method,for classification, it may be a stepping stone towards bettermachine learning potentials or force fields50,51 by allowingclassification of the atoms beforehand and choosing an appro-priate force field for each atom over the course of a simulation.Author contributionsA. Lu: conceptualization, data curation, methodology, formal ana-lysis, writing – original draft. J. Lin: formal analysis. Y. Futamura:formal analysis. T. Sakurai; formal analysis, writing – review andediting. R. Tamura: methodology, formal analysis, writing – reviewand editing. T. Miyazaki: conceptualization, funding acquisition,supervision, formal analysis, writing – review and editing.Data availabilityThe data of this study are available from the correspondingauthors upon reasonable request.Fig. 7 Transition from b-quartz to a-quartz at 600 K followed by a TS-LPP space trained on a single trajectory, using the first (a) 100 fs, (b)2000 fs, (c) 4000 fs and (d) 5000 fs.Fig. 8 Structural evolution during a melt-quench process, followed bycomponents 2, 3, 6 and 7 of the TS-LPP space for (a) the melt part (b) thequench part. Data from the training set appear in light gray. Colors reflectthe time range shown, from blue for early snapshots to red for latesnapshots.PCCP PaperOpen Access Article. Published on 28 March 2024. Downloaded on 8/9/2024 8:17:13 AM.  This article is licensed under a Creative Commons Attribution 3.0 Unported Licence.View Article Onlinehttp://creativecommons.org/licenses/by/3.0/http://creativecommons.org/licenses/by/3.0/https://doi.org/10.1039/d3cp06298h11664 |  Phys. Chem. Chem. Phys., 2024, 26, 11657–11666 This journal is © the Owner Societies 2024Conflicts of interestThere are no conflicts to declare.AppendicesA DescriptorsIn order to capture the structural differences in silica, descrip-tors based on atom-centered symmetry functions were defined.A.1 Atom-centered symmetry functions (ACSF). To charac-terize the local environment of a given atom, G2 atom-centeredsymmetry functions (ACSF32,52) were calculated. For a single-component system, the atomic fingerprint vector Vi of atom i isdefined as:ViðZm; rdÞ ¼ Gi2ðZm; rdÞ ¼Xje�rijZm� �2fcðrij ; rdÞ; (1)where Zm(m = 1,. . .,M) is the decay rate with distance and fc(rij;rd) is a cutoff function defined as:fcðrij ; rdÞ ¼0:5 cosprijrd� �2þ1" #ðrij � rdÞ0 ðrij 4 rdÞ8>><>>: : (2)One can write the M-dimensional atomic fingerprint vector forthe ith atom as:Vi(rd) = (Vi(Z1;rd),Vi(Z2;rd),. . .,Vi(ZM;rd)). (3)For binary systems, the atomic fingerprints are generalized asfollows:Vai2aðZm; rdÞ ¼Xj2a; jaie�rijZm� �2fcðrij ; rdÞ; (4)Vbi2aðZm; rdÞ ¼Xj2be�rijZm� �2fcðrij ; rdÞ: (5)And the fingerprint vector becomes 2M-dimensional:ViAa(rd) = (VaiAa(Z1;rd), . . ., VaiAa(ZM;rd), VbiAa(Z1;rd), . . ., VbiAa(ZM;rd)).(6)In the present work, the distance cutoff rd is set to 6 Å, andthere are 50 Zm values arranged on an exponential grid between0.45 Å and rd = 6 Å, thus each atomic fingerprint vector has100 components.A.2 Locally averaged atomic fingerprints (LAAF). Whileproviding valuable information on the atomic structures,simple atomic fingerprints proved insufficient for properlydifferentiating all the phases considered in this study. This iswhy local averaging was performed.31 For the ith atom,the locally averaged atomic fingerprint (LAAF) vector can bewritten as:Vavi ðrd ; raÞ ¼1N2raXj2raVjðrdÞ; (7)where NArais the number of atoms within the average radius rafrom the ith atom. For two-element systems, the LAAF descrip-tor is expressed as:Vavi2aðrd ; raÞ ¼1Na;2raXj2raVj2aðrdÞ: (8)There are two important distance cutoff values:1. rd: the distance cutoff for the descriptor.2. ra: the distance cutoff for locally averaging the descriptor.These values were carefully chosen so that the characteristics ofeach phase are properly captured. In this work rd = ra = 6 Å.Finally, feature selection based on variance threshold (10�4)and standardization were performed on the resulting LAAFvector to obtain unit variance. This lead to a reduction of thenumber of features from 100 to 63.B Dimensionality reductionB.1 Locality preserving projections (LPP). In recent years,machine learning methods have become popular in materialsscience, and several methods for dimensionality reduction havebeen developed. For instance, principal component analysis (PCA)has become a popular unsupervised learning method, whichconsists in computing the principal components and using themto perform a change of basis on the data.53,54 By this process, thefirst principal components should capture the most importantinformation from a given data set. Most of the time, only a fewfirst are kept and analyzed. However, in several cases, this methodfailed to properly capture the difference between groups of datapoints. In order to address this shortcoming, new methods havebeen developed, such as the locality preserving projections (LPP)method.55 By keeping local information, it is less sensitive tooutliers in data than PCA. In this method, an input data matrix Xis built from M-dimensional feature vectors {xi}i = 1,. . .,N, where Nis the number of data points. Therefore, the dimension of X is(M � N). In this work, the feature vectors xi are the LAAF vectors.First, a weighted adjacency matrix W is defined as:Wij ¼expð�jjxi � xj jjÞ2s2ðiajÞ0 ði ¼ jÞ8><>: ; (9)where s is a hyperparameter. This square matrix is of size (N � N).Next, the knn-nearest neighbor graph is created. A typicalvalue is knn = 7. This similarity graph is based on W, where theoff-diagonal elements are forced to zero, except for thoserelated to the knn nearest neighbor data points. The weightedadjacency matrix is made symmetric by the operation56Wij = max(Wij,Wji). (10)The square diagonal matrix D, called the degree matrix, isdefined as:D ¼d1 . . . 0... . .. ...0 . . . dN0BBBB@1CCCCA; (11)Paper PCCPOpen Access Article. Published on 28 March 2024. Downloaded on 8/9/2024 8:17:13 AM.  This article is licensed under a Creative Commons Attribution 3.0 Unported Licence.View Article Onlinehttp://creativecommons.org/licenses/by/3.0/http://creativecommons.org/licenses/by/3.0/https://doi.org/10.1039/d3cp06298hThis journal is © the Owner Societies 2024 Phys. Chem. Chem. Phys., 2024, 26, 11657–11666 |  11665where the diagonal components are defined as:di ¼XNj¼1Wij : (12)The graph Laplacian is defined as:L = D�W. (13)In LPP, the following generalized eigenvalue problem needs tobe solved:XTLXy = lXTDXy, (14)where li and yi are the ith eigenvalue and eigenvector of thisproblem, respectively. They are arranged in ascending order.For a target dimension dr, the mapping matrix Y by LPP fromM to dr dimensions is obtained byY = (y1, y2,. . ., ydr). (15)Using the mapping matrix Y, the feature vector x is mapped tothe low-dimensional vector x0 asx0 = xY (16)B.2 Two-step locality preserving projections (TS-LPP). Inthe two-step locality-preserving projections (TS-LPP) method,31the LPP transformation is applied first to obtain an intermedi-ate space of dimension dm, then applied again on the resultingspace to obtain the final low-dimensional space, using thesame s value for both transformations. This method was shownto improve the results of single LPP and PCA.B.3 Optimization of the hyperparameters. The criterionused to determine appropriate values for s and dm is theCalinsky–Harabasz score (pseudo-F33), which characterizesthe ratio of the between-cluster variation to the within-clustervariation after performing clustering with an unsupervisedlearning method. The k-means clustering method was used topartition the projected data points using the implementation ofscikit-learn.57 For this method, a target number of clusters k isprovided. Here, we use k = 8. In TS-LPP. the number of clusterswas shown to have little effect on the low-dimensional space.31A grid exploration was performed to determine the mostappropriate values for dm and s. In the present implementationof the TS-LPP method, the same value for s was used for bothLPP steps.AcknowledgementsThe computations in the present work were performed onthe Numerical Materials Simulator at the National Institutefor Materials Science. This study was supported by the JSPSGrant-in-Aid for Transformative Research Areas (A) ‘Hyper-Ordered Structures Science’ (Grant No. 20H05883) and the JSPSGrant-in-Aid for Scientific Research (Grant No. 18H01143,21H01008).References1 M. Mazloom, A. Ramezanianpour and J. Brooks, Cem. Concr.Compos., 2004, 26, 347–357.2 A. J. Ikushima, T. Fujiwara and K. Saito, J. Appl. Phys., 2000,88, 1201–1213.3 J. E. Lilienfeld, Method and apparatus for controlling elec-tric currents, 1930, https://patents.google.com/patent/US1745175A/en.4 G. N. Greaves and S. Sen, Adv. Phys., 2007, 56, 1–166.5 G. Hart, J. Mineral. Soc. Am., 1927, 12, 383–395.6 G. A. Lager, J. D. Jorgensen and F. J. Rotella, J. Appl. Phys.,1982, 53, 6751–6756.7 A. Wright and M. Lehmann, J. Solid State Chem., 1981, 36,371–380.8 L. Levien and C. T. Prewitt, Am. Mineral., 1981, 66, 324–333.9 E. C. T. Chao, J. J. Fahey, J. Littler and D. J. Milton,J. Geophys. Res., 1962, 67, 419–421.10 E. Ringdalen, JOM, 2014, 67, 484–492.11 M. Kayama, H. Nagaoka and T. Niihara, Minerals, 2018, 8, 267.12 R. Car and M. Parrinello, Phys. Rev. Lett., 1985, 55, 2471–2474.13 R. Car and M. Parrinello, Phys. Rev. Lett., 1988, 60, 204–207.14 F. H. Stillinger and T. A. Weber, Phys. Rev. A: At., Mol., Opt.Phys., 1982, 25, 978–989.15 S.-H. Lee, J. Kim, S.-J. Kim, S. Kim and G.-S. Park, Phys. Rev.Lett., 2013, 110, 235502.16 K. Nishio, T. Miyazaki and H. Nakamura, Phys. Rev. Lett.,2013, 111, 155502.17 H. Tong and H. Tanaka, Phys. Rev. X, 2018, 8, 011041.18 A. P. Bartók, R. Kondor and G. Csányi, Phys. Rev. B: Condens.Matter Mater. Phys., 2013, 87, 184115.19 S. V. King, Nature, 1967, 213, 1112–1113.20 S. Le Roux and P. Jund, Comput. Mater. Sci., 2010, 49, 70–83.21 S. L. Roux and P. Jund, Comput. Mater. Sci., 2011, 50, 1217.22 Y. Hiraoka, T. Nakamura, A. Hirata, E. G. Escolar, K. Matsueand Y. Nishiura, Proc. Natl. Acad. Sci. U. S. A., 2016, 113,7035–7040.23 Y. Onodera, S. Kohara, P. S. Salmon, A. Hirata,N. Nishiyama, S. Kitani, A. Zeidler, M. Shiga, A. Masuno,H. Inoue, S. Tahara, A. Polidori, H. E. Fischer, T. Mori,S. Kojima, H. Kawaji, A. I. Kolesnikov, M. B. Stone,M. G. Tucker, M. T. McDonnell, A. C. Hannon, Y. Hiraoka,I. Obayashi, T. Nakamura, J. Akola, Y. Fujii, K. Ohara,T. Taniguchi and O. Sakata, NPG Asia Mater., 2020, 12, 1–16.24 J. Konnert, P. D’Antonio and J. Karle, J. Non-Cryst. Solids,1982, 53, 135–141.25 F. Li and J. S. Lannin, Phys. Rev. Lett., 1990, 65, 1905–1908.26 N. Lačević, F. W. Starr, T. B. Schrøder, V. N. Novikov andS. C. Glotzer, Phys. Rev. E: Stat., Nonlinear, Soft Matter Phys.,2002, 66, 030101.27 J. D. Honeycutt and H. C. Andersen, J. Phys. Chem., 1987, 91,4950–4963.28 A. K. A. Lu and D. V. Louzguine-Luzgin, J. Chem. Phys., 2022,157, 014506.29 W. Lechner and C. Dellago, J. Chem. Phys., 2008, 129,114707.PCCP PaperOpen Access Article. Published on 28 March 2024. Downloaded on 8/9/2024 8:17:13 AM.  This article is licensed under a Creative Commons Attribution 3.0 Unported Licence.View Article Onlinehttps://patents.google.com/patent/US1745175A/enhttps://patents.google.com/patent/US1745175A/enhttp://creativecommons.org/licenses/by/3.0/http://creativecommons.org/licenses/by/3.0/https://doi.org/10.1039/d3cp06298h11666 |  Phys. Chem. Chem. Phys., 2024, 26, 11657–11666 This journal is © the Owner Societies 202430 P. M. Piaggi and M. Parrinello, J. Chem. Phys., 2017, 147,114112.31 R. Tamura, M. Matsuda, J. Lin, Y. Futamura, T. Sakurai andT. Miyazaki, Phys. Rev. B, 2022, 105, 075107.32 J. Behler, J. Chem. Phys., 2011, 134, 074106.33 T. Caliński and J. Harabasz, Commun. Stn., 1974, 3, 1–27.34 CONQUEST: Linear Scaling DFT, https://ordern.github.io/,(accessed March 2024).35 D. R. Bowler, R. Choudhury, M. J. Gillan and T. Miyazaki,Phys. Status Solidi B, 2006, 243, 989–1000.36 A. Nakata, J. S. Baker, S. Y. Mujahed, J. T. L. Poulton,S. Arapan, J. Lin, Z. Raza, S. Yadav, L. Truflandier, T.Miyazaki and D. R. Bowler, J. Chem. Phys., 2020, 152,164112.37 J. P. Perdew, K. Burke and M. Ernzerhof, Phys. Rev. Lett.,1996, 77, 3865–3868.38 D. R. Hamann, Phys. Rev. B: Condens. Matter Mater. Phys.,2013, 88, 085117.39 M. van Setten, M. Giantomassi, E. Bousquet, M. Verstraete,D. Hamann, X. Gonze and G.-M. Rignanese, Comput. Phys.Commun., 2018, 226, 39–54.40 D. R. Bowler, J. S. Baker, J. T. L. Poulton, S. Y. Mujahed,J. Lin, S. Yadav, Z. Raza and T. Miyazaki, Jpn. J. Appl. Phys.,2019, 58, 100503.41 G. J. Martyna, M. L. Klein and M. Tuckerman, J. Chem. Phys.,1992, 97, 2635–2643.42 M. Parrinello and A. Rahman, J. Appl. Phys., 1981, 52,7182–7190.43 S. Plimpton, J. Comput. Phys., 1995, 117, 1–19.44 W. M. Brown, P. Wang, S. J. Plimpton and A. N. Tharrington,Comput. Phys. Commun., 2011, 182, 898–911.45 S. Munetoh, T. Motooka, K. Moriguchi and A. Shintani,Comput. Mater. Sci., 2007, 39, 334–339.46 G. J. Martyna, D. J. Tobias and M. L. Klein, J. Chem. Phys.,1994, 101, 4177–4189.47 W. Shinoda, M. Shiga and M. Mikami, Phys. Rev. B: Condens.Matter Mater. Phys., 2004, 69, 134103.48 J. Lin, R. Tamura, Y. Futamura, T. Sakurai and T. Miyazaki,Phys. Chem. Chem. Phys., 2023, 25, 17978–17986.49 M. Ester, H.-P. Kriegel, J. Sander and X. Xu, Proceedings ofthe Second International Conference on Knowledge Discov-ery and Data Mining, 1996, p. 226–231.50 J. Behler, J. Chem. Phys., 2016, 145, 170901.51 R. Tamura, J. Lin and T. Miyazaki, J. Phys. Soc. Jpn., 2019,88, 044601.52 J. Behler and M. Parrinello, Phys. Rev. Lett., 2007, 98, 146401.53 K. Pearson, London, Edinburgh Dublin Philos. Mag. J. Sci.,1901, 2, 559–572.54 H. Hotelling, Biometrika, 1936, 28, 321–377.55 X. He and P. Niyogi, Proceedings of the 16th InternationalConference on Neural Information Processing Systems,Cambridge, MA, USA, 2003, p. 153–160.56 U. von Luxburg, Stat. Comput., 2007, 17, 395–416.57 F. Pedregosa, G. Varoquaux, A. Gramfort, V. Michel, B.Thirion, O. Grisel, M. Blondel, P. Prettenhofer, R. Weiss,V. Dubourg, J. Vanderplas, A. Passos, D. Cournapeau,M. Brucher, M. Perrot and E. Duchesnay, J. Mach. Learn.Res., 2011, 12, 2825–2830.Paper PCCPOpen Access Article. Published on 28 March 2024. Downloaded on 8/9/2024 8:17:13 AM.  This article is licensed under a Creative Commons Attribution 3.0 Unported Licence.View Article Onlinehttps://ordern.github.io/http://creativecommons.org/licenses/by/3.0/http://creativecommons.org/licenses/by/3.0/https://doi.org/10.1039/d3cp06298h