# Fileset

[Lambard_2026_J._Phys._Mater._9_021003.pdf](https://mdr.nims.go.jp/filesets/2d567b0f-7136-4cc6-864c-f32aab1444b0/download)

## Creator

[Guillaume Lambard](https://orcid.org/0000-0003-0275-4079)

## Rights

[Creative Commons BY Attribution 4.0 International](https://creativecommons.org/licenses/by/4.0/)

## Other metadata

[Beyond structure: revolutionising materials discovery via AI-driven synthesis protocol-property relationships](https://mdr.nims.go.jp/datasets/db14c5de-27d9-4758-8395-d61a6204fe42)

## Fulltext

Beyond structure: revolutionising materials discovery via AI-driven synthesis protocol-property relationships     PERSPECTIVE • OPEN ACCESSBeyond structure: revolutionising materialsdiscovery via AI-driven synthesis protocol-propertyrelationshipsTo cite this article: Guillaume Lambard 2026 J. Phys. Mater. 9 021003 View the article online for updates and enhancements.You may also likeMachine Learning and ArtificialIntelligence–accelerated ComputationalApproaches in Materials ScienceShafiq Sharhrah, Neetu Singh, GaneshMaurya et al.-Acceleration, agency, and foundations: teninfluential arXiv directions in AI formaterials science (2025)Prashnna Gyawali, Alejandro Bautista-Hernández and Aldo H Romero-Perspective on artificial intelligence foraccelerated materials design (AI4Mat)workshops in 2024Santiago Miret, Marta Skreta, GeemiWellawatte et al.-This content was downloaded from IP address 144.213.253.16 on 17/06/2026 at 02:49https://doi.org/10.1088/2515-7639/ae6e72/article/10.1088/1742-6596/3154/1/012025/article/10.1088/1742-6596/3154/1/012025/article/10.1088/1742-6596/3154/1/012025/article/10.1088/2515-7639/ae512f/article/10.1088/2515-7639/ae512f/article/10.1088/2515-7639/ae512f/article/10.1088/2632-2153/ae0d5d/article/10.1088/2632-2153/ae0d5d/article/10.1088/2632-2153/ae0d5dJ. Phys. Mater. 9 (2026) 021003 https://doi.org/10.1088/2515-7639/ae6e72Journal of Physics: MaterialsOPEN ACCESSRECEIVED16 February 2026REVISED28 April 2026ACCEPTED FOR PUBLICATION15 May 2026PUBLISHED29 May 2026Original content fromthis work may be usedunder the terms of theCreative CommonsAttribution 4.0 licence.Any further distributionof this work mustmaintain attribution tothe author(s) and the titleof the work, journalcitation and DOI.PERSPECTIVEBeyond structure: revolutionising materials discovery via AI-drivensynthesis protocol-property relationshipsGuillaume Lambard∗Data-driven materials design group, Center for Basic Research on Materials (CBRM), National Institute for Materials Science (NIMS),Namiki 1-1, Tsukuba, Ibaraki 305-0044, Japan∗ Author to whom any correspondence should be addressed.E-mail: LAMBARD.Guillaume@nims.go.jpKeywords: synthesis protocols, materials informatics, process–structure–property relationships, inverse design,closed-loop optimisation, self-driving laboratoriesAbstractThe current structure-centric paradigm in artificial intelligence-driven materials discovery, despitedelivering thousands of candidate structures, is stalling at a critical barrier: the synthesisabilitygap. We argue that closing this gap demands a pivot to a synthesis-first paradigm in which exe-cutable synthesis protocols, not just atomic configurations, are treated as primary design variables.We outline a roadmap built on three pillars: (i) representing synthesis procedures as machine-readable protocols, (ii) deploying generative and inverse-design models to propose actionablereaction pathways and recipes, and (iii) integrating closed-loop optimisation to refine protocolsagainst experimental realities and sustainability constraints. Framed in terms of the causal back-bone P→X→y from protocol P to structure X and properties y, this perspective sets out method-ological building blocks, standards needs and self-driving laboratory integration strategies to accel-erate reproducible, data-first materials discovery.1. IntroductionAdvanced materials underpin solutions to pressing global challenges in sustainable energy, healthcare,environmental remediation and information technologies, and progress increasingly depends on dis-covering materials with targeted properties faster than conventional trial-and-error allows. For decades,computational materials science has been dominated by a structure-property paradigm grounded in thepremise that atomic structure dictates observable properties. High-throughput density functional theory(DFT) workflows and large databases such as the Materials Project, AFLOW and the open quantummaterials database (OQMD) have delivered thousands of low-enthalpy candidates and enabled remark-able computational advances, yet many theoretically promising structures remain unrealised in the labo-ratory [1–4].The field now confronts a decisive bottleneck: structure-centric approaches-whether traditional DFTscreening or modern artificial intelligence (AI) generative models often trained on DFT data-routinelyfalter when their predictions meet experimental synthesis, exposing a critical synthesisability gap. Fewerthan 2% of over 50 000 low-enthalpy phases identified in high-throughput surveys have been realisedexperimentally, underscoring the limitations of pipelines that neglect kinetics, precursor availability andpractical constraints [5]. We contend that closing this gap demands a synthesis-first paradigm in whichexecutable synthesis protocols, not just atomic configurations, are treated as the primary design variables.This motivates a shift toward a synthesis protocol-property framework centred on the causal back-bone P→X→y, where a synthesis protocol P maps to structure, phase or morphology X and ultimatelyto properties y. For clarity, we define P as the complete, machine-readable specification of a materialrecipe-precursors, stoichiometries, sequence of operations (ADD, HEAT, COOL, FILTER, …), and quan-titative conditions (temperature, time, atmosphere, pressure, pH). Whether a candidate originates froma deep generative model or an ab initio evolutionary search, thermodynamic stability alone is an insuffi-cient metric of practical viability; hybrid pipelines that couple structure-centric tools with protocol-aware© 2026 The Author(s). Published by IOP Publishing Ltdhttps://doi.org/10.1088/2515-7639/ae6e72https://crossmark.crossref.org/dialog/?doi=10.1088/2515-7639/ae6e72&domain=pdf&date_stamp=2026-5-29https://creativecommons.org/licenses/by/4.0/https://creativecommons.org/licenses/by/4.0/https://orcid.org/0000-0003-0275-4079mailto:LAMBARD.Guillaume@nims.go.jpJ. Phys. Mater. 9 (2026) 021003 G Lambardplanners are needed to bridge from virtual candidates to executable recipes. Related ideas already exist inautonomous experimentation, retrosynthesis planning and closed-loop reaction optimisation [6–9]. Ourclaim is not that these ingredients are themselves new, but that in inorganic materials discovery theyremain insufficiently unified by a protocol-first formalism in which synthesis procedures are treated asprimary design variables and linked explicitly to intermediate structure through P→X→y. While theconcept of protocol-centric design is universal, implementation differs between organic molecular syn-thesis and inorganic/solid-state materials. This perspective targets the latter, using organic tools (e.g. thesimplified molecular input line entry system (SMILES), SELF-referencIng embedded strings (SELFIES),and retrosynthesis planners) as analogies where instructive but not assuming their direct transfer to inor-ganic synthesis governed by phase equilibria, non-equilibrium processing and transport-limited kinetics.To realise this vision, we pursue three specific objectives: (i) to critically assess the limitations ofcurrent structure-centric generative and predictive AI models; (ii) to formalise the synthesis protocol-property framework and survey the representations and learning algorithms that enable it; and (iii) tochart open challenges while outlining research directions toward a fully autonomous, synthesis-awarematerials discovery ecosystem. In this perspective, we outline the AI methodologies-from protocol repre-sentation and generative synthesis planning to closed-loop optimisation-that will drive this synthesis-firstparadigm and envision a future where AI seamlessly designs complete, experimentally realisable materialrecipes.This perspective centres machine-readable synthesis protocols as first-class design objects, detailing rep-resentations, inverse-design methods and closed-loop strategies that directly support reproducible, data-first discovery in automated and human-in-the-loop labs.Novel contributions of this perspective. Although many materials scientists recognise that synthesis gov-erns what is experimentally accessible, we argue that the operational consequences for AI workflowsremain under-specified. This perspective therefore emphasises several concrete contributions:• A protocol-first formalism anchored in P→X→y: we treat protocols as first-class design objectsand use P→X→y to separate prediction-only workflows from mechanistic, characterisation-awaremodelling.• A systems view spanning models, automation and interoperability: we connect representations andinverse design to the practical realities of self-driving laboratories, including the need for interopera-ble protocol execution and provenance-aware logging across platforms.• Inorganic-specific constraints and failure modes: we highlight modelling and data issues that are dis-tinctive for inorganic/solid-state synthesis (multiphase outcomes, path dependence, reactor effects, andsparse or biased protocol corpora), and show how these shape representation and learning choices.More broadly, synthesis- and process-centred representations also resemble how human experimentalistsplan, troubleshoot and refine procedures, complementing structure-centric surrogates that often approxi-mate upstream simulations. This framing differs from treating autonomous laboratories merely as execu-tion layers or treating retrosynthesis tools merely as route generators: here, representation, forward mod-elling, inverse design, characterisation and execution are posed as one coupled protocol-design problem.These objectives motivate the structure of the present perspective. Building on the challengesabove, section 2 assesses limitations of structure-centric generative AI. Section 3 lays out the synthesis-centric paradigm. Section 4 surveys enabling AI and machine learning (AI/ML) methodologies, whilesection 5 provides application case studies. Section 6 discusses outstanding challenges, and section 7 con-cludes with a future outlook. Figure 1 summarises the central shift argued for in this perspective, fromstructure-first screening with a downstream synthesizability gap to a synthesis protocol-centric workflowcoupled to experimental feedback.2. Generative AI for materials discovery: the structure-property paradigm and itslimitationsHaving established the synthesisability gap as a fundamental limitation of structure-centric approaches,we now examine the current state of generative AI for materials discovery. This critical assessmentreveals both the remarkable achievements and inherent constraints of structure-property models, pro-viding the foundation for our synthesis-first alternative. While organic retrosynthesis heuristics (e.g.synthetic accessibility, synthetic complexity, and retrosynthetic accessibility scores) and planners (e.g.2J. Phys. Mater. 9 (2026) 021003 G LambardFigure 1. Paradigm shift from structure-centric to synthesis protocol-centric discovery. (A) Conventional workflow empha-sises virtual structure generation and property prediction, leaving a dashed synthesisability gap to experimental realisation. (B)The proposed workflow elevates executable synthesis protocols to first-class design objects and closes the loop via autonomousexperimentation.ASKCOS and AiZynthFinder [7, 10]) are mature for molecular synthesis, they are only partially applica-ble to solid-state and solvothermal materials. In this section we focus on inorganic-relevant approachesand treat organic tools as instructive analogues rather than direct solutions.2.1. Overview of generative models for materials structuresGenerative AI models have emerged as powerful tools for exploring chemical space and proposing novelmaterial structures de novo [11]. These models learn underlying patterns and distributions from existingmaterials data and generate new instances with potentially desirable properties. For inorganic crystals,symmetry-aware, crystallographic information file (CIF)-based and graph representations (e.g. crystalgraph convolution networks and equivariant architectures) are commonly used alongside composition-based encodings. Several architectures are widely adopted:• Variational autoencoders (VAEs): VAEs learn a compressed, continuous latent representation of inputstructures (e.g. molecular graphs, crystal structures). By sampling points from this latent space anddecoding them, novel structures can be generated. VAEs have been applied to design molecules, poly-mers, and even porous materials such as metal-organic frameworks (MOFs) [12].• Generative adversarial networks (GANs): GANs employ a two-player game framework. A generatornetwork creates candidate structures, while a discriminator network attempts to distinguish thesegenerated structures from real ones in the training data. Through adversarial training, the generatorlearns to produce increasingly realistic and diverse structures [13].• Diffusionmodels: Inspired by non-equilibrium thermodynamics, diffusion models learn to reverse aprocess that gradually adds noise to data until only noise remains. By starting with noise and apply-ing the learned reverse process, highly realistic and diverse structures can be generated. These modelscurrently achieve state-of-the-art performance for molecular generation [11].• Autoregressive models: These models generate structures sequentially, predicting the next atom,bond, or fragment based on the previously generated parts. Recurrent neural networks andTransformer architectures operating on string representations such as SMILES or SELFIES belong inthis class [11].• Flow-based models: These learn an explicit, invertible transformation between the complex data dis-tribution and a simple base distribution (e.g. a standard Gaussian). Generation involves samplingfrom the base distribution and applying the inverse transformation [11].3J. Phys. Mater. 9 (2026) 021003 G LambardBeyond these archetypes, recent studies report strong performance from energy-based models,Transformer decoders, and 3D equivariant architectures, extending generative design capabilities [14, 15].These models operate on different structural representations. For molecules and polymers, SMILES,SELFIES and graph representations are common; for inorganic crystals, CIF or graph-based abstractionsare typical. Such generative approaches have successfully proposed novel candidates across drug discoveryand energy materials.2.2. The Achilles’ heel: the synthesisability gapDespite their success in generating structurally novel and potentially high-performing materials in sil-ico, the practical utility of structure-centric generative models is limited by the synthesisability gap [16].Experimental chemists frequently struggle to realise these computationally designed structures in the lab-oratory. Key contributing factors include:• Unknown or infeasible reaction pathways: no known chemical transformation may exist that convertsavailable starting materials into the target structure. Generative models often output molecules thatviolate established reactivity principles [17].• Precursor availability and cost: required building blocks might be unavailable, prohibitively expensive,or unstable [18].• Impractical reaction conditions: predicted syntheses may require extreme temperatures, pressures, orspecialised equipment.• Kinetic traps and competing reactions: models optimising for thermodynamic stability neglect kinetics;desired products can be kinetically inaccessible or obscured by side reactions [19].• Purification and isolation challenges: even if formed, the target may be difficult to separate from com-plex mixtures.• Scalability: a route that succeeds on milligram scale may fail at scale-up.• Regulatory, safety or environmental constraints: some suggested precursors are toxic, explosive or legallyrestricted, barring practical implementation [20].Thus, structure-only optimisation implicitly assumes synthesis is a solvable downstream task, whereas inreality synthetic constraints define the accessible chemical space. Even advanced surrogate models-such asenergy-conserving equivariant graph neural networks [21] that provide symmetry-respecting predictionswith near-DFT accuracy-remain fundamentally limited by their structure-centric foundation.2.3. Critiquing structure-centric AITo compensate, post-hoc heuristics such as the synthetic accessibility score (SAScore) [18], syntheticcomplexity score (SCScore) [22], and retrosynthetic accessibility score (RAScore) [23] are often appliedto filter generative outputs. These heuristics, however, are only loosely correlated with true experimentaldifficulty (e.g. r≈ 0.3 for SAScore vs expert labels [17]). Moreover, although graphics processing unit-accelerated Monte-Carlo planners such as AIZYNTHFINDER [10] can now process O(104) molecules perhour on a single node, explicit retrosynthesis for millions of candidates remains prohibitive at library-generation scale.A complementary strategy is to predict synthesisability directly with supervised learning, treatingit as a classification or ranking problem rather than as a downstream constraint applied post hoc. Forcrystalline materials, deep learning models trained on databases of known compounds have been usedto estimate whether hypothetical compositions or structures are likely to be experimentally realisable,providing a fast prior that can be combined with generative design or DFT screening [24]. Such pre-dictors are promising but must be interpreted carefully: they inherit biases in what the communityhas attempted and reported, and the target label (‘synthesisable’) is itself time-, platform- and effort-dependent.Ultimately, the synthesisability gap reveals a fundamental limitation of the structure-propertyparadigm in de novo design: it explores theoretical chemical space without embedding the constraintsthat bound experimental reality.Table 1 summarises major generative model families for structure design, highlighting their strengthsand synthesisability limitations.Therefore, while generative AI has achieved remarkable success in proposing novel structures withdesirable computed properties, the persistent synthesisability gap exposes an architectural flaw in the4J. Phys. Mater. 9 (2026) 021003 G LambardTable 1. Generative model families used for materials structure design together with their principal strengths and synthesisabilitylimitations.Model type Input repr. (molecular/inorganic) Output Strengths Limitations (synthesisability)VAE Graphs, SMILES / CIF,symmetry-aware graphsStructure Smooth latentspace, propertyoptimisationMay generateunsynthesisable structures;mode collapseGAN Latent vector, graphs / crystalgraphsStructure Highly realistic,diverse outputsTraining instability; modecollapse; synthetictractability unclearDiffusion Graphs, point clouds / crystallatticesStructure High quality anddiversityExpensive training/sampling(∼103 steps); no synth.guaranteeAutoregressive SMILES, SELFIES /composition-sequenceStructure Effective sequentialgenerationSensitive to representation;validity issuesFlow Graphs, SMILES /symmetry-aware graphsStructure Exact likelihood;invertibleComplex for discrete data;synth. not explicit††Grammar-constrained flow models partially alleviate this limitation [25].structure-first approach. Post-hoc synthesisability filters provide only weak correlation with experimen-tal difficulty, and computational retrosynthesis remains prohibitively expensive at scale. This systematicfailure across all major generative architectures-from VAEs to diffusion models-indicates that the solutionlies not in incremental improvements to structure-centric methods, but in a fundamental paradigm shiftthat treats synthesis protocols as primary design objects. The following section outlines this synthesis-first framework and its potential to bridge the gap between computational prediction and laboratoryreality.Additional protocol representations, including domain-specific language (DSL) and ontology formatsas well as multimodal embeddings, are discussed in section 4.3. The synthesis protocol–property paradigm: a necessary shiftThe persistent failures of structure-centric approaches demand a fundamental paradigm shift. We pro-pose elevating synthesis protocols to first-class design objects in a synthesis protocol–property frameworkthat embeds experimental feasibility from the outset. This approach directly addresses the synthesisabilitygap by treating the recipe-not just the product-as the primary design variable.While one might object that structure is fundamental (atomic arrangement ultimately determinesmaterial properties), our paradigm treats structure as an indispensable intermediate outcome producedby a synthesis protocol, rather than an abstract starting point divorced from lab realities. Similarly, crit-ics may argue that synthesis data does not exist at the necessary scale. While data scarcity poses challenges(see section 6), advances in self-driving laboratories and natural language processing (NLP)-driven litera-ture mining are rapidly generating rich protocol-property datasets. This confluence makes the synthesis-first shift both timely and feasible.3.1. Conceptual frameworkIn the synthesis-centric view, a material’s structure is treated as an intermediate outcome that emergesfrom executing a specific synthesis protocol. Characterising this intermediate-e.g. through ex situ diffrac-tion or in situ spectroscopy-remains scientifically vital, because the causal chain we ultimately seek tomodel is P→ X→ y, where X denotes structure, phase, or morphology and y the resulting properties.Two operational modes are common in practice: (i) a purely predictive mode that prioritises P→y toobtain target properties efficiently, and (ii) a characterisation-aware mode that uses P→X→y to learnwhy a protocol yields a given outcome. Direct P→y models can succeed when the relevant intermedi-ate structure is either weakly varying or implicitly absorbed into the training distribution, but explicittreatment of X becomes important when phase selection, morphology evolution, defect formation ormulti-step transformations mediate the final property. While additional X data can in principle improve5J. Phys. Mater. 9 (2026) 021003 G Lambardpredictive performance, integrating heterogeneous and often sparse characterisation into P→y models isnon-trivial; in many settings the primary benefit is mechanistic understanding and theory-building.Rich structural data therefore act as mechanistic ground truth that fuels generalisable models. Thecore ML tasks therefore become (i) the forward mapping P → y to predict properties y from a protocoland (ii) the inverse mapping y⋆ → P⋆ to design protocols P⋆ that deliver target properties y⋆. Becauseprotocols explicitly encode precursors, sequences, temperatures, times, catalysts and post-treatments,the resulting models are automatically grounded in experimental reality. In practice a single protocolmay yield multiple polymorphs or morphologies, so probabilistic forward models that capture this multi-modality, together with controllers that repeatedly infer structure-sensitive proxies of X, remain an activeresearch frontier.Task taxonomy: optimisation vs de novo protocol design. We distinguish two tasks that are often con-flated: (i) process/parameter optimisation for a known material and fixed procedural scaffold (continu-ous and discrete variables such as temperature, time, concentrations), and (ii) de novo protocol design,i.e. constructing the sequence of operations, reagent choices, and intermediate targets for a new material.Task (i) is well-suited to Bayesian optimisation (BO) in a constrained design space; Task (ii) is a high-dimensional, sequential decision problem that remains nascent for inorganic materials. We use this tax-onomy throughout the rest of the perspective (see section 5).Characterisation as the rate-limiting step. In practice, estimating X (phase, microstructure, defects, mor-phology) is often the slowest and most manual component of the loop. Building robust P→X→y mod-els therefore hinges on advances in high-throughput ex situ pipelines (automated Rietveld refinement,computer vision for morphology) and in situ/operando probes (synchrotron diffraction, small-angle andwide-angle x-ray scattering, inline spectroscopy, inline electron microscopy). These modalities providetime-resolved constraints that reduce ambiguity in X, enable causal attribution, and materially accelerateclosed-loop optimisation.On the intrinsic difficulty of P→X. Predicting X from P-spanning nucleation barriers, multi-step reac-tion pathways, polymorphic transformations, grain growth and porosity evolution, non-stoichiometry,and diffusion-limited transport-is frequently harder than the traditional structure→property task. Themapping is path-dependent and governed by non-equilibrium kinetics, multi-scale transport, and reac-tor/vessel effects. Practical strategies therefore require: (i) rich, time-resolved ground truth via in situ/-operando probes to disambiguate pathways; (ii) hybrid, physics-guided machine learning with simulatorsacting as priors or constraints; (iii) multi-fidelity learning across literature, simulator outputs, and self-driving laboratory (SDL) data; and (iv) explicit uncertainty quantification to guide information gain.Validation must likewise go beyond random train/test splits and include chemically held-out systems,cross-laboratory transfer, calibration against operando or ex situ characterisation, and ablations that com-pare direct P→y predictors with variants that explicitly incorporate X.3.2. Advantages of the synthesis-centric approachWithin the synthesis protocol–property framework, several advantages over traditional structure-centricapproaches emerge:• Experimental executability, not guaranteed synthesisability: Protocols generated within a constrainedaction/reaction grammar are, in principle and subject to platform and safety constraints, executablein a laboratory setting. Executability does not imply the target will be synthesised; rather, it confinessearch to experimentally actionable procedures, improving relevance and enabling efficient optimi-sation over real operations [26]. In practice, higher success probability comes not from executabil-ity alone but from combining such constraints with uncertainty-aware model selection, structure-sensitive feedback, and iterative closed-loop refinement.• Process parameters included: Properties that depend sensitively on temperature ramps, solvent, pH orageing time can be learned because these variables are explicit inputs [27].• Interface with automation: Protocol representations map naturally to robot instructions, enablingclosed-loop, SDLs [9].• True inverse design: Instead of suggesting exotic structures and leaving synthesis an open problem,the model outputs a recipe that can be run the same day [8].6J. Phys. Mater. 9 (2026) 021003 G Lambard• Deeper scientific insight: Correlating process variables with performance uncovers non-equilibriumeffects, defect formation and morphology control invisible to equilibrium structure-only models [28].• Sustainability and economics: Green-chemistry metrics (E-factor, process mass intensity (PMI),energy footprint) and economic costs (precursor cost, throughput, labour, equipment amortisation)can be included directly in multi-objective optimisation, steering design toward practically deployablesolutions [29].These conceptual advantages position the synthesis-centric paradigm as a direct solution to the synthesis-ability gap. By embedding experimental constraints from the outset, this approach promises to bridgethe persistent divide between computational prediction and laboratory reality. The following sectiondetails the AI/ML methodologies that enable this paradigm shift.4. Enabling AI/MLmethodologies for synthesis-driven discoveryRealising the vision of a synthesis-first paradigm requires a new class of AI/ML tools. The emerging tool-box, capable of understanding, generating, and optimising procedural synthesis data, is already takingshape. Three key technical challenges emerge: (i) how to represent a synthesis protocol, (ii) how to pre-dict properties from that representation, and (iii) how to invert the model to design new protocols.4.1. Representing synthesis protocolsCommon synthesis protocol representations currently in literature span four main paradigms, eachreflecting distinct trade-offs between expressivity and structure. Text- and NLP-based formats treat theprotocol as raw procedural prose or reaction SMILES strings, enabling transformer models pretrainedon large chemical corpora to learn directly from unstructured text. Graph-based representations abstractprotocols as reaction graphs-where molecules, operations, and vessels form nodes connected by temporalor chemical relationships-making them ideal for graph neural networks (GNNs). Action-sequence encod-ings view protocols as chronological lists of primitive operations (e.g. ADD, HEAT, FILTER), aligningnaturally with reinforcement learning (RL) frameworks and robotic execution. Finally, tabular or vectorrepresentations distil well-defined design-of-experiments variables-such as temperature, time, and pH-into continuous vectors, facilitating efficient optimisation via conventional ML or BO.Beyond these four canonical families, several specialised representation strategies are particularlyrelevant:• DSL and ontology representations: Machine-executable DSLs and standards such as the chemicaldescription language (XDL), autoprotocol, protocol activity modelling language (PAML) and stan-dardisation in laboratory automation (SiLA) 2 provide structured grammars that map directly ontorobotic hardware while maintaining human readability, while analytical information markup language(AnIML) supports standardised analytical data exchange [26, 30–33]. Knowledge-graph approachesleveraging domain ontologies such as the Reaction Ontology (RXNO) and chemical entities of bio-logical interest likewise enable symbolic reasoning and constraint checking across large corpora[34, 35].• Multimodal embeddings: Emerging work fuses free-text protocols with time-resolved sensor streams(spectra, images) to create rich joint embeddings that support closed-loop optimisation and anomalydetection [36].• Inorganic-specific modelling considerations: Unlike molecular retrosynthesis, inorganic synthesisoften lacks discrete reaction rules and features continuous non-stoichiometry, high-temperature pro-cessing, and diffusion-limited kinetics. Effective encodings therefore incorporate (i) thermochemicaland phase-field-inspired features (activities, phase stability margins, oxygen chemical potential), (ii)transport-aware parameters (thermal ramps, dwell times, gas/flow conditions), (iii) reactor and fur-nace geometry (crucible materials, atmosphere zones, loading configuration), and (iv) tolerance forcontinuous composition spaces. Graph representations should include vessels, solids, melts, atmo-spheres and contact interfaces; action-sequence grammars should encode temperature/pressure pro-files, grinding/milling, pelletising, soaking and controlled cooling. These additions centre solid-staterealities (diffusion, sintering, volatilisation) often absent from molecular encodings.Taken together, no single representation is universally best for inorganic synthesis. For many solid-stateor thin-film optimisation loops, a hybrid of action sequences and tabular process variables is the most7J. Phys. Mater. 9 (2026) 021003 G Lambardpragmatic starting point because it preserves ordered operations while exposing the thermochemical con-trols most often tuned in practice. Graph or DSL-based formats become especially valuable when vesselcontext, equipment logic or cross-platform execution constraints must be preserved explicitly, whereastext-centric representations remain indispensable for mining legacy literature corpora.4.2. Predicting properties from protocolsOnce protocols are encoded, supervised models can be trained. Common choices are GNNs for graphinputs, transformers/long short-term memory networks for sequences, and gradient-boosted trees orfeed-forward networks for tabular data. Model accuracy is limited less by algorithmic nuance than by thescarcity and quality of paired protocol-property datasets-an area where SDLs can contribute enormously.Beyond model architectures, reproducibility is equally critical. Protocol-aware models should beaccompanied by accessible code and data: wherever possible, authors should deposit protocol corpora,model weights, training scripts and figure-generation notebooks in open repositories such as Zenodoor institutional archives, providing digital object identifiers (DOIs) that can be cited alongside this per-spective. Even when no new experimental datasets are generated, releasing synthetic benchmark corpora,configuration files and source data for figures materially lowers the barrier to reuse and supports thedata-review processes now expected by journals.Physics-informed neural networks (PINNs) and other hybrid surrogate models that embeddifferential-equation constraints offer a promising but challenging avenue for synthesis modelling;while they are well developed in principle, robust practical deployments for full, multi-step protocolsremain difficult and comparatively scarce, with most demonstrations focusing on constrained unit oper-ations rather than end-to-end recipes. [37] In parallel, explicit modelling of measurement noise andbatch effects-for example via Gaussian-process discrepancy models-helps prevent overfitting to spuri-ous experimental artefacts. Equally critical is rigorous uncertainty quantification-via deep ensembles,Monte-Carlo dropout or evidential networks-and meta-learning approaches such as model-agnosticmeta-learning, which enable rapid adaptation to new reaction families with only a handful of additionalexperiments [38, 39].4.3. Physics-based simulators as priors and surrogatesBeyond data-driven predictors, physics-based synthesis models-including kinetic Monte Carlo for sur-face reactions and nucleation, phase-field models for crystallisation, coarsening and porosity evolution,CALculation of PHAse diagrams (CALPHAD)-informed diffusion models for multi-component solids,and computational fluid dynamics for reactor-scale transport-provide mechanistic constraints and syn-thetic inductive bias. Today, these tools are computationally intensive, difficult to parameterise for novelchemistries, and often limited in predictive power for unknown, multi-component systems. Nevertheless,they are essential: they can (i) generate synthetic multi-fidelity datasets to pre-train protocol-aware pre-dictors, (ii) act as differentiable or policy-evaluable environment models for BO and RL, (iii) providemechanistic priors and constraints for P→ X models (e.g. linking thermal histories to microstructuralevolution), and (iv) identify kinetic traps and metastable pathways. Hybrid approaches with PINNs andgrey-box surrogates leverage partial differential equation structure for improved extrapolation and sam-ple efficiency. These models should be judged not only by aggregate prediction error but also by whetherthey reproduce operando trajectories, transfer across chemistries or laboratories, and remain calibratedwhen X is only partially observed.4.4. Hybrid integration of structural knowledge with protocol-centric modelsVast structural resources (e.g. Materials Project [2], OQMD [4]) and DFT workflows must be cen-tral to protocol-centric design, not peripheral. They contribute: (i) Phase-diagram and thermochemi-cal constraints: computed convex hulls, chemical potentials and phase boundaries restrict feasible targetdomains for P→X models; (ii) Metastable intermediates: enumerated metastable phases suggest interme-diate waypoints and targets for multi-step synthesis planning; (iii) Descriptor bootstrapping: DFT-deriveddescriptors (formation energies, elastic moduli, redox potentials) augment P representations when targetstructures are known; (iv) Route screening: precursor compatibility and reaction driving forces fromstructure-property models prune infeasible branches in BO/RL planners; (v) Grey-box priors: surrogate Xmodels warm-start on DFT/phase-field-consistent microstructural hypotheses that are refined by exper-imental data; and (vi) Generalisation: structural embeddings help transfer knowledge across chemistriesby conditioning protocol models on target-structure attributes. In practice, however, hybrid integration8J. Phys. Mater. 9 (2026) 021003 G LambardTable 2. AI/ML techniques for modelling and designing synthesis protocols.Technique Typical input Output Strengths Key challengesTransformers Text, SMILES, actions Property or protocol Capture long-rangecontext; pre-trainingNeed large data;hallucinationGNNs Reaction graphs Property features Respect chemicaltopologyGraph design, scalabilityBayesian optimisation Vector/tabular Optimised parameters Sample-efficient;uncertainty-awareHigh-dimensional spaces;constraints; kernels forgraphs/sequencesReinforcement learning Action sequences Policy (recipe) Handles multi-step logic Reward design; sampleinefficiencyGenerative(VAE/GAN/Diff)Latent vector+ targetpropertyProtocol De-novo and conditionalgenerationValidity constraints; seesection 2Symbolic planners Target structure orpropertyStepwise protocol Chemically consistent;rule-based reasoningCoverage limited by ruleset; novelty constraintsSimulation-basedsurrogatesProcess simulator,physics modelProperty or protocol Mechanistic fidelity;gradient access;multi-fidelity priorsSimulator fidelity;parameter identifiability;compute expenseSimulation-based optim. Process simulator Optimised protocol in silico risk-freeevaluation; gradientaccessSimulator fidelity; computeexpenseCausal ML Protocol variables Causal graph/effects Mechanistic insight;extrapolationConfounding; datarequirementsFoundation models Text, graphs Embedding/protocol Leverage massivepre-training;data-efficient fine-tuningCompute cost; domainadaptationis rarely plug-and-play: structural characterisation is often sparse, noisy, and multi-modal, and its pri-mary value may be mechanistic understanding and hypothesis generation rather than immediate gains inpurely predictive P→y accuracy. Accordingly, DFT databases and structure-derived embeddings shouldbe treated as priors, filters or auxiliary descriptors rather than as substitutes for protocol-conditionedmeasurements, which remain the decisive ground truth for whether a synthesis route actually succeeds.4.5. Inverse design of protocolsIn the synthesis protocol–property setting, inverse design corresponds to mapping target properties (and,where relevant, target structures) to one or more candidate protocols y⋆→P⋆ under experimental, safetyand resource constraints. Table 2 should therefore be read as spanning three settings: direct P→y predic-tion, intermediate P→X modelling, and iterative or joint P→X→y workflows. Several complementaryalgorithmic families have emerged for this task:• BO excels when the search space (continuous or discrete) is moderate in dimension and each experi-ment is costly. Extensions exist for multi-objective, multi-fidelity and constrained settings. Sequentialmodel-based optimisation variants that allocate resources adaptively-such as Hyperband and BO withHyperBand-or rely on Thompson sampling are popular choices for high-throughput SDLs [40, 41].Recent latent-space and trust-region formulations extend BO sampling efficiency to hundreds ofdesign variables [42].• RL treats synthesis planning as a sequential decision problem, ideal for multi-step routes and de novoprotocol construction. Reward design and sample inefficiency remain open challenges; model-basedRL algorithms (e.g. model-based policy optimisation (MBPO), Dreamer) mitigate sample inefficiencyby learning differentiable environment models [43].• Generative models (see section 2) can be conditioned on target properties to propose entirely newprotocols. Ensuring chemical validity, experimental feasibility and safety of generated sequences is theforemost research hurdle.• Invertible and probabilistic inverse models: conditional invertible neural networks and related flow-based approaches provide amortised approximate inversion with uncertainty estimates, complement-ing optimisation-based strategies when repeated inverse queries are required. [44]9J. Phys. Mater. 9 (2026) 021003 G LambardFigure 2. Integrated ecosystem linking heterogeneous data sources, protocol representations, AI/ML core and automated exe-cution/characterisation in a closed-loop discovery cycle. Characterisation streams feed both P→X and X→ymodels, enablingtime-resolved updating during closed-loop operation.Rule-based symbolic retrosynthesis planners such as ASKCOS and AiZynthFinder provide chemistry-aware search primitives that can be combined with BO or RL for hybrid planning [7, 10]. When differ-entiable process simulators are available, gradient-based optimisation of continuous protocol parametersis also feasible and has been demonstrated for photochemical flow reactors [45]. For real-time roboticexecution, optimisation algorithms must interface with job-shop scheduling or dead-time-aware plannersand incorporate failure-recovery logic to maintain autonomous operation [46].As a concrete illustration, consider inverse design of a solid electrolyte synthesis protocol subject topractical constraints on furnace temperature, atmosphere and available unit operations. A protocol-awareoptimiser can search over both composition and processing schedules to maximise ionic conductivitywhile simultaneously minimising PMI and energy footprint, yielding Pareto fronts that trade off perfor-mance and sustainability. The algorithmic families summarised in table 2 provide complementary routesto solving such constrained, multi-objective y⋆→P⋆ design problems.Figure 2 provides a high-level systems view of how these methodological components integrate into acohesive synthesis-driven discovery workflow.The methodological landscape surveyed in this section demonstrates that the AI/ML tools necessaryfor synthesis-first materials discovery are rapidly maturing. From protocol representations that capturethe full complexity of inorganic synthesis to physics-informed models that bridge data-driven learningwith mechanistic understanding, the technical infrastructure for the paradigm shift is falling into place.The convergence of these capabilities-coupled with the hybrid integration strategies that leverage decadesof structure-centric knowledge-positions the field to move beyond the synthesisability gap toward trulypredictive, experimentally grounded materials design. The following section examines how these tools arealready delivering practical impact across diverse materials domains.5. From vision to reality: applications and early indicatorsThe synthesis-centric paradigm is beginning to demonstrate practical impact across multiple materialsdomains. This section examines both early proof-of-concept demonstrations and mature applications.5.1. Early proof-of-concept demonstrationsRecent works provide compelling early indicators of protocol-aware AI’s transformative potential.Gómez-Bombarelli et al [12] used VAEs to generate multi-step polymer syntheses (organic analogue),while Tadanki et al [47] introduced SYNFORMER, a Transformer that outputs full reaction sequences.10J. Phys. Mater. 9 (2026) 021003 G LambardMore directly relevant to inorganic materials, SDL platforms such as MINERVA seamlessly link proto-col generation with robotic execution [9], and autonomous flow platforms have demonstrated multi-step quantum-dot synthesis in which protocol variables are repeatedly updated using in-line spectro-scopic measurements that act as proxies for composition and particle state [48]. These successes, thoughnascent, confirm that embedding synthesis constraints can drastically improve experimental relevancecompared to structure-only approaches.5.2. Mature applicationsBelow, we explicitly label application examples as (po) process/parameter optimisation or (dn) de novoprotocol design.5.2.1. Energy storage and conversionBatteries (po-dn): Closed-loop optimisation of fast-charging protocols with ML improves performanceand mitigates degradation in Li-ion cells [49]. In continuous-flow settings, combining BO with mech-anistic surrogate models reduces experiments and navigates multi-objective trade-offs [27]. Here, theprotocol variables themselves are the design space, while electrochemical response and process-state mea-surements provide the feedback needed to iteratively refine the search; this is often a direct P→y setting,but can be enriched toward P→X→y when degradation state or microstructural proxies are explicitlymodelled.Photovoltaics (po): Automated spin-coating coupled to BO has been used to co-optimise precursor com-position and processing conditions for perovskite solar cells, including antisolvent handling, spin speedand annealing temperature [50]. This example makes the P→X→y logic explicit: the protocol governscrystallisation and film formation, time-resolved fluorescence and related film-quality measurements pro-vide structure-sensitive proxies of X, and power-conversion efficiency supplies the downstream propertytarget.5.2.2. Functional electronic materialsSemiconductors and quantum dots (po): Machine-learning-assisted real-time feedback control has beenused to steer InAs/GaAs quantum-dot growth by coupling growth conditions to in situ reflection high-energy electron diffraction videos, which act as structure-sensitive observations of the evolving surfaceand quantum-dot density [51]. This is precisely the regime in which explicit treatment of X is integralrather than optional: the controller does not merely learn P→y, but adjusts the protocol in response tomeasured proxies of intermediate structural state before the final material specification is achieved.5.2.3. Catalytic and surface-active materialsCatalysis (po): Active learning coupled to high-throughput experimentation accelerates discovery andoptimisation of electrocatalysts for CO2 reduction and hydrogen evolution [52]. Interpretable ML (e.g.Shapley additive explanations, SHAP) elucidates process-structure-property relationships to guide cat-alyst optimisation [53]. These examples highlight how protocol-aware approaches can reveal process-structure-performance correlations invisible to equilibrium structure-only models, especially when pairedwith automated electrochemical workstations and inline product analysis.5.2.4. Advanced functional materialsMagnetic materials (po): While some examples remain structure-centric (ML-guided DFT screening forrare-earth-free magnet candidates [54]), AI methods are beginning to accelerate candidate identificationin 2D magnetic materials where processing conditions critically affect magnetic ordering [55]. Protocol-centric approaches that couple processing schedules with magnetometry and transport measurementsoffer a natural path to closed-loop optimisation in this space.MOFs (po, emerging dn): Closed-loop and BO-driven strategies demonstrated in related solvothermaland flow systems provide a template for MOF crystallisation optimisation [45]. MOFs represent an idealtestbed for synthesis-centric approaches because crystallisation outcomes depend sensitively on tempera-ture ramps, pH evolution, and nucleation kinetics, all of which can be monitored via inline scattering orspectroscopy.Biomaterials and drug delivery (po): The MINERVA SDL automates synthesis and inline characterisationof nanomaterials for delivery applications, [9] while ML-guided photoinduced electron/energy transfer-reversible addition-fragmentation chain-transfer polymerisation in a robotic workstation rapidly gen-erates polymer libraries for bio-interface screening [56]. Though focusing on organic polymers, these11J. Phys. Mater. 9 (2026) 021003 G Lambardexamples demonstrate the power of protocol-centric automation, where robotic execution logs, inlineassays and high-throughput screening data jointly fuel protocol optimisation.5.3. Emerging patterns and lessons learnedSeveral key patterns emerge from these applications: (i) Flow and continuous processing environmentsare particularly amenable to protocol-centric optimisation because they enable real-time parameteradjustment; (ii) Multi-objective optimisation (performance, sustainability, cost) is naturally handledwhen economic and environmental factors are embedded in protocol representations; (iii) Closed-loopoperation with inline characterisation provides the rapid feedback necessary for effective learning; (iv)Interpretable models that reveal process-structure-property relationships offer scientific insights beyondpure optimisation.The most successful examples share a common architecture: protocol representations that capture theessential physics, predictive models trained on high-quality experimental data, and optimisation algo-rithms that can navigate multi-dimensional parameter spaces efficiently. Critically, the highest-impactapplications focus on problems where processing conditions strongly influence final properties-preciselywhere structure-only approaches fail.These case studies demonstrate that protocol-aware AI is transitioning from proof-of-concept topractical impact across diverse materials domains. The synthesis-centric paradigm is not merely a theo-retical alternative but a demonstrated pathway to bridging the synthesisability gap and accelerating mate-rials discovery.6. Challenges and future directionsAdoption of the synthesis protocol-property paradigm faces several non-trivial hurdles, which we cate-gorise by priority and outline below. Table 3 summarises challenges and prospective research directions.6.1. Critical near-term challengesData scarcity, quality and accessibility. This represents the most immediate barrier to widespread adop-tion. Existing literature is dominated by unstructured prose, publication bias toward positive results,missing metadata, and proprietary restrictions. Furthermore, data pulled from text-mined literatureare heterogeneous and biased toward successful experiments, whereas self-driving laboratories yieldhighly structured but domain-specific datasets. We recognise that, in contrast to million-entry structuredatabases such as Materials Project [2] or OQMD [4], protocol-property corpora remain smaller andnoisier, but they are expanding rapidly via automated NLP extraction and SDL campaigns. Robust mod-elling under these conditions requires more than simple data accumulation: it demands deliberate cap-ture of failed, null and low-yield experiments, chemistry-aware reweighting or stratified sampling, andevaluation protocols that distinguish interpolation within one laboratory from genuine transfer acrossplatforms. The value of failed experiments for materials ML has already been demonstrated in adjacentsettings, and protocol-centric discovery will benefit from adopting the same lesson early [57].Why structure-centric ML has dominated. The community’s historical fixation on structure is not acci-dental: crystal structures are comparatively well standardised; structure-derived labels (energies, bandgaps, elastic constants) can be generated at scale with mature simulation workflows; and many propertypredictors are naturally formulated as structure→property mappings. By contrast, protocol data are het-erogeneous, platform-dependent and often underspecified, making protocol-centred AI both harder andmore brittle. Recognising these structural reasons is important for designing realistic roadmaps that pri-oritise standardisation, provenance and robust execution.Provenance and event-driven architectures. Protocol interoperability across laboratories requires morethan shared ontologies: it requires reliable capture of execution provenance (what actually happened, withwhat deviations), and durable links between synthesis events and downstream characterisation. Event-sourced and streaming architectures adapted from distributed systems have been proposed for materialsprovenance and experiment control (e.g. ESAMP and MDML), but remain complex to implement andhave not yet seen widespread community adoption. [58, 59]Heterogeneity andmulti-fidelity learning. Text-mined protocols are noisy and incomplete, whereas SDLlogs are structured but platform-specific. We advocate hierarchical Bayesian or Gaussian-process multi-fidelity modelling to fuse literature-scale low-fidelity data with high-fidelity SDL measurements, cou-pled to explicit measurement-error models and batch-effect correction. Standardised protocol normal-isation (units, timing semantics, vessel context) and schema versioning are essential to prevent dataset12J. Phys. Mater. 9 (2026) 021003 G LambardTable 3. Summary of challenges and prospective research directions.Challenge area Specific issues Proposed directionsData availability Unstructured text; few negative results findable, accessible, interoperable, and reusable (FAIR)protocols; NLP extraction; SDL data generationRepresentation No universal format Community ontologies; conversion tools; benchmarkingGeneralisability Domain shift, black-box models Physics-informed nets; uncertainty quantification;interpretable MLAutomation integration Hardware/software incompatibility Modular lab OS; human-in-the-loop; fault recoveryData fusion & transferability SDL narrow vs literature broad; domainshiftMulti-fidelity GP/Bayes; transfer learning; domain adaptation;meta-learningComputational cost Large models; expensive BO loops Surrogate models; multi-fidelity; efficient architecturesdrift and enable reproducible training. Ambiguous NLP parses benefit from human-in-the-loop curationand active error-correction workflows. Primary challenge: Achieving robust transfer across domains- fromnarrow SDL regimes to literature-scale chemistry-requires principled domain adaptation and transferlearning, with careful handling of covariate and label shifts.Standardisation of protocol representation. The diversity of text snippets, SMILES strings, reactiongraphs and action sequences hampers model interoperability. Consensus ontologies and parsers capableof round-tripping between formats will be crucial to avoid siloed datasets [60].6.2. Medium-term technical challengesModel generalisability and interpretability. Deep models trained on narrow chemical domains often failon unseen reactions. Incorporating physical priors, providing uncertainty quantification and employ-ing interpretable architectures (e.g. latent-space attention visualisation, SHAP) will build trust andinsight [53].Integration with automated experimentation. SDLs promise high-throughput, reproducible data, yetreal-world robotic platforms still incur nontrivial failure rates (10%–30%), require human-in-the-loopoversight, and often lack chemist-friendly interfaces. In the medium term we therefore envision human–AI collaboration as the dominant operating mode: AI assists chemists with planning, monitoring, andfailure recovery rather than operating fully lights-out. Beyond algorithmic integration, there is an exe-cutability gap: mapping abstract protocols to reliable robotic execution under hardware variability (dis-pense precision, dead volumes, thermal lags, calibration drift). For inorganic workflows, solids han-dling (grinding/milling, weighing powders), pellet pressing, high-temperature furnace loading/unloading,atmosphere control and thermal profile verification remain difficult to automate robustly compared toliquid handling. Practical adoption will require job-shop scheduling with dead-time awareness, automaticcalibration and verification routines, and fail-safe recovery policies; these engineering controls shouldco-evolve with protocol-aware AI.Scalability and computational cost. Training large generative models and running BO across high-dimensional spaces demand significant compute. Protocol-centric models-often featuring multimodalembeddings, simulator calls, or experiment-in-the-loop optimisation-can incur higher training anddeployment costs than structure-only VAEs or diffusion networks; however, multi-fidelity, surrogate-model, and lightweight graph-based strategies can mitigate these burdens. The comparison is also subtle:structure-first models may be cheaper per inference, but protocol-aware systems can lower end-to-enddiscovery cost if they rapidly avoid uninformative or failed experiments. For this reason, the relevantmetric is not model floating point operations in isolation but scientific yield per unit of combined lab-oratory and compute budget. We further note that scale-up challenges (heat/mass transfer, economicamortisation of equipment and reagents) must be considered when evaluating protocol viability.Benchmarking and standards. Objective comparison of new algorithms demands curated benchmarksuites and clear evaluation protocols. Although reaction-informatics and autonomous-experimentationcommunities have begun to assemble shared datasets and benchmark tasks, coverage across inorganicsynthesis and protocol-aware inverse design remains sparse. The more immediate priority is therefore todefine benchmark tasks, transfer settings, calibration metrics and reporting conventions that reflect realexperimental constraints rather than purely computational scores.13J. Phys. Mater. 9 (2026) 021003 G Lambard6.3. Future research directionsThe challenges outlined above point to several high-priority research directions that will determine thesuccess of the synthesis-centric paradigm: How best to encode multi-step, multi-phase protocols foruniversal learning? Can generative AI be constrained to obey green chemistry metrics during protocolgeneration? How do we merge heterogeneous data streams (text, graphs, SDL logs) into unified train-ing corpora that preserve the strengths of each modality? What active-learning strategies can navigatehighly non-convex, constrained synthesis spaces most effectively? And perhaps most critically, how canwe develop transferable models that generalise across chemical domains while maintaining the mechanis-tic insights that make protocol-centric approaches scientifically valuable?Can there be a foundationmodel for protocol space? Large-capacity models spanning broad regions ofmolecular and materials structure space naturally motivate analogous questions for synthesis [21]. Weconsider protocol-space foundation models plausible, but harder to universalise than structure mod-els because protocols depend on platform-specific action vocabularies, incomplete provenance, tempo-ral control logic and the coupling of textual, sensor and hardware data. In our view, the most realisticroute is not a single monolithic model trained on ‘all protocols’, but interoperable foundation modelsgrounded in execution logs, characterisation streams and explicit ontologies that permit transfer acrossplatforms while preserving uncertainty about out-of-distribution actions. [26, 58, 59]Can synthesis be simulated from ‘structure-first’ models? We view protocol-first and structure-firstapproaches as complementary. It is plausible that future multi-scale simulation-enabled by majoradvances in algorithms, parameterisation and compute-could capture aspects of nucleation, growth andprocessing with increasing fidelity. However, present-day atomistic and continuum tools rarely provideend-to-end predictive control over synthesis outcomes for complex inorganic systems. Protocol-centredmodelling therefore offers a pragmatic route to progress in the near and medium term, while leavingopen the long-term possibility of deeper first-principles synthesis simulation.Concrete community to-do list. We highlight three near-term priorities for standards and benchmark-ing: (i) convergence toward interoperable protocol ontologies and grammars (e.g. XDL, PAML, AnIML)with robust round-tripping between text, graph and action-sequence formats; (ii) SDL logging standardsthat capture fault-tolerant execution traces, including failures and recovery actions, to support reliableP→X→y model training; and (iii) open, community-maintained benchmark suites for protocol-awaremodelling and inverse design that include negative and null results, not only optimised successes.Addressing these challenges will require coordinated efforts across the materials science, machinelearning, and automation communities. Success will depend not only on algorithmic advances but alsoon community-wide adoption of data standards, collaborative benchmark development, and sustainedinvestment in the infrastructure necessary to support synthesis-aware discovery at scale.7. Conclusion: toward a synthesis-first futureThe prevailing structure-property paradigm has delivered profound scientific insights and enabledremarkable computational advances, but it falters when predictions confront experimental reality.The persistent synthesisability gap-where fewer than 2% of computationally predicted phases are everrealised-represents not merely a technical hurdle but a fundamental architectural limitation of structure-centric approaches. By elevating synthesis protocols to first-class citizens in modelling and inverse design,the synthesis-first paradigm embeds feasibility constraints from the outset, offering a direct pathway tobridge this divide.The evidence presented in sections 2–5 demonstrates that this paradigm shift is both necessary andachievable. Early successes-from protocol-optimised battery electrolytes to self-driving quantum dotsynthesis-confirm that embedding experimental constraints dramatically improves hit rates comparedto structure-only approaches. The methodological infrastructure is rapidly maturing: protocol represen-tations that capture inorganic synthesis complexity, physics-informed models that bridge data-drivenlearning with mechanistic understanding, and hybrid integration strategies that leverage decades ofstructure-centric knowledge. These advances collectively point toward an accelerated, more reliable routeto functional materials.A vision for 2035. We envision a materials discovery ecosystem where synthesis protocols are the pri-mary design variables, increasingly integrated with autonomous experimentation and real-time char-acterisation. In this future, materials scientists design complete, executable recipes for target proper-ties more routinely than today, with AI systems that not only predict thermodynamic stability but also14J. Phys. Mater. 9 (2026) 021003 G Lambardsuggest feasible routes that account for kinetics, sustainability and cost. Self-driving laboratories operateas extensions of human creativity, exploring protocol spaces guided by interpretable models that revealprocess-structure-property relationships. We expect the synthesisability gap to narrow substantially asthese elements mature and interoperate.A call to the community. Realising this vision requires coordinated action across the materials science,machine learning, and automation communities. In particular, progress will be limited if advancesremain purely algorithmic: robust, scalable experimentation platforms and interoperable protocol execu-tion are equally central. We therefore call for: (i) sustained investment in reliable, high-throughput SDLinfrastructure and fault-tolerant operation; (ii) interoperable protocol representations, execution layersand provenance/logging standards that enable workflows to transfer across labs; (iii) shared protocol-property databases and standardised representations; (iv) community benchmarks that prioritise exper-imental validation over computational metrics; (v) integration of synthesis-aware curricula in materi-als science education; (vi) collaborative development of open-source tools that democratise access tosynthesis-centric AI; and (vii) policy frameworks that incentivise data sharing while protecting intellec-tual property. The transition from structure-centric to synthesis-first materials discovery represents notjust a technological shift but a cultural transformation-one that will require the collective commitmentof our community to achieve.By embracing protocols as design objects and embedding experimental reality into our computationalframeworks, we can transform materials science from a discipline of discovery to one of design-one exe-cutable recipe at a time.Data availability statementNo new datasets, software, or code were generated or analysed in this perspective. All information iscontained within the article and its references. If future code or data deposits are created (e.g., figuresource files or synthetic benchmark corpora), the corresponding repository links and DOIs will be pro-vided in the final submission.Conflicts of interestThe author declares no conflicts of interest.Author contributionGuillaume Lambard  0000-0003-0275-4079Conceptualization (equal), Data curation (equal), Formal analysis (equal), Investigation (equal),Methodology (equal), Project administration (equal), Resources (equal), Software (equal),Supervision (equal), Validation (equal), Visualization (equal), Writing – original draft (equal), Writing –review & editing (equal)References[1] National Science and Technology Council 2011 Materials genome initiative for global competitiveness U.S. ExecutiveOffice of the President report (available at: https://obamawhitehouse.archives.gov/sites/default/files/microsites/ostp/materials_genome_initiative-final.pdf)[2] Jain A et al 2013 Commentary: the materials project: a materials genome approach to accelerating materials innovation APLMater. 1 011002[3] Curtarolo S et al 2012 Aflow: an automatic framework for high-throughput materials discovery Comput. Mater. Sci. 58 218–26[4] Saal J E, Kirklin S, Aykol M, Meredig B and Wolverton C 2013 Materials design and discovery with high-throughput density func-tional theory: the open quantum materials database (oqmd) JOM 65 1501–9[5] Zunger A 2019 Beware of plausible predictions of fantasy materials Nature 566 447–9[6] Coley C W, Green W H and Jensen K F 2018 Machine learning in computer-aided synthesis planning Acc. Chem. Res. 51 1281–9[7] Coley C W, Rogers L, Green W H and Jensen K F 2017 Computer-assisted retrosynthesis based on molecular similarity ACS Cent.Sci. 3 1237–45[8] Coley C W, Eyke N S and Jensen K F 2020 Autonomous discovery in the chemical sciences part I: progress Angew. Chem. Int. Ed.59 22858–93[9] Zaki M, Prinz C and Ruehle B 2025 A self-driving lab for nano- and advanced materials synthesis ACS Nano 19 9029–4115https://orcid.org/0000-0003-0275-4079https://orcid.org/0000-0003-0275-4079https://obamawhitehouse.archives.gov/sites/default/files/microsites/ostp/materials_genome_initiative-final.pdfhttps://obamawhitehouse.archives.gov/sites/default/files/microsites/ostp/materials_genome_initiative-final.pdfhttps://doi.org/10.1063/1.4812323https://doi.org/10.1063/1.4812323https://doi.org/10.1016/j.commatsci.2012.02.005https://doi.org/10.1016/j.commatsci.2012.02.005https://doi.org/10.1007/s11837-013-0755-4https://doi.org/10.1007/s11837-013-0755-4https://doi.org/10.1038/d41586-019-00676-yhttps://doi.org/10.1038/d41586-019-00676-yhttps://doi.org/10.1021/acs.accounts.8b00087https://doi.org/10.1021/acs.accounts.8b00087https://doi.org/10.1021/acscentsci.7b00355https://doi.org/10.1021/acscentsci.7b00355https://doi.org/10.1002/anie.201909987https://doi.org/10.1002/anie.201909987https://doi.org/10.1021/acsnano.4c17504https://doi.org/10.1021/acsnano.4c17504J. Phys. Mater. 9 (2026) 021003 G Lambard[10] Genheden S, Thakkar A and Engkvist O 2020 AiZynthFinder: a fast, robust and flexible open-source software for retrosyntheticplanning J. Cheminform. 12 70[11] Sanchez-Lengeling B and Aspuru-Guzik A 2018 Inverse molecular design using machine learning: generative models for matterengineering Science 361 360–5[12] Gómez-Bombarelli R, Wei J N, Duvenaud D, Miguel Hernández-Lobato J, Sánchez-Lengeling B, Sheberla D, Aguilera-IparraguirreJ, Hirzel T D, Adams R P and Aspuru-Guzik A 2018 Automatic chemical design using a data-driven continuous representation ofmolecules ACS Cent. Sci. 4 268–76[13] Kadurin A, Nikolenko S, Aliper A and Zhavoronkov A 2017 drugan: an advanced generative adversarial autoencoder model for denovo generation of moleculesMol. Pharm. 14 3098–104[14] Hoogeboom E ( colleagues) 2022 Equivariant diffusion for molecule generation in 3d Proc. of NeurIPS[15] Satorras V G, Hoogeboom E and Welling M 2021 E(n) equivariant graph neural networks Proc. 38th Int. Conf. on MachineLearning[16] Tabor D P et al 2018 Accelerating the discovery of materials for clean energy in the era of smart automation Nat. Rev. Mater.3 5–20[17] Gao W and Coley C W 2022 The synthesizability of molecules proposed by generative models J. Am. Chem. Soc. 144 14813–24[18] Ertl P and Schuffenhauer A 2009 Estimation of synthetic accessibility of drug-like molecules from molecular complexity and frag-ment contributions J. Cheminform. 1 8[19] McDermott M J, Dwaraknath S S and Persson K A 2021 A graph-based network for predicting chemical reaction pathways insolid-state materials synthesis Nat. Commun. 12 3097[20] U.S. Environmental Protection Agency 2024 Toxic substances control act (tsca) chemical substance inventory (available at: www.epa.gov/tsca-inventory) (Accessed 15 June 2024)[21] Batatia I, Kovács D P, Simm G N C, Ortner C, and Csányi G 2022 Mace: higher-order equivariant message passing neural net-works for fast and accurate molecular force fields[22] Coley C W, Rogers L, Green W H and Jensen K F 2018 Scscore: synthetic complexity learned from a reaction corpus J. Chem. Inf.Model. 58 252–61[23] Thakkar A, Chadimová V, Bjerrum E J, Engkvist O and Reymond J-L 2021 Retrosynthetic accessibility score (RAscore)-rapidmachine-learned synthesizability classification from ai-driven retrosynthetic planning Chem. Sci. 12 3339–49[24] Davariashtiyani A, Kadkhodaie Z and Kadkhodaei S 2021 Predicting synthesizability of crystalline materials via deep learningCommun. Mater. 2 115[25] Jacques M et al 2021 Sequential translation normalizing flows for molecular graph generation Proc. ICML[26] Steiner S R et al 2019 Organic synthesis in a modular robotic system driven by a chemical programming language Science363 eaav2211[27] Schweidtmann A M et al 2018 Machine learning meets continuous flow chemistry: automated optimization towards the Paretofront of multiple objectives Chem. Eng. J. 352 277–82[28] Chen B R et al 2018 Understanding crystallization pathways leading to manganese oxide polymorph formation Nat. Commun.9 2553[29] Sheldon R A 2018 Metrics of green chemistry and sustainability: past, present, and future ACS Sustainable Chem. Eng. 6 32–48[30] Transcriptic 2018 Autoprotocol: a standard for description of biological workflows (available at: https://autoprotocol.org/)(Accessed 09 August 2025)[31] Myers C et al 2021 Paml: protocol activity modeling language (available at: https://paml.org/) (Accessed 09 August 2025)[32] Consortium S 2020 Sila 2—standardization in laboratory automation (available at: https://sila-standard.com/) (Accessed 09August 2025)[33] ASTM E13.15 Subcommittee 2019 Animl—analytical information markup language (available at: https://animl.org/) (Accessed 09August 2025)[34] OBO Foundry n.d. Name reaction ontology (RXNO) (available at: https://obofoundry.org/ontology/rxno.html) (Accessed 21 May2026)[35] Hastings J et al 2016 Chebi in 2016: improved services and an expanding collection of metabolites Nucleic Acids Res. 44 D1214–9[36] MacLeod B P et al 2020 Self-driving laboratory for accelerated discovery of thin-film materials Sci. Adv. 6 eaaz8867[37] Raissi M, Perdikaris P and Karniadakis G E 2019 Physics-informed neural networks: a deep learning framework for solving for-ward and inverse problems involving PDEs J. Comput. Phys. 378 686–707[38] Finn C, Abbeel P and Levine S 2017 Model-agnostic meta-learning for fast adaptation of deep networks Proc. 34th Int. Conf. onMachine Learning[39] Belakaria S, Deshwal A, Srinivasan R and Kalagnanam J 2021 Meta-learning for Bayesian optimization of black-box functionsProc. 38th Int. Conf. on Machine Learning[40] Lisha Li, Jamieson K, DeSalvo G, Rostamizadeh A and Talwalkar A 2017 Hyperband: a novel bandit-based approach to hyperpa-rameter optimization Proc. Int. Conf. on Learning Representations (ICLR) (arXiv:1603.06560)[41] Falkner S, Klein A and Hutter F 2018 Bohb: robust and efficient hyperparameter optimization at scale Proc. 35th Int. Conf. onMachine Learning (ICML) pp 1437–46[42] Letham B, Karrer B, Ottoni G and Bakshy E 2018 Constrained Bayesian optimization with noisy experiments (arXiv:1706.07094)[43] Zhou Z, Xiaocheng Li and Zare R N 2017 Optimizing chemical reactions with deep reinforcement learning ACS Cent. Sci.3 1337–44[44] Ardizzone L, Kruse J, Wirkert S, Rahner D, Pellegrini E W, Klessen R S, Maier-Hein L, Rother C, and Köthe U 2018 Analyzinginverse problems with invertible neural networks (arXiv:1808.04730)[45] Bucci A et al 2020 On-the-fly closed-loop materials discovery via Bayesian active learning Nat. Commun. 11 250[46] Hase F, Roch L M, Kreutter D, Aldeghi M and Aspuru-Guzik A 2018 Phoenics: a Bayesian optimizer for chemistry ACS Cent. Sci.4 1134–45[47] Tadanki A S, Rao H S P and Priyakumar U D 2025 Dissecting errors in machine learning for retrosynthesis: a granular metricframework and a transformer-based model for more informative predictions Digit. Discovery 4 831–45[48] Abdel-Latif K, Epps R W, Bateni F, Han S, Reyes K G and Abolhasani M 2021 Self-driven multistep quantum dot synthesis enabledby autonomous robotic experimentation in flow Adv. Intell. Syst. 3 2000245[49] Attia P M et al 2020 Closed-loop optimization of fast-charging protocols for batteries with machine learning Nature 578 397–402[50] Eguchi N, Fukazawa T, Kanda H, Yamamoto K, Miyake T and Murakami T N 2025 Performance optimization of perovskite solarcells with an automated spin coating system and artificial intelligence technologies EES Sol. 1 320–3016https://doi.org/10.1186/s13321-020-00472-1https://doi.org/10.1186/s13321-020-00472-1https://doi.org/10.1126/science.aat2663https://doi.org/10.1126/science.aat2663https://doi.org/10.1021/acscentsci.7b00572https://doi.org/10.1021/acscentsci.7b00572https://doi.org/10.1021/acs.molpharmaceut.7b00346https://doi.org/10.1021/acs.molpharmaceut.7b00346https://doi.org/10.1038/s41578-018-0005-zhttps://doi.org/10.1038/s41578-018-0005-zhttps://doi.org/10.1021/acs.jcim.0c00174https://doi.org/10.1021/acs.jcim.0c00174https://doi.org/10.1186/1758-2946-1-8https://doi.org/10.1186/1758-2946-1-8https://doi.org/10.1038/s41467-021-23339-xhttps://doi.org/10.1038/s41467-021-23339-xhttps://doi.org/www.epa.gov/tsca-inventoryhttps://doi.org/www.epa.gov/tsca-inventoryhttps://doi.org/10.1021/acs.jcim.7b00622https://doi.org/10.1021/acs.jcim.7b00622https://doi.org/10.1039/D0SC05401Ahttps://doi.org/10.1039/D0SC05401Ahttps://doi.org/10.1038/s43246-021-00219-xhttps://doi.org/10.1038/s43246-021-00219-xhttps://doi.org/10.1126/science.aav2211https://doi.org/10.1126/science.aav2211https://doi.org/10.1016/j.cej.2018.07.031https://doi.org/10.1016/j.cej.2018.07.031https://doi.org/10.1038/s41467-018-04917-yhttps://doi.org/10.1038/s41467-018-04917-yhttps://doi.org/10.1021/acssuschemeng.7b03505https://doi.org/10.1021/acssuschemeng.7b03505https://autoprotocol.org/https://paml.org/https://sila-standard.com/https://animl.org/https://obofoundry.org/ontology/rxno.htmlhttps://doi.org/10.1093/nar/gkv1031https://doi.org/10.1093/nar/gkv1031https://doi.org/10.1126/sciadv.aaz8867https://doi.org/10.1126/sciadv.aaz8867https://doi.org/10.1016/j.jcp.2018.10.045https://doi.org/10.1016/j.jcp.2018.10.045https://arxiv.org/abs/1603.06560https://arxiv.org/abs/1706.07094https://doi.org/10.1021/acscentsci.7b00492https://doi.org/10.1021/acscentsci.7b00492https://arxiv.org/abs/1808.04730https://doi.org/10.1038/s41467-020-19597-whttps://doi.org/10.1038/s41467-020-19597-whttps://doi.org/10.1021/acscentsci.8b00307https://doi.org/10.1021/acscentsci.8b00307https://doi.org/10.1039/D4DD00263Fhttps://doi.org/10.1039/D4DD00263Fhttps://doi.org/10.1002/aisy.202000245https://doi.org/10.1002/aisy.202000245https://doi.org/10.1038/s41586-020-1994-5https://doi.org/10.1038/s41586-020-1994-5https://doi.org/10.1039/D5EL00007Fhttps://doi.org/10.1039/D5EL00007FJ. Phys. Mater. 9 (2026) 021003 G Lambard[51] Shen C et al 2024 Machine-learning-assisted and real-time-feedback-controlled growth of InAs/GaAs quantum dots Nat.Commun. 15 2724[52] Tran K and Ulissi Z W 2020 Active learning across intermetallics to guide discovery of electrocatalysts for co2 reduction andhydrogen evolution Nat. Catal. 3 647–55[53] Lundberg S M and Lee S-I 2017 A unified approach to interpreting model predictions Advances in Neural Information ProcessingSystems pp 4765–74[54] Scheiber D, Richter M and Kratzer P 2022 Machine-learning-assisted discovery of rare-earth-free hard magnets Adv. Funct. Mater.32 2109382[55] Butler K T, Davies D W, Cartwright H, Isayev O and Walsh A 2018 Machine learning for molecular and materials science Nature559 547–55[56] Jones R, Cooper A I and Hartings M R 2022 Autonomous pet-raft polymerization platform for rapid library generation Adv.Funct. Mater. 32 2107777[57] Raccuglia P, Elbert K C, Adler P D F, Falk C, Wenny M B, Mollo A, Friedler S A, Schrier J and Norquist A J 2016 Machine-learning-assisted materials discovery using failed experiments Nature 533 73–76[58] Statt M J, Rohr B A, Brown K, Guevarra D, Hummelshøj J, Hung L, Anapolsky A, Gregoire J M and Suram S K 2023 Esamp:event-sourced architecture for materials provenance management and application to accelerated materials discovery Digit. Discov.2 1078–88[59] Elias J R, Chard R, Levental M, Liu Z, Foster I, and Chaudhuri S 2022 Real-time streaming and event-driven control of scientificexperiments (arXiv:2205.01476)[60] Kearnes S M et al 2021 The open reaction database J. Am. Chem. Soc. 143 18820–617https://doi.org/10.1038/s41467-024-47087-whttps://doi.org/10.1038/s41467-024-47087-whttps://doi.org/10.1038/s41929-018-0142-1https://doi.org/10.1038/s41929-018-0142-1https://doi.org/10.1002/adfm.202109382https://doi.org/10.1002/adfm.202109382https://doi.org/10.1038/s41586-018-0337-2https://doi.org/10.1038/s41586-018-0337-2https://doi.org/10.1038/nature17439https://doi.org/10.1038/nature17439https://doi.org/10.1039/D3DD00054Khttps://doi.org/10.1039/D3DD00054Khttps://arxiv.org/abs/2205.01476https://doi.org/10.1021/jacs.1c09820https://doi.org/10.1021/jacs.1c09820 Beyond structure: revolutionising materials discovery via AI-driven synthesis protocol-property relationships 1. Introduction 2. Generative AI for materials discovery: the structure-property paradigm and its limitations 2.1. Overview of generative models for materials structures 2.2. The Achilles' heel: the synthesisability gap 2.3. Critiquing structure-centric AI 3. The synthesis protocol–property paradigm: a necessary shift 3.1. Conceptual framework 3.2. Advantages of the synthesis-centric approach 4. Enabling AI/ML methodologies for synthesis-driven discovery 4.1. Representing synthesis protocols 4.2. Predicting properties from protocols 4.3. Physics-based simulators as priors and surrogates 4.4. Hybrid integration of structural knowledge with protocol-centric models 4.5. Inverse design of protocols 5. From vision to reality: applications and early indicators 5.1. Early proof-of-concept demonstrations 5.2. Mature applications 5.2.1. Energy storage and conversion 5.2.2. Functional electronic materials 5.2.3. Catalytic and surface-active materials 5.2.4. Advanced functional materials 5.3. Emerging patterns and lessons learned 6. Challenges and future directions 6.1. Critical near-term challenges 6.2. Medium-term technical challenges 6.3. Future research directions 7. Conclusion: toward a synthesis-first future References