# Fileset

[D5DD00437C.pdf](https://mdr.nims.go.jp/filesets/b12564d8-1f99-4c10-aaf7-3d76170307dd/download)

## Creator

Sitanan Sartyoungkul, Balasubramaniyan Sakthivel, [Pavel Sidorov](https://orcid.org/0000-0001-6462-702X), [Yuuya Nagata](https://orcid.org/0000-0001-5926-5845)

## Rights



## Other metadata

[Automated synthesis and fragment descriptor-based machine learning for retention time prediction in supercritical fluid chromatography](https://mdr.nims.go.jp/datasets/ed143ff5-a867-415f-8b9a-3351fcf6a62b)

## Fulltext

Automated synthesis and fragment descriptor-based machine learning for retention time prediction in supercritical fluid chromatographyDigitalDiscoveryPAPEROpen Access Article. Published on 26 November 2025. Downloaded on 12/24/2025 5:03:12 AM.  This article is licensed under a Creative Commons Attribution 3.0 Unported Licence.View Article OnlineView JournalAutomated synthaInstitute for Chemical Reaction Design aUniversity, Sapporo, Hokkaido 001-0021, JabJST, ERATO Maeda Articial Intelligence inProject, Sapporo, Hokkaido 060-0810, JapancAutonomous Polymer Design and DisMacromolecules and Biomaterials, NationaTsukuba, Ibaraki 305-0047, JapanCite this: DOI: 10.1039/d5dd00437cReceived 29th September 2025Accepted 24th November 2025DOI: 10.1039/d5dd00437crsc.li/digitaldiscovery© 2025 The Author(s). Published besis and fragment descriptor-based machine learning for retention timeprediction in supercritical fluid chromatographySitanan Sartyoungkul,ab Balasubramaniyan Sakthivel,a Pavel Sidorov *aand Yuuya Nagata *abcThe integration of automated synthesis and machine learning (ML) is transforming analytical chemistry byenabling data-driven approaches to method development. Chromatographic column selection, a criticalyet time-consuming step in separation science, stands to benefit substantially from such advances. Here,we report a workflow that combines automated synthesis of a structurally diverse amide library withfragment descriptor-based ML for retention time prediction in supercritical fluid chromatography (SFC).Retention data were systematically acquired on the recently developed DCpak® PBT column, providingone of the first structured datasets for this stationary phase. Benchmarking revealed that fragment-countdescriptors (ChyLine and CircuS) substantially outperformed conventional molecular fingerprints,delivering higher predictive accuracy and more interpretable relationships between substructures andretention behavior. External validation underscored the role of chemical space coverage, whilevisualization techniques such as ColorAtom analysis offered mechanistic insight into model decisions. Byuniting automated synthesis with chemoinformatics-driven ML, this study demonstrates a scalableapproach to generating high-quality training data and predictive models for chromatography. Beyondretention prediction, the framework exemplifies how data-centric strategies can accelerate columncharacterization, reduce reliance on trial-and-error experimentation, and advance the development ofautonomous, high-throughput analytical workflows.IntroductionChromatography is an indispensable analytical technique for theseparation and analysis of components within complex mixtures,with widespread applications in pharmaceuticals,1 food science,2and environmental monitoring.3 Among the various factorsinuencing the efficiency and success of chromatographic sepa-rations, column selection is paramount. Consequently, the char-acterization of both new and existing columns is a crucial step inoptimizing separation conditions. High-throughput evaluationmethods facilitate the rapid and efficient screening of numerouscolumns, signicantly accelerating analytical workows.In recent years, articial intelligence (AI) and machinelearning (ML) have attracted considerable attention for theirpredictive capabilities across various scientic disciplines,including analytical chemistry.4–6 In liquid chromatographynd Discovery (WPI-ICReDD), HokkaidopanChemical Reaction Design and Discoverycovery Group, Research Center forl Institute for Materials Science (NIMS),y the Royal Society of Chemistry(LC), AI and ML have emerged as powerful tools for retentiontime prediction, enabling faster, more accurate, and more effi-cient chromatographic method development. Furthermore,supercritical uid chromatography (SFC) has gained increasingattention due to its ability to provide even faster analyses, andits adoption has been expanding rapidly.7,8Despite these advancements, the adoption of newly devel-oped chromatography columns remains challenging foranalytical chemists, as their separation characteristics are oenunknown. Consequently, trial-and-error experimentation withunfamiliar columns can be impractical and time-consuming.To address this issue, we propose a machine learning modelcapable of predicting retention times based on molecularstructures, thereby providing analytical chemists with valuableinsights into the separation characteristics of new columns andfacilitating their selection and use.In this study, we employed an automated synthesis robot torapidly generate a diverse set of amide compounds with varyingmolecular structures. Retention times were measured using anSFC system, and a machine learning model was developed topredict retention times based on molecular structures.Furthermore, we explored the relationship between molecularsubstructures and retention times through visualization, whichis also discussed in this study.Digital Discoveryhttp://crossmark.crossref.org/dialog/?doi=10.1039/d5dd00437c&domain=pdf&date_stamp=2025-11-28http://orcid.org/0000-0001-6462-702Xhttp://orcid.org/0000-0001-5926-5845http://creativecommons.org/licenses/by/3.0/http://creativecommons.org/licenses/by/3.0/https://doi.org/10.1039/d5dd00437chttps://pubs.rsc.org/en/journals/journal/DDFig. 1 Automated synthesis of amides 1a–8h. Carboxylic acids weredissolved in THF. Amines, DMAP, and EDC-HCl were dissolved in DCM.For SFCmeasurements, sample solutions were diluted with a heptane/2-propanol (50/50) mixture in 2 mL vials.Digital Discovery PaperOpen Access Article. Published on 26 November 2025. Downloaded on 12/24/2025 5:03:12 AM.  This article is licensed under a Creative Commons Attribution 3.0 Unported Licence.View Article OnlineMaterials and methodsExperimental setupTo construct a library of amide compounds with diversemolecular structures, we obtained a comprehensive list ofcommercially available reagents from Tokyo Chemical IndustryCo., Ltd (TCIDATA, no. 43, 202007). For both carboxylic acidderivatives and amine derivatives, we calculated their Morganngerprints and selected eight structurally diverse compoundsfrom each category based on Tanimoto similarity coefficients,ensuring minimal structural redundancy. The library of amidecompounds consisted mainly of aromatic amide compounds.This composition reects the structural bias present incommercially available amines and carboxylic acids, which tendto include a large proportion of aromatic derivatives. Sucha bias likely originates from the pharmaceutical importance ofaromatic amides, as compounds such as aniracetam, agome-latine, and benorilate are well known marketed drugs. There-fore, aromatic amides play a signicant role in screeninglibraries used in drug discovery. Nevertheless, the dataset alsocontains aliphatic amides such as 6c and amides with esterfunctionalities such as 8c, ensuring structural diversity forreliable model development.Subsequently, the automated synthesis of various amidecompounds was carried out through condensation reactionsbetween the selected amines and carboxylic acids (Fig. 1 andTable 1). Tetrahydrofuran (THF) solutions of the selected eightcarboxylic acid derivatives and dichloromethane (DCM) solu-tions of the eight selected amines were prepared. A di-chloromethane solution of 4-dimethylaminopyridine (DMAP)and 3-ethylcarbodiimide hydrochloride (EDC-HCl) was thenadded, and the mixtures were stirred at 40 °C for 8 hours tosynthesize various amide compounds. Aer the reaction wascomplete, 0.1 mol L−1 HCl aqueous solution was added, and themixture was shaken. The organic layer was separated usinga phase separation lter, collected, and diluted with a heptane/2-propanol (50/50) mixture to prepare the chromatographysample solutions. For the synthesis of compound 7b, a catalyticamount of hydroxybenzotriazole (HOBt) was additionallyemployed under otherwise identical conditions. Chromato-graphic analysis was carried out on a Daicel DCpak® PBTcolumn9 (3 mm, 4.6 mm i.d. × 100 mm, fully porous particles)with supercritical CO2 and 2-propanol (90 : 10, v/v) as themobilephase at a ow rate of 2.0 mL min−1. The column temperaturewas maintained at 40 °C. Samples (1 mg mL−1 in n-hexane/2-propanol) were injected at a volume of 5 mL. Detection wasperformed using a two-dimensional photo diode array detector,and one-dimensional chromatograms were obtained at220.0 nm. For samples that eluted at retention times close tothat of the sample solvent, chromatograms were compared withthose of previously measured other samples to identify solvent-derived peaks, and the analyte retention times were determinedaccordingly. When the reaction did not proceed completely,chromatograms of the starting materials were measured, andthe newly appeared peak that was not derived from the startingmaterials was identied as the amide product.Digital DiscoveryHere, we employed the DCpak® PBT column, which isa silica gel-modied column with polybutylene terephthalate(PBT). This column was developed relatively recently, and itsuse remains limited. The retention times of the 64 synthesizedamide compounds are summarized in Table 1.In general, compounds containing aromatic rings tended toexhibit strong retention, whereas those with alkyl chainsshowed shorter retention times. However, interpreting thecolumn characteristics intuitively based solely on this retentiontime table is challenging. Therefore, based on these results, weattempted to develop a machine learning model to predictretention times from molecular structures.Computational detailsExperimental data (chromatograms) was processed using in-house scripts to extract the retention time in an automaticmanner with the peak detection and integration by Python. Thecode and intermediate results are available as SI.The ML model for prediction of retention time was builtfollowing the best practices in QSPR modelling.10 In this work,© 2025 The Author(s). Published by the Royal Society of Chemistryhttp://creativecommons.org/licenses/by/3.0/http://creativecommons.org/licenses/by/3.0/https://doi.org/10.1039/d5dd00437cTable 1 Retention time tR (s) of the 64 amides (1a–8h). Column; DCpak® PBT, eluent; sCO2/2-PrOH = 90/10, flow rate; 2.0 mL min−1, columntemperature; 40 °C, concentration of the sample; 1 mg mL−1 in heptane/2-PrOH (50/50) mixture, injection volume; 5 mL, detection; absorptionat 220 nma148.0 97.8 130.4 201.6 149.2 81.6 174.8 88.0132.6 105.6 131.8 240.6 150.6 77.0 180.8 86.676.0 61.8 66.2 103.0 78.8 56.4 88.0 59.6220.2 133.4 179.4 315.6 221.1 102.8 224.4 107.8132.2 104.8 130.4 228.2 149.4 98.0 168.6 110.2381.0 238.2 300.0 654.4 408.8 158.6 488.6 183.8504.4 262.0 371.8 719.8 503.2 179.2 534.0 193.2161.0 107.6 168.8 260.4 185.6 80.6 182.4 87.6a Retention time tR (s), 220 nm, t0 ∼ 39.8 s.Paper Digital DiscoveryOpen Access Article. Published on 26 November 2025. Downloaded on 12/24/2025 5:03:12 AM.  This article is licensed under a Creative Commons Attribution 3.0 Unported Licence.View Article Onlinewe chose structural descriptors to represent the molecules, as itis the most relevant part in our dataset. Widely used molecularngerprints (FP) – binary vectors indicating the absence or thepresence of certain structural features – were selected for theirsimplicity.11 We have used Morgan FP12 (capturing the circularsubstructures), RDkit FP13 (circular, linear, and branchedsubstructures), AtomPairs14 (pairs of atoms with the topologicaldistance between them), Torsion15 (substructures consisting of4 connected atoms with torsion angles) and Avalon16 (variousdrug-likeness features). The binary nature of the FP, however,limits their expressiveness and may lead to lower performanceof a model. To circumvent that, we also use fragment featuresthat account not only for the presence of certain substructure,but count their occurrences in each molecule, enriching theinformation content in the descriptor vector. Two types offragment descriptors were used – CircuS (Circular Substruc-tures) to account for circular fragments, and ChyLine (ChythonLinear) for linear substructures. Both fragment descriptors werecalculated using DOPtools library (ver.1.2),17 all ngerprints –using RDkit (ver.2024.9.6). Each descriptor type generatesa number of features for the dataset: for ngerprints, the lengthof the feature vector was set to 1024; for fragment counts, thenumber varies depending on the fragment topology and size.The calculated matrices of descriptors for each setting areavailable in SI.The best descriptor type was selected in a benchmarkingstudy. It was performed using DOPtools library and the© 2025 The Author(s). Published by the Royal Society of Chemistryfollowing parameters were optimized: (1) descriptor space –only one type of descriptors were used at a time by each model;(2) ML algorithm – Support Vector Machines (SVM),18 RandomForest (RF)19 and XGBoost (XGB)20 were tested in a regressionmodel; (3) ML hyperparameters, depending on the algorithm.The models were scored by the prediction results of a repeated5-fold cross-validation (CVk=5). Determination coefficient (R2)and root mean squared error (RMSE) are used to quantify themodel's quality:RMSE ¼ffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffi1NXN�yobs;i � ypred;i�2sR2 ¼ 1�PN�yobs;i � ypred;i�2PN�yobs;i � ŷobs;i�2where N is the number of points in the set, yobs,i is the experi-mentally observed value of the ith data point, ypred,i is the pre-dicted value of the ith data point, and ŷobs,i is the averageobserved value across the set.The following Python libraries were used for data processingand calculations: Chython (ver.1.78),21 RDkit (ver.2024.9.6),DOPtools (ver.1.2), Scikit-learn (ver.1.5),22 Optuna (ver.3.6).23Other libraries were installed as dependencies to the latestavailable versions.Digital Discoveryhttp://creativecommons.org/licenses/by/3.0/http://creativecommons.org/licenses/by/3.0/https://doi.org/10.1039/d5dd00437cDigital Discovery PaperOpen Access Article. Published on 26 November 2025. Downloaded on 12/24/2025 5:03:12 AM.  This article is licensed under a Creative Commons Attribution 3.0 Unported Licence.View Article OnlineResults and discussionModel benchmarkThe model for the prediction of retention time was based on theensemble of data presented in the Table 1. In the initial stages,we have modelled the retention time directly, and a benchmarkstudy on molecular descriptors was performed along with thehyperparameter optimization, so that every type of descriptorsachieves the best possible predictivity in CVk=5. Additionally,performing the benchmark on different ML methods, we haveobserved better predictive power of the SVM, so the resultshenceforth are only shown for this method (all benchmarkresults are available in SI). Its results show the clear advantageof fragment descriptors over ngerprints: all ngerprints haveshown much higher error of prediction and have especiallystruggled with the compounds in the higher ranges (seeFig. S141). Indeed, as fragment counts contain more informa-tion, it is expected that they would retain more knowledge onthe relationship between the structure and the modelled prop-erty. Moreover, as the higher retention times are oen associ-ated with the repeating substructures (e.g., in this case, morearomatic rings in a structure lead to higher RT), which theFig. 2 (Top) benchmark results (RMSE in repeated CVk=5 for the logarboxplot represents the distribution of scores (RMSE, in log units) for 5 repethe mean score, the box for the interquartile range IQR, whiskers for 1.5CVk=5 for the three best models: ChyLine fragments (left), CircuS fragmDigital Discoveryngerprints fail to effectively retain as they only encode thepresence and absence of substructures.Yet, the retention time by itself depends not only on thechemical structure, but also on the experimental setup andconditions. To eliminate the effect of changes in the chroma-tography column size and eluent speed, we have then selectedthe retention factor as the modelled property. The retentionfactor (k) is given by (tR − t0)/t0, where tR is the analyte retentiontime and t0 is the column dead time. Considering the range ofthe values, we also transform the retention factor value toa logarithmic scale to reduce the effect of the range on the errorof prediction (ln k). As the Fig. 2 shows, the fragment descrip-tors have again shown the best performance in cross-validation,although the performance was excellent across the board. Forthis property, the ngerprints still have difficulties with pre-dicting values in lower and higher ranges. Since the models forln k with fragments have shown the best performance, furtherwe only discuss these.External predictions and interpretationFragment-based models for ln k were applied to a series ofmolecules from external sources to verify the chemical spacecoverage by the models. The compounds used here (1x–12x)ithmic retention factor model) for each descriptor type in SVM. Eachats of CVk=5 on the training set with random shuffling (white square forIQR, other points are outliers). (Bottom) observed vs. predicted RT inents (middle), Morgan features FP (right).© 2025 The Author(s). Published by the Royal Society of Chemistryhttp://creativecommons.org/licenses/by/3.0/http://creativecommons.org/licenses/by/3.0/https://doi.org/10.1039/d5dd00437cFig. 3 The external predictions for the test set (1x–12x, right) made by the model built on ChyLine fragments (left plot) and CircuS fragments(right plot). The outliers (data points for which the prediction error is greater than 1 logarithmic unit) are indicated in red and annotated by text.Statistical scores in blue are for the subset excluding the outliers.Paper Digital DiscoveryOpen Access Article. Published on 26 November 2025. Downloaded on 12/24/2025 5:03:12 AM.  This article is licensed under a Creative Commons Attribution 3.0 Unported Licence.View Article Onlineincluded various amide compounds with structures relativelysimilar to those used in model training, as well as arbitrarilyselected compounds with completely dissimilar structures. Theresults of predictions are shown in Fig. 3. As the gure shows,both models struggle with this test set, with the RMSE beingover 1 compared to 0.12 for cross-validation. However, suchhigh prediction error is due to two main factors.First, there are several notable outliers for both models,especially compounds 10x and 12x. If the outliers are removed,the statistical scores for the models improve signicantly.Moreover, outside of these outliers, model built on ChyLineperforms quite well across most of the range of ln k values. TheCircuS model, on the other hand, shows a more restrictivecoverage.Second, the chemical space of the test set is quite differentfrom that of the training set. First of all, not all molecules areamides, although they are the main target of the model. Thespecial cases are the aforementioned compounds 10x and 12x,the former of which (9,10-diphenylanthracene) is a polycyclicFig. 4 Interpretation of ChyLine-based model for prediction of ln k byColorAtom. The contributions of atoms are coded blue for negativecontributions and red for positive, with the intensity of color indicatingthe scale of the effect. White-coded atoms have virtually no contri-bution to the prediction. The contributions are scaled to the maximumin the test set (colorbar on the right), to allow the comparison of theeffect.© 2025 The Author(s). Published by the Royal Society of Chemistryaromatic compound, and the latter (1,4-bis(trimethylsilyl)benzene) contains trimethylsilyl groups which are completelyoutside of the initial chemical space. One can also interpretthese errors using the ColorAtom methodology,24 which allowsto assign atomic contributions to predictions by coloring themaccording to their importance. Fig. 4 shows ColorAtom inter-pretations for the predictions on outliers by the ChyLine model.Indeed, for the compound 10x, the aromatic groups showpositive contribution, i.e., increasing the retention time as itwould be expected. However, due to the high number of thesegroups compared to the training set, the model overestimatesthe ln k which leads to a high prediction error. On the otherhand, the silyl groups in the compound 12x are completelyignored by the model and their contribution cannot be correctlyestimated. Similar observations can be made about otheroutliers, as well, where some groups' contributions are over- orunderestimated.It could also be assumed that the compounds of the test setare outside of the applicability domain (AD)25 of the training set.Indeed, when estimating Fragment Control (FC)26 AD, whichexcludes the compounds possessing new fragments, andBounding Box (BB)27 AD, which excludes compounds whichhave descriptors values outside of the training set, allcompounds of the test set would be considered outside of AD,although these are very strict denitions (see details in SI). Todemonstrate that the AD of the model is not extremely restric-tive, we performed validation by excluding a random portion ofthe training set to an external test set and repeated the opti-mization and validation process on these new sets. Thepredictions for these sets are excellent, which shows that themodel works well on external data of amides, as expected (alldetails are presented in SI).ConclusionsIn this study, we demonstrated a machine learning-basedapproach for predicting the retention times of amidecompounds in supercritical uid chromatography (SFC), aimedat facilitating high-throughput evaluation of recently developedDigital Discoveryhttp://creativecommons.org/licenses/by/3.0/http://creativecommons.org/licenses/by/3.0/https://doi.org/10.1039/d5dd00437cDigital Discovery PaperOpen Access Article. Published on 26 November 2025. Downloaded on 12/24/2025 5:03:12 AM.  This article is licensed under a Creative Commons Attribution 3.0 Unported Licence.View Article Onlinechromatography columns. By combining automated synthesisof a structurally diverse amide library with rigorous chemo-informatics modelling, we generated a dataset of retentiontimes measured on the DCpak® PBT column, which is a rela-tively new stationary phase with limited prior characterization.We benchmarked a range of molecular descriptors andmachine learning algorithms, showing that fragment-count-based descriptors (ChyLine and CircuS) substantially out-performed traditional molecular ngerprints in cross-validatedprediction of both raw retention times and logarithmic reten-tion factors (ln k). These fragment descriptors provided richer,more quantitative representations of structural features thatcorrelate with chromatographic behavior, especially forcompounds with repeating or aromatic substructures that driveretention on the PBT column.External validation using structurally diverse testcompounds highlighted important limitations of modelextrapolation, with notable prediction errors for molecules welloutside the training set's chemical space. Nonetheless, inter-pretation methods such as ColorAtom analysis claried theorigins of prediction errors, conrming that the model'slearned relationships remain chemically meaningful within itsapplicability domain. Moreover, controlled experimentsexcluding subsets of the training data demonstrated robustpredictive performance for amide structures within the ex-pected chemical space.Overall, our approach shows that machine learning modelstrained on systematically designed reaction libraries canprovide accurate, interpretable predictions of SFC retentiontimes for new columns. This can reduce the need for trial-and-error experimentation, accelerate method development, andimprove column selection workows. Future work will expandthe training data to broader chemical classes and columns,rene applicability domain estimation, and integrate thesepredictive tools into automated analytical workows for high-throughput chromatography.Author contributionsS. S. was responsible for the synthesis and measurements. Y. N.contributed to the automated synthesis and measurements. S.B. and P. S. conducted the informatics analyses. P. S. and Y. N.contributed to the writing of the manuscript. Y. N. supervisedand coordinated the overall project.Conflicts of interestThe authors declare no competing nancial interest. (PBTcolumns were provided by Daicel Corporation, and a compounddata list was provided by Tokyo Chemical Industry Co., Ltd;neither had any inuence on the impartiality of this study).Data availabilityAll experimental data and code for reproducing the modellingresults are freely available in the GitHub repository: https://github.com/icredd-cheminfo/chromatography-modeling, asDigital Discoverywell as in the Zenodo repository at DOI: https://doi.org/10.5281/zenodo.17655751.Supplementary information (SI): experimental data and codefor reproducing the modelling results. See DOI: https://doi.org/10.1039/d5dd00437c.AcknowledgementsThis work was supported by JSPS KAKENHI grant numbersJP23H03810, JP23H03807 and JP JP23H03806. Support was alsoprovided by JST-ERATO (JPMJER1903) and the Institute forChemical Reaction Design and Discovery (ICReDD), which wasestablished by the World Premier International ResearchInitiative (WPI), MEXT, Japan.Notes and references1 H. H. Maurer, J. Chromatogr. A, 2013, 1292, 19–24.2 H. M. Merken and G. R. Beecher, J. Agric. Food Chem., 2000,48, 577–599.3 S. Montesdeoca-Esponda, A. del Toro-Moreno, Z. Sosa-Ferrera and J. J. Santana-Rodŕıguez, J. Sep. Sci., 2013, 36,2168–2175.4 A. G. Usman, S. Işik, S. I. Abba and F. Meriçli, J. Sep. Sci.,2021, 44, 843–849.5 Y. Fan, Y. Deng, Y. Yang, X. Deng, Q. Li, B. Xu, J. Pan, S. Liu,Y. Kong and C.-E. Chen, Environ. Sci.:Adv., 2024, 3, 198–207.6 Z.-M. Win, A. M. Y. Cheong and W. S. Hopkins, J. Chem. Inf.Model., 2023, 63, 1906–1913.7 L. T. Taylor, J. Supercrit. Fluids, 2009, 47, 566–573.8 V. Desfontaine, D. Guillarme, E. Francotte and L. Nováková,J. Pharm. Biomed. Anal., 2015, 113, 56–71.9 K. Nagai, T. Shibata, S. Shinkura and A. Ohnishi, J.Chromatogr. A, 2018, 1549, 85–92.10 A. Tropsha, Mol. Inf., 2010, 29, 476–488.11 Danishuddin and A. U. Khan, Drug Discovery Today, 2016, 21,1291–1302.12 D. Rogers and M. Hahn, J. Chem. Inf. Model., 2010, 50, 742–754.13 G. Landrum, RDKit: Open-source cheminformatics, https://www.rdkit.org.14 R. E. Carhart, D. H. Smith and R. Venkataraghavan, J. Chem.Inf. Comput. Sci., 1985, 25, 64–73.15 R. Nilakantan, N. Bauman, J. S. Dixon andR. Venkataraghavan, J. Chem. Inf. Comput. Sci., 1987, 27,82–85.16 P. Gedeck, B. Rohde and C. Bartels, J. Chem. Inf. Model.,2006, 46, 1924–1936.17 S. Byadi, P. Gantzer, T. Gimadiev and P. Sidorov, DigitalDiscovery, 2025, 4, 1188–1198.18 H. Drucker, C. J. C. Burges, L. Kaufman, A. J. Smola andV. Vapnik, in Advances in neural information processingsystems, 1996, pp. 155–161.19 L. Breiman, Mach. Learn., 2001, 45, 5–32.20 T. Chen and C. Guestrin, in Proceedings of the 22nd ACMSIGKDD International Conference on Knowledge Discovery© 2025 The Author(s). Published by the Royal Society of Chemistryhttps://github.com/icredd-cheminfo/chromatography-modelinghttps://github.com/icredd-cheminfo/chromatography-modelinghttps://doi.org/10.5281/zenodo.17655751https://doi.org/10.5281/zenodo.17655751https://doi.org/10.1039/d5dd00437chttps://doi.org/10.1039/d5dd00437chttps://www.rdkit.orghttps://www.rdkit.orghttp://creativecommons.org/licenses/by/3.0/http://creativecommons.org/licenses/by/3.0/https://doi.org/10.1039/d5dd00437cPaper Digital DiscoveryOpen Access Article. Published on 26 November 2025. Downloaded on 12/24/2025 5:03:12 AM.  This article is licensed under a Creative Commons Attribution 3.0 Unported Licence.View Article Onlineand Data Mining, ACM, New York, NY, USA, 2016, pp. 785–794.21 R. Nugmanov, N. Dyubankova, A. Gedich and J. K. Wegner, J.Chem. Inf. Model., 2022, 62, 3307–3315.22 F. Pedregosa, G. Varoquaux, A. Gramfort, V. Michel,B. Thirion, O. Grisel, M. Blondel, P. Prettenhofer, R. Weiss,V. Dubourg, J. Vanderplas, A. Passos, D. Cournapeau,M. Brucher, M. Perrot and É. Duchesnay, J. Mach. Learn.Res., 2011, 12, 2825–2830.23 T. Akiba, S. Sano, T. Yanase, T. Ohta and M. Koyama, inProceedings of the 25th ACM SIGKDD InternationalConference on Knowledge Discovery & Data Mining, ACM,New York, NY, USA, 2019, pp. 2623–2631.© 2025 The Author(s). Published by the Royal Society of Chemistry24 G. Marcou, D. Horvath, V. Solov’Ev, A. Arrault, P. Vayer andA. Varnek, Mol. Inf., 2012, 31, 639–642.25 T. I. Netzeva, A. P. Worth, T. Aldenberg, R. Benigni,T. D. Mark, P. Gramatica, J. S. Jaworska, S. Kahn,G. Klopman, A. Carol, G. Myatt, N. Nikolova-jeliazkova,G. Y. Patlewicz and R. Perkins, Altern. Lab. Anim., 2005, 2,155–173.26 P. Polishchuk, T. Madzhidov, T. Gimadiev, A. Bodrov,R. Nugmanov and A. Varnek, J. Comput.-Aided Mol. Des.,2017, 31, 829–839.27 V. P. Solov’ev, I. Oprisiu, G. Marcou and A. Varnek, Ind. Eng.Chem. Res., 2011, 50, 14162–14167.Digital Discoveryhttp://creativecommons.org/licenses/by/3.0/http://creativecommons.org/licenses/by/3.0/https://doi.org/10.1039/d5dd00437c Automated synthesis and fragment descriptor-based machine learning for retention time prediction in supercritical fluid chromatography Automated synthesis and fragment descriptor-based machine learning for retention time prediction in supercritical fluid chromatography Automated synthesis and fragment descriptor-based machine learning for retention time prediction in supercritical fluid chromatography Automated synthesis and fragment descriptor-based machine learning for retention time prediction in supercritical fluid chromatography Automated synthesis and fragment descriptor-based machine learning for retention time prediction in supercritical fluid chromatography Automated synthesis and fragment descriptor-based machine learning for retention time prediction in supercritical fluid chromatography Automated synthesis and fragment descriptor-based machine learning for retention time prediction in supercritical fluid chromatography Automated synthesis and fragment descriptor-based machine learning for retention time prediction in supercritical fluid chromatography Automated synthesis and fragment descriptor-based machine learning for retention time prediction in supercritical fluid chromatography Automated synthesis and fragment descriptor-based machine learning for retention time prediction in supercritical fluid chromatography Automated synthesis and fragment descriptor-based machine learning for retention time prediction in supercritical fluid chromatography Automated synthesis and fragment descriptor-based machine learning for retention time prediction in supercritical fluid chromatography Automated synthesis and fragment descriptor-based machine learning for retention time prediction in supercritical fluid chromatography