# Fileset

[SI_NL-RQMS.pdf](https://mdr.nims.go.jp/filesets/32a78177-92be-4373-8cdc-6a4d89411902/download)

## Creator

[Yusuke Hibi](https://orcid.org/0000-0003-4006-1070)

## Rights

[In Copyright](http://rightsstatements.org/vocab/InC/1.0/)

## Other metadata

[Reference-free quantitative mass spectrometry in the presence of nonlinear distortion caused by <i>in situ</i> chemical reactions among constituents](https://mdr.nims.go.jp/datasets/2c2282ec-a0e6-4b36-98a4-61f58459fdd7)

## Fulltext

1  Support Information for Reference-free Quantitative Mass Spectrometry in the Presence of Nonlinear Distortion Caused by In-Situ Chemical Reactions among Constituents Yusuke Hibi* Data-driven Polymer Design Group, Research Center for Macromolecules and Biomaterials, National Institute for Materials Science (NIMS); 1-2-1, Sengen, Tsukuba, Ibaraki 305-0047, Japan. *Corresponding authors. Emails: hibi.yusuke@nims.go.jp    This PDF file includes: Methods Supplementary Figure S1-S5 Supplementary Tables S1-2 Captions for Data S1 Reference list cited in this support information file.  Other Supplementary Materials for this manuscript include the following:  The raw spectra and sample information are available at Analyst, 2024, DOI: 10.1039/D4AN00624K (Data S3, Gly-Jeff-Silox dataset). The ready-to-use starting matrix obtained by processing this raw data is attached to this manuscript as Data S1.   mailto:hibi.yusuke@nims.go.jp2  Methods Mathematical notations The notation largely adheres to the standard conventions used in signal processing. The symbols ℝ+𝑁×𝑀  and ℝ𝑁×𝑀  denote a non-negative real matrix and a real matrix, respectively, each with dimensions 𝑁 × 𝑀 . For a matrix 𝑿 ∈ ℝ𝑁×𝑀 , 𝑿𝑛: ∈ ℝ1×𝑀, 𝑿:𝑚 ∈ℝ𝑁×1, 𝑋𝑛𝑚 ∈ ℝ represent the nth row vector, mth column vector, and (n, m)-element of the matrix 𝑿, respectively. The notation 𝑿𝑇 stands for the transpose of matrix 𝑿. The Frobenius norm of 𝑿 is indicated by ‖𝑿‖𝐹. The symbols of ‖𝑿𝑛:‖1 and ‖𝑿𝑛:‖2 represent the ℓ1- and ℓ2-norm of nth row vector of 𝑿, respectively. For a square matrix of 𝑿, 𝑇𝑟(𝑿) represents its trance. 𝟏𝑁, 𝟏𝟏𝑁 and 𝑰𝑁 stand for a N-dimensional all-ones vector, (N, N)-dimensional all-ones matrix and N-dimensional identity matrix, respectively.   Preparation of the starting matrix 𝑨 representing fragment abundance (FA) (Fig. 2A) The ready-to-use spectral dataset of the Gly-Jeff-Silox system is available in the previous report.1 The dataset 𝑿 contained 310 spectra (31 samples × 10 temperature bands from 300 to 600 oC) with 2537 channels. Therefore, 𝑿 was a matrix with the size of (310, 2537), which was subjected to the reported NMF algorithm: 𝑿 ≈ 𝑨𝑺 , where 𝑺 ∈ ℝ+𝑀×𝐷 represents 27–fragment spectra (Fig. S1) and 𝑨 ∈ ℝ+10𝑁×𝑀 represents their abundances in each sample (Fig. 2A), where N = 31, M = 27 and D = 2537. Note that 𝑨  originally represented a spectrum-wise FA with a dimension of (310, 27), which was converted into a sample-wise FA with a dimension of (31, 27), which was further converted into weight fraction using synchronized thermogravimetry data as previously described.1 The NMF 3  hyperparameter was the same as previous report and presented in Table S1. The conventional linear RQMS performs two step NMF: 𝑿 ≈ 𝑨𝑺 ≈ (𝑪𝑩)𝑺 , as described in the main text.2 NL-RQMS aims to achieve better compositional analysis for interactive/reactive systems by introducing a bilinear model in the second factorization step yet starting with the same matrix 𝑨 as in conventional RQMS. The starting matrix 𝑨 with size of (N, M) can be found in attached Data S1.  Quick review of linear RQMS  The first NMF 𝑿 ≈ 𝑨𝑺 extracts the most representative M–fragment spectra 𝑺 and their abundances (FA) in each sample 𝑨. The second NMF 𝑨 ≈ 𝑪𝑩 finds the FA of K–pure constituents 𝑩 ∈ ℝ+𝐾×𝑀  and their concentration in each sample 𝑪 ∈ ℝ+𝑁×𝐾 . The second NMF uses so-called volume minimization algorithm (VolMin),3 which finds the volume-minimized simplex spanned by the rows of 𝑩 and enclosing all the rows of 𝑨. This can be formulated as follows: min 𝑪,𝑩12‖𝑨 − 𝑪𝑩‖𝐹2 + 𝑣𝑜𝑙(𝑩) (1) 𝑠. 𝑡. 𝑩 ≥ 0, 𝑪 ≥ 0, 𝑪𝟏𝐾 = 𝟏𝑁 . The first term is of approximation residuals and the second term is of the volume regularization of the simplex. However, this cannot output accurate composition because  𝑨𝑛: represents the coordinates of nth sample in the non-orthogonal coordinate system spanned by 𝑺.  To consider this non-orthogonality originating from the first NMF, the approximation residuals 𝐷(𝑨|𝑪𝑩) should be evaluated in Riemann metrics, i.e.: 4  𝐷(𝑨|𝑪𝑩) = 𝑇𝑟[(𝑨 − 𝑪𝑩)𝑺𝑺𝑇(𝑨 − 𝑪𝑩)𝑇]. The significance of using Riemann metrics is that the distance between 𝑨 and 𝑪𝑩 can be evaluated in the original spectral space, as depicted in Fig. 3. Using a lower triangular matrix 𝑳 ∈ ℝ𝑀×𝑀  obtained via Cholesky decomposition of 𝑺𝑺𝑇 , the residuals can be written as follows: 𝐷(𝑨|𝑪𝑩) = 𝑇𝑟[(𝑨 − 𝑪𝑩)𝑳𝑳𝑻(𝑨 − 𝑪𝑩)𝑇] = ‖𝑨̂ − 𝑪𝑩̂‖𝐹2, where 𝑨̂ = 𝑨𝑳 ∈ ℝ𝑁×𝑀  and  𝑩̂ = 𝑩𝑳 ∈ ℝ𝐾×𝑀 . The equation (1) then becomes as follows with additional orthogonal constraints: min 𝑪,𝑩̂⬚12‖𝑨̂ − 𝑪𝑩̂‖𝐹2+𝛼2𝑣𝑜𝑙(𝑩̂) +𝛽2𝑛𝑜𝑛𝑜𝑟𝑡ℎ(𝑩̂) , (2) 𝑠. 𝑡. 𝑩̂ = 𝑩𝑳, 𝑩 ≥ 0, 𝑪 ≥ 0, 𝑪𝟏𝐾 = 𝟏𝑁 , where 𝑛𝑜𝑛𝑜𝑟𝑡ℎ(𝑩̂) is a non-orthogonality term of row vectors of 𝑩̂, and 𝛼 > 0, 1 > 𝛽 > 0 are regularization parameters controls volume-shrinking and volume-expanding force applied to the simplex. Although the selection of hyperparameters influence the final compositional results, 0.1 for both  and  were consistently used and gave sufficiently accurate results in the previous reports. These regularization terms were formulated as follows: 𝑣𝑜𝑙(𝑩̂) = log|det(𝑩̂𝑩̂𝑇 + 𝜏𝑰𝐾)|, 𝑛𝑜𝑛𝑜𝑟𝑡ℎ(𝑩̂) = 𝑇𝑟 (𝜦 (𝑩̂𝑩̂𝑇 − 𝑑𝑖𝑎𝑔(𝑩̂𝑩̂𝑇))), where 𝜦 ∈ ℝ𝐾×𝐾  is a symmetric Lagrange multiplier matrix. The equation (2) cannot be solved directly since both 𝑪  and 𝑩̂  are unknown. Therefore, they were alternatively and 5  iteratively updated starting from the initial 𝑪 and 𝑩̂ set by vertex component analysis. At the tth update, the update to 𝑪(𝑡+1) can be achieved using a simple algorithm of non-negative least square (NNLS)4 based on temporal 𝑩̂(𝑡), here denoted as 𝑪(𝑡+1) = 𝑁𝑁𝐿𝑆(𝑨̂; 𝑩̂(𝑡)). On the other hand, updating rule for 𝑩̂ is much more complicated due to the regularization terms and non-negative restriction; yet it can be updated based on 𝑨̂  and 𝑪(𝑡)  as derived in the previous report, and here denotation of   𝑩̂(𝑡+1) = 𝑣𝑜𝑙𝑚𝑖𝑛(𝑨̂; 𝑪(𝑡)).  Mathematical derivation of the NL-RQMS  For analyzing interactive systems, the nonlinear correction term should be incorporated in the second factorization,5 i.e.:  𝑨 ≈ 𝑪𝑩 + 𝑪′𝑩′, (3) where  𝑩′𝑘: (𝑘 = 1,2, … 𝐾′) represents the kth cross-component of (l, m)-components, 𝐾′ =(𝐾2), and (l, m) is the index combination of two-combination of K. The kth column of  𝑪′ is calculated by pairwise multiplication of the columns of 𝑪: 𝑪′:𝑘 = 𝑪:𝑙⨀𝑪:𝑚, 𝑓𝑜𝑟 𝑘 = 1, 2, . . 𝐾′, (4) where  ⨀ represents element-wise production.   As described above, the approximation residual should be evaluated using Riemann metrics. The equation (3) becomes 𝑨̂ ≈ 𝑪𝑩̂ + 𝑪′𝑩′̂  where 𝑨̂ = 𝑨𝑳, 𝑩̂ = 𝑩𝑳, 𝑩′̂ = 𝑩′𝑳 . As the vertices of pure component spectra are consistent regardless of the existence of interactions, the same regularization terms for  𝑩̂  were applied as (2). The NL-RQMS algorithm then can be formulated as follows: 6  min 𝑪,𝑩̂,𝑩̂′12‖𝑨̂ − 𝑪𝑩̂ − 𝑪′𝑩′̂‖𝐹2+𝛼2𝑣𝑜𝑙(𝑩̂) +𝛽2𝑛𝑜𝑛𝑜𝑟𝑡ℎ(𝑩̂) + 𝜌‖𝑩′‖1, (5) 𝑠. 𝑡. 𝑩̂ = 𝑩𝑳, 𝑩′̂ = 𝑩′𝑳, 𝑩 ≥ 0, 𝑪 ≥ 0, 𝑪𝟏𝐾 = 𝟏𝑁 , where  𝑪′  can be calculated from 𝑪  according to equation (4). The regularization term of 𝜌‖𝑩′‖1  express the chemical knowledge that the polymer substructures involved in reaction/interaction should be limited, and the hyperparameter 𝜌 controls the nonlinearity of the model. Introducing 𝑨′̂ ≡ 𝑨̂ − 𝑪′𝑩′̂  makes the problem (5) become very similar to problem (2). If 𝑩′̂(𝑡)  is given at tth iteration, 𝑩̂(𝑡+1)  can be updated using exact same algorithm:  𝑩̂(𝑡+1) = 𝑣𝑜𝑙𝑚𝑖𝑛(𝑨′̂(𝑡); 𝑪(𝑡)) . However, the update rule of 𝑪(𝑡+1)  should be modified from the simple NNLS, because this update affects 𝑪′ as well.   To consider the update rule for 𝑪, the objective function is defined as follows: 𝐿 ≡12‖𝑨̂ − 𝑪𝑩̂ − 𝑪′𝑩′̂‖𝐹2=12‖𝑨̂ − 𝑪̃𝑩̃‖𝐹2 , where  𝑩̃ ≡ ( 𝑩̂𝑩′̂)  and 𝑪̃ ≡ (𝑪 𝑪′) . From 𝑪̃ , only columns related to k-components are extracted and denoted as 𝑪𝑘 , i.e. 𝑪𝑘 ≡(𝑪:𝑘 𝑪:𝑘⨀𝑪:1 𝑪:𝑘⨀𝑪:2 … 𝑪:𝑘⨀𝑪:𝐾)  ∈ ℝ+𝑁×𝐾 . In the same manner, k-component related rows of  𝑩̃ are extracted and denoted as 𝑩𝑘. Using 𝑪𝑘 and 𝑩𝑘, the objective function can be written as: 𝐿 =12‖𝑨𝑘 − 𝑪𝑘𝑩𝑘‖𝐹2 , where 𝑨𝑘 ≡ 𝑨̂ − 𝑪̃𝑩̃ + 𝑪𝑘𝑩𝑘. The gradient of 𝐶𝑛𝑘 with to objective function is computed as follows: 𝜕𝐿𝜕𝐶𝑛𝑘= 𝑇𝑟 [𝜕𝑪𝑘𝜕𝐶𝑛𝑘(𝜕𝐿𝜕𝑪𝑘)𝑇] (6) The nth row of 𝜕𝑪𝑘𝜕𝐶𝑛𝑘 can be calculated as (1 𝐶𝑛1 𝐶𝑛2 … 𝐶𝑛𝑘) ≡ 𝒄/𝑘𝑇 and all the other rows are zeros. Therefore, Eq. 6 can be calculated as follows: 7  𝜕𝐿𝜕𝐶𝑛𝑘= 𝒄/𝑘𝑇[𝑩𝑘𝑩𝑘𝑇𝑪𝑘𝑇 − 𝑩𝑘𝑨𝑘𝑇]:𝑛 The optimized 𝐶𝑛𝑘 can be obtained by setting 𝜕𝐿𝜕𝐶𝑛𝑘≡ 0 and calculated as: 𝐶𝑛𝑘 =𝒄/𝑘𝑇𝑩𝑘(𝑨𝑘𝑇):𝑛𝒄/𝑘𝑇𝑩𝑘𝑩𝑘𝑇𝒄/𝑘(7) If 𝐶𝑛𝑘 becomes negative values, then 𝐶𝑛𝑘 is set to zero.  Once 𝑩̂(𝑡) and 𝑪(𝑡) are updated, 𝑩̂′ can be immediately updated using generalized least absolute shrinkage of selection operators (LASSO) algorithm.6  This problem is formulated as follows with temporally fixed 𝑩̂(𝑡) and 𝑪(𝑡): min 𝑩′̂12‖𝑨̅ − 𝑪′𝑩′̂‖𝐹2+ 𝜌‖𝑩′‖1, (8) 𝑠. 𝑡. 𝑩′̂ = 𝑩′𝑳 where 𝑨̅ ≡ 𝑨̂ − 𝑪𝑩̂. The NL-RQMS cyclically updating 𝑩̂, 𝑪 and 𝑩̂′ is outlined as follows. Algorithm 1: Pseudo-code for NL-RQMS Input: sample-wise FA: 𝑨 ∈ ℝ+𝑁×𝑀 , fragment spectra: 𝑺 ∈ ℝ+𝑀×𝐷, the number of system constituents: K, regularization parameters: (𝛼, 𝛽, 𝜌) Output: fraction: 𝑪 ∈ ℝ+𝑁×𝐾 , FAs of the constituents: 𝑩 ∈ ℝ+𝐾×𝑀 , FAs of cross-component: 𝑩′ ∈ ℝ+𝐾′×𝑀  Initialization calculate 𝑳 via Cholesky decomposition of 𝑺𝑺𝑇 set 𝑨̂ = 𝑨𝑳 initialize 𝑩̂ and  𝑪 using linear RQMS algorithm. 8  Repeat until convergence criteria is satisfied:   Calculate 𝑨̅ = 𝑨̂ − 𝑪𝑩̂. Calculate 𝑪′ following Eq. 4.   update 𝑩′̂ by solving Eq. 8 using generalized LASSO.   Calculate 𝑨′̂ = 𝑨̂ − 𝑪′𝑩′̂.   Update  𝑩̂ = 𝑣𝑜𝑙𝑚𝑖𝑛(𝑨′̂;  𝑪)   Update 𝑪 by Eq. 7.   Normalize 𝑪 so that sum-to-one constraint is satisfied. return 𝑪, 𝑩 and 𝑩′    9  Supplementary Figures and Tables  Fig. S1. The extracted 27–fragment spectra (𝑺) from the spectral dataset 𝑿 containing 310 spectra (31 samples × 10 temperature bands). The fragment abundances (FAs) of these 27 spectra in each sample is numerically presented in Table S1 and depicted as a heatmap in Fig. 2A.                                                               10   Fig. S2. (NL-)RQMS flowchart and definition of terminologies. Channels: m/z positions of centroid peaks; fragment spectra: groups of peaks whose intensities fluctuates at a certain ratio across the entire spectral dataset; fragment abundance (FA): the abundances of the fragment spectra in each sample; reference: a pure system component; cross-component: the interaction effects caused by mixing two references, influencing the FAs in mixed samples. The incorporation of thermogravimetry data for converting FA to a weight basis is detailed in the previous report.1      + ×                                                                    =         (   )    +      =      + ×                                       =     + ×  =    1   + ×                                                       + ×                             + ×                                                                                                                                                                                                 =     1   +  ×                      + ×  11   Fig. S3. Estimating the nonlinearity effect on fragment abundance (FA). Red points represent the measured FA for each sample, while the yellow triangular plane is the best-fit plane that these red points align with. Green points mark where perpendicular lines from red dots intersect with the plane. Nonlinearity is evaluated by the deviation from the plane. (A) FA of a non-interacting fragment derived from Silox, where the plane slopes towards the Silox vertex, indicating that most sample FAs are close to the best-fit plane. (B) FA of fragment 19, which results from Gly-Jeff interactions, shows that most points deviate from the plane. Since the true Gly, Jeff, and Silox samples contain almost no fragment 19 (Fig. 2A), the actual plane should be near FA = 0 plane. This suggests that using a linear analysis to estimate the plane for such nonlinear data leads to significant errors. The weight fraction of fragments caused by nonlinear interactions in this dataset is calculated by dividing the total distance of data points from the plane by the number of samples, showing that 41% of the total weight is influenced by nonlinear effects.                                                                                       12   Fig. S4. Testing Robustness Against Noise in Input Data. The sample variance for each fragment in the input FA matrix data was calculated, and Gaussian noise, with a variance between 1% and 30% of the sample variance, was randomly added to the input data. Noise was introduced ten times for each noise level, and the error from the true composition was measured as the RMSE of the estimated composition values. The average RMSE was plotted against the noise levels, with error bars showing the standard deviation across the ten trials. The results indicate that estimation accuracy remains stable up to noise levels of around 10%, demonstrating the robustness of NL-RQMS in handling noise.                                 13   Fig. S5. Characterization of important fragment spectra identified as having positive or negative nonlinear effects by NL-RQMS analysis. The observed and calculated masses were highly consistent.                                                                                                                                                               14  Table S1. The hyper parameters of the first NMF 𝑿 ≈ 𝑨𝑺 and the second factorization 𝑨 ≈ 𝑪𝑩 + 𝑪′𝑩′. Dataset Sample number N First NMF to calculate FA Second factorization 𝑤𝑜 Merging threshold Initial M iteration 𝐾 𝛼= 𝛽 p  Gly/Jeff/Silox 31 0.2 0.99 30 3000 3 0.1 1.5 0.03 wo: This parameter defines the strictness of orthogonality. A higher wo value indicates fewer shared peaks among the fragment spectra. The value of wo should range between 0 and 1, where 0 implies no orthogonality constraint and 1 represents the strictest constraint. Merging threshold and initial M: These parameters are associated with the automatic relevance determination (ARD) mechanism. The initial number of bases should be set larger than the appropriate number, and the ARD mechanism reduces the number of basis spectra by merging them when the cosine similarity of spectra exceeds the "merging threshold." K: The number of system constituents. : The regularization hyperparameter that controls the minimum volume constraint for the simplex spanned by the reference spectra. A higher  value results in a more contracted simplex. : The regularization hyperparameter that controls the orthogonality among the reference spectra. A higher  value leads to a more expanded simplex. p: The hyperparameter that controls robustness against outliers, which should be specified between 0.5 and 2. A higher p makes the second factorization more sensitive to data variation but less robust to outliers. : This parameter controls the sparseness of the interaction term of 𝑩′. 15  Table S2.   Known composition Inferred composition (linear) Inferred composition (nonlinear) FileNames Gly Jeff Silox Gly Jeff Silox Gly Jeff Silox gly10jeff10silox80 0.15 0.05 0.80 0.11 0.01 0.89 0.13 0.06 0.81 gly10jeff80silox10 0.05 0.80 0.15 0.00 0.79 0.21 0.07 0.75 0.18 gly10jeff80silox10 0.15 0.80 0.05 0.09 0.83 0.07 0.15 0.75 0.10 gly20jeff40silox40 0.21 0.39 0.40 0.20 0.37 0.43 0.25 0.38 0.38 gly20jeff80 0.20 0.80 0.00 0.15 0.85 0.00 0.21 0.79 0.00 gly20silox80 0.20 0.00 0.80 0.17 0.00 0.83 0.19 0.00 0.81 gly30jeff70 0.29 0.71 0.00 0.23 0.77 0.00 0.29 0.71 0.00 gly30silox70 0.32 0.00 0.68 0.23 0.00 0.77 0.27 0.00 0.73 gly33jeff33silox33 0.33 0.33 0.34 0.29 0.32 0.38 0.34 0.31 0.34 gly40jeff20silox40 0.40 0.20 0.41 0.35 0.20 0.45 0.40 0.20 0.39 gly40jeff40silox20 0.40 0.40 0.20 0.37 0.43 0.20 0.41 0.38 0.20 gly40jeff60 0.40 0.60 0.00 0.29 0.71 0.00 0.38 0.62 0.00 gly40silox60 0.41 0.00 0.59 0.36 0.00 0.64 0.42 0.00 0.58 gly50jeff50 0.51 0.49 0.00 0.39 0.61 0.00 0.51 0.49 0.00 gly50silox50 0.50 0.00 0.51 0.45 0.00 0.55 0.52 0.00 0.48 gly50silox50 0.52 0.00 0.48 0.44 0.00 0.56 0.52 0.00 0.48 gly60jeff40 0.60 0.40 0.00 0.43 0.57 0.00 0.58 0.42 0.00 gly60silox40 0.60 0.00 0.40 0.55 0.00 0.45 0.62 0.00 0.38 gly70jeff30 0.68 0.32 0.00 0.49 0.51 0.00 0.62 0.38 0.00 gly70silox30 0.70 0.00 0.30 0.69 0.00 0.31 0.75 0.00 0.25 gly80jeff10silox10 0.81 0.04 0.14 0.88 0.01 0.11 0.86 0.03 0.11 gly80jeff10silox10 0.78 0.16 0.07 0.80 0.20 0.00 0.83 0.14 0.03 gly80jeff20 0.79 0.21 0.00 0.62 0.38 0.00 0.74 0.26 0.00 gly80jeff20 0.80 0.20 0.00 0.70 0.30 0.00 0.71 0.28 0.02 gly80silox20 0.80 0.00 0.20 0.83 0.00 0.17 0.87 0.00 0.13 jeff20silox80 0.00 0.21 0.79 0.00 0.15 0.85 0.00 0.23 0.77 jeff30silox70 0.00 0.29 0.71 0.00 0.24 0.76 0.00 0.33 0.67 jeff40silox60 0.00 0.39 0.61 0.00 0.35 0.65 0.00 0.45 0.55 jeff50silox50 0.00 0.51 0.49 0.00 0.49 0.51 0.00 0.58 0.42 jeff60silox40 0.00 0.61 0.39 0.00 0.57 0.43 0.00 0.66 0.34 jeff70silox30 0.00 0.71 0.30 0.00 0.66 0.34 0.00 0.74 0.26 The precisions of linear and nonlinear RQMS were assessed by calculating the root mean squared error (RMSE) of 𝑪  in comparison to the ground truth 𝑪̅ , i.e., 𝑅𝑀𝑆𝐸 =√1𝑁∑ ∑ (𝑪 − 𝑪̅)𝑘𝑛2𝑁𝑛𝐾𝑘 , where N is the dataset seize (N=31) and K is the number of the system component (K = 3).  Data points on the Gly-Jeff edge are bolded to highlight the significant 16  improvement in estimation accuracy achieved by updating to nonlinear RQMS.  Data S1. Numerical data of Fig. 2A. The values represent the 27–fragment abundances (weight fraction) in each sample, derived by reported algorithm using hyperparameters of Table S1.  Reference (1) Hibi, Y.; Uesaka, S.; Naito, M. Thermogravimetry-Synchronized, Reference-Free Quantitative Mass Spectrometry for Accurate Compositional Analysis of Polymer Systems Without Prior Knowledge of Constituents. Analyst 2024. https://doi.org/10.1039/D4AN00624K. (2)       Hibi, Y.; Uesaka, S.; Naito, M. A Data-Driven Sequencer That Unveils Latent “Codons” in Synthetic Copolymers. Chem Sci 2023. https://doi.org/10.1039/d2sc06974a. (3) Fu, X.; Huang, K.; Yang, B.; Ma, W. K.; Sidiropoulos, N. D. Robust Volume Minimization-Based Matrix Factorization for Remote Sensing and Document Clustering. IEEE Trans. Signal Process. 2016, 64 (23), 6254–6268. https://doi.org/10.1109/TSP.2016.2602800. (4) Heinz, D. C.; Chang, C.-I. Fully Constrained Least Squares Linear Spectral Mixture Analysis Method for Material Quantification in Hyperspectral Imagery. IEEE Trans. Geosci. Remote Sens. 2001, 39 (3), 529–545. (5) Févotte, C.; Dobigeon, N. Nonlinear Hyperspectral Unmixing with Robust Nonnegative Matrix Factorization. IEEE Trans. Image Process. 2015, 24 (12), 4810–4819. https://doi.org/10.1109/TIP.2015.2468177. (6) Roth, V. The Generalized LASSO. IEEE Trans. Neural Netw. 2004, 15 (1), 16–28. https://doi.org/10.1109/TNN.2003.809398.