QSPR for the prediction of critical micelle concentration of different classes of surfactants using machine learning algorithms

Surface active agents, also known as surfactants, are vital chemicals that play a crucial role in many aspects of our daily lives. There are four main subclasses into which surfactants can be categorized, depending on their net charge: cationic, anionic, nonionic, and zwitterionic. Lately, researchers have shown significant interest in a specific type of surfactant called Gemini surfactants. These surfactants possess two hydrophobic tails and two head groups that are connected by a short spacer [1]. One of the fundamental characteristics of surfactants is their ability to collect at interfaces, which is related to the fact that their structure contains both hydrophilic and hydrophobic regions [2]. This characteristic of surfactants can reduce surface tension between two substances. In principle, the effectiveness of the surfactant increases as its tendency becomes stronger. At a boundary, both the structure of the surfactant and the types of the two phases that meet at the interface influence the concentration of the surfactant. Therefore, the choice of a surfactant should be based on its specific use. Micellization, or micelle formation, is the dynamic process of the creation of micelles, which is another essential characteristic of surfactants, that is, the surfactant concentration exceeds a critical value called the CMC [1]. This last refers to the concentration at which surfactant molecules start to form micelles. At this point, various characteristics of the surfactant solution, such as foaming, interfacial tension, emulsification, conductivity, and others, undergo substantial changes [3]. Due to these characteristics, surfactants are commonly used in various industrial applications such as pharmaceuticals, detergents, personal care, food, and agriculture. They play a crucial role in facilitating wetting, foaming, emulsification, and lubrication processes [[4], [5], [6], [7]]. Various external factors, such as temperature, pressure, pH, ionic strength, volume of the solution, and the structural characteristics of the surfactant, including hydrophobic tail length and head group area, can significantly impact the CMC [7]. A variety of experimental techniques such as tensionmetry, conductance, nuclear magnetic resonance spectroscopy, cyclic voltammetry, and fluorescence emission spectroscopy can be employed to determine the CMC value [8].

Within the wide range of methodologies found in the literature for predicting substance properties by understanding the chemical structure, QSPR modeling is a significant field of research in computational chemistry [9] because it can provide a faster, more accurate, and less expensive method for understanding and measuring structural characteristics that affect physical property. In the QSPR framework, a wide range of molecular descriptors effectively capture the intricate details of the molecular structure, such as topological, constitutional, geometrical, electronic, and more [8]. Therefore, a chosen set of descriptors is statistically associated with the studied experimental property, producing a mathematical model that can be used for discovering valuable correlations between structure and property.

Multiple QSPR models have been constructed to predict the CMC of surfactants [2,3,[10], [11], [12], [13], [14]] but few published QSPR models can accurately predict CMC values for all surfactant classes. Qin et al. [15] utilized a graph representation of molecules, where atoms are nodes and chemical bonds are edges, as the input for a graph convolutional neural network (GCN) to predict the CMC of a set of 202 surfactants, which included anionic, cationic, nonionic, and zwitterionic compounds. The test set with a (R2 = 0.92, RMSE = 0.3), demonstrates the robustness of the GCN model. A report published in 2002 [16] presents a QSPR model to predict the CMC of 49 surfactants and seven molecular descriptors. By applying partial least squares regression, which takes into account nonionic, anionic, cationic, and zwitterionic surfactants, the study achieved an observed coefficient of determination of 0.90 for the training set.

This study aims to establish QSPR models for the predicting and correlating the negative logarithm of CMC with the molecular structure of 593 different classes of surfactants (including anionic, cationic, zwitterionic, nonionic, and Gemini surfactants) using various descriptors. These descriptors are calculated from the two-dimensional depiction of the molecules. Both internal and external validation were used to select the best model. Finally, the applicability domain based on the Williams plot was analyzed for the best model.

留言 (0)

沒有登入
gif