Title:Relevance of Machine Learning to Predict the Inhibitory Activity of Small
Thiazole Chemicals on Estrogen Receptor
Volume: 19
Issue: 1
Author(s): Venkatesan Jayaprakash, Thangavelu Saravanan, Karuppaiyan Ravindran, Thangavelu Prabha*, Jubie Selvaraj, Sudeepan Jayapalan, M.V.N.L. Chaitanya and Thangavel Sivakumar
Affiliation:
- Department of Pharmaceutical Chemistry, Nandha College of Pharmacy, Erode, 638052, Tamilnadu,
India
Keywords:
QSAR, python, supervised machine learning, thiazole derivatives, MCF-7, breast cancer, cytotoxicity.
Abstract:
Background: Drug discovery requires the use of hybrid technologies for the discovery
of new chemical substances. One of those interesting strategies is QSAR via applying an artificial
intelligence system that effectively predicts how chemical alterations can impact biological activity
via in-silico.
Aim: Our present study aimed to work on a trending machine learning approach with a new opensource
data analysis python script for the discovery of anticancer lead via building the QSAR
model by using 53 compounds of thiazole derivatives.
Methods: A python script has been executed with 53 small thiazole chemicals using Google collaboratory
interface. A total of 82 CDK molecular descriptors were downloaded from “chemdes”
web server and used for our study. After training the model, we checked the model performance
via cross-validation of the external test set.
Results: The generated QSAR model afforded the ordinary least squares (OLS) regression as R2 =
0.542, F=8.773, and adjusted R2 (Q2) =0.481, std. error = 0.061, reg.coef_ developed were of, -
0.00064 (PC1), -0.07753 (PC2), -0.09078 (PC3), -0.08986 (PC4), 0.05044 (PC5), and
reg.intercept_ of 4.79279 developed through stats models, formula module. The performance of
test set prediction was done by multiple linear regression, support vector machine, and partial least
square regression classifiers of sklearn module, which generated the model score of 0.5424,
0.6422 and 0.6422 respectively.
Conclusion: Hence, we conclude that the R2values (i.e. the model score) obtained using this script
via three diverse algorithms were correlated well and there is not much difference between them
and may be useful in the design of a similar group of thiazole derivatives as anticancer agents.