Title:A Machine Learning Language to Build a QSAR Model of Pyrazoline
Derivative Inhibitors Targeting Mycobacterium tuberculosis Strain H37Rv
Volume: 20
Issue: 2
Author(s): Jayaprakash Venkatesan, Prabha Thangavelu*, Selvaraj Jubie, Sudeepan Jayapalan and Thangavel Sivakumar
Affiliation:
- Department of Pharmaceutical Chemistry, Nandha College of Pharmacy, Erode-638052, Tamilnadu,
India
Keywords:
Machine learning, QSAR, Python, H37Rv strain, Mycobacterium tuberculosis, Pyrazoline derivatives.
Abstract:
Background: Machine learning has become an essential tool for drug research to generate
pertinent structural information to design drugs with higher biological activities. Quantitative structureactivity
relationship (QSAR) is considered one technique. QSAR study involves two main steps: first is
the generation of descriptors, and the second is building and validating the models.
Aim: By using a Python program language for building the QSAR model of pyrazoline derivatives, the
data were collected from diverse literature for the inhibition of Mycobacterium tuberculosis. Pyrazoline, a
small molecule scaffold, could block the biosynthesis of mycolic acids, resulting in mycobacteria death
and leading to anti-tubercular drug discovery.
Methods: We have developed a new Python script that effectively uses CDK descriptor as the independent
variable and anti-tubercular bioactivity as the dependent variable in building and validating the best
QSAR model. The built QSAR model was further cross-validated by using the external test set compounds.
Then, the three algorithms, viz. multiple linear regression, support vector machine, and partial
least square classifiers, were used to differentiate and compare their r2 values.
Results: Our generated QSAR model via an open-source python program predicted well with external test
set compounds. The generated statistical model afforded the ordinary least squares (OLS) regression as R2
value of 0.514, F value of 5.083, the adjusted R2 value of 0.413, and std. error of 0.092. Moreover, the
multiple linear regression showed the R2 value of 0.5143, reg.coef_ of, -0.07795 (PC1), 0.01619 (PC2),
0.03763 (PC3), 0.07849 (PC4), -0.09726 (PC5), and reg.intercept_ of 4.8324. The performance of the
model was determined by the support vector machine classifier of sklearn, module and it provided a model
score of 0.5901. Further, the model performance was supported by a partial least square regression, and
it showed the R2 value of 0.5901. The model performance was validated, and the model predicted similar
values when compared to that of the train set, and the plotted linear curve between the predicted and actual
pMIC50 value showed all data to fall over the middle linear line.
Conclusion: We have found that the model score obtained using this script via three diverse algorithms
correlated well, and there was not much difference between them; the model may be useful in the design
of a similar group of pyrazoline analogs as anti-tubercular agents.