Title:A Developed Model Based on Machine Learning Algorithms for Phishing Website Detection
Volume: 18
Issue: 2
Author(s): Hussein Abdel-Jaber, Hussein Al Bazar*Muawya Naser
Affiliation:
- Faculty of Computer Studies, Arab Open University (AOU), Riyadh, Saudi Arabia
Keywords:
Phishing websites, machine learning, phishing detection, classification metrics, classification report, cybersecurity.
Abstract:
Introduction: Users are accessing websites for many purposes, such as obtaining information
about a particular topic, buying items, accessing their accounts, etc. Cybercriminals use
phishing websites to attain the sensitive information of the users, like usernames and passwords,
credit card details, etc. Detecting phishing websites helps in protecting the information and the
money of people. Machine learning algorithms can be applied to detect phishing websites.
Methods: In this paper, a model based on various machine learning algorithms is developed to
detect phishing websites. The machine learning algorithms used in this model are Decision
Tree, Random Forest, Extra Trees, K-Nearest Neighbors, Multilayer Perceptron and Support
Vector Machine. The dataset of phishing websites is taken from the Kaggle website. The algorithms
mentioned above of the developed model are compared together to identify which
algorithm has better classification results.
Results: The extra trees algorithm offers the best results for accuracy, precision, and F1-
Score. This paper also compares the developed model with a previous model that uses the
same dataset and relies upon decision tree, random forest, and support vector machine to determine
which model has better classification report results. The developed model, depending
on the Decision Tree and SVM, offers better classification results than those of the previous
models. The developed model is compared with another preceding model relying upon
Decision Tree and Random Forest algorithms to determine which model generates better results
for accuracy, precision, recall/sensitivity, and F1-Score.
Conclusion: The developed model, depending on the Decision Tree, presents better results
for accuracy, recall, and F1-Score than the results of accuracy, sensitivity, and F1-Score for
the preceding model based on the Decision Tree.