Comparison of machine learning techniques in email spam detection

  • This report compares the performance of three machine learning techniques for spam detection including Random Forest (RF), k-Nearest Neighbours (kNN) and Support Vector Machines (SVM).
  • The idea of automatically classifying spam and non-spam emails by applying machine learning methods has been popular in academia and has been a topic of interest for many researchers.
  • This comparison is a real-time process, and therefore the main drawback of this approach is that the kNN algorithm must compute the distance and sort all the training data for each prediction, which can be slow if given a large training dataset (James, Witten, Hastie, & Tibshirani, 2013, pp.
  • We determine from the results that k-Nearest Neighbours (kNN) and Support Vector Machine (SVM) perform similar weak regarding accuracy and Random Forest (RF) outperforms both.
  • Therefore due to its design Random Forest performs relatively well "out-of-the-box" compared to k-Nearest Neighbours and Support Vector Machine.

save | report | share on

Top 200 comments