Saturday, 13 April 2024

Spam Detection by Combining Bayesian Method and Regression Analysis | Chapter 5 | Research Updates in Mathematics and Computer Science Vol. 3

 This study proposes a new method that utilizes the correlation structure between the number of words in the mail and the Bayesian score. Spam mails usually do not have a stable style and features. Spammers who send such mails, go on changing the features. The most often used statistical filter for email filtering is the Naive Bayesian filter. However, the training data and the word corpus that the filter designer utilized will determine the filter's design.  A new mail with unknown nature is classified into spam (unsolicited mail) or ham (legitimate mail) basing on a score by combining conditional probabilities of tokens in the mail. The statistical behavior of this score indicates some interesting features, which can be explored to improve performance of the filter. We report the results of an experiment using Enron data set and highlight the advantages of the new filter. We also propose a new method of testing the model using random data sets.


Author(s) Details:

K. Srikanth,
Department of Computer Science, SV University, Tirupati, India.

S. Ramakrishna,
Department of Computer Science, SV University, Tirupati, India.

K. V. S. Sarma,
Department of Statistics, SV University, Tirupati, India.

Please see the link here: https://stm.bookpi.org/RUMCS-V3/article/view/13983

No comments:

Post a Comment