This paper presents an experiment on spam filters using Logistic Regression, where the filter's effectiveness is influenced by the characteristics of the token frequency distribution. The focus of the discussion is on the importance of data cleaning before model development. It emphasizes the necessity of excluding inconsistent features prior to their inclusion in the model. The experiment utilizes the UCI dataset, which shows the percentage of token counts in each email. The model’s discriminative performance is evaluated through the use of an ROC curve. The use of the UCI dataset provided valuable insights into how token counts influence spam classification. The ROC curve analysis reinforced the importance of evaluating model performance comprehensively, offering a clear view of its discriminative power.
Author (s) Details
K. Srikanth
Department of Data Science, Malla Reddy University, Telangana, India.
Please see the book here:- https://doi.org/10.9734/bpi/mcsru/v2/3819
No comments:
Post a Comment