Phishing emails remain a significant concern and a growing cybersecurity threat in online communication. They often bypass traditional filters due to their increasing sophistication. This study presents a comparative evaluation of machine learning (ML) models and transformer-based large language models (LLMs) for phishing email detection, with embedded URL analysis. This study assessed ML training and LLM fine-tuning on both balanced and imbalanced datasets. We evaluated multiple ML models, including Random Forest, Logistic Regression, Support Vector Machine, Naïve Bayes, Gradient Boosting, Decision Tree, and K-Nearest Neighbors, alongside transformer-based LLMs DistilBERT, ALBERT, BERT-Tiny, ELECTRA, MiniLM, and RoBERTa. To further enhance realism, phishing emails generated by LLMs were included in the evaluation. Across all configurations, both the ML models and the fine-tuned LLMs demonstrated robust performance. Random Forest achieved over 98% accuracy in both email detection and URL classification. DistilBERT obtained almost as high scores on emails and URLs. Balancing the dataset led to slight accuracy gains in ML models but minor decreases in LLMs, likely due to their sensitivity to majority class reductions during training. Overall, LLMs are highly effective at capturing complex language patterns, while traditional ML models remain efficient and require low computational resources. Combining both approaches through a hybrid or ensemble method could enhance phishing detection effectiveness.
