An-Najah Staff

Publication Type

Original research

Authors

Small-scale sentiment classification often suffers

from data scarcity, which limits the generalization ability of

the models. This study evaluates and compares the effectiveness

of three data augmentation strategies: Easy Data Augmenta-

tion (EDA), back-translation, and contextual token substitution

(nlpaug-style), with both traditional machine learning classifiers

(Logistic Regression, Random Forest) and transformer-based

models (BERT). We perform a comprehensive empirical com-

parison with low-resource sentiment datasets by summarizing

the results of recent studies and performing targeted head-to-

head experiments. Our findings indicate that all augmentation

methods improve performance. Contextual augmentation yields

the most consistent gains for BERT models, while EDA and back-

translation provide greater benefits for traditional classifiers.

These insights help guide the selection of data augmentation

techniques tailored to model type and dataset size, filling a critical

gap in research on data augmentation for sentiment classification

on small datasets.

Journal