Enhancing Sentiment Classification on Small Datasets through Data Augmentation and Transfer Learning”
Publication Type
Original research
Authors

Small-scale sentiment classification often suffers

from data scarcity, which limits the generalization ability of

the models. This study evaluates and compares the effectiveness

of three data augmentation strategies: Easy Data Augmenta-

tion (EDA), back-translation, and contextual token substitution

(nlpaug-style), with both traditional machine learning classifiers

(Logistic Regression, Random Forest) and transformer-based

models (BERT). We perform a comprehensive empirical com-

parison with low-resource sentiment datasets by summarizing

the results of recent studies and performing targeted head-to-

head experiments. Our findings indicate that all augmentation

methods improve performance. Contextual augmentation yields

the most consistent gains for BERT models, while EDA and back-

translation provide greater benefits for traditional classifiers.

These insights help guide the selection of data augmentation

techniques tailored to model type and dataset size, filling a critical

gap in research on data augmentation for sentiment classification

on small datasets.

Journal
Title
Discover Artificial Intelligence
Publisher
Springer’s nature
Publisher Country
Switzerland
Indexing
Scopus
Impact Factor
6.0
Publication Type
Both (Printed and Online)
Volume
--
Year
--
Pages
--