An-Najah Staff

Publication Type

Conference Paper

Authors

MOAMIN BURHAM JAMIL ABUGHAZALA

Ensuring data quality is critical for reliable decision-making, analytics, and machine learning applications. Traditional data validation methods often depend on manually defining quality rules, a process that is time-consuming, error-prone, and difficult to scale. Great Expectations (GEs) is a widely adopted framework for data validation; however, crafting its rules manually introduces challenges in scalability, domain adaptability, and syntactic complexity.
This study explores the use of Large Language Models (LLMs) to automate the conversion of natural language data quality requirements into structured GEs validation rules. We fine-tune the LLaMA-3.2-3B-bnb-4bit model using Low-Rank Adaptation (LoRA) on real-world datasets sourced from the telecommunications and IT sectors. To evaluate the effectiveness of this approach, we apply standard NLP metrics ROUGE, BLEU, METEOR, and BERTScore, alongside practical QA metrics such as rule completeness and manual effort reduction.
Our results demonstrate that the fine-tuned LLM significantly outperforms generic models, generating rules with greater fluency, accuracy, and domain alignment.

Conference

Conference Title: 51st Euromicro Conference Series on Software Engineering and Advanced Applications (SEAA) 2025
Conference Country: Italy
Conference Date: Sept. 10, 2025 - Sept. 12, 2025
Conference Sponsor: Springer

Quality by Prompt: LLM-Powered Transformation of Data Quality Requirements into Great Expectations