Ensuring data quality is critical for reliable decision-making, analytics, and machine learning applications. Traditional data validation methods often depend on manually defining quality rules, a process that is time-consuming, error-prone, and difficult to scale. Great Expectations (GEs) is a widely adopted framework for data validation; however, crafting its rules manually introduces challenges in scalability, domain adaptability, and syntactic complexity.
This study explores the use of Large Language Models (LLMs) to automate the conversion of natural language data quality requirements into structured GEs validation rules. We fine-tune the LLaMA-3.2-3B-bnb-4bit model using Low-Rank Adaptation (LoRA) on real-world datasets sourced from the telecommunications and IT sectors. To evaluate the effectiveness of this approach, we apply standard NLP metrics ROUGE, BLEU, METEOR, and BERTScore, alongside practical QA metrics such as rule completeness and manual effort reduction.
Our results demonstrate that the fine-tuned LLM significantly outperforms generic models, generating rules with greater fluency, accuracy, and domain alignment.