Smart cities generate massive volumes of heterogeneous data from sources such as traffic systems, environmental sensors, public transport, and citizen applications. Ensuring the quality of this urban data is crucial for reliable analytics, service optimization, and policy-making. However, data validation in smart city systems remains largely manual, error-prone, and non-scalable due to frequent schema evolution and variable data standards across departments.
In this paper, we utilize the DQGen framework for automating data quality validation in smart city environments. Leveraging metadata extracted from open urban datasets, the framework maps standard quality dimensions—such as completeness, consistency, validity, and timeliness—to executable validation rules using Great Expectations. The generated scripts can be integrated into city dashboards or batch pipelines, allowing for continuous, transparent, and repeatable validation across evolving datasets.
We validate the framework using datasets from a municipal open data portal, which include traffic flow, air quality, and public transportation usage records.