Validation and Error Detection in Relational Data using a Hybrid Rule-based System
Abstract
Keywords
Full Text:
PDFReferences
R. Miller, S. H. M. Chan, H. Whelan, and J. Gregório, “A Comparison of Data Quality Frameworks: A Review,” Big Data and Cognitive Computing, Vol. 9, No. 4, Apr. 2025, DOI: 10.3390/bdcc9040093.
M. Ibrahim, Y. Helmy, and D. Elzanfaly, “Data Quality Dimensions, Metrics, and Improvement Techniques,” Future Computing and Informatics Journal, Vol. 6, No. 1, pp. 1–12, 2021, DOI: 10.54623/fue.fcij.6.1.3.
F. Ridzuan and W. M. N. W. Zainon, “A Review on Data Quality Dimensions for Big Data,” Procedia Comput. SCI., Vol. 234, No. 1, pp. 341–348, 2024, DOI: 10.1016/j.procs.2024.03.008.
J. Wang, Y. Liu, P. Li, Z. Lin, S. Sindakis, and S. Aggarwal, “Overview of Data Quality: Examining the Dimensions, Antecedents, and Impacts of Data Quality,” Journal of the Knowledge Economy, Vol. 15, No. 1, pp. 1159–1178, Mar. 2024, DOI: 10.1007/s13132-022-01096-6.
J. Merino, I. Caballero, B. Rivas, M. Serrano, and M. Piattini, “A Data Quality in Use Model for Big Data,” Future Generation Computer Systems, Vol. 63, pp. 123–130, 2016, DOI: 10.1016/j.future.2015.11.024.
A. Yulianto and Firmansyah, “Data Improvement Life Cycle untuk Meningkatkan Kualitas Data: Studi Kasus Data Survey Kesehatan Mental,” Remik, Vol. 9, No. 2, pp. 474–483, 2025, DOI: 10.33395/remik.v9i2.14643.
P.-O. Côté, A. Nikanjam, N. Ahmed, D. Humeniuk, and F. Khomh, “Data Cleaning and Machine Learning: A Systematic Literature Review,” Automated Software Engineering, Vol. 31, No. 54, May 2024, DOI: 10.1007/s10515-024-00453-w.
M. Aidjili, “Implementasi Trigger dan View untuk Mendukung Konsistensi dan Efisiensi Pengolahan Data pada Sistem Database (Study Kasus: Toko Nanda Pekalongan),” j.komputer, j.informasi, j.teknologi, Vol. 5, No. 2, pp. 1–12, Dec. 2025, DOI: 10.53697/jkomitek.v5i2.39.
X. Chu, I. F. Ilyas, S. Krishnan, and J. Wang, “Data Cleaning: Overview and Emerging challenges,” in Proceedings of the ACM SIGMOD International Conference on Management of Data, Association for Computing Machinery, Jun. 2016, pp. 2201–2206. DOI: 10.1145/2882903.2912574.
A. R. Kaufman and A. Klevs, “Adaptive Fuzzy String Matching: How to Merge Datasets with Only One (Messy) Identifying Field,” Political Analysis, Vol. 30, No. 4, pp. 590–596, Oct. 2022, DOI: 10.1017/pan.2021.38.
N. Elmobark, “A Comparative Analysis of Python Text Matching Libraries: A Multilingual Evaluation of Capabilities, Performance and Resource Utilization,” International Journal of Environment, Engineering and Education, Vol. 7, No. 1, pp. 48–60, Apr. 2025, DOI: 10.55151/ijeedu.v7i1.188.
Y. Gao, C. Ge, X. Miao, H. Wang, B. Yao, and Q. Li, “A Hybrid Data Cleaning Framework using Markov Logic Networks,” arXiv preprint, Mar. 2019, DOI: 10.48550/arXiv.1903.05826.
S. Chaudhuri, K. Ganjam, V. Ganti, and R. Motwani, “Robust and Efficient Fuzzy Match for Online Data Cleaning,” in Proceedings of the 2003 ACM SIGMOD international conference on Management of data, Association for Computing Machinery, Jun. 2003, pp. 313–324. DOI: 10.1145/872757.872796.
J. Stoikov, A. Nikolova, and V. Georgiev, “Advanced Record Linkage Techniques for Improving the Data Matching between Cultural Heritage Datasets from Different Sources,” TEM Journal, Vol. 11, No. 4, pp. 1906–1914, Nov. 2022, DOI: 10.18421/TEM114-59.
E. Eessaar, “The Usage of Declarative Integrity Constraints in the SQL Databases of Some Existing Software,” in Software Engineering and Algorithms, R. Silhavy, Ed., Springer International Publishing, Jul. 2021, pp. 375–390. DOI: 10.1007/978-3-030-77442-4_33.
F. M. Wibowo, M. Z. Nafan, M. A. Gustalika, H. Fernando, M. Hussain, and N. A. Sahadun, “Lightweight String Similarity Approaches for Duplicate Detection in Academic Titles,” Journal of Informatics and Web Engineering, Vol. 4, No. 3, pp. 416–426, Oct. 2025, DOI: 10.33093/jiwe.2025.4.3.25.
A. K. Elmagarmid, P. G. Ipeirotis, and V. S. Verykios, “Duplicate Record Detection: A Survey,” IEEE Trans. Knowl. Data Eng., Vol. 19, No. 1, pp. 1–16, 2007, DOI: 10.1109/TKDE.2007.250581.
I. F. Ilyas and X. Chu, “Trends in Cleaning Relational Data: Consistency and Deduplication,” Foundations and Trends in Databases, Vol. 5, no. 4, pp. 281–393, Oct. 2015, DOI: 10.1561/1900000045.
L. Cruz-Filipe, M. Franz, A. Hakhverdyan, M. Ludovico, I. Nunes, and P. Schneider-Kamp, “repAIrC: A Tool for Ensuring Data Consistency by Means of Active Integrity Constraints,” arXiv preprint, Oct. 2015, DOI: 10.5220/0005586400170026.
O. Azeroual, M. Jha, A. Nikiforova, K. Sha, M. Alsmirat, and S. Jha, “A Record Linkage‐based Data Deduplication Framework with DataCleaner Extension,” Multimodal Technologies and Interaction, Vol. 6, No. 4, p. 27, Apr. 2022, DOI: 10.3390/mti6040027.
N. Barlaug and J. A. Gulla, “Neural Networks for Entity Matching: A Survey,” ACM Trans. Knowl. Discov. Data, Vol. 15, No. 3, pp. 1–37, Apr. 2021, DOI: 10.1145/3442200.
J. Fu, X. Han, X. Wan, and W. Wang, “PAT: Pattern-Perceptive Transformer for Error Detection in Relational Databases,” arXiv preprint, Sep. 2025, DOI: 10.48550/arXiv.2509.25907.
I. F. Ilyas and T. Rekatsinas, “Machine Learning and Data Cleaning: Which Serves the Other?,” Journal of Data and Information Quality, Vol. 14, No. 3, Sep. 2022, DOI: 10.1145/3506712.
M. Herschel, R. Diestelkämper, and H. Ben Lahmar, “A Survey on Provenance: What For? What Form? What From?,” The VLDB Journal, Vol. 26, pp. 881–906, Oct. 2017, DOI: 10.1007/s00778-017-0486-1.
E. Cahyaningsih, A. Rinjatmoko, and W. P. Sari, “Pengukuran Kualitas Data menggunakan Framework Total Data Quality Management: Studi Kasus Kementerian Hukum dan Hak Asasi Manusia Rutan Klas I Jakarta Pusat,” Jurnal Teknologi Informasi dan Ilmu Komputer, Vol. 12, No. 1, pp. 121–132, Feb. 2025, DOI: 10.25126/jtiik.2025129178.
F. K. Dankar, M. K. Ibrahim, and L. Ismail, “A Multi-Dimensional Evaluation of Synthetic Data Generators,” IEEE Access, Vol. 10, pp. 1–1, Jan. 2022, DOI: 10.1109/ACCESS.2022.3144765.
DOI: https://doi.org/10.32520/stmsi.v15i6.6561
Article Metrics
Abstract view : 0 timesPDF - 0 times
Refbacks
- There are currently no refbacks.

This work is licensed under a Creative Commons Attribution-ShareAlike 4.0 International License.







