TITLE:
Data Evaluation in Artificial Intelligence
AUTHORS:
Philip de Melo
KEYWORDS:
Artificial Intelligence, Accuracy, PM GenAI Algorithm
JOURNAL NAME:
Journal of Data Analysis and Information Processing,
Vol.13 No.3,
August
7,
2025
ABSTRACT: High-quality data is essential for hospitals, public health agencies, and governments to improve services, train AI models, and boost efficiency. However, real data comes with challenges: strict privacy laws, high storage costs, legal constraints, and issues like bias or incompleteness. These can reduce the reliability of AI systems. As a result, artificial datasets are gaining importance. Synthetic and augmented data offer alternatives, yet their differences and potential are not fully understood. Data quality refers to how well a dataset is suited for its intended purpose in an AI pipeline. Key attributes include Accuracy—How correct and error-free the data is; completeness—Whether all required data fields are present; Consistency—Uniformity across datasets (e.g., same format or scale); Timeliness—Relevance of the data in time (significant in real-time systems), Validity—Whether the data follows defined formats or constraints; Uniqueness—Absence of duplicate records. This paper examines how both types of data are generated and used, showcasing their characteristics through practical examples.