A Quality Assurance Reference Framework for Assessing Educational Data

Digital educational content is gaining importance as an incubator of pedagogical methodologies in formal and informal online educational settings. Its educational efficiency is directly dependent on its quality, however educational content is more than information and data. This paper presents a new data quality framework for assessing digital educational content used for teaching in distance learning environments. The model relies on the ISO2500 series quality standard and beside providing the mechanisms for multi-facet quality assessment it also supports organizations that design, create, manage and use educational content with the quality tools (expressed as quality metrics and measurement methods) to provide a more efficient distance education experience. The model describes the quality characteristics of the educational material content using data and software quality characteristics.


Introduction
Distance learning spearheaded the effort to overcome several barriers posed by the COVID-19 pandemic when many educational institutions and training organizations transferred their processes online [1]. Educational information sharing in such distance and online educational settings proved to go beyond the traditional understanding of data design and management. Quality of information and data used during online and distance learning played a primary role in raising business and educational value [2]. Besides being a collection of data (videos, text, Open Educational Resources, simulations, online lectures), they should be mapped to well specified pedagogical goals [3]. Educational content Management Systems and as such, it can be assessed by data quality models such as ISO25012 [6] [7] and ISO800-61 [8]. Albeit, such as an assessment lacks in the sense that contextual information is not considered. The literature on the quality assessment of educational content exhibits but a few works on the quality of educational information (i.e. digital educational content) and rather focuses on educational processes, educational goals, data volume, or specific aspects such as security [9] [10]. ISO/IEC 25012, 25024 and 25040 are a very good base for a data quality assessment framework. This paper presents a new framework for the assessment digital educational content used for teaching in distance learning settings. The model relies on the ISO2500 series quality standard and beside providing the mechanisms for multi-facet quality assessment it also supports organizations that design, create, manage and use educational content with the quality tools (expressed as quality metrics and measurement method) to provide a more efficient distance education experience. The model describes the quality characteristics of the educational material content re-using or configuring existing data quality characteristics and by designing new practical measurement methods (metrics). The framework aims to pave the way for a quality model for educational content used for teaching in distance learning environments.
The contribution of this paper lays in the novelty of the proposed data model. Its importance to data quality evaluation allows organizations to acquire a deeper knowledge of their educational processes while mitigating risks relating to early learner drop out, unfair assessment, reduced capacity to reach set educational goals, usability and accessibility.
The paper is structured as follows: Section 2 analyses the role of digital educational content (in the form of data and information) in the distance and online educational process. A short review of relevant quality standards is given is also provided. Section 3 presents the structure of the proposed framework while section 4 discusses practical issues for its application. Finally, conclusions are drawn in the final section.

Types of Educational Content
The educational material is generally divided into printed and digital material.
The printed material is composed of textbooks which are available to students in A. Stefani, B. Vassiliadis Journal of Data Analysis and Information Processing printed or electronic form. As digital material we consider the construction (artifact) which combines digital content (digital content), a means of diffusing the content (media) for a specific purpose or application (application). If the purpose is educational, then we are referring to Digital Educational Material (DES).
For example, in a digital educational material the content is the text, the technological means is an application (e.g. an LMS) while the pedagogical/didactic application is the context in which the application is used (e.g. for additional training in a specific topic).
Digital content is also provided in many types to end-users, each of which is stored in a set of different formats. The basic types of digital content are: 1) Text: text is considered to be a coherent set of characters, words or paragraphs, which may also contain static visual material. The text can be: • monolithic, which appears in the form of text files (doc, pdf, odt) and allows hierarchical or sequential access; • hypertext, i.e. a set of sections of text with references (links-links) both between its sections and to external texts and objects; • dynamic hypertext (wiki), which is a special category of hypertext and allows users to modify sections or links or add new ones. 2) Audio: Refers mainly to recordings that are available to the end user for playback. The audio material is divided into: • static audio playback, which requires the audio file to be uploaded to the user's computer and played locally.
• Streaming audio playback, which allows audio to be played from a remote web site.
3) Static visual material (Graphics): this is a static digital material based on visual (and not verbal or audio) representation. It includes photos, images, maps, diagrams, etc. 4) Audiovisual material (Video): This category refers to material that has been produced with the help of audiovisual recording media. Distinguished in: • interactive or non-interactive (interactive, non-interactive) material. • two-dimensional or three-dimensional (2D/3D animation) material. 5) Animation: this category refers to the material in the form of animation, which is not considered as audiovisual material. A prime example is animation that can represent processes or experiments that are difficult to videotape in a real environment. The final file types are similar to those of the audiovisual material. Distinguished in: • interactive or non-interactive (interactive, non-interactive) material. • two-dimensional or three-dimensional (2D/3D animation) material.

The Educational Content Lifecycle
The design of the proposed framework requires a careful examination of the characteristics of digital educational material in terms of the definition of data quality in general but also consider learning theories (because they are incorporated in the data) and the characteristics of distance education (where the "dis-A. Stefani, B. Vassiliadis Journal of Data Analysis and Information Processing appearing tutor" needs to be replaced by information flow within the data).
End users, the learner, are the main consumers of data and information in this context thus, educational content should exhibit several different quality facets including: • Content: which contains text, graphics, interactive objects (text, graphics, interactive objects), a pure data facet.
• Presentation, where the characteristics of the content are defined such as (format, fonts, colors, images) and the way of encoding the information (Encoding of information).
• Structure, which defines the way in which the educational material is organised in educational units and how the user/learner navigates and interacts with them.
• Educational context (Context), which defines the form of educational content in relation to the target/use group (workplace, training, technological training, learning pace).
• Pedagogy: where pedagogical strategies are incorporated in the context (e.g. learning by doing using simulations, knowledge building using collaboration project assignments etc.).

Standards and Quality
Specific standards can be used for the evaluation of the multifaceted attributes of digital educational content. These are: • the ISO19796 standard, which provides a common quality assurance framework for education (learning, education, training); • the ISO9126 standard which is primarily applied to software (and thus can be applied to the means of diffusion the information in an educational context). It can be configured and partially re-used for the internal and external evaluation of educational material as well as for the evaluation of the quality in use (the way that end users consume the information). The selection of standards contributes significantly to the proper development of educational material and its quality evaluation through the control of internal and external metrics and evaluation indicators. Our methodology for designing the proposed framework, combines elements of the two aforementioned standards for the creation of a hierarchical structure composed of quality characteristics and sub-characteristics, identical to those proposed by ISO in its standards. The hierarchy permits the organized assessment of the digital educational material, the full coverage of all facets, and the mapping of practical measurements methods (metrics) to quality dimensions. To date, there is no recorded case of an adoption or design of a template for the evaluation of educational material from a data/information point of view. At the international level, the evaluation of the educational material, mainly of the secondary education, is done by specialized Research Institutes, which operate autonomously and were established mainly by university pedagogical departments [11] [12] [13] [14]. At a higher educational level, relevant research is very limited. In all these efforts, A. Stefani, B. Vassiliadis Journal of Data Analysis and Information Processing two dimensions are appearing to have importance and as such, they are considered (albeit in much more detail and formality) in the design of the framework: 1) The Process dimension. Modern international research of educational materials, whether qualitative or quantitative, empirical or interpretive is usually focused on the process of designing and producing the data [15] [16]. Details concerning content characteristics (educational and technical) are also explored [17]. Since the focus is on the process and on the data per se, the objectives of evaluation based on such approaches is on: • the owners of the processes: those in charge should know the suitability and effectiveness of the material they use; • pedagogical principles derived from the literature, institutional or individual experience and regulations; • specific quality characteristics such as security, usability and accessibility.
2) The delivery medium dimension. Information Technology in Education-ICT (from the internet and electronic texts to specialized e-learning systems) is the main delivery medium of not only data but that of teaching practices, practices that be quite diverse depending on the target group at hand. From higher education to labor upskilling and competence learning in vocational education, ICT has been used in conjunction with traditional didactic models (such as information consumption, knowledge building, learning by doing etc.). The role of data and information is largely dependent on the didactical approach: in the simplest of models, data is transmitted in a one-way mode to the learner along with pieces of information (relating to the educational goals and key definitions). In more advanced models, data and information are used to create new data in the form of knowledge which in turn, enrich existing data collections (e.g. through metadata annotations). Finally, in more advanced models, data and information flow is bi-directional, created in significant volumes in real time and diverse in nature (e.g. via the use of social learning tools).
To the best of our knowledge, there are no concrete frameworks based in formal models for assessing digital educational content by considering both the data/information dimension and the educational/pedagogical perspective. Formality as expressed by ISO standards is, in our opinion, of paramount importance to the design of such frameworks. It encompasses significant experience in designing quality assurance processes from multiple domains. Besides the generality of the definitions of quality dimensions (flexibility), ISO standards also permit the use of a wealth of methods where appropriate (practicality, fit for use). The next section discusses these characteristics.

The Structure of Formal Standards
The role of standards is to provide guidance to ensure the quality of data or software. The standard is a documented contract that contains technical specifications or precise criteria that can be used as rules for their evaluation.
The structure of ISO standards is usually hierarchical. Qualitative characteris-tics are placed at the top of the hierarchical structure. They are classes of quality components that are non-overlapping. Each attribute contains (or is broken down into) a set of qualitative sub-attributes that are also non-overlapping. Due to the non-overlapping of the characteristics, it is implied that the relation of characteristics to sub-characteristics is a one to many. These two-levels generally describe quality components that cannot be measured by assigning absolute values during evaluation of a target; only descriptive values can be used. This is necessary to ensure the generality of the standards, i.e. their independence from specific techniques or technologies of implementation of the object of evaluation. The standard is independent of the technical methods for implementing quality, it only depicts what is needed for the target to be qualitative and not how to be built so as to be of high quality. The third level of the structure consists of metrics that have a "one to many" relationship to the sub-characteristics. Metrics may be assigned with absolute values and are measures-quality measurements. In many cases, their practical value is great as they can provide accurate information or instructions for the design and/or the development of quality items. But as with all absolute measures, a numerical value cannot be an accurate representation of reality. So, metrics in quality assessment should be used with caution.
One of the more important standards produced by ISO is the ISO25000 series, a standard that is used for the design of the framework.

The ISO 25000 Series Standard
The ISO25000 series or Software Product Quality Requirements and Evaluation (SQuaRE) is the new version of standards for the quality of software systems. It was designed to replace ISO9000 and ISO10000 series standards with the aim of homogenizing and eliminating quality assessment gaps. The series' organizational structure is presented in Table 1.
Of particular importance to our research is ISO/IEC 25012:2008, which sets the general data quality standards for data stored in a structured form within a computer system. It can be used to set data quality requirements, data quality measures, or design and conduct data quality assessments. It can be used, for example, to set data quality requirements during production, acquisition and completion processes, to identify quality assurance criteria that are useful for reusing, verifying and improving data or to reorganize, evaluate and improve the data, to assess the compliance of the data with legislation and/or requirements.
The standard categorizes quality characteristics into fifteen characteristics from two perspectives: inherent and system dependent. It is intended to be used in conjunction with the other parts of the 25000 series and especially with ISO/IEC 25010. The ISO 25012 data quality model includes the following quality dimensions: 1) Internal Data Quality which refers to: Quality Management Division The standards in this category define all common models, terms and concepts of the 25000 series. They also provide requirements and guidelines for the management of requirements, specifications and evaluation of software products. 2501n Quality Model Division The standards in this category present detailed quality models for software, quality in use, and data. They also provide practical instructions for implementing the models. We argue that the ISO25000 series standard is an appropriate basis for designing the framework because it includes a quality structure for assessing data and the software that manages them.

What Should Be Assessed and How?
The concept of quality is neither strictly measurable nor clearly defined in the field of distance education because the achievement of high-quality results through the diffusion of data/information, as described in previous sections, presupposes a quality model, quality assurance procedures, adherence to the quality cycle and of course the commitment of the human factor. In such as multidimensional context, a clear description of which components should be assessed and how is needed. An appropriate framework must be designed to determine and evaluate the quality of the educational material defined as the product of the educational methodology and process.
The definition and initial analysis of Quality characteristics from ISO standards helps towards the design and development of better-quality systems and products. Standards define quality properties (for example a software must have In addition, they state what qualities a quality software should pose and not how it will be built to have those qualities. The standards are therefore general enough to maintain their correctness regardless of how they are implemented.
The choice of a model for capturing the quality of the educational data/information is particularly difficult since the educational material as a product is a complex creation of the distance education system.
The following steps were initially taken to define the framework: • Step 1: Analysis of key components of the educational process. In summary, the basic components of a framework model, which interact with each other and compose its reference structure are: • quality features; • quality sub-characteristics; • metrics; • quality measurement data; • measurement functions and; • quality measurements.
For example, Figure 1 shows the correlations between the quality model, its quality characteristics and measurement tools. The quality measurement model (as defined in ISO25010) defines the inherent properties of the software that manages the data, which can be distinguished quantitatively or qualitatively as characteristics. Quality features are inherent properties that contribute to quality. Quality characteristics are categorized into one or more (sub) characteristics. Quality characteristics are measured by applying a measurement method. A measurement method is a logical sequence of operations used to quantify an attribute in relation to a particular scale. The result of applying a measurement method is called quality measure element. Qualitative characteristics and sub-characteristics can be quantified by applying measure functions to these elements. A function is essentially an algorithm used to combine elements. The result of applying a measurement function is called a quality measure. The quality measurement data become quantified reflecting the status of quality characteristics and sub-characteristics. More than one meter can be used to measure an attribute or sub-attribute.

Design Rationale
The The quality model of the educational material is based on the quality of the data and the quality of the software (delivery method). Therefore, the quality model is a synthesis of the quality characteristics of the two standards, that are properly adapted.
The hierarchical structure of the quality framework is organized into 5 characteristics (level 1) and 27 sub-characteristics (level 2), as depicted in Table 2.
The interpretation of level 1 characteristics is aligned with the particularities exhibited by digital educational material: Confidentiality, Integrity, Non-repudiation, Accountability, Authenticity, Security Compliance 1) Functionality is defined as the ability of educational material to meet functional requirements and goals. Functional requirements for data and software relate to the set of functional requirements (or capabilities) of the user served by the training material. The importance of the above feature is summarized in the question "What functional requirements of the user does the educational material meet?" This means that the educational material should serve the educational goals of the set educational context. For example, including an index of terms in the educational material of a digital book, improves the functionality of the book. The use of proper internal hyperlinks links in an educational wiki, supports both user navigation in the educational material and pinpoints which elements (e.g. concepts, terms) should be further elaborated (setting the scope of the study and respecting learner workload).
2) Reliability is defined as the degree to which the educational material performs specified functions under specified educational and technical conditions for a specified period of time. Reliability emphasizes on the question "When does the educational material work properly and is acceptable to learners?". This is question is twofold: from an educational perspective, the educational material should be free of errors (e.g. vague definitions, unnecessary workload, examples out of scope etc.); from a technical perspective, is should be reliably delivered to the end users under the medium used (e.g. an LMS) and/or its format (audible video, text and images that are readable etc.).
3) Usability is defined as the degree to which the digital educational material can be used by specified learners to achieve specified educational and technical goals with effectiveness, efficiency and satisfaction in a specified context. For example, text should be readable by learners with visual impairments, scrolling in a page of an educational wiki should be reduced to a maximum (e.g. three pages) or storing/downloading of data should be possible in popular formats. 4) Efficiency is the ability of the educational data to help learners reach their objectives in a given period of time and within a specified educational context. Efficiency is achieved when educational data are accurate, up to date, within the scope of the educational topic (especially when the didactical approach is adequality realized). For example, when bibliographic data are up to up to date, then the educational goal of providing resources for further reading or research can be better served.
5) The quality characteristic of maintainability refers to the ability of educational material designers to modify it for use (or re-use) in different educational or technical contexts. Educational context maintainability may refer to its re-use for teaching educational topics different than those it was initially designed for (this can be accomplished for example, by designing modular content).
6) The quality feature of Portability refers to the ability of digital educational material to be adaptable from one technological environment to another. Portability refers to the ability of reusing or reconfiguring technical attributes of the date (e.g. format) for delivery through other software platforms (e.g. using another Learning Management System).

Practical Application
The metrics measure the quality of the educational material with quantitative indicators and can be measured at different stages of its lifecycle (inception, design, delivery etc.), using different methods (statistical, LMS, log analysis, surveys) and by different stakeholders (e.g. designers, tutors etc.). The measurement of the metrics and the correct interpretation of the measurable values is a critical point for the transition of data/information to the next stage of the cycle or for the feedback of the cycle. Of particular important is internal evaluation step: the point where data/information is to be delivered to final to external users (the learners).
The current process of development and evaluation of data/information in the form of digital educational content lags behind in the definition of quality metrics linked to learning characteristics and learning objectives. Usually, more emphasis is placed on technical characteristics and quantitative characteristics such as volume. By defining, measuring, evaluating and interpreting the metrics correctly, an answer can be provided not only to the general question "is the data/information of high quality?" but to more detailed questions such "which specific components lack quality" and "why are they of low quality". Thus, the framework provides leads to the causes of low quality per component.
The metrics defined and analyzed below were the result of two different independent approaches to the development of quality models.
1) Initially with a top-down approach: the high-quality objectives were set based on the ISO 25000 quality standards and the measurements needed to support them were produced.
2) Bottom-up approach: starting with measurable observations in the educational material, the quality objectives are derived.
The analysis of metrics is done through tables (according to ISO standards) that describe in columns the following characteristics: The interpretation and use of metrics are highly dependent on the context of application. For example, the "Currency of references per section" metric, under the Functionality characteristic-Currency sub-characteristic, measures whether the references included in a specific part of the educational material (e.g. a section in a textbook or a video lecture) are state of the art. The values (max-min) within which quality of data is maximized is set by the organization using the framework. These values may be derived from regulations, historical data, organizational or national best practices.

A. Stefani, B. Vassiliadis
An example of a metric that is even more highly dependent on the educational context is "Workload per section" metric under the Efficiency-Time Behavior quality sub-characteristic. Although this metric seems to fit better to be used in software assessment, it measures a very important aspect relating to the use of educational material: the amount of workload (effort) a learner needs to comprehend it a unit of educational content (e.g. a video lecture). In this example, comprehension is different from observation since the context is educational and the goal is to learn and not just to consume information. This metric is usually measured in hours per time unit (e.g. hours per week) when referred to volumes of information but can also be applied to smaller educational units (e.g. a book chapter, a video-lecture etc.). In the case of a video lecture (a most common and effective way of teaching), the workload is not necessarily equal to the duration of the data stream itself. The learner may need to re-run specific segments or study additional resources to reach the set educational goals. Again, the upper and lower limits are to be set by the organization providing the data and are usually derived for the average learner (learning pace is another important parameter in this case). The mode of learning also influences the limits (e.g. full-time, part time or supervised vs. self-paced learning). Table 3 presents examples of metrics (organized by characteristic and sub characteristic) that are derived from the framework.
The application of the metrics optimizes the use of current data use or facilitating the design of improved versions by providing practical guidelines and limits within which data quality is maximized in a given organizational and educational context.
A limitation of the current research is that is has not been tested on real data so as ranges for the metrics to be derived. It is however important to note that the ranges within which metrics are meant to take their values from, depend on the context, the organizational, educational and operational objectives of the organisation using the framework. However, some best practices may help towards the efficient application of the framework. The following Table 4 presents range values for some of the key metrics of the framework.

Conclusions
Data becomes information in a given context and as such, its quality attributes need to be seen under a specific lens. Information processing also differs in the A. Stefani, B. Vassiliadis Journal of Data Analysis and Information Processing sense that delivery to the user is highly connected to the way that data is consumed. Educational content is a type of information with an increased added value but also with specific requirements for its optimization and design. The latter, are facilitated by quality assessment, ideally supported by formal standards. Due to the lack of standards specific to the educational context, we argue that this paper contributes towards a practical assessment tool in the form of a framework. We presented a framework for the assessment of educational material and corresponding metrics for its practical application, based on the hierarchical structure of the ISO25000 series standard.
The contribution of this work is twofold: it provides a new approach in regarding educational data as contextual information and presents a practical, configurable method for assessing the quality of these data. The framework can be used as basis for extending the set of metrics briefly presented in section 4 and deriving border values within which high quality of data is feasible.