Analysis of Students’ Misconception Based on Rough Set Theory

The study analyzed students’ misconception based on rough set theory and combined with interpretive structural model (ISM) to compare students’ degree of two classes. The study then has provided an effective diagnostic assessment tool for teachers. The participants were 30 fourth grade students in Central Taiwan, and the exam tools were produced by teachers for math exams. The study has proposed three methods to get common misconception of the students in class. These methods are “Deleting conditional attributes”, “Using Boolean logic to calculate discernable matrix”, and “Calculating significance of conditional attributes.” The results showed that students of Class A had common misconceptions but students of Class B had not common misconception. In addition, the remedial decision-making for these two classes of students is pointed out. While remedial decision-making of two classes corresponded to structural graph of concepts, it can be found the overall performance of the Class B was higher than Class A.


Introduction
"Misconception" is also called as the "Alternative Conception" or the "Alternative Frame".A number of scholars have shown that before a formal education, students have possessed the systematic structure of science phenomena.There is a basic difference between systematic structure and learning knowledge structure [1].The Ministry of Education [2] of Taiwan has emphasized that teachers should investigate reason of the mistake which student easily make in learning.In recent years, the diagnostic teaching has been developing.Many experiments based on diagnostic teaching have been implemented [3][4][5][6].
There are many methods to do cognitive diagnosis.In quantitative approaches, although the Item Response Theory is frequently used, the requirement of a large sample size is existing limitation of the theory.In quailtative approaches, interview is common way.However, teachers are often unable to do individual remedial teaching because of a limited number of teachers.Moreover, in order to identify misconceptions in learning, then make an accurate diagnosis of student's problems, a number of scholars have designed various kinds of diag-nostic tests [7,8].However, the development of one validity and reliability testing program requires the process of interview, paper-pencil test items design, pretest, and revise.This process consumes time, budget as well as resource.Therefore, it is difficult to operate above methods in teaching site and teachers cannot receive feedback from the diagnostic exams on time.
In order to overcome the problems mentioned above, unlike previous psychometric research, this study has analyzed students' response based on rough set theory.Rough set theory is a very practical subject.It is the mathematical tools and proposed in 1982 by Pawlak to process vagueness and uncertainty.Rough set theory has been rapidly developed in recent years.It is an important method in artificial intelligence and cognitive science and often used in the medical field and industrial management.For example, Yeh and Cheng [9] applied rough set theory to classify appendicitis.They found that through the approximation sets and reduct, the multiattribute diseases can be well classified.In addition, rough set theory has fruitful results in many fields, particularly in approximate reasoning [10], mathematical logic analysis and reduct [11][12][13], building of predictive models [14], decision support system [15][16][17][18][19] and other areas.Many studies have shown that the use of rough set theory not only formulate a clear decision-making projects [12], but also enhance the effectiveness of the research while doing optimization [19,20].Among them, the research related to education of Qu and Wang [16] provided a basis of personalized teaching strategies in distance learning website by analysis of reduct and attribute significance.
Rough set theory assume analyzed objects set implicit knowledge itself and knowledge is considered to be a classification ability for the object [21].The main aim of rough set theory is to retrieve the rules in information system though difference set of lower and upper approximations in set theory and the concept of conditional probability [22][23][24][25][26]. Information system is composed from various objects and their corresponding attributes.The rules can describe each object under the conditional attribute which can be classified.Rough set theory can get the same knowledge as the original decision-making system without losing any information.At this time, the state has a minimum condition attributes, and maintain the simplest form which has same classification ability as the original decision-making system [21].Relative to the probability statistics, the empirical theory of fuzzy sets and other mathematical tools, rough set theory not only can find objects relationship but also have an advantage which other theories have not.Statistic needs the probability distribution, the empirical theory needs basic trust given, fuzzy sets needs membership function, but rough sets do not obey any assumptions.It means that rough sets do not need quantity description given in advance of certain characteristics or attributes or probability distribution related statistics [21].
In view of the above reasons, although the rough set theory is rarely used in education, but its characteristics (Rough set theory does not obey any assumptions and can get the same knowledge as the original decisionmaking system under not lose any information.)are very suitable for small class teaching in the stage.Therefore, the study analyzed students' misconception based on rough set theory and combined with interpretive structural model (ISM).The sample of the study was 30 fourth grade students in Central Taiwan, and the exam tools were produced by teachers for math exams.This study analyzed individual misconceptions of two classes, set appropriate remedial teaching decision-making according various degree, and combined ISM structural graph to compare degree of two classes in order to provide teachers an effective tool while doing teaching diagnosis.

Fundamental Theory
In this section, researchers do a brief introduction of fundamental theory used, including Rough Set Theory, and Interpretive Structural Modeling.

Rough Set Theory
Rough set theory is proposed in 1982 by Pawlak.It is a mathematical tool to deal with problems of vagueness and uncertainty [21].It does not need to give quantity description or statistical probability distribution of some characteristics or attributes in advance and do not have to obey any assumptions.Rough set theory assumes that objects set analyzed itself imply the knowledge and knowledge is considered to be a classification ability of the object.The main purpose is to extract rules which can describe each object classified under which attributes from information system, the rules [27,28].The following are the important concepts of rough set theory [21]:

Information System (IS)
Generally, the information posed by objects of study and its characteristics is known as an information system (IS), also known as the approximation space.Formally, information systems is a four-tuple, defined as  .If attribute sets R can be further broken down into condition attribute C and decision attribute D, and when it satisfies , Information set can also be called a decision system or decision table (Table 1).If decision table contains only one decision attribute, it is called a single decision-making.Otherwise, it is called multi-decision.

 
ind X represented and is an equivalence relation.U is divided into a finite number of equivalent set.It is indiscernibility between objects in each equivalent set.The first step of classification in the rough set is using   ind X to construct the basic set.It

Upper Approximation and Lower Approximation
Positive domain   R pos X or lower approximation of X is a set of elements which completely determined to classify set X in U under R.It is defined as or Upper approximation of X is a set of elements which could not completely determined to classify set X in U under R.It is defined as Boundary is a set of elements which probably classify set X in U under R.It means the set neither completely determined to classify set X nor completely determined to classify set U X  .It is defined as

Dispensable and Independent
, it is said r in R can be independent.

Dependents and Significant of Attributes
it is said D is not completely derivable under C. Dependent of attribute can decide significant of attribute.The usual practice is deleteing a attribute i C and calculating the impact of positive domain under C.It is defined as

Reduct and Core of Rough Sets
For decision system given, if R C  is independent and An attribute set maybe have various reduct.Intersection of reduct is said core of C, represented Core can be interpreted as the most important part of knowledge, can not delete while reduct.

Decision Rule Having the Most Efficient
After calculations of reduct and core in decision system, rules can be extracted by the reduction of the decision system.The type of rules is if "characteristic value after reduction of sentence" then "a classification of decision attribute".These rules express extracted knowledge from raw data.

Interpretive Structural Modeling (ISM)
Interpretive Structural Modeling (ISM) is proposed in 1976 by Warfield.The mathematical analysis transforms the relationship between the different types of elements into the associated constructor class diagram in a complex system [29].While analyzing, using the Hierarchical Digraph in Graphic Theory describes the relationship between different types of elements.As a result, ISM transform fragmental and abstractive elements into the specific and comprehensive associated constructor class diagram in a complex system to clarify the structure of the complex situations [30].
While calculating, the relationship between the various elements must be arranged.Causality analysis table is established by binary matrix.Binary data of "1" and "0" means that the elements of related or unrelated.The matrix is represented by symbol A. To use Graphic Theory, the adjacency matrix A plus the unit matrix I becomes "contains its own causality matrix" represented by symbol B. Though  .Finally, all the structural elements in the event are transformed into the associated structural hierarchy chart to obtain a distribution position of various structural elements [29].
In recent years, the studies of applying ISM in education are very much.Such as making structural graph of learning interest factors in mathematics, and proposing guidance programs for students of different learning [31], structured analysis of the teaching content [32], proposing learning path of concepts by combining students' misconceptions and ISM structural graph of concepts [33].
In this study, ISM structural graph of concepts is made by teaching content.By corresponding remedial decision-making of two classes to the ISM structural graph of concepts, researchers compared degrees of two classes.This method is an innovative method which is different from the traditional method of using the average.

Research Methods
In this section, researchers first do reliability test in order to ensure the data is quite reliable and then drew research procedures.

Reliability Test of Data
This paper took two fourth grade classes in the same school for example.The number of items was 24, the average of students of each class was 15, and the number of concepts were 7. Before analyzing, the researchers first test reliability of students' responses for two classes.The results showed that Cronbach's α value of Class A is 0.854 and Cronbach's α value of Class B is 0.849.These data represent high reliability.

Research Procedures
The study analyzed students' misconception based on rough set theory and combined with ISM to compare students' degree of two classes.Research structure was shown in Figure 1.

Results and Discussion
Based on research structure, this section divides into several parts.About problems, researchers analyze the structural graph of concepts based on ISM.About students' responses of Class A and B, researchers analyze the SCD table, find common misconceptions, and then formulate the remedial decision-making.Finally, researchers compare students' degree of two classes.

Production and Analyses of the Structural Graph of Concepts Based on ISM
In this section, researchers used the ISM model to build the problems' structural graph of concepts.The study used seven concepts of fraction in fourth grade math.The relationship between concepts is shown as in Table 2.
Number "1" reflects the connection between two con-cepts and the lack of connection is indicated as "0".Researchers used ISM software to estimate a matrix calculation to obtain the causal linking structure between concepts which was the ISM structural graph of concepts (Figure 2).From Figure 2, structural graph of concepts has 5 layers, the lowest layer is the basic concept of this unit, and the top layer is the most difficult concept.When teachers teach this unit, there are three distinct teaching sequences, including

SCD Table of Class A
First, teachers judged the correlation between concept and problem.If there is a connection between concept and problem, then a corresponding column will reflect the number "1" and a lack of connection will show a "0" (Table 3).In this paper, students of two classes did same test, so they had the same problem-concept relationship.
Students' responses of Class A are shown in Table 4. "1" represents students answered correctly, "0" represents students answered incorrectly.In this paper, if students' score was lower than average score, researchers        5).SC table was obtained by combining problem-concept relationship (Table 3) and students' responses of Class A (Table 4).It represents the number of incorrect responses in C1 to C7, respectively.For example: In Table 4, researchers can find A−5 had wrong response in P17, and C3 and C7 must be equipped to answer correctly.Thus, a corresponding column will reflect the number "1" in SC table of Class A.
In SC table, the number of some concepts that students answered incorrectly was a great difference.If doing rough set computing according SC table, the rules could not be obtained.In other words, it is unable to identify common misconceptions of the students.Thus in this paper, researchers assumed if students answered incurrectly once at a certain concept, then it can be regarded as to got the wrong answer accidentally; if students answered incorrectly more than twice at a certain concept, then it can be regarded as to have the misconception.Based on the hypothesis, SC table was converted to SCD table (Table 6).In SCD table, students answered incorrectly more than twice and then a corresponding column will reflect the number "1"; it will be showed a "0", conversely.
The indiscernibility relation can be obtained according to the rough set theory.It is equivalence relation of S under the condition attributes.The result of the calculation is as follows.
Average score of Class A 21.9

S ind C A
To take the intersection from equivalence relations of S were obtained.It represents students of Class A were classified as six categories based on C1 to C7.The result of the calculation is as follows.
Upper approximation and the lower approximation are Similarly, if D is NO, to calculate equivalence relation based on conditional probability and indiscernibility relation can obtain the following results. If

X S D S A
Upper approximation and the lower approximation are In addition, it is no sure whether remedial students in Class A.

Common Misconceptions of Class A
In order to find core of Class A, researchers do reduce of , hence SCD table be simplified (Table 7).

Deleting a Condition Attribute
The first way is to delete a condition attribute in SCD summary table and to check whether there will be new conflicts.The following is the result of deleting and checking of each attribute.Researchers delete C2 and C4 as an example (Tables 8 and 9).After deleting C2, there was a new contradiction between A−1 and A−3.It represented C2 could not be omitted.
1) After deleting C3, there was a new contradiction between A−1 and A−8.It represented C3 could not be omitted.
2) After deleting C4, there was no new contradiction.It represented C4 can be omitted.
3) After deleting C6, there was no new contradiction.It represented C6 can be omitted.
4) After deleting C7, there was no new contradiction.It represented C7 can be omitted.
By the analysis, C2 and C3 could not be omitted.

 
red C maybe were all sets containing C2 and C3.
From SCD summary table of Class A, the student can be classified correctly and remedial teaching decision can be made whether just based on C2 and C3.Hence

Using Boolean Logic to Calculate Discernable Matrix
The second way is to use Boolean logic to calculate discernable matrix.First, discernable matrix of Class A must be established.The researchers compared between students through the two-phase way and recorded different condition attributes in the matrix.For example, in Table 7, after comparing student A−1 and A−2, they have difference in C2, C3, C6, and C7.And so on, Table 10 can be completed.By discernable matrix, the union of each student and other students can be calculated.It means that student maybe produce misconception in that concept.And then, to take the intersection of all union in the matrix is possible 3

S ind C S ind C S ind C S ind C
   Table 10.Discernable matrix of Class A.

Calculating Significant of Condition Attributes
Using dependent of attribute can decide significant of attribute.The caculation of significant is removing a attribute i C from C and finding the influencing degree of

S ind D A
individually.The calculation process are as follows.

S ind C C A
To imitate the calculation of , and

 
S ind D calculate individually   C pos D removing a attribute.After that, dependent of each attribute is obtained based on formula of dependent.Dependent of each attribute is shown in Table 11.
Finally, the dependent of each attribute substitutes Significance of each attribute is shown in Table 12.
From Table 12, C1, C4, C5, C6, and C7 were redundant and could be removed.Researchers got . The calculation of the above three got the same results.It showed students of Class A had common misconception in C2 and C3 in fraction of unit.In Class A, teacher can do remedial teaching directly for two parts "Identify the proper fractions, improper fractions and mixed numbers" and "Improper fraction change integer or mixed numbers".

Formulate Remedial Decision-Making of Class A
By the analysis of Class A, it was seen C2 and C3 could not be omitted.In order to do the remedial teaching more efficiently, researchers extracted rules according two condition attributes.In this paper,   P x said that sets of students having same attribute value about attribute P.
About Student A-1, reduct is . The calculation is shown as following.
. The calculation is shown as following.
. The calculation is shown as following.
Take union of reduct of each student above, remedial decision-making of Class A obtained is the following two points: 1) If C2 = 1 or C3 = 1, then student needs remedial teaching.
Based on the ISM structural graph of concepts, C2 is simpler than C3.Therefore, in the remedial process, teachers can adjust remedial instruction according to the students' level.If students have a lower level, remedy of C2 can be given first; if students have a higher level, remedy of C3 can be given directly.
2) If 2 0 C  and 7 0 C  , then student do not need remedial teaching.

SCD Table of Class B
Similarly, students' responses of Class A are shown in Table 13."1" represents students answered correctly, "0" represents students answered incorrectly.If students' score was lower than average score, researchers set for students who need remedial teaching.Researchers judged the students who need remedial teaching are B−1, B−8, and B−9.
Similarly, SCD Class B (Table 13).It represents the number of incorrect responses in C1 to C7, respectively.According previous mode, SC table was converted to SCD table (Table 15).In other words, if students answered incorrectly once at a certain concept, then it can be regarded as to get the wrong answer accidentally; if students answered incorrectly more than twice at a certain concept, then it can be regarded as to have the misconception.
Similarly, the indiscernibility relation can be obtained according to the rough set theory.It is equivalence relation of S under the condition attributes.The result of the calculation is as follows.
To take the intersection from , equivalence relations of S was obtained.It represents students of Class A were classified as five categories based on C1 to C7.The result of the calculation is as follows.
Similarly, if D is NO, to calculate equivalence relation based on conditional probability and indiscernibility relation can obtain the following results. If Upper approximation and the lower approximation are In addition, it is no sure whether remedial students in Class B.

Common Misconceptions of Class B
In order to find core of Class B, researchers do reduct of SCD table of Class B again according to rough set theory.Core means common misconception of Class A. Simi-   16).

Deleting a Condition Attribute
The first way is deleting a condition attribute of SCD summary table and checking whether there will be new conflicts.The steps were the same as Class A. The following results were obtained after calculation.
1) After deleting C1, there was no new contradiction.It represented C1 can be omitted.
2) After deleting C2, there was no new contradiction.It represented C2 can be omitted.
3) After deleting C3, there was no new contradiction.It represented C3 can be omitted.
4) After deleting C6, there was a new contradiction between B−1 and B−3.It represented C6 could not be omitted.
By the analysis, C6 could not be omitted.

 
red C maybe was all sets containing C6. From SCD summary table of Class B, it is easy to correctly classify students and to decide whether remedial teaching just only based on C6.
Due to

Using Boolean Logic to Calculate Discernable Matrix
The second way is using Boolean logic to calculate discernable matrix.Similarly, discernable matrix of Class A must be established.The researchers compared between students through the two -phase way and recorded different condition attributes in the matrix.And so on, Table 17 can be completed.
By discernable matrix, the union of each student and other students can be calculated.And then, to take the intersection of all union in the matrix is possible   red C of Class B. Result of the calculation is as follows.
If attribute set is dependent, the smallest set of attribute is

Calculating Significant of Condition Attributes
To imitate the analysis of the Class A, using dependent of attribute can decide significant of attribute.Before calculating significant, dependent must be first calculated.Similarly,

 
C D  must be calculated.And then Then calculating individually.The calculation process are as follows.18.
Finally, the dependent of each attribute substitutes Significant of each attribute is shown in Table 19.
From Table 19, C1, C2, C3 and C7 were redundant and could be removed.But because of . Three calculation ways above got the same results again.It showed that students of Class B had possible misconception in C4 or C5 or C6 but had not common misconception in fraction of unit.That means, teacher must do remedial teaching directly for three parts "Mixed fraction or integer change improper fraction", "Solve addition problem of the same denominator fraction", and "Solve subtraction problem of the same denominator fraction" in Class B.

Formulate Remedial Decision-Making of Class B
Similarly, by the analysis of Class B, it was seen C4, C5, and C6 could not be omitted.In order to do the remedial teaching more efficiently, researchers extracted rules according three condition attributes.In this paper,   P x also indicated that sets of students have same attribute value about attribute P.
About Student B−1, B−8, and B−9, reduct is  Based on the ISM structural graph of concepts, C6 is simpler than C4 and C5.Therefore, in the remedial process, teachers can adjust remedial instruction according to the students' level.If students have a lower level, remedy of C4 or C5 can be given first; if students have a higher level, remedy of C6 can be given directly.


2) If 4 0 C  or 5 0 C  or 6 0 C  , then student do not need remedial teaching.

Compare Students' Degree of Two Classes
In accordance with the traditional way, only from the average of two classes, the difference of two classes is insignificant.However, though caculating reduct and core of rough set theory and corresponding remedial decision-making of two classes to the ISM structural graph of concepts, it was found that Class B overall performance is higher than the Class A. By decision content, basic concept of students of Class A is weaker, because that students needed to remedy concepts of the lower layer C2 or C3 (Red part of Figure 5).And basic concept of students of Class B is stronger, because that students needed to remedy concepts of the higher layer C4 or C5 or C6 (Green part of Figure 5).

Conclusions
The study analyzed students' misconception based on rough set theory and combined with interpretive structural model (ISM).The sample of the study was 30 fourth grade students in Central Taiwan, and the exam tools were produced by teachers for math exams.This study firstly suggested three ways to get the common misconception of the students in class, then based on the difference level of each class to make appropriate remedial teaching decision and finally combines with ISM structural graph to compare the degree of two classes in order to provide teachers an effective tool for teaching  diagnosis.The results are as follows: 1) The ISM structural graph of concepts has 5 layers in fraction of unit.When teachers teach this unit, there are three distinct teaching sequences, including C1  C2  C4  C6, C1  C2  C3  C5  C6 and C1  C2  C3  C5  C7.
2) According to information system of rough set theory, this study differed from previous studies which analyzed students' responses directly.This study establishes SC table based on the combination between problemconcept relationship and students' responses.It represents the incorrect responses in C1 to C7, respectively.SCD table is a combination of SC table and researchers' decision-making.The physical significance of the SCD table represents students whether have misconception in concept C1 to C7, as well as teachers whether judge the need for remedial teaching.
3) Based on calculation of indiscernibility relation, a distribution of students is obtained.From the distribution, researchers can determine the students who need remedial teaching, the students who do not need remedial teaching, and the students who could not determine the need for remedial teaching in class.According to results of two classes, it is not sure whether remedial students in two classes.
4) This study provided three methods to get common misconception of the students in class.These three methods are "Deleting a condition attribute", "Using Boolean logic to calculate discernable matrix", and "Calculating significance of condition attributes".Reduct and core will be obtained after calculating, core represents common misconception of students in class.
5) Form caculating reduct and core, it showed students of Class A had common misconception in C2 and C3 in fraction of unit.In Class A, teacher can do remedial teaching directly for two parts "Identify the proper fractions, improper fractions and mixed numbers" and "Improper fraction change integer or mixed numbers".Therefore, researchers extracted rules based on three attributes in SCD summary table of Class A, remedial decision-making of Class A obtained is the following two points: a) If C2 = 1 or C3 = 1, then student needs remedial teaching.b) If C2 = 0 and C7 = 0, then student do not need remedial teaching.
6) Form caculating reduct and core, it showed students of Class B had possible misconception in C4 or C5 or C6 but had not common misconception in fraction of unit.It means, teacher must do remedial teaching directly for three parts "Mixed fraction or integer change improper fraction", "Solve addition problem of the same denominator fraction", and "Solve subtraction problem of the same denominator fraction" in Class B. Therefore, researchers extracted rules based on three attributes in SCD summary table of Class B, remedial decision-making of Class B obtained is the following two points: a) If C4 = 1 or C5 = 1 or C6 = 1, then student needs remedial teaching.b) If C4 = 0 or C5 = 0 or C6 = 0, then student do not need remedial teaching.7) After corresponding remedial decision-making of two classes with the ISM structural graph of concepts, it was found that Class B overall performance is superior than the Class A. students of Class A needed to remedy concepts of the lower layer C2 or C3 and that of Class B needed to remedy concepts of the higher layer C4 or C5 or C6.
8) Because this analysis method is based on rough set theory, inputted data must be discrete data.If data analyzed are continuous data, it must be discretizationed before analysis.


Assume in decision system, positive domain   R pos X of decision attribute D under condition attribute C is divided by knowledge of C. It is defined as means dependence of decision attribute D on condition attribute C. It is defined as expresse B multiplied by n times), B is trasformed into reachable matrix represented by symbol T. It means n T B

Figure 2 .
Figure 2. The structural graph of concepts.
by C. Before calculating significant, dependent must be first calculated.First, The calculation process is as follows.

Figure 4 .
Figure 4.A distribution of students of Class B.
removing a attribute.After that, dependent of each attribute is obtained based on formula of dependent.Dependent of each attribute is shown in Table

Figure 5 .
Figure 5.To compare remedial decision-making of two classes.

Table 6 . SCD table of Class A.
SCD table ofClass A according to rough set theory.Core means common misconception of Class A. Three ways were provided as cross-validation while calculating.First, researchers reduce data of SCD table.According to of students of Class A.

Table 12 . Significant of each attribute.
could not be omitted.

Table 3 )
table was formulated based on IS of rough set theory.As previously mentioned, SC table must first be needed (Table 14) before formulating SCD table.SC table was obtained by combining problemconcept relationship (and students' responses of

Table 14 . SC table of Class B.
to calculate equivalence relation based on conditional probability and indiscernibility relation can obtain the following results.

Table 17 . Discernable matrix of Class B.
and C6 could be omitted.
and C5 could be omitted.