Factors Influencing Students Decisions to Enrollment in Sudanese Higher Education Institutions ()
1. Introduction
Enrollment management is one of the most important education process phases that refer back to the late nineteenth century when Harvard University founded the board of freshman advisors in 1889. The board’s purpose was to establish orientation, provide advising and counseling, and develop social events for freshmen [1] . Since then, enrollment management is increasingly getting the attention of scholars in educational data mining, particularly in computer-related disciplines.
Data mining is a field of computer science that focuses on the detection of patterns and hidden knowledge in enormous data and gives the information in logical form. Artificial intelligence, machine learning, statistics are some techniques applied in data mining [2] .
One of the emerging interdisciplinary study areas is the educational data mining, which is a new growing research community [3] . Educational data mining concerns the developing and using methods that can explore or extract interesting information from educational data [4] . One of these methods is association rules.
An association rule is an unsupervised learning method that used for pattern discovery, which in turn may reveal new knowledge and exciting discovery. Association rules are considered as an important method for decision making if it has a support and confidence that is at least equal to some minimal support and confidence thresholds defined by the user. Association rules mining was first encountered by Agrawal et al. [5] .
There are various association rule mining algorithms such as AIS, SETM, Apriori, Aprioritid, Apriorihybrid, FP-growth [6] . The Apriori algorithm is the most well-known association algorithm used for finding frequent itemsets with candidate generation [7] .
Using association rules to discover new knowledge in educational data is a common method [4] [8] [9] , more specifically, many studies apply association rules algorithms to enrollment data sets [10] [11] [12] .
This paper applies the Apriori algorithm to a student’s enrollment data set that was created using a questionnaire and a sample of students who are enrolled in governmental and private sector Sudanese Universities. The Apriori algorithm is selected since it is the most frequently used association rule algorithm. The aim is to discover the most influential category among enrollment related factor categories, and further to discover the essential factors within each influential category. Also, the paper shows the correlation between factors within the different categories that influence the student’s decision to enroll in Sudanese Universities.
The expected extracted knowledge can be of great value and offer a helpful and constructive recommendation to the academic planners to improve enrollment to their higher institutions. The rest of the paper is organized as follows: Section 2 discusses related work that applies association rules as a mining tool to enrollment data sets. Section 3 reviews association rules and the Apriori algorithm. Section 4 explains how the enrollment data set for students enrolled in the Sudanese Universities were created. Section 5 presents how association rule mining is applied to Sudanese Universities enrollment dataset. Section 6 is results and discussion and the conclusion was drawn in Section 7.
2. Related Work
Many researchers study enrollment related factors that influence student’s selection decisions to enroll in higher education institutions. Recently, there has been more interest in extract various relations from educational data by using association rules as the data mining tool.
Many researchers examined the influence of the student and society related factors in enrollment decision. Researchers prove that student educational aspirations have positively related with Higher Educational Institution (HEI) selection decisions [8] [9] [10] . Some studies compare aptitude and the ability of the students’ as factors in the selection of HEI [11] [12] . Others recognize the critical role played by students’ guardians, family and friends factors in directing student decide to enroll in specific HEI’s [12] [13] [14] . Students’ interests, motivations, and occupational plans, the class level, socioeconomic level, ethnic of the students, and institutional characteristics are examined to see their effects in institution’s total enrollment [15] [16] [17] .
Student’s selection of colleges depends on several criteria, including academic quality, facilities, campus surroundings, and personal characteristics [12] [18] . Also, income affects the choice of students along the public-private education divide. The college location is a significant predictor of HEI selection, and a visit to the university campus was found out as an important factor [19] .
The reputation of HEI’s has a significant influence on the student’s enrollment decision, which was examined by many researchers who found that engaging in international partnerships attracts larger numbers of international students [20] [21] .
Recent evidence suggests that there are needs of increasing diversification at the programs level for adopting more general programs with based on the diversity of students’ sample population, multiple regional, social, and economic needs [22] . Also, the institutional image emphasized the significance of building positive emotions in achieving educational institution enrollment goals and the availability of postgraduate studies in institutions are affecting student’s selection of HEI [23] [24] [25] .
A considerable amount of literature was published on factors related to Admission and examines the influences of these factors on the decision of the student to enrolment. The recognition of the academic degree or the program nationally or internationally, the degree flexibility, the diversity of courses, and the flexible of entry requirements have influenced enrollment [18] . The marketing mix, marketing efforts, channels, and advertisement is found to be important factors that influence student’s college selection [25] [26] , also, financial aid induces more enrollments in colleges than other factors [27] [28] .
Finding a job had become a central issue for students and their family recently. The decision of enrollment to specific education institution was affected by many factors related to the employment, and employment opportunities are a stronger predictor of enrollment decisions [19] [29] .
Association rules have been used to discover relations between; admission system attributes in King Abdul-Aziz University (KAU) [30] , the preliminary students knowledge [31] , the specialties and student’s interests [32] , the factors that affect postgraduate study and assessment [33] , and the courses and the failed students [34] .
Association rules as a data mining technique are used to investigate the correlation between different enrollment factors. Some studies investigated the Apriori algorithm on enrollment to extract the behavior of low- and high-income students [35] , the quality of talent training and enhance the overall competitiveness of colleges and universities [36] , student performance for a certain outcome (Pass or Fail) [37] .
This paper gets benefits from previous studies to determine the different factors influencing the enrollment decision. Then the paper provides a new categorization to these factors. To determine the most influential factors and the correlations between these factors the paper uses association rules in a similar way to [4] [38] [39] . The paper deals with enrollment dataset as in [30] [40] [41] .
The paper is unique in that, it defines categorization to enrollment factors and uses the Apriori algorithm to extract association rules that determine which category has more roles in student’s enrollment decisions, and what factors within this category are most important.
3. Association Rules and the Apriori Algorithm
Association rule mining is a descriptive data mining technique for finding patterns, associations, and correlations among sets of items in a database. A standard association rule is a rule of the form X→Y which means that if X is true for instance in a database, then Y is true for the same instance, with a certain level of significance [5] [13] .
Typically, an association rule is called strong if it satisfies both a minimum support (min-supp) threshold and a minimum confidence (min-conf) threshold that is determined by the user [2] .
The minimum support is defined as the minimum percentage of occurrences of the item/itemset in the database, while minimum confidence defined as the minimum certainty or trustworthiness associated with each discovered pattern [6] .
Let A and B be itemsets in the database D. An association rule between item sets A and B is an implication of the form
, where
. The support for an association rule
denoted as sup (
), is defined as the number of transactions in D that contains
[9] . The item sets A and B are called antecedent and consequent, respectively.
The support of an itemset A, supp(A) is the proportion of transaction in the database in which the item A appears. It signifies the popularity of an itemset.
The confidence determines how frequently items in B appear in the transactions that contain A; it is the ratio of the number of transactions that include all items in the association rule.
Moreover, one of the simplest correlation measures is Lift. Lift is important to measure the interestingness of a rule. Lift measure tells us whether the LHS (left-hand sides) of a correlation influences the RHS (right-hand sides) positively or negatively. The lift between the occurrence of item set A and B can define as [42] :
The strength of correlation was measured from the lift value as follows [42] :
- If Lift(A, B) = 1 or P(B/A) = P(B) (or P(A/B) = P(B)), then B and A are independent and there is no correlation between them.
- If Lift(A, B) > 1 or P(B/A > P(B) (or P(A/B) > P(A)), then A and B are positively correlated, meaning the occurrence of one implies the occurrence of the other.
- If Lift(A, B) < 1 or P(B/A) < P(B) (or P(A/B) < P(B)), then A and B are negatively correlated, meaning the occurrence of one discourage the occurrence of the other.
The Apriori algorithm was proposed in 1994 [10] . The algorithm identifies the frequent items in the database and extending them to larger and larger item sets as long as those itemsets appear adequately often in the database. The Apriori algorithm defines confidence level for association rules using two parameters; the minimum support threshold, and the minimum confidence threshold. The frequent itemsets defined by Apriori can be used to determine association rules which highlight general trends in the database [15] .
Apriori algorithm uses a level-wise search, where k-itemsets (An itemset which contains k items is known as k-itemset) are used to explore (k + 1)-itemsets, to mine frequent itemsets from the transactional database for Boolean association rules. In this algorithm, frequent subsets are extended one item at a time, and this step is known as the candidate generation process. Then groups of candidates are tested against the data [6] .
To count candidate item sets efficiently, Apriori uses breadth-first search method and a hash tree structure. It identifies the frequent individual items in the database and extends them to larger and larger item sets as long as those itemsets appear sufficiently often in the database [6] .
Apriori algorithm defines frequent item sets that can be used to determine association rules which highlight general trends in the database. Mining of association rules from a database consists of finding all rules that meet the user-specified threshold support and confidence. The problem of mining association rules can be as stated in the Algorithm below.
Algorithm:
1) Find all sets of items which arise with a frequency that is greater than or equal to the user-specified threshold support, s.
2) Generate the anticipated rules using the large itemsets, which have user-specified threshold confidence, α.
Apriori algorithm starts by scanning the data set to find all the items and count their support as candidates of size1 (i.e., C1) and removes infrequent items (count < min-supp).
4. The Enrollment Dataset
We classify the enrollment factors into four categories that contain 32 factors; Table 1 below shows these categories and the factors constitute each category.
Based on categorization in Table 1, the enrollment data collected via a questionnaire that has been subject to a rigorous assessment by experts and professionals. The questionnaire has two sections; the first section is about demographic factors related to the enrolled student, namely; the university name the student enrolled in, the type of institution (governmental/private), gender (male/female), age (from 18 to 20), academic year class (1/2/3/4/5), and academic level (B.Sc./Diploma), and the second section is about the enrollment-related factors.
The sample of students is enrolled in eight Sudanese universities in Khartoum state in computer-related studies. From each of the 8 selected universities, 125 students are selected randomly from different classes. Then the 1000 sample has been reviewed and organized, making analysis easy, each type of data in the questionnaire is encoded.
The demographic information is encoded as follows; universities take codes {1, 2, 3, 4, 5, 6, 7, 8}, institution type as {1, 2}, gender is encoded as 1 or 2, age as {1, 2, 3}, academic class as {1, 2, 3, 4, 5}, and academic level as {1, 2}.
For the enrollment-related factor, the questionnaire uses the Likert fifth scale (Strongly agree, agree, Na, disagree, and strongly disagree), which are encoded as {5, 4, 3, 2, 1}.
The questionnaire data n transformed into tables as a data set. Statistical adjustments applied to data that requires scale transformations. Validation tests were used to evaluate the questionnaire scale and contents as follows:
· Cronbach alpha (α) was utilized for estimating the reliability coefficient for the questionnaire scale. Reliability coefficient means obtaining the same values when re-using the measuring instrument with the same circumstance, and give thus results [43] . Cronbach Alpha value for most factors is above 70%.
· Moreover, the percentage of the main categories of enrollment related factors is 92%. These show that the enrollment factors are the highly reliable and consistent measuring tool.
· The Kaiser-Meyer-Olkin (KMO) and Bartlett’s tests were used to check for adequacy of sample contents. KMO is statistic values that demonstrate the proportion of variance in the variables that might be affected by underlying factors [44] . KMO returns values between 0 and 1. Here in this sample test, KMO values are between (0.8 and 1), which indicates that the sampling is adequate, and a factor analysis may be useful with this dataset.
5. Applying Association Rule Mining to the Sudanese Universities Enrollment Dataset
To determine what enrollment factor category is the most influential, we apply the Priori association rule algorithm, with a suitable minimum support value, to find all the frequent itemsets in the dataset with that minimum support threshold, and then extract the association rules.
Factors within the extracted rules, define the set of most critical factors and thus the most influential category. Besides, the extracted rules determine the correlations between these factors.
Two types of relationships are of interest to this study, the first is the relationship between the demographic factors and the enrollment-related factors, and the second is the relationship between enrollment related factors themselves.
In extracting association rules, three steps have to be followed:
1) Determine the minimum support: A fundamental problem in Apriori is how to choose a minimum support (min-supp) value to find interesting patterns. There is not an easy way to determine the best min-supp threshold. In this paper, the minimum support is determined by trial and error.
2) Create the itemset list: In association rules, a set of the item also defined as a large item set. The item set is said to be frequent if it occurs more than the predefined minimum support.
3) Extract the association rules:
a) Generate the standard association rules from the frequent items by using the Apriori algorithm.
b) An association rule measures. The correlation rule is measured based on the minimum support, minimum confidence, and interest correlation between itemsets A and B by using lift measures.
5.1. Extraction of Association Rules between Demographic Factors and Enrollment Related Factors
To determine the suitable minimum support value, we try different values of minimum support and minimum confidence, and record the number of association rules that can be generated. Table 2 shows the number of association rules for the range of values 0.1 to 1 for minimum support and minimum confidence Since the number of generated rules is similar for support from 0.1 to 0.3 and confidence from 0.1 to 0.9, we select as appropriate values min-supp = 0.3 and min-conf = 0.9 as thresholds.
To create the candidate set, called large itemset L(1), the set is generated for demographic factors and enrollment related factors. The size of L1 is 19 items, as shown in Table 3, and the size of large itemsets L(2) is 14 items, as shown in Table 4 below.
![]()
Table 2. The number of association rules extracted for support and confidence values (from 0.1 to 1).
The generated common Apriori association rules of demographic factors and enrollment related factors using min-supp is (0.3), min-conf is (0.9), and using lift measures are shown in Table 5.
Table 5 shows 10 association rules. Six rules correlated demographic factors and four rules were correlated demographic factors with enrollment factors. Numbers within each rule shows the number of itemsets comes together, for example, the in rule 1: Age = 3406 è Academic levels = 1391, 406 is the total number of records with Age = 3391 of them come with academic level = 1.
The column of Lift was used to measure the new rules and specified the type of association rules. Rule 1and rules 2 are higher than one; this means the factors are positively correlated. In positive correlation, both antecedent and consequent factors move in the same direction. While rules 3, 4, 5, and 6 are equal to one, this means they have an independent correlation. In independent correlation, the probability of occurrence of the antecedent and that of the consequent factors are independent of each other. While rules 7, 8, 9, and 10 are less than one, this means they negatively correlated with high confidences. A negative correlation is an inverse relationship between the two factors
5.2. Extraction of Association Rules between Enrollment Related Factors
To determining the minimum support value; association rules generated by trying different values of min-supp, and min-conf and the number of rules generated as shown in Table 6. The most appropriate values found are min-supp is (0.2), min-conf is (0.6).
![]()
Table 5. Association rules of demographic factors and enrollment factors.
![]()
Table 6. The number of association rules extracted for support and confidence values (from 0.1 to 1).
In the extract the association rules, the candidate set called large itemsets, (L) is generated of enrollment related factors. The size of L(1) is 73 items, and the size of large itemsets L(2) is 29 items.
The common Apriori association rules of enrollment related factors were extracted using min-supp = 0.2, min-conf = 0.6 and lift > 1, as shown in Table 7.
6. Results and Discussion
Table 5 shows that there are only 7 factors that influence student’s enrollments, 4 of them related to the demographic factors, and 3 factors of them are enrollment related factors. These factors are shown in Table 8.
Table 7 shows that the Student and society related factors category did not appear in all association rules, Educational Institution related factors appeared in five rules, Admission related factors appeared in four rules, and employment-related factors appeared in one rules. These factors are shown in Table 9.
In addition to that, we found out there is the ministry approves a positive influence from the two factors reputation of universities and education facilities with a degree. Besides, there is a positive correlation between factors reputation of universities and education facilities with a degree are approved by the ministry, a correlation between the factors; good position, retention, the degree is approved by the ministry, aptitude test and feasibility and the factor admission requirements.
![]()
Table 7. Association rules found between enrollments related factors.
![]()
Table 8. The demographic and enrollment related factors influencing student’s decision to enrollment.
![]()
Table 9. The enrollment related factors influencing student’s decision.
7. Conclusions
This paper shows that only 4 out of the 6 student’s demographic factors, namely: age, gender, type of institution, and academic level have a strong influence on student’s enrollment. Moreover, the most important categories of factors that affect the Sudanese student’s decision to enroll in Higher Education Institutions are the Educational Institution and Admission categories of enrollment related factors.
The findings can be used by higher education institutions as a guideline in offering students the appropriate knowledge about enrollment related factors.