Factors Influencing Students Decisions to Enrollment in Sudanese Higher Education Institutions

There is a growing body of literature that recognizes the importance of data mining in educational systems. This recognition makes educational data mining a new growing research community. One way to achieve the highest level of quality in a higher education system is by discovering knowledge from educational data such as students’ enrollment data. Many mining tools that aim to discover exciting correlations, frequent patterns, associations, or casual structures among sets of items in educational data sets have been proposed. One of the widely used tools is association rules. In this paper, the Apriori algorithm is used to generate association rules to discover the importance and correlation between factors that influence student’s decision to enroll in higher education institutions in Sudan. The algorithm is applied using a student’s enrollment data set that was created using a questionnaire and 800 students enrolled in governmental and private sector universities as a sample. This paper classifies factors that influence enrollment into: student’s demographic factors and four categories of enrollment related factors (Stu-dent and Society, Educational Institution, Admission, and Employment related factors), and determines the most influential factors in determining student’s decision to enroll in Sudanese universities. The analysis result shows that the Educational Institution related factors (50%) and Admission related factors (40%) are strongly influencing students’ enrollment decision, while the Employment related factors (10%) and Student and Society related factors (0%) have weak influence. The factors out of the 14 Educational Institution related factors that have a high impact are: reputation, diversity of study, quality of education, education facilities, and feasibility.


Introduction
Enrollment management is one of the most important education process phases that refer back to the late nineteenth century when Harvard University founded the board of freshman advisors in 1889. The board's purpose was to establish orientation, provide advising and counseling, and develop social events for freshmen [1]. Since then, enrollment management is increasingly getting the attention of scholars in educational data mining, particularly in computer-related disciplines.
Data mining is a field of computer science that focuses on the detection of patterns and hidden knowledge in enormous data and gives the information in logical form. Artificial intelligence, machine learning, statistics are some techniques applied in data mining [2].
One of the emerging interdisciplinary study areas is the educational data mining, which is a new growing research community [3]. Educational data mining concerns the developing and using methods that can explore or extract interesting information from educational data [4]. One of these methods is association rules.
An association rule is an unsupervised learning method that used for pattern discovery, which in turn may reveal new knowledge and exciting discovery. Association rules are considered as an important method for decision making if it has a support and confidence that is at least equal to some minimal support and confidence thresholds defined by the user. Association rules mining was first encountered by Agrawal et al. [5].
There are various association rule mining algorithms such as AIS, SETM, Apriori, Aprioritid, Apriorihybrid, FP-growth [6]. The Apriori algorithm is the most well-known association algorithm used for finding frequent itemsets with candidate generation [7].
Using association rules to discover new knowledge in educational data is a common method [4] [8] [9], more specifically, many studies apply association rules algorithms to enrollment data sets [10] [11] [12]. This paper applies the Apriori algorithm to a student's enrollment data set that was created using a questionnaire and a sample of students who are enrolled in governmental and private sector Sudanese Universities. The Apriori algorithm is selected since it is the most frequently used association rule algorithm.
The aim is to discover the most influential category among enrollment related factor categories, and further to discover the essential factors within each influential category. Also, the paper shows the correlation between factors within the different categories that influence the student's decision to enroll in Suda- The expected extracted knowledge can be of great value and offer a helpful and constructive recommendation to the academic planners to improve enrollment to their higher institutions. The rest of the paper is organized as follows: Section 2 discusses related work that applies association rules as a mining tool to enrollment data sets. Section 3 reviews association rules and the Apriori algorithm. Section 4 explains how the enrollment data set for students enrolled in the Sudanese Universities were created. Section 5 presents how association rule mining is applied to Sudanese Universities enrollment dataset. Section 6 is results and discussion and the conclusion was drawn in Section 7.

Related Work
Many researchers study enrollment related factors that influence student's selection decisions to enroll in higher education institutions. Recently, there has been more interest in extract various relations from educational data by using association rules as the data mining tool.
Many researchers examined the influence of the student and society related factors in enrollment decision. Researchers prove that student educational aspirations have positively related with Higher Educational Institution (HEI) selection decisions [8] [9] [10]. Some studies compare aptitude and the ability of the students' as factors in the selection of HEI [11] [12]. Others recognize the critical role played by students' guardians, family and friends factors in directing student decide to enroll in specific HEI's [12] [13] [14]. Students' interests, motivations, and occupational plans, the class level, socioeconomic level, ethnic of the students, and institutional characteristics are examined to see their effects in institution's total enrollment [15] [16] [17].
Student's selection of colleges depends on several criteria, including academic quality, facilities, campus surroundings, and personal characteristics [12] [18].
Also, income affects the choice of students along the public-private education divide. The college location is a significant predictor of HEI selection, and a visit to the university campus was found out as an important factor [19].
The reputation of HEI's has a significant influence on the student's enrollment decision, which was examined by many researchers who found that engaging in international partnerships attracts larger numbers of international students [20] [21].
Recent evidence suggests that there are needs of increasing diversification at the programs level for adopting more general programs with based on the diversity of students' sample population, multiple regional, social, and economic needs [22]. Also, the institutional image emphasized the significance of building tionally or internationally, the degree flexibility, the diversity of courses, and the flexible of entry requirements have influenced enrollment [18]. The marketing mix, marketing efforts, channels, and advertisement is found to be important factors that influence student's college selection [25] [26], also, financial aid induces more enrollments in colleges than other factors [27] [28].
Finding a job had become a central issue for students and their family recently. The decision of enrollment to specific education institution was affected by many factors related to the employment, and employment opportunities are a stronger predictor of enrollment decisions [19] [29].
Association rules have been used to discover relations between; admission system attributes in King Abdul-Aziz University (KAU) [30], the preliminary students knowledge [31], the specialties and student's interests [32], the factors that affect postgraduate study and assessment [33], and the courses and the failed students [34].
Association rules as a data mining technique are used to investigate the correlation between different enrollment factors. Some studies investigated the Apriori algorithm on enrollment to extract the behavior of low-and high-income students [35], the quality of talent training and enhance the overall competitiveness of colleges and universities [36], student performance for a certain outcome (Pass or Fail) [37].
This paper gets benefits from previous studies to determine the different factors influencing the enrollment decision. Then the paper provides a new categorization to these factors. To determine the most influential factors and the correlations between these factors the paper uses association rules in a similar way to [4] [38] [39]. The paper deals with enrollment dataset as in [30] [40] [41].
The paper is unique in that, it defines categorization to enrollment factors and uses the Apriori algorithm to extract association rules that determine which category has more roles in student's enrollment decisions, and what factors within this category are most important.

Association Rules and the Apriori Algorithm
Association rule mining is a descriptive data mining technique for finding patterns, associations, and correlations among sets of items in a database. A standard association rule is a rule of the form X→Y which means that if X is true for instance in a database, then Y is true for the same instance, with a certain level of significance [5] [13].
Typically, an association rule is called strong if it satisfies both a minimum support (min-supp) threshold and a minimum confidence (min-conf) threshold that is determined by the user [2].
The minimum support is defined as the minimum percentage of occurrences of the item/itemset in the database, while minimum confidence defined as the The support of an itemset A, supp(A) is the proportion of transaction in the database in which the item A appears. It signifies the popularity of an itemset.
The confidence determines how frequently items in B appear in the transactions that contain A; it is the ratio of the number of transactions that include all items in the association rule.
The strength of correlation was measured from the lift value as follows [42]: The Apriori algorithm was proposed in 1994 [10]. The algorithm identifies the frequent items in the database and extending them to larger and larger item sets as long as those itemsets appear adequately often in the database. The Apriori algorithm defines confidence level for association rules using two parameters; the minimum support threshold, and the minimum confidence threshold. The frequent itemsets defined by Apriori can be used to determine association rules which highlight general trends in the database [15].
Apriori algorithm uses a level-wise search, where k-itemsets (An itemset which contains k items is known as k-itemset) are used to explore (k + 1)-itemsets, to mine frequent itemsets from the transactional database for Boolean association rules. In this algorithm, frequent subsets are extended one item at a time, and Intelligent Information Management this step is known as the candidate generation process. Then groups of candidates are tested against the data [6].
To count candidate item sets efficiently, Apriori uses breadth-first search method and a hash tree structure. It identifies the frequent individual items in the database and extends them to larger and larger item sets as long as those itemsets appear sufficiently often in the database [6].
Apriori algorithm defines frequent item sets that can be used to determine association rules which highlight general trends in the database. Mining of association rules from a database consists of finding all rules that meet the user-specified threshold support and confidence. The problem of mining association rules can be as stated in the Algorithm below. Algorithm: 1) Find all sets of items which arise with a frequency that is greater than or equal to the user-specified threshold support, s.
2) Generate the anticipated rules using the large itemsets, which have user-specified threshold confidence, α.
Apriori algorithm starts by scanning the data set to find all the items and count their support as candidates of size1 (i.e., C1) and removes infrequent items (count < min-supp).

The Enrollment Dataset
We classify the enrollment factors into four categories that contain 32 factors; Table 1 below shows these categories and the factors constitute each category.
Based on categorization in Table 1, the enrollment data collected via a questionnaire that has been subject to a rigorous assessment by experts and • Cronbach alpha (α) was utilized for estimating the reliability coefficient for the questionnaire scale. Reliability coefficient means obtaining the same values when re-using the measuring instrument with the same circumstance, and give thus results [43]. Cronbach Alpha value for most factors is above 70%.
• Moreover, the percentage of the main categories of enrollment related factors is 92%. These show that the enrollment factors are the highly reliable and consistent measuring tool. • Aspiration, • Family/society motivation, • Parent's occupation/qualification, • Family income, • Family Social class, • Proudness/academic prestige, • Ethnic/religion Educational Institution 14 • Location, • Reputation, • Education facilities, • International partnership, • Diversity of study, • Quality of education, • Image/campus, • Activity facilities, • Social life, • Postgraduate and institutional research, • Systematic and organized, • Academic assistance, • Modern/feasibility, • Campus visiting Admission 7 • Aptitude test, • Education/tuition cost, • Financial aid, • Approved degree, • Admission requirements, • Educational institution representative and advertising, • Retention Employment 4 • Good job opportunity, • High income, • Good position, • Employed promotion and planning Intelligent Information Management  [44]. KMO returns values between 0 and 1. Here in this sample test, KMO values are between (0.8 and 1), which indicates that the sampling is adequate, and a factor analysis may be useful with this dataset.

Applying Association Rule Mining to the Sudanese Universities Enrollment Dataset
To determine what enrollment factor category is the most influential, we apply the Priori association rule algorithm, with a suitable minimum support value, to find all the frequent itemsets in the dataset with that minimum support threshold, and then extract the association rules.
Factors within the extracted rules, define the set of most critical factors and thus the most influential category. Besides, the extracted rules determine the correlations between these factors.
Two types of relationships are of interest to this study, the first is the relationship between the demographic factors and the enrollment-related factors, and the second is the relationship between enrollment related factors themselves.
In extracting association rules, three steps have to be followed: 1) Determine the minimum support: A fundamental problem in Apriori is how to choose a minimum support (min-supp) value to find interesting patterns. There is not an easy way to determine the best min-supp threshold. In this paper, the minimum support is determined by trial and error.
2) Create the itemset list: In association rules, a set of the item also defined as a large item set. The item set is said to be frequent if it occurs more than the predefined minimum support.
3) Extract the association rules: a) Generate the standard association rules from the frequent items by using the Apriori algorithm.
b) An association rule measures. The correlation rule is measured based on the minimum support, minimum confidence, and interest correlation between itemsets A and B by using lift measures.

Extraction of Association Rules between Demographic Factors and Enrollment Related Factors
To determine the suitable minimum support value, we try different values of minimum support and minimum confidence, and record the number of association rules that can be generated. Table 2  To create the candidate set, called large itemset L(1), the set is generated for demographic factors and enrollment related factors. The size of L1 is 19 items, as shown in Table 3, and the size of large itemsets L(2) is 14 items, as shown in Table 4 below.    The generated common Apriori association rules of demographic factors and enrollment related factors using min-supp is (0.3), min-conf is (0.9), and using lift measures are shown in Table 5. Table 5

Extraction of Association Rules between Enrollment Related Factors
To determining the minimum support value; association rules generated by try- ing different values of min-supp, and min-conf and the number of rules generated as shown in Table 6. The most appropriate values found are min-supp is (0.2), min-conf is (0.6).    The common Apriori association rules of enrollment related factors were extracted using min-supp = 0.2, min-conf = 0.6 and lift > 1, as shown in Table 7. Table 5 shows that there are only 7 factors that influence student's enrollments, 4 of them related to the demographic factors, and 3 factors of them are enrollment related factors. These factors are shown in Table 8. in five rules, Admission related factors appeared in four rules, and employment-related factors appeared in one rules. These factors are shown in Table 9.

Results and Discussion
In addition to that, we found out there is the ministry approves a positive influence from the two factors reputation of universities and education facilities with a degree. Besides, there is a positive correlation between factors reputation of universities and education facilities with a degree are approved by the ministry, a correlation between the factors; good position, retention, the degree is approved by the ministry, aptitude test and feasibility and the factor admission requirements.   The findings can be used by higher education institutions as a guideline in offering students the appropriate knowledge about enrollment related factors.