Logistic Regression Model for the Academic Performance of First-Year Medical Students in the Biomedical Area

In the medical education field, the prediction of variables that have an impact on the academic performance of students is highly important as supporting programs can be implemented to avoid dropouts or failing scores. Several studies have confirmed the relationship between student performance during the first months at college and the one afterwards; nevertheless, every medical school has its particularities. The objective is to develop a logistic regression model to predict first-year medical students’ performance using academic, psychological and vocational variables as well as learning and strategies for self-motivation. The study is observational, transversal and descriptive. The study group consisted of 1205 first-year medical students. Participants completed questionnaires dealing with general knowledge, psychosocial factors, factors associated with career choice, as well as research and autoregulation strategies inventory. Participation was fully voluntary and the results were used under confidential agreement (NDA). The multiple regression model considered pleasure in academic background, percentage of checkmarks in general knowledge, perceived efficiency (categoric with 3 levels), aptitude, interest in biological sciences and health, follow stablished regulations, drive and the pursue of social prestige as covariables. We conclude that a logistic regression model to predict academic performance, mostly of those medical students under academic risk in the biomedical area, is an efficient tool as it allows valid conclusions for appropriate decision making.


Introduction
The logistic regression model aims to describe the relationship between one or more independent variables that may be continuous, categorical, or binary (Hosmer, Stanley, & Rodney, 2000).
Due to its contribution, the logistic regression model has become a useful tool in business, medicine, epidemiology, sociology and marketing, and so on as it can predict the ailment disease, the granting of a credit to a specific individual, the success or failure of a business, among many others.
In the field of medical education, the prediction of variables that have an impact on the academic performance of students is highly important as supporting programs can be implemented to avoid dropouts or failing scores.A study carried out in the United Kingdom found a 10% rate of dropout in the first-year medical students.When different cohorts were analyzed (Arulampalam, Naylor, & Smith, 2004), they observed an association between the risk of academic failure and the level of knowledge in biology, chemistry and physics, as well as their gender (Oliver, Smith, Winston, Geranmayeh, Behjati, Kingston, & Pollara, 2010).
Research on prediction of academic success in medical students has generated a database of substantial knowledge on the factors predicting school performance (Ferguson, James, & Madeley, 2002;O'Neill, Wallstedt, Eika, & Hartvigsen, 2011).This allows medical schools to offer prompt and tailored support for students at risk of academic failure.
Several studies have confirmed the relationship between student performance during the first months at college and the ones afterwards (Horn & Carroll, 1998;Murtaugh, Burns, & Schuster, 1999;Winston, van der Vleuten, & Scherpbier, 2014); nevertheless, every medical school has its particularities and in the Faculty of Medicine at UNAM, there is a high index of failure in the biomedical area during the first year of studies and thus, an increased number of repeaters in the following academic year.
In a previous study, we showed that emotional variables had the highest correlation with academic performance of the first-year medical students (Urrutia-Aguilar, Ortiz, Fouilloux, Ponce, & Guevara-Guzmán, 2014).However, we consider that this type of educational research should include other types of variables.
To meet this need, in this study we use a logistic regression model to predict medical students' performance in their first year using academic, psychological and vocational variables as well as learning and motivational strategies as variables.

Methodology
The population under study included 1205 first-year medical students at Universidad Nacional Autónoma de México (UNAM), 2013-2014.The academic curriculum consists of a six-year training program, with the first two years being considered biomedical studies, the following three year clinical clerkships and the last year social service.
The regressive variables considered in this research work were categorized into four main groups: 1) general knowledge; 2) psychosocial factors; 3) factors associated with career choice and 4) inventory of research and strategies for self-motivation.The outcome variable for our model was academic performance and it was coded as "0" if the student had failed at least one subject of the biomedical area taken during the first year and "1" in case of successfully completing all the coursework.
The study was observational, cross-sectional, and descriptive and was approved by In order to assess psychological characteristics, including signs of depression, two different questionnaires were applied in the same place and time for all students after signing an informed consent format: 1) Beck Depression Inventory (Beck, Steer, & Garbin, 1988;Jurado, Villegas, Mendez, Rodriguez et al., 1998); and, 2) Symptom Checklist SCL-90.The Beck Depression Inventory can be self-administered, and has 21 items to assess depression symptoms in adolescents and adults.The total score varies from 0 to 63, with a score ≥ 13 being identified as a probable case of depression.The Symptom Checklist SCL-90 is a screening test to identify symptoms of different psychopathologies, and has 90 items with a Cronbach Alpha of all subscales over 0.713 (Cruz, López, Blas, González et al. 2005).The subscale for depression considered for this study was greater than or equal to 1.514.
Study strategies and self-motivation were categorized using a 91-item multiple choice questionnaire that assessed how students acquire, organize, remember and apply what they learn.The questionnaire also assessed the way they evaluate, plan and control their study strategies with a Cronbach Alpha of 0.97 (Castañeda, Pineda, Gutiérrez, Romero, & Peñalosa, 2010).The student self-assessed their study strategies and self-motivation in four main aspects: 1) data gathering; 2) methods of recovering learned data when facing tasks and exams; 3) methods of processing, which may be convergent or divergent; and, 4) meta cognition and meta motivational self-regulation.
To assess student performance, scores were gathered during the academic year from Data was analyzed using the R statistical software.The statistical analysis was descriptive, inferential and multivariate.
Inclusive criteria: In this study data of students who had taken all the surveys and had one grade at least were included.The participation was fully voluntary and the results were used under confidentiality agreement (NDA).

Results
The population under study that met the inclusive criteria were 925 students, 37% men and 63% women; 34.7% of the students came from CCH, 52% from ENP and 13.3% from "other".The database contained 72 variables.
Figure 1 shows the CCH students associate failing scores in a positive way while students from "other" show a negative association towards failure.
Two different models with their corresponding co-variables were developed.One included all the variables (i.e.academic background, percentage of check marks in general knowledge, perceived efficiency, aptitude, interest in biological sciences and health, follow stablished regulations, drive and the pursue of social prestige as covariables) and the second measuring each variable independently.
High-school is the type of academic background at entry.Perceived efficiency is the factor that considers conviction of oneself that a task can be carried out successfully, the behavior needed to produce results.
The aptitude test is described as the ability the student has towards reasoning (data Figure 1.Academic success and bachelor of origin.analysis and synthesis), mechanical aptitude (identification of physics principles) and assembling of ways (spatial perception).
The motivational factor is understood as having an orientation towards achievement that sets high goals and strive to reach them and as a skill, the interest to achieve a good academic performance.
The social prestige factor assesses the behaviors that the subject show to feel accepted by the referred social group, the sense of belonging.
As our two models are co-related (model one has all covariables of model two), we compared them through the difference of deviances.Model one was the best predictor of academic failure or success as it was statistically significant (p ≤ 0.02), and thus, we concluded that the model with the most parameters, model one, contains relevant data that is not contained in model two.Abscissa indicates test score and probability success explained by the knowledge test, skills test, the factor sciences and health and motivators aspect.
We got a p ≤ 0.17 in the Hosmer Lemeshow test, which shows a good fit of the model to the data.
Regarding the predicting power of the model, if the response variable is "1", it means that the student received a passing average in the subjects considered in this study and zero if he/she failed one subject.In the best interest of this context, it is of great value to have a higher specificity (zeros well classified as zeros) rather than a higher sensitivity (ones well classified as ones), as our main objective is to predict the population under academic risk (Table 1 Considering π 0 = 0.5 as cutoff point, sensitivity was 0.73 and specificity of 0.68, accuracy 0.71.These values show that the predicting power of the model is considerably good. When the ROC Curve was created (Figure 7) we found that the area under the curve was 0.7736 and thus considered adequate.

Discussion
The use of a logistic regression model to predict academic performance among first year medical students, in particular identifying hose at academic risk, is an efficient tool as it allows valid conclusions for appropriate decision making.
It was not surprising that the variables with a higher weight of prediction had been  In this study, we showed that student self-perception of his/her own ability to perform the required tasks of a course (perceived efficiency) is a factor that predicts academic performance, which undoubtedly is a cognitive variable that determines learning in the classroom (Castañeda, Pineda, Gutiérrez, Romero, & Peñalosa, 2010).The term of perceived efficiency is linked to the Social Cognitive Theory (Bandura, 1986), which establishes that learning is produced through a reciprocal interaction between environmental, behavioral and cognitive influences.
The self-perception students have of their own abilities influences the choice in tasks, proposed goals, effort, and persistence of actions towards reaching such goals.Therefore, professors must consider the importance of reassuring the student self-esteem when learning new concepts.Self-perception plays a significant role in the academic performance of medical students demonstrated through implementation of one-term geared towards developing their meta cognitive and self-regulation abilities (Winston, van der Vleuten, & Scherpbier, 2014;Winston, Van der Vleuten, & Scherpbier, 2010).

Conclusions
The logistic regression model in this study identified variables that predicted academic performance among the first-year medical students.These findings have practical implications, in both the development of in-person or online/distance preparatory courses designed for high school students matriculating into medical school and the implementation of corrective measures during the first year of medical school.These are based on the need to develop didactic materials that will help correct academic deficiencies.Additionally, teaching training courses could be adopted to deal with topics on teaching strategies to aid students to improve their self-perception.
Nonetheless, we are confident that the model identified in this study can be used across all national medical schools to successfully identify students at risk of academic failure and provide efficient strategies to ensure that students complete their medical training.
A limitation of this study is that departmental exams are mainly structured to assess declarative contents.In the future, research work should undertake validation of logistic regression models.

the
Ethics and Research Commissions of the Faculty of Medicine at UNAM (reference number 061/2014).The type of academic background was considered and classified into Colegio de Ciencias y Humanidades (CCH) and Escuela Nacional Preparatoria (ENP), these two being part of UNAM and any other school system was named as "Other", including foreign schools.The scores obtained in the diagnostic exam (designed and validated by the General Directorate of Orientation and Educational Services, UNAM) were gathered.This is a 120 item exam related to general knowledge in the following areas: Physics, Chemistry, Math, Biology, Spanish, Universal History and History of Mexico as well as Geography; all of multiple choice answers.The instrument of career choice factors (designed and validated by the General Directorate of Orientation and Educational Services, UNAM) was applied.Its objective was to assess 44 influential factors within aptitude, interest and expectations regarding career choice.
August 2014 to May 2015 in the following subjects: Cell Biology and Medical Histology, Biochemistry and Molecular Biology and Human Embryology.It is important to high-light that the departmental exams were applied at the same time to all populations and each contained 50 to 70 items chosen by a group of experts from an items bank (Cronbach ≥ 0.87 Y 0.92; level of difficulty = 30 y 70; positive discrimination = 70 y 90).The assessment was carried out through departmental exams (50%) and the professor's criterion 50.Considering that the student has a passing score of 8.5.

Figures 2 -
Figures 2-6 show the probability of success according to different values of continuous variables, in the model.It can be observed that the probability of success is directly proportional to the score obtained in general knowledge, with the aptitude test, with the interest of the student in biological sciences and health, as well as with the motivational

Figure 3 .
Figure 3. Probability of success explained by the skills test.

Figure 4 .
Figure 4. Probability of success explained by the factor biological sciences and health.

Figure 5 .
Figure 5. Probability of success explained by motivators.

Figure 6 .
Figure 6.Probability of success explained by the social prestige factor.