Prediction Model of School Drop Out Factors Using Classification Techniques in Selangor

  • Siti Rafidah Sariman Fakulti Pengajian Pendidikan, Universiti Putra Malaysia, 43400 Serdang, Selangor, Malaysia.
  • Habibah Ab Jalil Fakulti Pengajian Pendidikan, Universiti Putra Malaysia, 43400 Serdang, Selangor, Malaysia.
  • Erzam Marlisah Fakulti Sains Komputer Dan Teknologi Maklumat, Universiti Putra Malaysia, 43400 Serdang, Selangor, Malaysia.
Keywords: Predicting drop out, Classification, Data mining


Malaysia has been struggling to sustain the number of students graduating from school, with an increasing number leaving early, which poses significant concerns for the future generation. Several factors contribute to this issue, such as economic constraints, geographical challenges, transportation problems, and sociocultural norms. This paper uses a data mining approach to identify the attributes that lead to school dropouts and to determine the predictive model with the highest accuracy for forecasting dropout rates. The application of data mining approaches has proven effective in predicting students at risk of dropping out in general education. Nevertheless, there is a shortage of data mining-related studies on student attrition in public schools in Malaysia. The study utilized student and school datasets with consent from the respective departments in the Ministry of Education. This data includes information from 2,482 students across various primary schools in Selangor with initially 22 attributes collected from the dataset. After the attributes undergone feature selection process by using InfoGainAttributeEval, there are 12 features left including class attribute Status_DO. The collected data encompasses student demographics, academic performance and socioeconomic background. The experiments for this study used Decision Trees (J48), Naïve Bayes and Random Forest. By using classification techniques that were made available in WEKA, all attributes from the dataset were tested. The results of the analysis shown that Random Forest with the highest accuracy of 79.5729% in term of predicting student drop out hence indicate the reliability of this research as a decision support tool.


Download data is not yet available.


Al-Radaideh, Q. A., Al-Shawakfa, E. M., & Al-Najjar, M. I. (2006). Mining Student Data Using Decision Trees. ACIT.

Dupere, V., Leventhal, T., Dion, E., Crosnoe, R., Archambault, I., & Janosz, M. (2015). A Stress Process, Life Course Framework of Dropout. Review of Educational Research, 85(4), 591–629.

Gil, J. S., Delima, A. J. P., & Vilchez, R. N. (2020). Predicting Students’ Dropout Indicators in Public School using Data Mining Approaches Jay. International Journal of Advanced Trends in Computer Science and Engineering, 9(1), 5–9.

Heppen, J. B., & Therriault, S. B. (2008). Developing early warning systems to identify potential high school dropouts. Washington DC : The National High School Center at the American Institutes for Research.

Mardolkar, M., & Kumaran, N. (2020). Forecasting and Avoiding Student Dropout Using the K-Nearest Neighbor Approach. SN Computer Science, 1(2).

Márquez-Vera, C., Cano, A., Romero, C., Noaman, A. Y. M., Mousa Fardoun, H., & Ventura, S. (2016). Early dropout prediction using data mining: A case study with high school students. Expert Systems, 33(1), 107–124.

McFarland, J., Hussar, B., Zhang, J., Wang, X., Wang, K., Hein, S., Diliberti, M., Forrest Cataldi, E., Bullock Mann, F., and Barmer, A. (2019). The Condition of Education 2019. NCES 2019144.

Mduma, N., Kalegele, K., & Machuve, D. (2019). Machine Learning Approach For Reducing Students Dropout Rates. International Journal of Advanced Computer Research, 9(42), 156–169.

Ministry of Education. (2018). Garis Panduan Mengurus Murid Berisiko Cicir di Sekolah. Ministry of Education. https://

Musiliu, B. (2020). Comparison of Feature Selection Techniques for Predicting Student ’ s Academic Performance. International Journal of Research and Scientific Innovation, 7(8), 97-101.

Nicoletti, Maria do Carmo. (2019). Revisiting the Tinto’s Theoretical Dropout Model. Higher Education Studies, 9(3), 52.

Provost, P., Fawcett, T. (2013). Data Science for Business: What You Need To Know About Data Mining and Data-Analytic Thinking. O’Reilly Media.

Roslan, N. (2021). Prediction of Student Dropout in Malaysian’s Private Higher Education Institute using Data Mining Application. Turkish Journal of Computer and Mathematics Education (TURCOMAT), 12(3), 2326–2334.

Rumberger, R. W., & Lim, S. A. (2008). Why Students Drop Out of School : A Review of 25 Years of Research. In California Dropout Research Project. Retrieved from

Tinto, V. (1975). Dropout from Higher Education: A Theoretical Synthesis of Recent Research. Review of Educational Research, 45(1), 89–125.

UNICEF and UIS (2016). Monitoring Education Participation: Framework for Monitoring Children and Adolescents who are Out of School or at Risk of Dropping Out. UNICEF Series on Education Participation and Dropout Prevention, Vol I. UNICEF Regional Office for Central and Eastern Europe and the Commonwealth of Independent States.

Vijayakumaran, N., Mohd Yusof, H., Oulaganathan, S., & Saundra Rajan, D. K. (2023). The Impact of Parental Involvement and Student Engagement on School Dropout Intention: A Systematic Literature Review. International Journal of Education, Psychology and Counseling, 8(50), 36–46.

Viloria, A., Padilla, J. G., Vargas-Mercado, C., Hernández-Palma, H., Llinas, N. O., & David, M. A. (2019). Integration of data technology for analyzing university dropout. Procedia Computer Science, 155(2018), 569–574.

White, S., & Kelly, F. (2010). The School Counselor’s Role in School Dropout Prevention. Journal of Counseling and Development, 88(2), 227–235.

How to Cite
Sariman, S. R., Ab Jalil, H. and Marlisah, E. (2024) “Prediction Model of School Drop Out Factors Using Classification Techniques in Selangor”, Malaysian Journal of Social Sciences and Humanities (MJSSH), 9(6), p. e002867. doi: 10.47405/mjssh.v9i6.2867.