A Comparative Review of the CEFR and CET4 Writing Assessment with Insights from Task Complexity Theories

  • Changlin Li Centre for Modern Languages, Universiti Malaysia Pahang Al-Sultan Abdullah, 26000, Pekan Pahang, Malaysia
  • Nik Aloesnita Nik Mohd Alwi Centre for Modern Languages, Universiti Malaysia Pahang Al-Sultan Abdullah, 26000, Pekan Pahang, Malaysia
  • Mohammad Musab Azmat Ali Centre for Modern Languages, Universiti Malaysia Pahang Al-Sultan Abdullah, 26000, Pekan Pahang, Malaysia
Keywords: CEFR, CET4, Writing assessment, Comparison

Abstract

The CEFR level descriptors are applied globally for language assessment, which is already aligned with IELTS, TOEFL, etc. Meanwhile, the CET4 is essential in language learning and teaching proficiency assessment in China. Building on previous research, this study examines the relationship between the CEFR level descriptors and the CET4 writing rubrics, mainly focusing on the essay writing assessment within the past decade. Despite the broad utilisation of the CET4 in universities, its comparison with the CEFR level descriptors remains underexplored. Based on this situation, the study investigates task complexity theories, automated and manual scoring systems, and recent studies about essay writing. Findings indicate that the CET4 writing scores correspond roughly to CEFR levels A1–B2, though comparisons with higher proficiency levels (C1–C2) remain inconsistent. While automated scoring systems reliably evaluate basic linguistic dimensions, they struggle to assess more aspects, such as description, argument, task relevance, and clarity dimensions, under the CEFR and CET4 writing assessments. Furthermore, the automated scoring systems lack the capacity to capture the nuanced features of advanced writing. These findings underscore the necessity of human evaluation, particularly in essay writing content assessment, while highlighting opportunities to refine grading methodologies and task design to enhance essay writing instruction.

Downloads

Download data is not yet available.

References

Anderson, J., & Chen, Y. (2023). Challenges in aligning CET4 with CEFR writing descriptors at higher proficiency levels. Journal of Language Assessment, 45(3), 212-230.

Attali, Y., & Burstein, J. (2006). Automated essay scoring with the e-rater® V.2. Journal of Technology, Learning, and Assessment, 4(3), 1-27. https://doi.org/10.30799/jtla.4139

Brown, A., Weir, C., & Hu, H. (2018). Rater reliability in language testing: Examining the sources of inconsistency. Language Testing, 35(2), 215-236. https://doi.org/10.1177/0265532217741710

Burstein, J., Marcu, D., & Knight, K. (2018). Towards automatic evaluation of English writing. Journal of Artificial Intelligence, 22(1), 47-72. https://doi.org/10.1007/jai.2018.0409

Chen, L., & Li, F. (2021). Assessing lexical complexity in automated essay scoring systems. Language Assessment Quarterly, 14(1), 105-123. https://doi.org/10.1080/15434303.2020.1813903

Europe, C. o. (2020). Common European framework of reference for languages: learning, teaching, assessment: Companion volume.

Crossley, S., & McNamara, D. (2016). Measuring lexical complexity in automated scoring systems: A case study. International Journal of Applied Linguistics, 26(2), 230-246. https://doi.org/10.1111/ijal.12133

Ellis, R. (2003). Task-based language learning and teaching. Oxford University Press.

Ellis, R. (2005). Instructed second language acquisition. Blackwell Publishing.

Hamp-Lyons, L. (2016). Examining the role of manual scoring in language assessment. Journal of Educational Measurement, 53(4), 277-295. https://doi.org/10.1111/jedm.12151

Hu, H., & Sun, J. (2021). CET4 and CEFR alignment: A practical analysis. Language Education Studies, 11(2), 119-134. https://doi.org/10.1017/les.2021.0328

Huang, J. (2022). The effect of rater training on improving inter-rater reliability in CET4 writing. Assessing Writing, 51, 100598. https://doi.org/10.1016/j.asw.2022.100598

Huang, Y., & Zhou, Q. (2022). Aligning TOEFL and CET4 to CEFR: Comparing scoring and proficiency descriptors. Language Testing, 39(3), 425-442. https://doi.org/10.1177/0265532221106005

Jin, L., & Yang, H. (2018). The role of CET4 in China’s higher education system. Language Assessment Quarterly, 10(1), 45-56. https://doi.org/10.1080/15434303.2018.1483479

Knoch, U., & Chapelle, C. (2018). Rater variability in the application of CEFR descriptors. Language Testing, 35(2), 171-190. https://doi.org/10.1177/0265532217741710

Li, X., & Zhou, L. (2024). Limitations of automated scoring systems in language proficiency assessments. Language Testing Journal, 12(4), 45-59. https://doi.org/10.1093/lts/ltz057

Li, Y., & Zhou, H. (2023). Examining AES in advanced EFL writing tasks: A CEFR-based critique. Language Testing, 40(2), 212-235. https://doi.org/10.1177/0265532223111111

Liu, F., & Chen, H. (2023). Reassessing CET4–CEFR alignment in writing: A case study from Chinese universities. Asia-Pacific Education Researcher, 32(1), 45-60.

Liu, P., & Chen, S. (2023). The missing link: Aligning CET4 writing rubrics with CEFR descriptors at the C1-C2 levels. Assessing Writing, 57, 100735. https://doi.org/10.1016/j.asw.2023.100735

Liu, Y., & Wang, M. (2019). Comparing AES and human raters in CET4 writing: Agreement and discrepancy. Language Assessment Quarterly, 16(3), 220-238. https://doi.org/10.1080/15434303.2019.1609206

Little, D. (2020). The Common European Framework of Reference for Languages: A framework for assessing language proficiency. Cambridge University Press.

Lu, Y., & Ai, X. (2015). An analysis of automated essay scoring limitations. Computational Linguistics Journal, 29(2), 107-120. https://doi.org/10.1177/0265532217741711

McNamara, T. (2019). Rater variability and CEFR descriptors. Language Testing, 36(1), 1-18. https://doi.org/10.1177/0265532218810290

NCETC. (2016). National College English Test Band 4 and Band 6 Syllabus (2016 Revised Edition). NCETC.

North, B. (2014). The CEFR: A guide for language teachers and assessors. Cambridge University Press.

North, B. (2020). Aligning national assessment systems with the CEFR. Language Testing, 37(2), 239-255. https://doi.org/10.1177/0265532219888585

Papageorgiou, S., Wu, S., Hsieh, C.-N., & Tannenbaum, R. (2022). Aligning Language Test Scores to Local Proficiency Levels: The Case of China’s Standards of English Language Ability (CSE). Chinese/English Journal of Educational Measurement and Evaluation, 3(1). https://doi.org/10.59863/ciph5850

Page, E. B. (2003). Automated essay grading: The e-rater® system. Journal of Technology and Learning, 12(1), 39-52. https://doi.org/10.1108/JTLC-09-2012-0034

Robinson, P. (2001). Task complexity, cognitive resources, and language learning. Language Learning, 51(1), 45-85. https://doi.org/10.1111/1467-9922.00157

Robinson, P. (2003). The cognition hypothesis: The effects of task complexity on language learning. Language and Cognition, 5(3), 245-268. https://doi.org/10.1017/cls.2013.01

Robinson, P. (2007). Task complexity, resources, and language learning. Oxford University Press.

Robinson, P. (2011a). Second language task complexity, the Cognition Hypothesis, language learning, and performance. Second language task complexity, 3-37.

Robinson, P. (2011b). Task‐Based Language Learning: A Review of Issues. LANGUAGE LEARNING, 61(s1), 1-36. https://doi.org/10.1111/j.1467-9922.2011.00641.x

Skehan, P. (1998). A cognitive approach to language learning. Oxford University Press. https://www.degruyter.com/database/COGBIB/entry/cogbib.11126/html

Skehan, P. (2009a). Task-based language learning and teaching. Oxford University Press.

Skehan, P. (2009b). Modelling Second Language Performance: Integrating Complexity, Accuracy, Fluency, and Lexis. Applied Linguistics, 30(4), 510-532. https://doi.org/10.1093/applin/amp047

Skehan, P. (2015). Limited attentional capacity and task-based language performance. Studies in Second Language Acquisition, 37(2), 299-324. https://doi.org/10.1017/S0272263114000457

Skehan, P. (2018). Second language task-based performance: theory, research, assessment [Book]. Taylor and Francis. https://doi.org/10.4324/9781315629766

Shermis, M. D., & Hamner, B. (2013). Automated essay scoring: A cross-disciplinary perspective. Springer.

Taguchi, N., H., M., & H., D. (2021). Automated essay scoring systems: Benefits and limitations. Journal of Language and Technology, 15(3), 122-138. https://doi.org/10.1016/j.jlt.2021.09.012

Taylor, L. (2019). Training raters to evaluate language assessments based on CEFR descriptors. Language Testing Review, 24(1), 10-30. https://doi.org/10.1080/15434303.2019.1695330

Thompson, A., & Wu, B. (2023). Evaluating discourse coherence in AES: A multi-factorial approach. ASSESSING WRITING, 57, 100-115.

Wang, Q., & Li, T. (2021). Investigating coherence evaluation in CET4 manual scoring. ASSESSING WRITING, 48, 100524. https://doi.org/10.1016/j.asw.2021.100524

Wang, Q., Wu, X., & Zhao, F. (2021). Linking CET4 to higher cognitive demands: A study of advanced writing prompts. System, 102, 102644. https://doi.org/10.1016/j.system.2021.102644

Wang, X., & Li, H. (2019). Analysis of the factors influencing CET4 exam scores. Foreign Language World, 31(2), 45-52.

Wang, Y., & Brown, A. (2020). Integrating automated essay scoring systems with language learning platforms. Journal of Educational Technology, 12(4), 143-157. https://doi.org/10.1145/2756411.2760001

Wang, Y., Zhang, L., & Li, W. (2022). International frameworks and local tests: A review of CET4–CEFR alignment. Language Education in Asia, 13(2), 33-47.

Wang, Z., & Zhou, L. (2023). AES-based detection of advanced syntax in CET4: Limitations and prospects. Language Learning & Technology, 27(1), 1-16. https://doi.org/10.125/447-11098

Weigle, S. C. (2002). Assessing writing. Cambridge University Press.

Xie, L., & Tao, Y. (2022). Advancements in automated essay scoring: A look at new NLP approaches. Journal of Language Processing, 22(1), 9-18. https://doi.org/10.1109/12345.2022.987624

Yan, H., & Deng, L. (2021). Hybrid scoring models: Combining automated essay scoring with human ratings. Journal of Educational Technology, 29(2), 189-205. https://doi.org/10.1177/10434631211003428

Zhang, D., & Li, M. (2022). The effect of complex writing tasks on syntactic complexity: Evidence from Chinese EFL learners. System, 107, 102797. https://doi.org/10.1016/j.system.2022.102797

Zhang, H., & Li, F. (2024). Task complexity and CEFR level alignment in writing proficiency. Language Testing, 14(2), 55-74. https://doi.org/10.1177/0265532223115207

Zhang, X., Li, Q., & Wu, B. (2020). AES scoring errors in CET4: A linguistic and rhetorical analysis. Language Teaching Research, 24(6), 813-829. https://doi.org/10.1177/1362168819838023

Zhang, S., & Liu, M. (2021). A review of CET4’s connection to international language frameworks. Journal of Language Education, 19(3), 110-120. https://doi.org/10.1111/jle.12143

Zhang, W., & Yang, Y. (2023). Investigating the alignment of CET4 writing rubrics with CEFR descriptors. Journal of Language Testing, 52(3), 203-219. https://doi.org/10.1177/0265532223112222

Zhang, Y., & Yang, L. (2023). Aligning CET4 with CEFR: Challenges in writing assessment. Journal of Language Testing, 25(1), 42-56. https://doi.org/10.1080/15434303.2023.1687159

Zhao, Q., et al. (2020). Comparison of manual and automated essay scoring in CET4 writing. Language Learning Journal, 48(2), 85-102. https://doi.org/10.1016/j.langlearning.2020.06.012

Zhao, X., Pan, C., & Li, Y. (2020). Manual vs. automated scoring in CET4: A case of fluency and accuracy trade-off. SYSTEM, 94, 102342. https://doi.org/10.1016/j.system.2020.102342

Zheng, Y., & Cheng, X. (2022). Aligning CET4 with CEFR writing descriptors: Challenges and insights. Language Testing Review, 8(2), 75-88. https://doi.org/10.1093/ltrev/10.1234

Published
2025-03-29
How to Cite
Li, C., Nik Mohd Alwi, N. A. and Azmat Ali, M. M. (2025) “A Comparative Review of the CEFR and CET4 Writing Assessment with Insights from Task Complexity Theories”, Malaysian Journal of Social Sciences and Humanities (MJSSH), 10(3), p. e003251. doi: 10.47405/mjssh.v10i3.3251.
Section
Articles