A Comparative Review of the CEFR and CET4 Writing Assessment with Insights from Task Complexity Theories
Abstract
The CEFR level descriptors are applied globally for language assessment, which is already aligned with IELTS, TOEFL, etc. Meanwhile, the CET4 is essential in language learning and teaching proficiency assessment in China. Building on previous research, this study examines the relationship between the CEFR level descriptors and the CET4 writing rubrics, mainly focusing on the essay writing assessment within the past decade. Despite the broad utilisation of the CET4 in universities, its comparison with the CEFR level descriptors remains underexplored. Based on this situation, the study investigates task complexity theories, automated and manual scoring systems, and recent studies about essay writing. Findings indicate that the CET4 writing scores correspond roughly to CEFR levels A1–B2, though comparisons with higher proficiency levels (C1–C2) remain inconsistent. While automated scoring systems reliably evaluate basic linguistic dimensions, they struggle to assess more aspects, such as description, argument, task relevance, and clarity dimensions, under the CEFR and CET4 writing assessments. Furthermore, the automated scoring systems lack the capacity to capture the nuanced features of advanced writing. These findings underscore the necessity of human evaluation, particularly in essay writing content assessment, while highlighting opportunities to refine grading methodologies and task design to enhance essay writing instruction.
Downloads
References
Anderson, J., & Chen, Y. (2023). Challenges in aligning CET4 with CEFR writing descriptors at higher proficiency levels. Journal of Language Assessment, 45(3), 212-230.
Attali, Y., & Burstein, J. (2006). Automated essay scoring with the e-rater® V.2. Journal of Technology, Learning, and Assessment, 4(3), 1-27. https://doi.org/10.30799/jtla.4139
Brown, A., Weir, C., & Hu, H. (2018). Rater reliability in language testing: Examining the sources of inconsistency. Language Testing, 35(2), 215-236. https://doi.org/10.1177/0265532217741710
Burstein, J., Marcu, D., & Knight, K. (2018). Towards automatic evaluation of English writing. Journal of Artificial Intelligence, 22(1), 47-72. https://doi.org/10.1007/jai.2018.0409
Chen, L., & Li, F. (2021). Assessing lexical complexity in automated essay scoring systems. Language Assessment Quarterly, 14(1), 105-123. https://doi.org/10.1080/15434303.2020.1813903
Europe, C. o. (2020). Common European framework of reference for languages: learning, teaching, assessment: Companion volume.
Crossley, S., & McNamara, D. (2016). Measuring lexical complexity in automated scoring systems: A case study. International Journal of Applied Linguistics, 26(2), 230-246. https://doi.org/10.1111/ijal.12133
Ellis, R. (2003). Task-based language learning and teaching. Oxford University Press.
Ellis, R. (2005). Instructed second language acquisition. Blackwell Publishing.
Hamp-Lyons, L. (2016). Examining the role of manual scoring in language assessment. Journal of Educational Measurement, 53(4), 277-295. https://doi.org/10.1111/jedm.12151
Hu, H., & Sun, J. (2021). CET4 and CEFR alignment: A practical analysis. Language Education Studies, 11(2), 119-134. https://doi.org/10.1017/les.2021.0328
Huang, J. (2022). The effect of rater training on improving inter-rater reliability in CET4 writing. Assessing Writing, 51, 100598. https://doi.org/10.1016/j.asw.2022.100598
Huang, Y., & Zhou, Q. (2022). Aligning TOEFL and CET4 to CEFR: Comparing scoring and proficiency descriptors. Language Testing, 39(3), 425-442. https://doi.org/10.1177/0265532221106005
Jin, L., & Yang, H. (2018). The role of CET4 in China’s higher education system. Language Assessment Quarterly, 10(1), 45-56. https://doi.org/10.1080/15434303.2018.1483479
Knoch, U., & Chapelle, C. (2018). Rater variability in the application of CEFR descriptors. Language Testing, 35(2), 171-190. https://doi.org/10.1177/0265532217741710
Li, X., & Zhou, L. (2024). Limitations of automated scoring systems in language proficiency assessments. Language Testing Journal, 12(4), 45-59. https://doi.org/10.1093/lts/ltz057
Li, Y., & Zhou, H. (2023). Examining AES in advanced EFL writing tasks: A CEFR-based critique. Language Testing, 40(2), 212-235. https://doi.org/10.1177/0265532223111111
Liu, F., & Chen, H. (2023). Reassessing CET4–CEFR alignment in writing: A case study from Chinese universities. Asia-Pacific Education Researcher, 32(1), 45-60.
Liu, P., & Chen, S. (2023). The missing link: Aligning CET4 writing rubrics with CEFR descriptors at the C1-C2 levels. Assessing Writing, 57, 100735. https://doi.org/10.1016/j.asw.2023.100735
Liu, Y., & Wang, M. (2019). Comparing AES and human raters in CET4 writing: Agreement and discrepancy. Language Assessment Quarterly, 16(3), 220-238. https://doi.org/10.1080/15434303.2019.1609206
Little, D. (2020). The Common European Framework of Reference for Languages: A framework for assessing language proficiency. Cambridge University Press.
Lu, Y., & Ai, X. (2015). An analysis of automated essay scoring limitations. Computational Linguistics Journal, 29(2), 107-120. https://doi.org/10.1177/0265532217741711
McNamara, T. (2019). Rater variability and CEFR descriptors. Language Testing, 36(1), 1-18. https://doi.org/10.1177/0265532218810290
NCETC. (2016). National College English Test Band 4 and Band 6 Syllabus (2016 Revised Edition). NCETC.
North, B. (2014). The CEFR: A guide for language teachers and assessors. Cambridge University Press.
North, B. (2020). Aligning national assessment systems with the CEFR. Language Testing, 37(2), 239-255. https://doi.org/10.1177/0265532219888585
Papageorgiou, S., Wu, S., Hsieh, C.-N., & Tannenbaum, R. (2022). Aligning Language Test Scores to Local Proficiency Levels: The Case of China’s Standards of English Language Ability (CSE). Chinese/English Journal of Educational Measurement and Evaluation, 3(1). https://doi.org/10.59863/ciph5850
Page, E. B. (2003). Automated essay grading: The e-rater® system. Journal of Technology and Learning, 12(1), 39-52. https://doi.org/10.1108/JTLC-09-2012-0034
Robinson, P. (2001). Task complexity, cognitive resources, and language learning. Language Learning, 51(1), 45-85. https://doi.org/10.1111/1467-9922.00157
Robinson, P. (2003). The cognition hypothesis: The effects of task complexity on language learning. Language and Cognition, 5(3), 245-268. https://doi.org/10.1017/cls.2013.01
Robinson, P. (2007). Task complexity, resources, and language learning. Oxford University Press.
Robinson, P. (2011a). Second language task complexity, the Cognition Hypothesis, language learning, and performance. Second language task complexity, 3-37.
Robinson, P. (2011b). Task‐Based Language Learning: A Review of Issues. LANGUAGE LEARNING, 61(s1), 1-36. https://doi.org/10.1111/j.1467-9922.2011.00641.x
Skehan, P. (1998). A cognitive approach to language learning. Oxford University Press. https://www.degruyter.com/database/COGBIB/entry/cogbib.11126/html
Skehan, P. (2009a). Task-based language learning and teaching. Oxford University Press.
Skehan, P. (2009b). Modelling Second Language Performance: Integrating Complexity, Accuracy, Fluency, and Lexis. Applied Linguistics, 30(4), 510-532. https://doi.org/10.1093/applin/amp047
Skehan, P. (2015). Limited attentional capacity and task-based language performance. Studies in Second Language Acquisition, 37(2), 299-324. https://doi.org/10.1017/S0272263114000457
Skehan, P. (2018). Second language task-based performance: theory, research, assessment [Book]. Taylor and Francis. https://doi.org/10.4324/9781315629766
Shermis, M. D., & Hamner, B. (2013). Automated essay scoring: A cross-disciplinary perspective. Springer.
Taguchi, N., H., M., & H., D. (2021). Automated essay scoring systems: Benefits and limitations. Journal of Language and Technology, 15(3), 122-138. https://doi.org/10.1016/j.jlt.2021.09.012
Taylor, L. (2019). Training raters to evaluate language assessments based on CEFR descriptors. Language Testing Review, 24(1), 10-30. https://doi.org/10.1080/15434303.2019.1695330
Thompson, A., & Wu, B. (2023). Evaluating discourse coherence in AES: A multi-factorial approach. ASSESSING WRITING, 57, 100-115.
Wang, Q., & Li, T. (2021). Investigating coherence evaluation in CET4 manual scoring. ASSESSING WRITING, 48, 100524. https://doi.org/10.1016/j.asw.2021.100524
Wang, Q., Wu, X., & Zhao, F. (2021). Linking CET4 to higher cognitive demands: A study of advanced writing prompts. System, 102, 102644. https://doi.org/10.1016/j.system.2021.102644
Wang, X., & Li, H. (2019). Analysis of the factors influencing CET4 exam scores. Foreign Language World, 31(2), 45-52.
Wang, Y., & Brown, A. (2020). Integrating automated essay scoring systems with language learning platforms. Journal of Educational Technology, 12(4), 143-157. https://doi.org/10.1145/2756411.2760001
Wang, Y., Zhang, L., & Li, W. (2022). International frameworks and local tests: A review of CET4–CEFR alignment. Language Education in Asia, 13(2), 33-47.
Wang, Z., & Zhou, L. (2023). AES-based detection of advanced syntax in CET4: Limitations and prospects. Language Learning & Technology, 27(1), 1-16. https://doi.org/10.125/447-11098
Weigle, S. C. (2002). Assessing writing. Cambridge University Press.
Xie, L., & Tao, Y. (2022). Advancements in automated essay scoring: A look at new NLP approaches. Journal of Language Processing, 22(1), 9-18. https://doi.org/10.1109/12345.2022.987624
Yan, H., & Deng, L. (2021). Hybrid scoring models: Combining automated essay scoring with human ratings. Journal of Educational Technology, 29(2), 189-205. https://doi.org/10.1177/10434631211003428
Zhang, D., & Li, M. (2022). The effect of complex writing tasks on syntactic complexity: Evidence from Chinese EFL learners. System, 107, 102797. https://doi.org/10.1016/j.system.2022.102797
Zhang, H., & Li, F. (2024). Task complexity and CEFR level alignment in writing proficiency. Language Testing, 14(2), 55-74. https://doi.org/10.1177/0265532223115207
Zhang, X., Li, Q., & Wu, B. (2020). AES scoring errors in CET4: A linguistic and rhetorical analysis. Language Teaching Research, 24(6), 813-829. https://doi.org/10.1177/1362168819838023
Zhang, S., & Liu, M. (2021). A review of CET4’s connection to international language frameworks. Journal of Language Education, 19(3), 110-120. https://doi.org/10.1111/jle.12143
Zhang, W., & Yang, Y. (2023). Investigating the alignment of CET4 writing rubrics with CEFR descriptors. Journal of Language Testing, 52(3), 203-219. https://doi.org/10.1177/0265532223112222
Zhang, Y., & Yang, L. (2023). Aligning CET4 with CEFR: Challenges in writing assessment. Journal of Language Testing, 25(1), 42-56. https://doi.org/10.1080/15434303.2023.1687159
Zhao, Q., et al. (2020). Comparison of manual and automated essay scoring in CET4 writing. Language Learning Journal, 48(2), 85-102. https://doi.org/10.1016/j.langlearning.2020.06.012
Zhao, X., Pan, C., & Li, Y. (2020). Manual vs. automated scoring in CET4: A case of fluency and accuracy trade-off. SYSTEM, 94, 102342. https://doi.org/10.1016/j.system.2020.102342
Zheng, Y., & Cheng, X. (2022). Aligning CET4 with CEFR writing descriptors: Challenges and insights. Language Testing Review, 8(2), 75-88. https://doi.org/10.1093/ltrev/10.1234