查看论文信息

查看全文

查看论文信息

中文题名：	小学阅读认知诊断计算机化自适应测评的开发与验证
姓名：	李燕
保密级别：	公开
论文语种：	中文
学科代码：	04020005
学科专业：	05心理测量学（040200）
学生类型：	博士
学位：	教育学博士
学位类型：	学术学位
学位年度：	2022
校区：	北京校区培养
学院：	心理学部
研究方向：	认知诊断自适应测评
第一导师姓名：	刘嘉
第一导师单位：	现清华大学心理系 ; 曾就职于北京师范大学心理学部
提交日期：	2022-06-20
答辩日期：	2022-06-20
外文题名：	The development and validation of Chinese reading cognitive diagnostic computerized adaptive testing for primary students
中文关键词：	小学生 ; 阅读 ; 认知诊断计算机化自适应测评 ; 认知诊断评估 ; 测验等值 ; 测评开发与验证 ; 蒙特卡洛模拟技术
外文关键词：	Primary students ; Reading ; Cognitive Diagnostic Computerized Adaptive Testing ; Cognitive Diagnostic Assessment ; Test Equating ; Test Development and Validation ; Monte-Carlo Simulation
中文摘要：	︿阅读能力被广泛认可为个体学业发展和职业成功的必备前提。在处于发展关键期的小学阶段，对阅读能力的优劣势诊断和成长监测是小学师生真实、迫切的应用需求。但在我国教育质量监测、课堂测评、考试等实用场景中，采用孤立试卷生成单一分数和排名的报告形式仍占据了主导地位。这种单一评价形式存在着分数至上、加重应试负担、伤害自主性和自尊心等问题，不利于学生全面发展，更难以应对我国建设高质量教育体系远景目标中提出的“减负增效”新挑战。认知诊断计算机化自适应测评（CD-CAT）是目前最先进的个性化智能测评技术。CD-CAT充分结合了认知诊断评估和计算机化自适应测评技术的优势，可根据即时作答进行自适应估计和题目推荐，进而利用更少的题量、更短的时长，为学生提供更精确、更详细的优劣势诊断信息。在过去十年，CD-CAT取得了大量方法学的进展，但也存在着模拟为重、实证稀缺、信效度证据不充分等问题，致使其价值难以在真实教育场景中体现，进而难以提升教育者的使用信心，真正推动教育测评的改革。为了切实解决上述问题，本文围绕着理论基础、应用准备、应用效果三个角度设计研究，系统化探索了CD-CAT等前沿测评技术在真实的小学生汉语阅读测评中的应用效果。具体来说，研究一遵循认知诊断评估的开发框架，构建和比较了多种汉语阅读的内在属性结构和认知诊断模型，开发验证了《小学生阅读认知诊断测验》。研究一中共组织了23018名小学2-6年级学生参与测评，结果说明小学三个学段的汉语阅读含有六个稳定的认知属性，即提取信息、理解推论、整体感知、反思评价、文学作品、实用文本，GDINA模型的性能显著和稳定地优于四种代表性简约模型（DINA、DINO、A-CDM、R-RUM），且《小学生阅读认知诊断测验》具有良好的诊断信度、效度和可用性，能可靠地区分各学段内小学生的阅读优势与不足。在研究一的基础上，研究二承上启下，编制了一套适用于小学2-6年级学生的阅读认知诊断题库。通过多种等值设计和45所学校的28485名小学2-6年级学生的测评数据集，实现了195个优质项目的参数校准，以满足认知诊断自适应和阅读能力发展监测的需求。结果形成了一套内容均衡、诊断鉴别力优、项目参数已校准的小学阅读标定题库和小学2-6年级学生的阅读属性掌握模式分布和发展情况，进而能够为小学生提供可靠的发展性阅读诊断信息。以研究二中的真实题库参数和属性掌握模式分布为基础，研究三通过蒙特卡洛模拟了阅读CD-CAT系统的应用效果。通过两个模拟实验，研究者比较了多种校准模型、选题策略、曝光控制法、终止策略等成分的表现，结果发现经混合模型校准的项目库在判准率上明显优于GDINA模型。JSD选题策略比GDI法略好，且都显著优于MPWKL、PWKL和随机选题策略。限制渐进法等曝光控制策略可以有效改善题库使用均匀性和测验效率等指标。进一步比较发现，与随机选题和传统固定题本相比，CD-CAT有效地提高了模式水平的诊断信度和属性判准率，保证准确性的同时还减轻了学生的测评作答负担。总之，本研究首次开发和验证了一套符合课程需求、有诊断功能、可监测发展的《小学生阅读认知诊断自适应测评》。本CD-CAT系统的预期效果优于传统测评，具有较高的应用、落地价值，有助于实现学生阅读能力和技能的有效诊断。同时，本研究也围绕改进结果评价、强化过程评价、探索增值评价3个方面展开探索，为落实“减负增效”的教育改革提供有价值的参考依据。﹀
外文摘要：	︿ Reading ability is widely recognized as a prerequisite for an individual’s academic development and career success. As primary education is the critical stage of reading development, the diagnosis and growth monitoring of reading are real and urgent application needs of teachers and students. In China, single reports of testing scores and rankings still play a major role in assessment applications such as the national educational monitoring, classroom assessment, and testing. This single evaluation form has led to many problems including the supremacy of scores, increased testing burden, hurting autonomy and self-esteem of students, which are not conducive to their all-around development. This also presents challenges in the long-term goal of “reducing burden and increasing efficiency” in the construction of a high-quality education system in China. Cognitive diagnostic computerized adaptive testing (CD-CAT) is a world-leading personalized intelligent assessment technology that combines the advantages of cognitive diagnostic assessment (CDA) and computerized adaptive testing (CAT) to provide more accurate and detailed diagnostic information on students' strengths and weaknesses with fewer items and testing burden, by adaptive recommending items based on instant estimation of attribute mastery pattern. In the past decade, CD-CAT has made rapid methodological progress, but it is also plagued by problems such as simulation-oriented, scarce empirical evidence, and limited evidence of reliability and validity, which make it difficult to bring its value to real-life scenarios such as classrooms and examinations, and thus to improve educators' confidence in using it. To address these issues, this paper systematically explored the effectiveness of CDA, CD-CAT, and other cutting-edge assessment technologies in the application of Chinese reading assessment to real elementary school students from three perspectives: theoretical foundation, application preparation, and application effectiveness. Specifically, Study 1 followed the Chinese curriculum standards and the development framework of CDA, explored the structure of reading attributes and optimal cognitive diagnostic models in three key elementary stages, and developed and validated the Diagnostic Chinese Reading Comprehension Assessment (DCRCA). Results showed that Chinese reading at the primary level contains six stable cognitive attributes, namely, retrieving information, making inferences, integration and summation, reflective evaluation, literary texts, and practical texts. Meanwhile, the assessment data of 23,018 students in grades 2-6 showed that the GDINA model performed significantly and consistently better than four representative parsimonious models (i.e. DINA, DINO, A-CDM and R-RUM). The DCRCA has good diagnostic reliability, validity, and usability, and can reliably distinguish the reading strengths and weaknesses of primary students. Based on study 1, study 2 constructed a calibrated reading item bank for primary students. The calibration of item parameters was achieved through multiple equating designs and three assessment datasets to meet the application needs of the CD-CAT and reading growth monitoring. The final item bank consists of 195 reading items that are content-balanced, high-quality, and calibrated. Based on the distribution and development of reading attribute mastery patterns of 28,485 primary students in grades 2-6 from 45 schools, this item bank can provide reliable developmental diagnostic information for Chinese primary students. Using Monte Carlo simulation techniques, study 3 compared the effectiveness of various CD-CAT systems on the real calibrated item parameters and the students’ distribution of attribute mastery patterns in study 2. The results showed that CD-CAT effectively improved the attribute and pattern classification accuracy compared with random strategy and traditional assessments. Through an in-depth comparison of the performance of calibration CDMs, item selection strategies, exposure control methods, and termination strategies, it was found that the item bank calibrated by item-level mixed models significantly outperformed the test-level GDINA model in terms of classification accuracy; while the JSD item selection strategy was slightly better than that of the GDI method, and both JSD and GDI were significantly better than that of the MPWKL, PWKL and random selection strategies. Exposure control strategies such as the progressive restriction method effectively improved the uniformity of item bank usage and testing efficiency. In conclusion, this study developed and validated the DCRCA and setting of a reading CD-CAT system, which can provide fine-grained diagnostic feedback on students’ reading growth intelligently. The expected effects of the CD-CAT system are significantly better than random strategy and the traditional fixed assessment, providing valuable evidence for the implementation of the educational reform of “reducing burden and increasing efficiency”. ﹀
参考文献总数：	176
馆藏地：	图书馆学位论文阅览区（主馆南区三层BC区）
馆藏号：	博040200-05/22004
开放日期：	2023-06-20

附件下载