查看论文信息

查看全文

查看论文信息

中文题名：	多维多级计算机化自适应测验的在线标定
姓名：	原露
保密级别：	公开
论文语种：	chi
学科代码：	04020005
学科专业：	05心理测量学（040200）
学生类型：	博士
学位：	教育学博士
学位类型：	学术学位
学位年度：	2024
校区：	北京校区培养
学院：	中国基础教育质量监测协同创新中心
研究方向：	心理测量
第一导师姓名：	陈平
第一导师单位：	中国基础教育质量监测协同创新中心
提交日期：	2024-06-27
答辩日期：	2024-05-26
外文题名：	Online calibration in multidimensional computerized adaptive testing with polytomously scored items
中文关键词：	计算机化自适应测验 ; 在线标定 ; 题库建设与维护 ; 项目反应理论 ; 神经网络
外文关键词：	computerized adaptive testing ; online calibration ; construction and maintenance of item bank ; item response theory ; neural network
中文摘要：	︿高质量的题库建设是计算机化自适应测验（computerized adaptive testing, CAT）高效、公平实现的重要基石。为保持CAT题库的活力和可持续使用，应及时使用优良的新题替换出现问题的旧题，其中对新题的准确、高效标定既是研究重点也是技术难点。在线标定是CAT中新题标定的核心关键技术，目前已被广泛应用于各种形式的CAT，包括单维CAT（unidimensional CAT, UCAT）、多维CAT（multidimensional CAT, MCAT）、多级计分CAT和认知诊断CAT。然而，现有的CAT在线标定方法和设计仍不足以满足现实中复杂测验情境的需求。相比于单一维度或二级计分测评，多维多级测评数据在心理与教育领域更为普遍，但只有少量研究关注多维多级CAT（MCAT with polytomously scored items, P-MCAT）的在线标定。因此，当下亟需将已有的在线标定方法和设计拓展至P-MCAT情境。传统在线标定方法一般基于结合最大期望算法的边际极大似然估计（marginal maximum likelihood estimation with an expectation-maximization algorithm, MMLE/EM）来实现，理论上擅长在维度数量和反应类别数量少、标定样本量大的情境下标定新题；但可能难以在其他（特别是标定样本量较小）情境下实现准确标定或成功收敛。因此，尽管引入基于MMLE/EM的P-MCAT在线标定方法具有重要意义，还需引入新思路、新视角来实现标定样本量较小情境下的新题标定。此外，已有的在线标定研究都依赖蒙特卡洛模拟来实现且假设题库中的参数值准确无误，难以反映实践中遇到的挑战（如题目参数的估计和标识误差，考生作答中的猜测、练习和疲劳等效应）。为解决上述问题，本文在P-MCAT情境下提出新的在线标定方法/设计，并开展三个模拟研究和一个实证研究来全面探讨它们的表现。研究一将已有4种基于MMLE/EM的MCAT在线标定方法和2种以题目为中心的UCAT在线标定设计推广至P-MCAT情境，即提出4种新的P-MCAT在线标定方法（PM-OEM、PM-MEM、PM-OEM-BME和PM-MEM-BME）和2种新的P-MCAT在线标定设计（PM-D-VR和PM-D-c）。另外，研究一还提出2种新题参数的初始值设置（O设置和M设置）方式。最后，通过开展2个模拟子研究，在不同实验条件下（即不同的能力维度数量、题库结构、维度间相关和样本量）对在线标定方法/设计的表现进行考查。结果表明：所有新方法在M设置下的收敛性明显优于O设置；所有新方法在维度较少、标定样本量较大的情境下都能较为准确地返真题目参数，但在维度较多、标定样本量较小的情境下易出现不收敛或标定精度不佳的问题；PM-OEM和PM-OEM-BME具有最高的标定效率；在大多数条件下，自适应设计比随机设计的表现更好。鉴于研究一提出的P-MCAT在线标定方法尚未完全满足复杂真实世界中的测验需求，研究二提出基于神经网络的在线标定框架。基于神经网络的方法与基于MMLE/EM的方法有很大不同，即前者的新题参数估计是通过直接学习输入和输出数据之间的模式而不是通过寻找对数边际似然的解来获得。此外，考虑到现实情境中的小型题库可能由于训练样本量不足（即旧题数量）而影响新题标定的问题，本研究提出多重填补的思路提高标定精度。最后，通过2个模拟子研究比较基于神经网络和MMLE/EM的方法在不同情境下的表现，并进一步探索基于神经网络的方法的性质。结果表明：基于神经网络的方法能够极大地改善基于MMLE/EM的方法在多维多级情境中遇到的收敛问题；基于神经网络的方法在返真新题参数和标定效率方面具有一定优势；多重填补可以进一步改进基于神经网络的方法的标定精度。研究一和研究二均假设题库中旧题的参数值完全无误，研究三则探讨题库中的误差对标定过程的影响，并考查了题库中可能存在的2种误差情境：旧题维度标识有误和旧题参数存在估计误差。此外，还将第一类误差情境进一步细分为少标、多标和错标3种设置。最后，通过2个模拟子研究分别探讨新提出的在线标定方法在2种旧题误差情境下的表现。结果表明：新提出的P-MCAT在线标定方法（基于MMLE/EM与基于神经网络的新方法）在2种旧题误差情境下都具有较高的稳健性，其中在少标设置下比在多标和错标设置下具有更高的标定稳健性。因此，本文新提出的P-MCAT在线标定方法可以应用于实践中。为了进一步验证P-MCAT在线标定方法/设计在真实情境下的表现，研究四基于《义务教育数学课程标准（2022年版）》开发真实P-MCAT题库，并设计P-MCAT在线标定系统平台，在线收集数据检验P-MCAT在线标定方法的实践表现。结果表明：新提出的P-MCAT在线标定方法在真实测验中表现良好。最后，本文基于模拟和实证研究的成果为实践提供指导。﹀
外文摘要：	︿ The construction of a high-quality item bank is essential for the efficient and fair implementation of computerized adaptive testing (CAT). In order to maintain the vitality and sustainable use of the item bank of CAT, problematic operational items should be replaced promptly with good new items. The accurate and efficient calibration of these new items is both a research focus and a technical challenge. Online calibration is a key technology for item calibration in CAT, and has been widely used across various forms of CAT, including unidimensional CAT, multidimensional CAT (MCAT), CAT with polytomously scored items, and cognitive diagnostic CAT. However, existing online calibration methods and designs cannot meet the complex testing scenarios in reality. Although multidimensional and polytomous assessment data are more prevalent in the psychological and educational fields compared to unidimensional or dichotomous assessments, only a few studies attempted to develop online calibration in MCAT with polytomously scored items (P-MCAT). Therefore, there is an urgent need to fill the gap between existing online calibration methods/designs and the P-MCAT context. Traditional online calibration methods are generally implemented based on marginal maximum likelihood estimation with an expectation-maximization algorithm (MMLE/EM), and are theoretically sound at calibrating new items in contexts where the numbers of dimensions and response categories are small and the calibration sample size is large; however, it might be difficult to achieve accurate calibration or successful convergence in other scenarios (especially with small calibration samples). Therefore, although the introduction of MMLE/EM-based online calibration methods for P-MCAT is of great significance, new ideas and perspectives need to be introduced to calibrate new items within the small calibration sample context. In addition, existing online calibration studies have relied on Monte Carlo simulations, assuming the parameter values in an item bank are accurate, which underestimates the challenges faced in practice (e.g., estimation and labeling errors in the item parameters, and effects of guessing, practice, and fatigue during examinees’ responding processes). To address the above issues, we proposed new online calibration methods/designs in the P-MCAT context and conducted three simulation studies and an empirical study to comprehensively explore their performance. Study 1 extended the existing four MMLE/EM-based MCAT online calibration methods and two “item-centered” UCAT online calibration designs to adapt them to the P-MCAT context. This resulted in the development of four new P-MCAT online calibration methods (PM-OEM, PM-MEM, PM-OEM-BME, and PM-MEM-BME) and two new P-MCAT online calibration designs (PM-D-VR and PM-D-c). In addition, Study 1 proposed two approaches for setting the initial values of the new item parameters (O setting and M setting). Finally, two simulation sub-studies were conducted to examine the performance of the online calibration methods/designs under different experimental conditions (i.e., different numbers of ability dimensions, structures of item banks, correlation between dimensions, and sample size). The results showed that all the new methods converged significantly better in the M setting than in the O setting; all the new methods could recover item parameters more accurately in scenarios with lower dimensions and larger sample sizes, but they often encountered issues of non-convergence or poor calibration accuracy in scenarios with higher dimensions and smaller sample sizes; PM-OEM and PM-OEM-BME had the highest calibration efficiencies; and the adaptive designs outperformed the random design in most conditions. Considering the P-MCAT online calibration methods proposed in Study 1 did not fully meet the requirements of complex real-world practice, Study 2 proposed a neural network-based online calibration framework. The neural network-based methods differ profoundly from the MMLE/EM-based methods in that the parameter estimates of new items are obtained by directly learning the patterns between input and output data instead of finding solutions to the log-marginal likelihood. In addition, considering the problem that small item banks in real situations may affect the calibration of new items due to insufficient training sample size (i.e., the number of operational items), this study was equipped with multiple imputation to improve the calibration accuracy. Through two simulation sub-studies, we compared the performance of the neural network-based and MMLE/EM-based methods under various scenarios, and further explored the properties of the neural network-based methods. The results showed that the neural network-based methods can mitigate the non-convergence issue encountered by MMLE/EM-based methods in multidimensional polytomous contexts; the neural network-based methods had advantages in terms of recovering new item parameters and calibration efficiency; and multiple imputation could further improve the calibration accuracy of the neural network-based methods. Both Study 1 and Study 2 assumed that the parameters of the operational items in the item bank were completely error-free, while Study 3 explored the impact of errors in the item bank on the calibration process. Two types of error scenarios in the item bank were examined: operational items with incorrect dimension identification and operational items with estimation errors in the parameters. In addition, the first type of error context was further subdivided into three settings: under-labeling, over-labeling, and mislabeling. Finally, the performance of the newly proposed online calibration methods was explored through two simulation sub-studies for two error scenarios of the operational items, respectively. The results showed that the newly proposed P-MCAT online calibration methods (the MMLE/EM-based and neural network-based methods) demonstrated strong resilience in two error scenarios of the operational items, with more robustness in the under-labeling setup than in over-labeling and mislabeling setups. Therefore, the newly proposed P-MCAT online calibration methods in this paper could be applied in practice. Study 4 was conducted to further validate the P-MCAT online calibration methods/designs in real-world scenarios. Based on the “Mathematics Curriculum Standards for Compulsory Education (2022 Edition)”, a real P-MCAT item bank was developed. Additionally, a P-MCAT online calibration system was designed to collect data online and evaluate the practical performance of the P-MCAT online calibration methods. The results indicated that the newly proposed P-MCAT online calibration methods performed well in real tests. Finally, the paper provided practice guidance based on the results of simulation and empirical research. ﹀
参考文献总数：	124
馆藏地：	图书馆学位论文阅览区（主馆南区三层BC区）
馆藏号：	博040200-05/24003
开放日期：	2025-06-28

附件下载