中文题名: | 测验模式效应检测方法的比较与应用 |
姓名: | |
保密级别: | 公开 |
论文语种: | 中文 |
学科代码: | 04020005 |
学科专业: | |
学生类型: | 硕士 |
学位: | 教育学硕士 |
学位类型: | |
学位年度: | 2020 |
校区: | |
学院: | |
研究方向: | 测验模式效应检测方法的比较与应用 |
第一导师姓名: | |
第一导师单位: | |
提交日期: | 2020-06-23 |
答辩日期: | 2020-06-03 |
外文题名: | The Comparison and Application of Detecting Methods for Test Mode Effect |
中文关键词: | |
外文关键词: | Test Mode Effect ; Test Mode ; Measurement Invariance ; Parameter Invariance |
中文摘要: |
目前,许多测验都在经历着从纸笔测验(PBT)向计算机测验(CBT)的转变,但许多研究发现当同一测验在不同测验形式中呈现时,其作答结果可能存在差异,不能直接进行比较。测验模式效应(TME)指的就是这种由于测验形式不同而带来的测验功能差异。TME的存在会对测验公平、选拔标准、题库建设等多方面产生影响,因此在进行测验形式的转换之前,使用有效的检测方法对TME进行检测具有重要的意义。目前用于TME检测的方法包括方差分析法(ANOVA)、多组验证性因素分析法(MCFA)、测验与题目功能差异法(DFIT)和模式效应模型法。尽管方法众多,但目前对TME检测方法的研究还不够深入:一方面缺少对模式效应模型法这一新方法参数估计准确性的探究;另一方面缺少对不同TME检测方法的全面比较和在实证数据中的综合应用。因此,本文通过两个模拟研究和一个实证研究,探讨模式效应模型法的参数估计问题,并对TME的四种检测方法进行全面比较和综合应用,为今后的研究者在TME研究中选择合适的检测方法提供依据。
﹀
研究一以模式效应模型法中的模型2为例,探讨其在不同模拟条件下的参数估计准确性。本研究使用有限混合的EM算法(EM-FM算法)在不同的实验设计、样本量和题目数量的条件下对TME参数进行估计;同时参考经验I类错误的计算方法,对TME参数的估计结果进行显著性检验。研究结果发现:大多数实验条件下EM-FM算法可以对模型2中的TME参数进行准确估计,参数估计结果的准确性受到实验设计和样本量大小的影响。 研究二采用模拟数据,在不同实验条件下对四种TME检测方法(即ANOVA、MCFA、DFIT和模式效应模型法)的检测效果进行全面比较,并采用I类错误率和统计检验力来评价检测效果。研究结果发现:MCFA的检验力最高,同时伴随较高的I类错误率;ANOVA和DFIT的结果相近,都有着较高的检验力和较低的I类错误率;模式效应模型法的结果受实验设计的影响较大,在组内设计中的表现好于其他三种方法,在组间设计中的表现差于其他三种方法。此外,四种检测方法的统计检验力和I类错误率都受到实验设计和样本量大小的影响。 研究三使用随机分组的组间设计,收集来自不同地区和学校的756名小学四年级学生在PBT或CBT上的数学测验作答数据,使用上述四种检测方法对测验层面和题目层面的TME进行检测,并对影响TME的因素进行探究。研究结果发现:(1)不同TME检测方法的结果具有较高的一致性,均表明测验层面的TME不显著(即被试在PBT和CBT这两种不同测验形式上的作答结果具有可比性);虽然有20%题目存在显著的TME,但是未对整体结果产生影响;(2)性别、地区/学校、使用计算机的熟练程度等因素对TME产生影响;题目难度能够显著预测TME的大小,即题目难度越大,TME的值也越大。 综上所述,本文采用模拟数据和实证数据,对EM-FM算法在模型2参数估计中的适用性进行分析,并对不同TME检测方法进行深入比较和综合应用。本文的结果可为测量研究者与实践者今后在不同情境下选择合适的TME检测方法提供依据。尽管本文通过一系列的研究得到一些有意义的研究结果,但是未来仍希望对模式效应模型法中的其他模型(模型1和模型3)进行探究,进一步完善该方法在实践中的应用,为我国大规模教育评估领域的计算机化进程助力。 |
外文摘要: |
Nowadays, many tests have been moving from paper-based testing (PBT) to computer-based testing (CBT) which provides a platform for new item types and for collecting additional process and timing data. Although CBT offers many advantages over traditional PBT, the researchers found that the performance from different test modes might not be compared directly. Test mode effect (TME) refers to the observation that tasks presented in one test mode (e.g., PBT) may function differently when presented in another test mode (e.g., CBT), which might affect test fairness, selection criteria and equating of item bank. Therefore, before transforming the test mode from PBT to CBT, it is of great significance to use effective methods to detect TME. Four methods, analysis of variance (ANOVA), multigroup confirmatory factor analysis (MCFA), differential item functioning (DIF) and mode effect model method, have already been proposed to detect TME. Although there are many TME detection methods, the current research on TME detection methods is not deep enough. On the one hand, the mode effect model method still lacks research evidence for the accuracy of parameter estimation; on the other hand, the above four detection methods for TME have not been thoroughly compared and comprehensively applied yet. By using two simulation studies and one empirical study, thus, this paper explored the parameter estimation issue of the model effect model, and comprehensively compared and applied the four TME detection methods, providing the basis for future researchers to choose the appropriate detection methods in TME research.
﹀
In Study 1, the Model 2 in mode effect model method was taken as an example to discuss the accuracy of parameter estimation under different simulation conditions. In this study, the EM for Finite Mixtures (EM-FM) was used to directly estimate the TME parameters under different experimental designs, sample sizes and number of items. Moreover, the significance test of the TME parameter estimation results was done by using the calculation method of empirical error of type I. The results showed that the EM-FM algorithm could accurately estimate the TME parameters in Model 2 under most simulation conditions, and the accuracy of parameter estimation was affected by the experimental design and sample size. Study 2 used simulation data to compare the detection results of four TME detection methods (i.e., ANOVA, MCFA, DFIT, and mode effect model method) under different conditions in in terms of Type I error rate and power of test. The results indicated that (1) both the power of test and Type I error rate of the TME detection methods were affected by the experimental design and sample size. (2) The power of MCFA was the highest, accompanied by a higher Type I error rate. (3) The results of ANOVA and DFIT were similar, both had a higher power and a lower Type I error rate. (4) The results of mode effect model method were greatly affected by the experimental design, which showed significant better performance in the within-subject design than between-subject design. In Study 3, 756 fourth grade primary school students from different regions and schools were collected to complete a PBT or CBT mathematics test via a randomized between-subject design. Then, the four TME detection methods discussed in Study 2 were used to detect both test-level and item-level TMEs; and the factors that affect TME were explored as well. The results revealed that (1) the results of four TME detection methods had high consistency, i.e., the test-level TME was not significant, meaning that the results of PBT and CBT were comparable. Although some items had significant TME, they had no impact on the overall results. (2) Factors such as gender, area/school, and computer proficiency had no significant influence on test-level TME. The item difficulty could significantly predict the size of TME, and the greater the item difficulty, the greater the value of TME. In summary, this paper utilized simulated data and empirical data to analyze the applicability of the EM-FM algorithm in the parameter estimation of Model 2, conduct in-depth comparison and comprehensive application of different TME detection methods, and explore the factor that might affect TME. The results of this paper could provide a basis for measurement researchers and practitioners to choose appropriate TME detection methods in different situations in the future. Although this article obtains some meaningful results, in the future we still hope to explore other models (Model 1 and Model 3) in the model effect model method and further improve the application of this method in practice, with the goal of helping the computerization process in the field of large-scale education evaluation in China. |
参考文献总数: | 88 |
馆藏号: | 硕040200-05/20007 |
开放日期: | 2021-06-23 |