- 无标题文档
查看论文信息

中文题名:

 IRT_Δb法和修正LR法进行矩阵取样测验DIF检验的有效性研究    

姓名:

 张勋    

学科代码:

 040202    

学科专业:

 发展与教育心理学    

学生类型:

 硕士    

学位:

 教育学硕士    

学位年度:

 2011    

校区:

 北京校区培养    

学院:

 认知神经科学与学习研究所    

研究方向:

 心理与教育测量和评价    

第一导师姓名:

 李凌艳    

第一导师单位:

 北京师范大学认知神经科学与学习研究所    

提交日期:

 2011-06-29    

答辩日期:

 2011-05-24    

外文题名:

 APPLYING IRT_ΔB PROCEDURE AND ADAPTED LR PRODURE TO TESTS WITH MATRIX SAMPLING    

中文摘要:
项目功能差异(Differential Item Functioning, DIF)检测是用于保障测验公平性的重要方法,对于考生群体范围非常广泛的大尺度教育测量与评价项目,对其测验题目进行项目功能差异检测是保证其公平性的重要步骤。但是,大尺度教育测量与评价项目通常采用矩阵取样技术,每名考生仅完成整体测验中多个题册中的一个,由这一个题册所得的测验总分不能准确地衡量考生群体的真实能力,这使得传统的基于单个题册总分进行DIF检测的方法不能直接用于对矩阵取样测验进行DIF检测。在矩阵取样测验中,基于IRT模型所得的能力估计值是考生群体真实能力较为准确的估计值,在进行DIF检测时,选用这一能力估计值作为匹配变量能更加精确地控制考生群体的真实能力。因此,对于传统单题本测验中使用的DIF检测方法,基于IRT模型类的检测方法,可通过同时性估计而直接用于矩阵取样测验,如PISA所使用的IRT_Δb法;而对于基于观察分数(题册总分)的检测方法,如LR法、MH法等,则可用基于IRT模型所得的能力值替代题册总分作为匹配变量以修正其对考生群体能力匹配的准确性。这两种思路虽然能很好地用于改进矩阵取样测验DIF检测的有效性,但是,目前已有的研究却很少对其进行过验证。因此,本研究基于模拟数据和实证数据,对PISA所使用的IRT_Δb法及匹配基于IRT模型所得能力值后的LR法的有效性,及有关的问题进行初步的探索。本研究包含两个研究,研究一基于Rasch模型,模拟出符合部分平衡不完全组块(pBIB)结构的作答数据,以对IRT_Δb法和修正LR法在矩阵取样测验中进行DIF检测的有效性、相关的影响因素及两种方法检测结果间的关系进行探索。研究二基于PISA2003的实际测试数据,进一步验证两种方法检测结果的有效性,并利用LR法初步探索可能导致题目产生DIF的社会宏观情境因素。在研究一中,通过设定不同DIF效应量大小,以判断IRT_Δb法和修正LR法是否能有效地区分出DIF效应量不同的题目;此外,研究设定了DIF题目比率、考生群体真实能力差异和样本量三种条件,以探讨其对IRT_Δb法和修正LR法进行DIF检测的影响。结果显示,IRT_Δb法和修正LR法均能有效地区分DIF量大小不同的题目;IRT_Δb法的检测结果受题册中所含DIF题目比率的影响;修正LR法的检测结果受题册中所含DIF题目比率和考生群体间真实能力差异的影响。此外,本研究通过对IRT_Δb法和修正LR法检测结果间关系的探讨,得到IRT_Δb法所对应的DIF检测结果的分类标准。研究二选择PISA2003中美国和韩国学生在数学测验上的作答数据,利用IRT_Δb法和修正LR法对其题目进行DIF分析,结果显示由两种方法检测所得的DIF检测结果很一致,进一步验证了修正LR法用于矩阵取样测验中的有效性,及IRT_Δb法分类标准的合理性。此外,经过LR法分析,课后额外学习可能是导致有些题目在美国学生与韩国学生间存在DIF的原因。
外文摘要:
Matrix sampling was a useful technique widely accepted and used in large-scale educational assessment nowadays. According to this technique, each examinee just took one of the multiple booklets. So, it was inaccurate to estimate the examinees’ ability by the raw score of one booklet. It made the methods which based on the observed score inappropriate to detect DIF (Differential Item Functioning) in tests with matrix sampling.The estimation of group-level ability based on IRT model was much more accurate in tests with matrix sampling. So, it was reasonable to match this estimation for detecting DIF in tests with matrix sampling. Methods for detecting DIF based on IRT model could be applied to tests with matrix sampling with concurrent estimating, such as the IRT_Δb procedure used by PISA. Methods based on observed score could be adapted by replacing the matching variable with estimation based on IRT model. Simulation and empirical studies were conducted to analysis the efficacy of the IRT_Δb procedure and the adapted LR (Logistic regression) procedure used in tests with matrix sampling. In study 1, response data sampled as pBIB (partial balanced incomplete block) design was simulated based on Rasch model. DIF magnitude and percentage, ability differences, sample size were manipulated in this study. The results indicated that both methods were sensitive to the different level of DIF magnitude, the DIF effect obtained by IRT_Δb (Δb) was influenced by DIF percentage,the DIF effect obtained by adapted LR procedure (ΔR2) was influenced both by DIF percentage and ability differences. What’s more, there was a stable relationship between Δb and ΔR2.In study 2, IRT_Δb procedure and adapted LR procedure were conducted to detect DIF between Korea and the United States in the Programme for International Student Assessment (PISA) 2003 mathematics tests. Results suggested that the DIF items flagged by the two procedures were almost the same; this confirmed the efficacy of the adapted LR procedure and the availability of the classification criteria. Furthermore, the Extra Lesson Hours after School (ELHAS) was identified as a potential source of DIF between Korea and the United States in the PISA2003 mathematics test by the adapted LR procedure.
参考文献总数:

 41    

作者简介:

 本人本科就读于复旦大学,所学专业为心理学,毕业后获理学学士学位。硕士期间就读于北京师范大学认知神经科学与学习研究所,专业为发展与教育心理学,所研究的方向心理与教育测量和评价,论文研究主题是测验公平性。硕士期间曾多次参与教育部基础教育质量监测中心的相关工作,参与完成中心的多项工作。硕士期间在考试研究杂志上发表过一篇文章(DIF分析实际应用中的常见问题及其研究新进展. 考试研究,2010年第2期(4月第6卷))。    

馆藏号:

 硕040202/1165    

开放日期:

 2011-06-29    

无标题文档

   建议浏览器: 谷歌 360请用极速模式,双核浏览器请用极速模式