查看论文信息

查看全文

查看论文信息

中文题名：	基于多选题的考试作弊甄别方法
姓名：	黄美薇
保密级别：	公开
论文语种：	中文
学科代码：	040203
学科专业：	应用心理学
学生类型：	硕士
学位：	教育学硕士
学位类型：	学术学位
学位年度：	2019
校区：	北京校区培养
学院：	中国基础教育质量监测协同创新中心
研究方向：	心理测量
第一导师姓名：	骆方
第一导师单位：	北京师范大学中国基础教育质量监测协同创新中心
提交日期：	2019-06-12
答辩日期：	2019-05-31
外文题名：	CHEATING DETECTION METHOD BASED ON MULTIPLE-MARK ITEMS
中文关键词：	作弊甄别 ; 答案抄袭检测法 ; 多选题 ; 考场内作弊 ; 跨考场作弊 ; 题组模型
中文摘要：	︿国外大规模考试的题型基本都是单选题，基于此开发的指标都适用于单选题的作弊甄别；而我国的很多大规模考试除了单选题，还有多选题和简答题等其他题型。仅靠数量有限的单选题甄别作弊考生，检验力会受到极大地限制，若能有效利用试卷中的其他题型，将会非常有助于提高作弊甄别的检验效果。相比单选题，多选题有更多可能的作答结果，如果两个考生不存在抄袭关系，那么他们在多选题上出现错同题的可能性会远小于单选题。因此，由于抄袭多选题出现的相同作答，能更有效地表征考生之间的异常一致性，针对多选题构建的作弊甄别指标会更敏感。本研究拟改进现有的作弊甄别指标，使其适用于多选题的作弊甄别。以往研究通常将多选题的十一个作答结果（即不同选项的组合）均看作独立的选项来分析，但实际上多选题不同选项的组合之间并不满足局部独立性的假设，例如答案AB和ABD同时包含AB，二者具有较高的关联性，忽视他们的相关关系，直接将其看作独立互斥的选项来估计作答概率，可能会造成有偏的估计结果。为了解决上述问题，本研究将一道具有相同题干和背景的多选题看作一个题组，四个选项看作四道需要分别作答的小题，利用题组模型在控制了题组效应的情况下，使四个选项满足局部独立性的假设后，同时估计不同选项的作答概率，再计算不同选项组合发生的联合概率代表多选题的作答概率。获得多选题各选项组合的作答概率后，即可结合作弊甄别指标ω指数计算配对考生的作答一致性并检出作弊考生，以构建适用于多选题的作弊甄别指标。另外，同时利用K指数和依据称名反应模型计算的ω指数甄别多选题作弊考生，并将三种方法的甄别效果相比较。本研究采用基于真实考试作答的题目参数模拟考生作答的范式，依据某年心理学研究生入学考试的试题，利用称名反应模型估计了65道单选题和10道多选题的题目参数，从中挑选了题目性能较好的67道题，模拟了70个考场共2100名考生的作答。设计了考场内作弊和跨考场作弊两种作弊手段，每一种作弊手段中分别设计了不同的抄袭源（或枪手）能力（60%，80%和100%）和不同抄袭题目数量（单选题6种题量多选题6种题量），共108种抄袭情境。以此探讨利用多选题甄别作弊考生的效果、影响甄别效果的因素以及该方法的适用条件。结果表明，在考场内作弊的情况下，利用题组模型计算的ω指数甄别效果最好，多选题在该指标中发挥的作用最大，在抄袭源能力为60%的条件下，仅需20道单选题+10道多选题，该指数就能在几乎不误判的情况下，检出所有作弊考生，而且在特定条件下，增加10道单选题后检出率仅能提高4.9%，而增加10道多选题则能提高42.3%，可见，使用多选题甄别作弊的效果非常突出。在跨考场作弊的情况下，甄别效果好于考场内作弊，在抄袭题量仅为10道单选题和10道多选题的情况下就可以达到检出率的上限和误判率的下限，表明跨考场作弊的团伙更容易被识别出来。由于在题量很少时就已达到检出率上限，多选题在ω指数中发挥的作用不明显，反而检验效果较差的K指数在借助了作弊团伙的内部相似性信息后，能够利用多选题大幅度提高检验效果。今后的研究可以考虑更多的跨考场作弊场景，例如模拟更多的作弊团伙，减少单个作弊团伙中的人数，或减少相同的抄袭题目数量等，上述场景中作弊团伙内部的整体一致性水平相对较低，较难被检测，此时可以利用多选题提高检验效果。本研究的结论是，充分利用作弊考生在多选题中的抄袭能大大提高作弊甄别的检验效果，尤其在抄袭题量较少，抄袭源能力较高的情况下，多选题的甄别效果远好于单选题。在考场内作弊的情境下，使用题组模型计算的ω指数甄别效果最好。跨考场作弊的情境中，由于团体间的相似性有助于识别作弊考生，抄袭题量对检验效果的影响不明显，因此ω指数的两种算法检验效果接近，均能在题量较少时达到检出率的上限和误判率的下限，多选题在其中发挥的作用也不明显；而K指数甄别跨考场作弊考生时，借助多选题则能发挥更大的作用。﹀
外文摘要：	︿ The items of foreign large-scale examinations are basically single-choice questions. The indexes based on this type of items are only applicable to detecting cheating of single-choice questions. But in China, addition to single-choice items, there are also many multiple-mark questions, constructed questions, and other types of questions. So it is a great limitation if we just rely on single-choice questions to detect cheating candidates. If other questions in the test paper could be used effectively, it will be very helpful to improve the power of detection. Compared to single-choice questions, multiple-mark questions have more answers. If two candidates do not have cheated, the probability of being identically incorrect in multiple-mark items would be much less than they would in single-choice questions. Therefore, the identical answers occurred in multiple-mark questions are more representative of abnormal consistency between candidates, and the cheating indexes constructed of multiple-mark questions will be more sensitive. Present study intends to improve the existing cheating detection indexes to make it suitable for multiple-mark questions. Firstly, a suitable response probability model was constructed for multiple-mark questions: (1)the eleven response results of multiple-mark questions (ie, the combination of different options) are regarded as independent options just like single-choice questions, then Nominal Response Model was been used to estimate the probability of occurrence of each answer. (2)In addition, the multiple-mark question was also been regarded as a testlet, and the four options are regarded as four independent small questions, estimating the response probability of different options respectively, and then calculating the joint probability of different option combinations. After that, the ω index was calculated to evaluate the consistency of the candidates, and detecting the cheating candidates. In addition, the K* index was also used to identify candidates who are cheating on multiple-mark questions, and the detection power is compared with the ω index to explore factors that influence the detection effect of different indexes. Specifically, based on the 2007 Psychological Postgraduate Entrance Examination data, this study estimated the item parameters of 65 single-choice items and 10 multiple-mark items using the Nominal Response Model, and selected 67 questions in total with better property, simulated the answer of 2,100 candidates from 70 examination rooms. Two cheating ways were designed: cheating in one examination room and cheating across the examination rooms. In each of the cheating ways, different source (or gunner) ability (60%, 80% and 100%) and different number of cheating items were designed (6 kinds of single-choice items cheating number * 6 kinds of multiple-mark items cheating number), a total of 310 cheating situation. In this way, the factors affecting the detecting effect and the applicable conditions of the method are researched. The results show that, in the case of cheating in one examination room, among the three index calculated, the detection power of ω index calculated by the Testlet Model is the best, and the multiple-mark item plays the most important role in the very index. In the case of cheating across examination rooms, every index all perform very well, detection rate achieve to upper limit on the condition of a small amount of cheating, indicating that the gangs who cheated across the examination room are more easily identified, so the multiple-mark items did not exert obvious advantages. Future research can consider increasing the number of cheating gangs, reducing the number of individuals in a single cheating gang, or reducing the number of cheating items, so that multiple-mark items may play a better role. ﹀
参考文献总数：	46
作者简介：	黄美薇，骆方，潘逸沁，2019，结合选择题和主观题的两阶段作弊甄别方法，心理科学，0（0），0-0.
馆藏号：	硕040203/19017
开放日期：	2020-07-09

附件下载