查看论文信息

查看全文

查看论文信息

中文题名：	高中生思维能力测试题效验研究：以英语阅读测试为例
姓名：	赵思奇
保密级别：	公开
论文语种：	中文
学科代码：	050211
学科专业：	外国语言学及应用语言学
学生类型：	博士
学位：	文学博士
学位类型：	学术学位
学位年度：	2022
校区：	北京校区培养
学院：	外国语言文学学院
研究方向：	语言测试
第一导师姓名：	程晓堂
第一导师单位：	北京师范大学外国语言文学学院
提交日期：	2021-12-23
答辩日期：	2021-12-23
外文题名：	Validating Items for Testing Chinese High School Students' Thinking Capacity: Using English Reading Test as an Example
中文关键词：	思维能力 ; 英语阅读测试 ; 题型 ; 效度论证模式
外文关键词：	Thinking capacity ; English reading test ; Item type ; Argument-based approach to validity
中文摘要：	︿思维能力是公民核心素养的重要组成部分。基础教育各学科均在探索具有本学科特点的思维能力测评方法。英语学科考试中如何融入思维能力的测评，目前尚无成熟的经验。为探索此类测评方法，本研究开发并验证了融入思维能力测评的高中英语阅读测试题型。测试包含11种新题型，考查理解能力、逻辑思维能力、批判性思维能力和创新思维能力四个方面，作答形式包括单选、判断、匹配、分层多选、简答五种形式，评分方案及部分选择题基于SOLO分类理论设计，反映不同层次的思维能力。本研究以测试使用论证模式（AUA）为理论框架指导试题设计和效度验证，通过论证支持推理环节的理由来评估试题开发效度推理的有效性；理由包括构念界定合理性、试题与评分方案设计得当性、评分方案可靠性三方面。数据收集和分析采用质性与量化相结合的方法，收集了三位专家的反馈意见、343名考生的作答数据、九名考生的有声思维报告以及四位评分员的评分数据。构念界定立足《课标》并整合了相关理论、为专家所认可，具有合理性。试题的Rasch模型拟合度较好，有声思维报告分析结果显示预期的认知操作编码频率最高，且采用预期的认知操作是正确作答的必要条件，证明了试题设计的得当性。评分方案各分数档使用频次大于10，拟合值与难度均符合Rasch模型预期，且随分数档的升高难度单调递增，临界值以1.4-5.0个logits的增幅稳步增加，证明了评分方案设计的得当性。评分过程严格遵循了评分方案，评分员拟合度和区分度处于0.4-1.6个logits和0.5-1.5个logits的合理范围内，内部一致性较好；评分员间的严厉度差异显著低于被试能力跨度，实际一致率适当高于预期一致率，评分员间一致性较好，证明了评分的可靠性。本研究具有理论意义。基于《课标》和思维能力理论界定了思维能力的测试构念，明确了思维能力的构成以及各构成成分的定义；结合阅读能力理论，厘清了思维能力和阅读过程、阅读技能的对应关系，为将思维能力的培养融入英语阅读活动提供了理论基础，明确了在英语阅读考试中考查思维能力的构念域；参考思维能力测评、阅读测评的体系和框架，提出了在英语阅读考试中考查思维能力的测试目标，明确了体现思维能力的典型阅读任务，融入了英语学科的知识，尤其是语篇知识和语言知识，为思维能力和英语学科特征的深度结合提供了理论依据，使思维能力这一抽象的概念得以具体化，更易操作。本研究对于思维能力的测评方法有实践层面的贡献。首先，本研究开发了在英语阅读测试中考查思维能力的新题型，使英语学科的测评方法更为多样、丰富。考查理解能力宜采用单选题、匹配题和判断题形式。考查逻辑思维能力宜采用单选题，主要用于精加工推理类的任务。考查批判性思维能力宜采用单选题、简答题和分层多选题；其中单选题更适用于对观点、信息间的内部一致性进行判断的任务，简答题和分层多选题适用于依据外部评价标准对观点和信息作出评判的任务。考查创新思维能力以简答题为佳。其次，研究发现11个新题型的开发难度、适用情境不同，可为考查思维能力的英语阅读测试开发提供依据。以开发难度从低到高排列依次是判断题、匹配题、单选题、基于SOLO分类理论评分的简答题、分层多选题、聚焦作答创新性的简答题。多数新题型可适用于学业成就测试、水平测试，单选题、判断题以及基于SOLO分类理论评分的简答题可用于终结性考试以及高风险测试情境；其余题型多适用于低风险的诊断性测试或经改编用于形成性评价。最后，本研究开发的分层多选新题型能够增强择答题型的考核功能，为考查复杂的思维能力提供了除简答题之外的另一种可选形式。该题型既可体现易错的作答内容，又能反映不同的思维能力层次，改变了择答题非对即错的评分方式，从而使选项所能提供的信息更为丰富，具有应用于诊断性测试、学业成就测试、水平测试等多种情境的潜力。﹀
外文摘要：	︿ Thinking capacity (TC) is an essential component of key competencies. Test methods that integrate TC into tests of K-12 school subjects have been the focus of recent research, but empirically-based research is still thin in volume. In order to investigate such test methods, the present study developed and validated 11 English reading-based item types for testing TC. The new item types target four ability areas, namely, understanding, logical thinking, critical thinking and creative thinking. Response formats include single-answer multiple choice (SMC), multiple true-or-false (MTF), matching, two-tiered multi-answer multiple choice (TTMMC) and short-answer question (SAQ). Scoring rubrics and options for some of the MC items are based on the SOLO Taxonomy, reflecting different levels of TC. The present study is guided by the Assessment Use Argument (AUA), based on which an Item Development Argument (IDA) was constructed and validated. Validation focused on evaluating the warrants supporting the claims made within the IDA. The warrants are appropriateness of construct definition, appropriateness of items and scoring rubrics, and consistency of scoring. Throughout the study, both quantitative and qualitative data were collected and analysed, including evaluation and feedback from three expert judges, test responses from 343 examinees, think-aloud protocols from nine participants and four raters’ ratings of subjectively-scored items. Findings generally support the validity of the IDA. First, construct definition is appropriate, as the definition is based on a combination of instructional syllabus and relevant theories, and confirmed by expert judges. Second, items are appropriately designed because they exhibit acceptable fit with the Rasch model, and think-aloud protocol analysis shows that expected cognitive operations were used most frequently by the participants, serving as the necessary condition for correctly answering the items. Scoring rubrics are also appropriate because analysis based on the many-facet Rasch model shows: each grade-level has a frequency of ten observations; the fit statistics and difficulty of grade-levels accord with model expectation; difficulty advance monotonically with grade-levels, and Rasch-Andrich threshold advance by 1.4-5.0 logits. Fourth, consistency of scoring is supported by procedural and empirical backing alike: scoring procedures were strictly adhered to; fit statistcs and estimated discrimination were within acceptable range, 0.4-1.6 logits and 0.5-1.5 logits respectively, indicating good intra-rater consistency; rater severity differences were significantly lower than that of test-taker ability, and observed percentage of exact agreements is slightly higher than expected, indicating good inter-rater consistency. The present study has made theoretical contribution. First, it defined the construct of TC based on the National English Curriculum Standards for Senior High School (NECSHS) and relevant theories on thinking ability and intelligence, specified the components of TC. Second, it drew on theories of reading and clarified the relationship between TC and the reading process / skills, which serves as a theoretical foundation for cultivating TC through English reading activities, and specifies the construct domain of TC in English reading tests. Third, by consulting assessment frameworks on thinking and reading competency, the present study defined 15 sub-constructs, with corresponding reading item types, incorporating knowledge of the English subject, especially discourse and language use knowledge, which serves as a theoretical basis for integrating thinking capcacity with the English subject, and turns this abstract concept into something concrete and operational. The present study has also made practical contribution to test methods of TC. Firstly, the present study adds new item types to the pool of English reading test tasks, which diversifies English reading test methods. Items testing understanding can be designed in the form of SMC, matching and MTF. Logical thinking can be tested with the SMC format, which is suitable for complex inference tasks, including inductive and deductive inferences. Critical thinking can be tested with SMC, SAQ and TTMMC. The SMC format is more suitable for tasks that require judgment of internal consistency; the SAQ and TTMMC formats are more suitable for making evaluations based on external criteria. Items for creative thinking should use the SAQ format. Secondly, findings from the present study reveal differences between the 11 new item types in their ease of development and applicability to certain assessment situations, which can serve as a basis for English reading-based TC test development. The ease of development, from the easiest to the most difficult, is ordered as: MTF, matching, SMC, SOLO-based SAQ, TTMMC and creativity-based SAQ. Most of the new item types are friendly to achievement tests and proficiency tests; SMC, MTF and SAQs with SOLO-based scoring rubrics are suitable for summative tests, and can be used for high-stakes tests; the rest are more suitable for low-stake test situations or can be adapted for formative assessment. Last but not least, the TTMMC format is unique in that it can enhance the effectiveness of selected-response items, and can serve as an alternative to SAQs for testing complex ability areas. The rationale is that options contain common misconceptions by examinees, while at the same time reflect different levels of TC. Thus the selected-response items will be able to provide more information, making them potentially suitable for various test purposes such as diagnostic tests, achievement tests and proficiency tests. ﹀
参考文献总数：	261
馆藏地：	图书馆学位论文阅览区（主馆南区三层BC区）
馆藏号：	博050211/22001
开放日期：	2022-12-23

附件下载