中文题名: | 基于论证的高中英语概要写作测试效度验证 |
姓名: | |
学科代码: | 050211 |
学科专业: | |
学生类型: | 博士 |
学位: | 文学博士 |
学位年度: | 2015 |
校区: | |
学院: | |
研究方向: | 语言测试 |
第一导师姓名: | |
第一导师单位: | |
提交日期: | 2016-01-04 |
答辩日期: | 2015-12-14 |
外文题名: | An argument-based validation of an English summary writing test for senior high school graduates |
中文摘要: |
使用综合写作测试已成为大规模高风险语言测试中写作测试的改革方向,概要写作是综合写作任务中使用较多的形式之一,该任务要求考生使用自己的语言概括源文本信息。尽管概要写作在教育和心理语言学领域的研究已较为深入,但在语言测试领域的研究仍然匮乏。本研究根据基于论证的测试效度验证框架(Kane, Crooks & Cohen, 1999; Kane, 1992, 2004, 2006, 2012),通过研究教育部考试中心研发的高中英语概要写作测试的构念、评分信度和效度、考生完成概要写作任务的认知加工过程为评分、概化、外推和解释四个推断提供支持或反驳证据,目的是验证测试的效度,为测试的可行性提供支持。研究采用质性研究和量化研究相结合的方法来收集和分析数据。来自3所不同类别高中的216名学生完成了一份包含阅读、语言运用、应用文写作和概要写作的测试卷,研究还收集了6名学生的概要写作口陈报告和7位标准概要作者的概要,4名评分员参与了评分。研究采用文本分析、概化分析、Rasch模型分析和回归分析等方法分析了这些数据。概要写作构念分析显示,核心意义单位、高级概要规则(概括、建构和抽象)、无错T单位、语法复杂度、词汇丰富度和难度、概要结构质量、改述质量和抄录量能够区分不同水平考生。在语法错误量、深度衔接和T单位均长方面,最低水平考生比相邻水平考生的表现略好,这是因为水平最低考生抄录较多所致。分析结果表明,测试探得了这些文本特征所代表的构念。但是,评论量、小句均长、并列指数和指代衔接指数的表现不符合理论预期,无法用来评判考生的概要质量。研究者根据分析结果提出了完善评分标准的建议。在信度方面,Rasch模型和概化理论分析显示,整体评分和分项评分在信度方面都能满足测试目的,概化研究中,考生的概要写作能力能解释70.2%(整体评分)和62.2%-74.5%(分项评分)的分数方差,两名评分员时,概化系数都能达到0.8以上,但分项评分无论是在信度(概化系数)还是在测量精度(考生分隔指数、拟合考生数)方面都优于整体评分的表现。相关分析发现,概要分数与试卷中的阅读分数没有显著相关,且效应量较小,但都与语言运用和应用文写作分数相关,说明概要写作测试主要考查的是阅读基础上的语言运用能力和写作能力。回归分析结果表明,概要结构、改述值、核心意义单位、抄录词数、无错T单位和理解错误能有效预测概要档次,并能解释61%的分数变异,这些文本特征所代表的构念与评分标准的四个维度相对应,这证明分数可以得到清晰的解释。概要整体评分、分项评分和自评有显著相关,效应量为中等,表明概要写作分数具有共时效度,域分数可外推为目标分数。考生的口陈报告分析表明,考生在完成概要写作任务时都经历了阅读、读写结合和写作三个阶段,整体仔细阅读过程是概要写作的基础,选择、概括和建构宏观规则的运用是成功完成任务的关键,较成功的概要作者都使用了知识转化的写作过程,这些发现都与概要写作认知加工理论一致。研究还发现,局部仔细阅读过程数量和概要宏观规则使用数量之和随考生概要档次的降低而降低,同时,概要结构与源文本结构不一致是作者使用了错误的阅读方法所致,这说明分数反映了欲测构念。但是,写作过程的数量多少并非概要写作成功与否的标志。通过文本特征分析、相关分析和口陈报告分析结果的三角验证分析,研究者发现,概要写作实质上考查了考生运用策略能力、不断地使用选择、概括和建构宏观规则来提取源文本宏观结构的能力以及运用语言资源来控制语言和转换语言以重述源文本的能力。三角验证的分析结果还表明,分数可归因于高中生的概要写作能力。从评分标准及其使用的合理性、评分在不同评分情境下的一致性、不同评分员间的一致性、分数体现构念的程度到与外部标准的相关性,多种证据构成一条完整和连贯的推断链,同时,反驳得到了削弱,证明了英语概要写作测试的效度和测试在高中大规模高风险测试中的可行性。总起来看,概要写作分数反映了高中生使用英语进行概要写作的能力,教育部考试中心开发的概要写作测试能够帮助高校做出人才选拔决策。本研究论证了中国高中英语概要写作测试的构念,为分数提供了清晰的解释,为评分标准的完善提出了建议,同时探索了多项概要写作测试中亟待研究的领域,并取得了相应成果,有利于利益攸关者了解中国高中生在概要写作任务中的表现。
﹀
|
外文摘要: |
Current approaches to testing writing in large-scale high-stakes tests have called for an integration of writing and reading or listening. One of the integrated writing tasks is summary writing, requiring the candidates to summarize the information from a source text using their own words. While summarization has been extensively investigated in literacy education and psycholinguistics, little has been done to study summary writing as a language assessment task. This study validated a summary writing task developed by the National Education Examinations Authority by examining its constructs, the reliability and validity of the test score, the cognitive processing in summarization in order to collect evidence for or against evaluation inference, generalizability inference, extrapolation inference and explanation inference before it becomes operational by applying Kane’s argument-based validation framework to improve the test and interpretation (Kane, Crooks & Cohen, 1999; Kane, 1992, 2004, 2006, 2012). This study used both the quantitative and the qualitative methods to collect and analyze the data, and to integrate the findings and draw inferences. A test including MC reading comprehension, language use, an essay writing and a summarization task was administered to 216 participants from 3 different schools. The 216 summaries along with the test score, think-aloud protocols from 6 students, 7 summaries by model writers and scores from 4 raters were analyzed. The results of the construct analysis indicated that the core idea units, macro-rules, error free T-units, grammatical complexity, lexical richness and difficulty, macro-structure index, paraphrase quality and borrowing amount effectively distinguished participants according to their level of English proficiency. However, there were such exceptions as grammatical errors, deep cohesion index and mean length of T-units, in which the lowest level students outperformed its neighboring level students due to the lowest level participants’ higher amount of borrowing. The results proved that the test elicited the intended constructs. However, the amount of comment, the referential cohesion index, coordination index and the mean length of clause were against the theoretical expectations. These constructs could not be used in the human rating or computer-assisted rating. Suggestions have been made to improve the rating scale accordingly. Rasch model analysis and generalizability study results also showed the analytic rating scale outperformed the holistic one in reliability (G-coefficient) and the measurement precision (the candidates’ separation statistics and the number of students with acceptable fit), although both of them were qualified for the test purpose. The person variance explained 62.2%-74.5% of the total variance and two raters’ rating reached 0.80 above of score reliability in both holistic rating and analytic rating. Besides, there was no significant correlation between the summary score and the MC reading score but the summary score had significant correlation with the language use score and the essay writing score, suggesting the summary writing task tested language use ability and writing ability with reading ability as a prerequisite. Moreover, regression study revealed the macro-structure index, paraphrase quality, core idea units, the amount of borrowing, the error free T-units and the amount of misunderstanding were statistically significant predictors of the summarization performance. The results provided empirical evidence for the interpretation of the test score. Additionally, the relationship between the summary score and the students’ self-assessment were statistically significant with a medium effect size, suggesting that the score was relevant to language performance of interest beyond the test setting. The think-aloud protocol analysis showed global careful reading and the use of macro-rules were crucial in successful summarization performance. Moreover, the successful summary writers used knowledge transforming strategy. These findings were commensurate with the skilled summary writing process as evidenced by research in cognitive psychology. Also, the sum of local careful reading amount, selection, generalization and construction macro-rules could distinguish the learners according to their summary performance and the wrong choice of reading methods resulted in the summaries whose macro-structure were incongruent with that of the source text. However, the quantity of writing processes failed to discriminate among the proficiency levels. Triangulation between evidences from textual feature analysis, correlation studies and think-aloud protocol analysis revealed 1) summary writing involves the ability to strategically employ selection, generalization and construction macro-rules to extract the macro-structure of the source text and the ability to use one’s language potential to restate the source text and 2) expected scores were attributed to these constructs. The justification of the rating scale, the consistency of the scores across different rating scales and raters, the extent to which the scores are indicators of the construct have been adequately backed with evidence and they have formed an adequate and coherent chain with rebuttals weakened, proving the test is viable for a high-stakes English test in China. In sum, these results suggest that the test score of the summarization task reflects the ability of the test-takers to write a summary in English and is useful for aiding in selection decisions. This study has explored the constructs of a summary writing task for Chinese senior high school graduates, provided transparent interpretations for the test score and offered suggestions to improve the rating scale. More importantly, it has filled several research gaps in summary writing tests, contributing to a better understanding of how Chinese senior high school students perform in a summarization task.
﹀
|
参考文献总数: | 269 |
作者简介: | 作者曾长期从事高中英语教学,研究兴趣为语言测试,在《现代外语》、《中小学外语教学》、《中国外语教育》、《中国考试》等刊物上发表论文多篇。 |
馆藏地: | 图书馆学位论文阅览区(主馆南区三层BC区) |
馆藏号: | 博050211/1502 |
开放日期: | 2016-01-04 |