查看论文信息

查看全文

查看论文信息

中文题名：	初中生生物学科学论证能力的自动评分和即时反馈评价研究
姓名：	王聪
保密级别：	公开
论文语种：	chi
学科代码：	040102
学科专业：	课程与教学论
学生类型：	博士
学位：	教育学博士
学位类型：	学术学位
学位年度：	2023
校区：	北京校区培养
学院：	化学学院
研究方向：	生物学教育
第一导师姓名：	王磊
第一导师单位：	化学学院
提交日期：	2023-06-18
答辩日期：	2023-05-28
外文题名：	USE OF AUTOMATED SCORING AND FEEDBACK IN FORMATIVE ASSESSMENT OF CHINESE GRADES 7-9 STUDENTS' COMPETENCE IN BIOLOGICAL SCIENTIFIC ARGUMENTATION
中文关键词：	初中生 ; 生物学 ; 科学论证能力 ; 形成性评价 ; 自动评分 ; 即时反馈
外文关键词：	Middle school ; Biology ; Formative assessment ; Scientific argumentation ; Automated scoring ; Instant Feedback
中文摘要：	︿本研究旨在探讨基于主观题作答的自动评分和即时反馈对初中生生物学科学论证能力进行形成性评价的效果。研究通过四个主线任务展开：（1）构建基于机器学习的科学论证形成性评价理论模型；（2）开发用于测量初中生生物学科学论证能力的评价工具并评价其质量；（3）评价计算机自动评分模型的精准性并实现即时反馈功能；（4）选取初中七年级生物课程“生物圈中的绿色植物”单元开展教学和评价的准实验研究，探究使用本研究所开发的计算机自动评分和即时反馈平台对于初中生生物学科学论证能力的影响。任务一中，首先从理论层面出发，将以证据为核心的设计理念与形成性评价的通用框架相结合，并从教学空间需求和技术空间需求的角度进行初步探讨，构建了“基于机器学习的科学论证形成性评价理论模型”(Formative Assessment Model for Scientific Argumentation Based on Machine Learning，FAMSA-ML)。该模型在已有的较为成熟的三空间的以证据为中心的评价设计模型基础之上增加了“教学空间”和“技术空间”。在每个空间都详细探讨了与科学论证形成性评价相关的关键概念和因素，并阐述了在每个空间中的重要性。该模型不仅丰富了基于机器学习环境中形成性评价的理论意义和价值，同时也为本研究后续的工作提供了理论指导和参考。任务二中，基于机器学习的科学论证形成性评价理论模型的指导，遵循“四基石”系统设计开发了初中生物学科学论证能力评价工具。在明确评价科学论证心理构造的基础上，围绕单元教学需求设计了形成性评价任务及预期表现，并以七年级学生为关注对象定制了科学论证框架和预期表现，进行了两轮基于Rasch模型的工具质量检验，逐步优化了工具的信度与效度。第一轮小规模检验包括61名参与者，检验结果在整体指标良好的前提下，对工具进行了部分删除和评分标准调整。第二轮规模检验涉及137名参与者，为研究者提供了进一步的实证支持。经过这些环节的严谨执行和相应的调整，最终开发出了一套具有较高信、效度的针对初中生生物学科学论证能力的形成性评价工具，为后续研究解决机器学习的系列问题，以及设计有价值的课程素材奠定了基础。任务三重点关注了对科学论证主观题作答的计算机自动评分模型构建和反馈应用的实现。主要内容分为三部分：自动评分模型的试题作答评分效能评价、反馈内容生成策略以及反馈平台的搭建。研究通过第一轮研究和第二轮研究两轮构建基于机器学习的科学论证自动评分模型。在反馈内容生成策略方面，本研究设计了基于分析式评标的反馈内容，以及基于情境化的反馈实现。对自动评分的效能进行了实证检验和比较研究发现，至少需要800个人工标定完成的学生作答样本才能准确构建评分模型。基于分析式评分的评价比基于整体式评分的评价更为准确。自动评分准确性在不同学生作答长度间差异不大。为促进学生科学论证能力在形成性评价过程中的提升，本章还探讨了评价依据、反馈内容设计和实现，以及平台的搭建等关键问题。任务四开展了基于自动评分系统的科学论证形成性评价的教学实验研究，关注如何在真实课堂环境中基于单元教学设计并实施生物学科学论证能力形成性评价，并进一步观察评价效果。内容分为三个部分：单元课程的设计、教学实验的设计与实施和教学实验效果。首先，阐述了本研究基于单元学习的形成性评价设计理念。选取初中生物课程“生物圈中的绿色植物”单元，按照“单元主题－目标－活动－素材”的思路进行了课程的系统设计，并对如何将自动评分和即时反馈嵌入课程和教学过程中的实施环节和具体流程进行了阐明。采用准实验方法，设置实验组和对照组，分别为“机器反馈组”“人机协同反馈组”和“传统人工教学反馈组”。对三组学生的科学论证能力前后测的能力值进行对比，以了解基于计算机自动评分和即时反馈教学对实际教学效果的影响。实验组和对照组在实验过程中，课程选择、学习内容、学生水平和教学进度等方面都保持一致。研究结果表明，使用计算机即时反馈对初中生生物学科学论证表现具有显著促进作用，优于常规传统人工教学反馈。此外，研究发现修订率越高的学生，后测能力值越高，表明修订行为对中学生生物学科学论证表现具有促进作用。通过比较“传统人工教学反馈”“机器反馈”和“人机协同反馈”三种反馈与教学的结合方式对初中生生物学科学论证能力发展影响的量化和质性证据可知：三种结合方式在促进学生论证能力发展方面均取得了一定的成效，然而，“人机协同反馈”方式在这方面表现出更为明显的优势。这种方式将计算机多次的论证结构提示与教师对元认知的提示相结合，发挥了双方的优势，从而更有效地促进了学生生物学科学论证能力的提升。本研究提出了基于机器学习的科学论证形成性评价理论模型(FAMSA-ML)，为科学教育领域未来基于机器学习的科学实践形成性评价具有指导意义，拓展了机器学习在科学教育领域的相关应用研究。初中生生物学科学论证能力形成性评价工具的开发对科学论证形成性评价设计具有借鉴价值，基于主观题作答的自动评分模型的构建为基于中文的计算机自动评分在科学教育测评领域的应用提供示范。即时反馈的实现以及反馈与教学的不同结合方式为形成性评价教学实践提供了新的思路。﹀
外文摘要：	︿ This study aims to explore the effectiveness of using automated scoring and feedback in formative assessment of Grades 7-9 students' competence in biological scientific argumentation based on Constructed-Response (CR) items. The research unfolds through four main tasks: （1） constructing a formative assessment theoretical model for scientific argumentation based on machine learning; （2） developing assessment tools to measure the competence of Grades 7-9 students in biological scientific argumentation and evaluating their quality; （3） assessing the accuracy of the automated scoring model and implementing Instant Feedback functionality; （4） conducting a quasi-experimental study on the teaching and assessment of the "Green Plants in the Biosphere" unit in Grade 7 biology courses, investigating the impact of using the automated scoring and Instant Feedback system developed in this study on Grades 7-9 students' competence in biological scientific argumentation. In Task 1, the study starts from a theoretical perspective, combining evidence-centered design principles with a general framework for formative assessment. It preliminarily explores the integration from the perspectives of Instruction Space and Technique Space, constructing the "Formative Assessment Model for Scientific Argumentation Based on Machine Learning (FAMSA-ML)." This model adds "Instruction Space" and "Technique Space" to the existing and more mature evidence-centered assessment design model based on three spaces. Key concepts and factors related to formative assessment of scientific argumentation are discussed in detail within each space, and the importance within each space is elaborated. This model not only enriches the theoretical significance and value of formative assessment in a machine learning environment but also provides theoretical guidance and reference for subsequent work in this study. In Task 2, under the guidance of FAMSA-ML, the assessment tool for Grades 7-9 students' competence in biological scientific argumentation was systematically designed and developed following the "Four Pillars" framework. Based on a clear understanding of the psychological constructs of assessing scientific argumentation, formative assessment tasks and expected performances were designed around the unit teaching requirements. A scientific argumentation framework and expected performance were customized for Grade 7 students, and two rounds of tool quality tests based on the Rasch model were conducted, gradually optimizing the tool's reliability and validity. The first small-scale test involved 61 participants, and under the premise of overall satisfactory indicators, some deletions and adjustments to scoring criteria were made. The second large-scale test involved 137 participants, providing further empirical support for the researchers. After rigorous implementation and corresponding adjustments in these stages, a formative assessment tool with high reliability and validity for Grades 7-9 students' competence in biological scientific argumentation was ultimately developed. This laid a foundation for solving a series of machine learning problems in subsequent research and designing valuable curriculum materials. Task 3 focuses on the implementation of computer-automated scoring and Instant Feedback based on responses to CR items for Grades 7-9 students' competence in scientific argumentation. The main content is divided into two parts: the evaluation of the scoring efficacy of the automated scoring model for item responses and the generation strategy for feedback content. The study constructs a machine learning-based scientific argumentation automated scoring model through two rounds of research, the first and second rounds. In terms of feedback content generation strategy, this study designs feedback content based on analytical scoring rubrics and context-based feedback implementation. Empirical tests and comparative studies on the efficacy of the automated scoring model found that at least 800 student response samples with manual calibration are required to accurately construct a scoring model. Evaluation based on analytical scoring is more accurate than evaluation based on holistic scoring. The Cohen's kappa between human raters and machine does not vary greatly among different student response lengths. To promote the improvement of Grades 7-9 students' scientific argumentation skills in the formative assessment process, this chapter also discusses key issues such as assessment basis, feedback content design and implementation, and Instruction Space and Technique Space construction. Task 4 conducts teaching experimental research on the formative assessment of scientific argumentation based on the automated scoring system for GRADES 7-9 STUDENTS, focusing on how to design and implement formative assessment of biology scientific argumentation ability based on unit teaching in a real classroom environment, and further observe the assessment effect. The content is divided into three parts: the design of the unit course, the design and implementation of teaching experiments, and the effects of teaching experiments. First, the formative assessment design concept based on unit learning is explained. The biology course unit "Green Plants in the Biosphere" for Grades 7-9 Students is selected, and the course is systematically designed according to the idea of "unit theme-target-activity-material". The implementation steps and specific processes for embedding automated scoring and Instant Feedback into the course and teaching process are clarified. Using a quasi-experimental method, experimental groups and control groups are set up, including a "Machine Feedback Group," a "Human-Machine Collaborative Feedback Group," and a "Traditional Manual Teaching Feedback Group." The pre- and post-test ability values of the three groups of students' scientific argumentation abilities are compared to understand the impact of computer-automated scoring and Instant Feedback teaching on actual teaching effects. During the experiment, the experimental group and the control group maintained consistency in course selection, learning content, student level, and teaching progress. The results show that the use of computer Instant Feedback has a significant promoting effect on the performance of Grades 7-9 Students in biology scientific argumentation, which is better than the conventional traditional manual teaching feedback. In addition, the study found that the higher the revision rate of students, the higher the post-test ability value, indicating that the revision behavior has a promoting effect on the performance of middle school students' biology scientific argumentation. By comparing the quantitative and qualitative evidence of the impact of the three feedback and teaching integration methods, "Traditional Manual Teaching Feedback," "Machine Feedback," and "Human-Machine Collaborative Feedback" on the development of Grades 7-9 Students' biology scientific argumentation ability, it can be seen that all three integration methods have achieved certain effectiveness in promoting students' argumentation ability development. However, the "Human-Machine Collaborative Feedback" method shows more obvious advantages in this regard. This method combines the computer's multiple argument structure prompts with the teacher's metacognitive prompts, bringing out the advantages of both sides, and thus more effectively promoting the improvement of students' competence in biological scientific argumentation. This study proposes a Formative Assessment Model for Scientific Argumentation based on Machine Learning, which provides guidance for future formative assessments of scientific practices based on machine learning in the field of science education and expands the related application research of machine learning in this field. The development of a formative assessment tool for Grades 7-9 students' competence in biological scientific argumentation has reference value for the design of formative assessment for scientific argumentation. The construction of an automated scoring model based on CR item responses demonstrates the application of computer-automated scoring in the Chinese language for the field of science education assessment. The implementation of Instant Feedback and various combinations of feedback with teaching provide new ideas for formative assessment teaching practices. ﹀
参考文献总数：	232
馆藏地：	图书馆学位论文阅览区（主馆南区三层BC区）
馆藏号：	博040102/23015
开放日期：	2024-06-18

附件下载