中文题名: | 多模态数据驱动的形成性评价研究 ——以英语演讲能力测评为例 |
姓名: | |
保密级别: | 公开 |
论文语种: | chi |
学科代码: | 040110 |
学科专业: | |
学生类型: | 博士 |
学位: | 教育学博士 |
学位类型: | |
学位年度: | 2023 |
校区: | |
学院: | |
研究方向: | 多模态学习分析;教育技术基本理论 |
第一导师姓名: | |
第一导师单位: | |
提交日期: | 2023-06-15 |
答辩日期: | 2023-06-03 |
外文题名: | Research on Formative Assessment Driven by Multimodal Data ——Taking English public speaking competence assessment as an example |
中文关键词: | |
外文关键词: | Multimodal data ; Formative assessment ; English public speaking competence ; Human-AI collaboration |
中文摘要: |
各类智能技术不断的创新发展为学习者提供了新型的智能评价环境,采用多模态数据驱动的评价方式为教育提供新机遇。本研究通过结合教育技术领域的多模态数据与教育评价领域的形成性评价,落地于应用语言领域的英语演讲教学活动。围绕三轮演讲教学,分别设计了多模态数据驱动的“自我评价+同伴评价+教师评价”的组合评价模式,以及多模态数据驱动的“自我评价+机器评价+教师评价”的组合评价模式。通过开展准实验研究,采用演讲评价标准和调查问卷收集数据,包括自我评价、同伴评价、机器评价以及教师评价的成绩数据,英语演讲自我效能感、英语演讲学习投入度、对评价模式的态度、对多模态教学与评估平台的技术接受度等四类问卷数据,以及学习者对于问卷中开放问题的填答数据。在数据分析阶段,利用描述性统计、T检验、协方差分析等方法分析量化数据,采用编码分析和卡方检验的方法分析质性数据,试图揭示多模态数据驱动形成性评价的机制与规律。 第一阶段,分析了多模态数据驱动的两种组合评价模式中各种评价形式两两之间的关系。首先,在“自我评价+同伴评价+教师评价”的组合评价模式中,Pearson相关性和T检验的结果表明,同伴评价与教师评价在所有维度均具有一定程度的相关性,教师评价相较于同伴评价的评分更为严格;自我评价与教师评价在部分维度存在显著相关性,教师评价相较于自我评价的评分更为严格;自我评价与同伴评价在某些维度存在显著相关性。其次,在“自我评价+机器评价+教师评价”的组合评价模式中,Pearson相关性和T检验的结果表明,机器评价与教师评价仅在部分维度均具有一定程度的相关性;同样,自我评价与教师评价在部分维度存在显著相关性,教师评价相较于自我评价的评分更为严格;自我评价与机器评价,仅在演讲焦虑调控能力维度存在负相关性。 第二阶段,采用描述性统计、T检验、协方差分析等方法验证了两种不同的多模态数据驱动的组合评价模式的应用效果。其中,第一个重要的研究发现是机器评价可以发挥与同伴评价相类似的作用,第二个重要的研究发现是机器评价对于调动学习者演讲过程中激情洋溢,感染力强的情绪相较于同伴评价产生了显著影响,第三个重要的研究发现是同伴评价对于英语演讲社交投入度相较于机器评价产生了显著影响。 第三阶段,采用描述性分析、T检验、协方差分析等量化分析方法和编码分析的质性分析方法,探讨了学习者对于不同评价模式的态度以及对于多模态教学与评估平台的技术接受度。其中,第一个重要发现是学习者对同伴评价的积极态度后测得分相比于前测得分有显著提升。第二个重要发现是学习者对机器评价的积极态度后测得分相比于前测得分有显著提升。第三个重要发现是学习者对于同伴评价或机器评价的协方差分析表明两组在积极态度、理解和行动、相关性等方面均无显著性差异。第四个重要发现是“自评+互评+师评”组的学习者和“自评+机评+师评”组的学习者的感知有用性均获得显著提升。第五个重要发现是,通过协方差分析的统计结果表明在排除前测混杂干扰后,自评+互评+师评的学习者的服务质量后测得分显著高于自评+机评+师评的学习者服务质量后测得分。 本研究通过结合多模态数据与形成性评价,应用于英语演讲教学。通过文献阅读与理论分析,构建了两种不同的多模态数据驱动的英语演讲能力形成性评价组合模式,进而设计并开展准实验研究。通过验证并运用成熟的演讲标准与问卷工具,收集了准实验研究过程中的成绩数据和问卷数据。通过数据分析与验证,发现了相关的教育教学规律。创新点主要体现在:第一,构建多模态数据驱动的英语演讲能力评价模式,为教育技术领域如何设计与利用多模态数据,如何结合形成性评价等教育评价理论和实践,如何深入具体学科教学,如何采取人机协同的教学方式,提供了实证依据和重要启示。第二,通过准实验研究,确立了多模态数据驱动的人机协同的教学方式方法与规律,提出“自我评价+同伴评价+机器评价+教师评价”的人机协同的组合评价模式可能是当前形成性评价的最优解。第三,提供了多模态数据驱动的形成性评价的理论诠释,从经典理论视角、技术支持的过程性测评理论、复杂技能测评理论等三方面为多模态数据如何驱动形成性评价提供了理论诠释。 |
外文摘要: |
The dynamic evolution and development of various AI technologies provided learners with ample opportunities to be exposed in AI learning environment. Formative assessment driven by multimodal data has injected new vitality into the education research field. This study reflects its multidisciplinary nature in that it combines multimodal data in educational technology with formative assessment in educational testing and assessment, and English public speaking instruction in applied linguistics. The instructional intervention was comprised of three rounds of English public speech tests and a serious of formative assessment, which lasted for sixteen weeks. The first group carried out self-, peer- and teacher assessment on their public speaking performance, while the second group undertook self-, automated-, and teacher assessment. The quasi-experimental study employed multi-source data. Quantitative data included public speaking performance grades from self-assessment, peer assessment, automated assessment, and teacher assessment as well as students’ questionnaire responses of English public speaking self-efficacy, engagement, attitudes towards assessment modes, and technology acceptance of the self-designed multimodal data-driven assessment platform. Qualitative data included learners’ responses to open-ended questions in the questionnaire. Descriptive data analysis, T-test and ANCOVA were used to analyze qualitative data. The study used content analysis and chi-square test to analyze qualitative data. In the first phase, the interplay between various assessment practices enhanced by multimodal data in the two groups was analyzed. Firstly, in the first group using self-, peer- and teacher assessment, the results of Pearson correlation and T-test showed that peer assessment was correlated with teacher assessment in all dimensions to some extent while peer assessment was more lenient than teacher assessment. Self-assessment and teacher assessment showed significant correlation in certain dimensions while self-assessment was more lenient than teacher assessment. There was a significant correlation between self-assessment and peer assessment in some dimensions, and self-assessment was more lenient than teacher assessment. Secondly, in the second group using self-, automated-, and teacher assessment, the results of Pearson correlation and T-test showed that automated assessment was correlated with teacher assessment in some dimensions. Similarly, self-assessment and teacher assessment showed significant correlations in some dimensions, and self-assessment was more lenient than teacher assessment. Self-assessment was negatively correlated with automated assessment only in the dimension of speech anxiety regulation ability. In the second stage, descriptive analysis, T-test and ANCOVA were used to verify the effects of two modes of multimodal data-driven assessment. The first valuable finding was that automated assessment can play a similar role as their peer counterparts. The second significant finding was that automated assessment was better at mobilizing learners’ passion and emotions in the process of speech than peer assessment. The third important finding was that peer assessment had a significant impact on social engagement in learners’ English public speaking learning compared with automated assessment. In the third phase, quantitative data analysis approaches such as descriptive analysis, T-test, ANCOVA were used. Qualitative analysis of content analysis and chi-square test analysis were used to explore learners’ attitudes towards different assessment modes and their technology acceptance of the assessment platform. Firstly, the post-test score of learners’ positive attitude towards peer assessment was significantly higher than their pre-test score. Besides, the post-test score of learners’ positive attitude towards automated assessment was significantly higher than the pre-test score. Thirdly, there is no significant difference between the two groups in terms of positive attitude, understanding and action, and relevancy. In addition, the perceived usefulness of learners in both groups has been significantly improved. Lastly, ANCOVA results showed that learners’ perceived service quality in group 1 was significantly higher than those in group 2. Integrating multimodal data into formative assessment, this study explored effects of two modes of formative assessment driven by multimodal data on learners’ English public speaking. By verifying and using public speech evaluation rubrics and questionnaires, the performance data and questionnaire data in the process of quasi-experimental research were collected. Certain pedagogical and educational implications were found. First, the multimodal data-driven assessment modes provide empirical evidence for and shed lights on how to design and use multimodal data in educational technology, how to combine educational assessment theory and practice such as formative assessment, how to deepen the teaching of specific subjects, and how to adopt a collaborative human-machine teaching approach. Secondly, through quasi-experimental research, the teaching and process of multimodal data-driven human-machine collaboration were established, and the combination assessment mode of self-, peer-, automated, teacher assessment is proposed. It may be the optimal solution of the current formative assessment. Thirdly, it provides a theoretical interpretation of multimodal data-driven formative asse, providing a theoretical interpretation of how multimodal data drives formative assessment from three aspects: classical theoretical perspective, process evaluation theory supported by technology, and complex skill evaluation theory. |
参考文献总数: | 343 |
馆藏地: | 图书馆学位论文阅览区(主馆南区三层BC区) |
馆藏号: | 博040110/23005 |
开放日期: | 2024-06-14 |