查看论文信息

查看全文

查看论文信息

中文题名：	探究GPT命题的最佳提示词——以四、八年级数学试题为例
姓名：	王珞宇
保密级别：	公开
论文语种：	chi
学科代码：	025200
学科专业：	应用统计
学生类型：	硕士
学位：	应用统计硕士
学位类型：	专业学位
学位年度：	2024
校区：	珠海校区培养
学院：	统计学院
研究方向：	教育测量与大数据挖掘
第一导师姓名：	李峰
第一导师单位：	中国基础教育质量监测协同创新中心
提交日期：	2024-06-15
答辩日期：	2024-05-22
外文题名：	EXPLORE THE BEST PROMPT FOR GPT QUESTION GENERATION—TAKING THE 4th AND 8th GRADE MATHEMATICS TEST QUESTIONS AS EXAMPLES
中文关键词：	GPT命题 ; 提示词 ; 自动问题生成 ; 试题命制效果 ; 试题评价
外文关键词：	GPT question generation ; Prompt ; Automatic Question Generation ; Effectiveness of question generation ; Evaluation of test questions
中文摘要：	︿算法系统自动生成数学文字题的研究已经取得明显进展，相关研究展现了生成的试题在表述清晰、情境设置真实、试题难度及试题的可用性方面的效果。近两年，研究者开始探究GPT生成数学试题的效果，并将其与人工命题比较，发现两者间不存在显著差别，但对影响GPT问题生成质量的提示词的研究较少。本研究选取角色、内容要求、难度、步骤及答案四个提示词因子进行正交实验设计，得到8种提示词组合，并根据提示词组合给予GPT-4.0提示，使其在每种提示词组合下生成题目。为评估GPT命题的效果，研究从8种提示词组合生成的20道试题中随机抽选5道，邀请某大学数学学科教学专业的10名研究生进行试题评价。为了考察人工命制的试题和8种提示词组合生成的试题是否存在显著差异，研究选取了5道人工命题参与试题评价。试题评价前，为每位评价者提供了新课标中与四、八年级相关的重要内容及例题供参考，避免评价者对于四、八年级知识点的了解不足带来的偏差。研究结果显示，GPT命题与人工命题在评估标准得分上无显著差异。角色、步骤及答案提示词影响试题在评估标准上的得分。在角色提示词下，GPT命制的试题拥有更高的知识准确性、表达更清晰简明、试题的直接适用性也较好；步骤及答案提示词影响试题的运算步骤，在试题数学运算的准确性上有更好的效果。内容要求、难度是与试题难度相关的提示词，影响试题难度水平，包含这两个提示词因子的提示在生成试题的难度水平方面与预期接近。﹀
外文摘要：	︿ Research on the algorithm automatic generation of mathematical word problems has made significant progress, and relevant studies have demonstrated the effectiveness of the generated test questions in terms of clear expression, authentic contextual settings, question difficulty, and usability. In the past two years, researchers have begun to explore the effect of GPT in generating mathematical test questions and compare it with manually created ones, and finding no significant difference between them. However, there has been less research on the prompts that affect the quality of GPT's question generation. In this study, we conducted an orthogonal experimental design using four prompt factors: Role, Content, Difficulty, Steps and Answers. Eight combinations of prompt words were obtained, and GPT-4.0 was prompted based on these combinations to generate questions under each set of prompts. To assess the effectiveness of GPT-generated questions, five questions were randomly selected from the 20 questions generated by the eight prompt word combinations, and 10 graduate students majoring in mathematics education from a university were invited to evaluate the questions. To investigate whether there were significant differences between the manually crafted questions and the questions generated by the eight prompt combinations, five human created questions were selected for evaluation. Before the evaluation, each evaluator was provided with important content and example questions related to the fourth and eighth grades in the Curriculum Standards for reference, to avoid biases caused by insufficient understanding of the knowledge in these grades. The results indicate that there is no significant difference between GPT-generated questions and manually created questions in terms of evaluation criteria scores. The Role, Steps and Answers prompts influence the scores of the evaluation criteria. Under the Role prompt, GPT-generated questions exhibit higher knowledge accuracy, clearer and more concise expression, and better applicability. The Steps and Answers prompt affect the computational steps of the questions, resulting in better accuracy in mathematical operations. Content and difficulty are prompts related to the difficulty level of the questions, influencing the difficulty level of the generated questions. Prompts containing these two factors generate questions with difficulty levels close to expectations. ﹀
参考文献总数：	54
作者简介：	王珞宇，1999年6月生，本科毕业于上海外国语大学新闻传播学院广告学专业，硕士毕业于北京师范大学应用统计专业。
馆藏地：	总馆B301
馆藏号：	硕025200/24113Z
开放日期：	2025-06-15

附件下载