查看论文信息

查看全文

查看论文信息

中文题名：	基于作文文本分析的青少年抑郁风险预测研究
姓名：	张桂婷
保密级别：	公开
论文语种：	中文
学科代码：	04020001
学科专业：	01基础心理学（040200）
学生类型：	硕士
学位：	教育学硕士
学位类型：	学术学位
学位年度：	2022
校区：	北京校区培养
学院：	心理学部
研究方向：	基础心理学
第一导师姓名：	周可
第一导师单位：	北京师范大学心理学部
提交日期：	2022-06-23
答辩日期：	2022-06-23
外文题名：	RISK PREDICTION OF ADOLESCENT DEPRESSION BASED ON COMPOSITION TEXT ANALYSIS
中文关键词：	青少年抑郁 ; 作文 ; 文本分析 ; 自然语言处理 ; 机器学习 ; 深度神经网络
外文关键词：	Adolescent depression ; Composition ; Text analysis ; Natural language processing ; Machine learning ; Deep neural networks
中文摘要：	︿全球罹患心理疾病的人数逐年攀升，心理健康已成为世界各国关注的重点。其中，抑郁症是当前主要关注的心理疾病类型。由于会对人的心理、生理与社会功能造成极大危害，抑郁症已成为新兴的公共卫生问题。需要关注的是，近年来抑郁症在青少年群体中呈高发趋势。青少年抑郁会损害个体的认知能力，影响正常的身心发展，也可能会导致成年后产生其他心理疾病，造成失业、自杀等严重的社会问题。因此，尽早筛查和诊断青少年抑郁，对青少年的正常发展来说至关重要。目前，抑郁症诊断方法主要依赖于专家评估与问卷测量。这些方法在大范围人群的应用方面存在一些局限，不适用于广泛筛查。因此，迫切需要建立一种能适用于在校青少年群体的便捷方法来辅助检测青少年抑郁。以往研究表明语言可以反映出人们内在心理状态。因此，在本论文中，作者拟通过对青少年群体的作文进行文本分析，实现预测青少年抑郁的目的。具体来说，作者搜集了不同年龄层共4257名学生的抑郁量表数据以及他们的作文文本。作者按照量表得分将被试分为高抑郁风险组和低抑郁风险组，并构建了三类自然语言处理的计算模型，根据被试的作文文本预测其抑郁风险。在研究一中，作者使用LIWC词典以词频计算的方式先验地捕捉作文文本的心理语言学特征，并使用经典的机器学习模型进行分类，考察该模型能否预测出被试的抑郁风险。结果发现，从作文中提取的心理语言学特征可以用来预测被试的抑郁风险，F-measure值达到0.60。在研究二中，作者进一步使用无监督学习的word2vec模型对文本进行词嵌入，提取文本的抽象信息，同样使用经典机器学习模型进行分类。结果发现，该方法能更好的预测被试的抑郁风险，F-measure值达到0.64。在研究三中，作者同样使用word2vec模型抽提文本特征，但结合最新的深度神经网络模型，如卷积神经网络（TextCNN）与递归神经网络（TextRNN），考察能否更好地预测被试的抑郁风险。结果发现，TextCNN表现与研究二相近，F-measure值为0.64，而使用TextRNN进行预测的表现最佳，F-measure值达到0.74。综上所述，使用不同的特征提取方式并结合不同的机器学习模型，作者发现，word2vec结合TextRNN的模型在预测抑郁风险方面要优于其他所有测试的计算模型。这可能是因为RNN能捕捉到双向序列信息，再结合上利于记忆长时信息的门控结构，可以对文本的上下文信息进行有效的特征提取与分类。本研究通过实验数据和计算分析验证了使用作文来检测抑郁风险的可行性，为青少年抑郁的筛查提供了一个能适用于大规模人群的便捷新思路，可以在青少年日常学习状态下辅助检测青少年抑郁。﹀
外文摘要：	︿ The number of people suffering from mental disorders is rising globally year by year. Mental disorders are now the leading causes of the global health-related burden, with depressive and anxiety disorders being leading contributors to this burden. Depression can pose the significant damage to human’s psychological, physiological and social functioning. In recent years, depression has a high incidence trend among the adolescent population. Adolescent depression can impair an individual's cognitive capacities and affect his/her normal physical and mental development. It is also one of the main factors that may cause other psychological disorders in adulthood, and result in serious social problems such as unemployment and suicide. Therefore, early screening and diagnosis of adolescent depression is crucial to the normal development of adolescents. At present, the diagnosis of depression mainly relies on expert evaluation and questionnaire measurement. These methods are difficult to be applied to large populations for screening purpose due to the need for one-to-one testing. A convenient method to assist in the detection of adolescent depression is thus needed. Previous studies have shown that language can reflect people's internal psychological state. Therefore, in this paper, the author intends to achieve the purpose of predicting adolescent depression through text analysis of adolescent compositions. Specifically, the author collected depression scale data and compositions data from a total of 4,257 students of different ages. Based on the scores of the depression scale, participants were divided into high-risk depression group and low-risk depression group. The author constructed three different kinds of natural language processing models and evaluated their performance in predicting the risk of depression through text analysis of participants' compositions. In the Study 1, the author used the LIWC to capture the psycholinguistic features of participants’ compositions. Then, these features served as the input to the classical machine learning models for classification. By doing so, the author investigated whether these models could predict participants’ risk of depression through psycholinguistic features. The results showed that psycholinguistic features extracted from the composition could be used to predict the participants’ risk of depression with an f-measure value of 0.60. In the Study 2, the author used the word2vec, an unsupervised learning model, to extract features of compositions by a word embedding method. Then the same classical machine learning models were used for classification. The results showed that this method outperformed the method used in study one, and could predict the risk of depression with an f-measure value of 0.64. In the Study 3, word2vec was also used to extract text features, and the state-of-art deep neural network models, such as convolutional neural network (TextCNN) and recurrent neural network (TextRNN), were used to classify whether the participants belonged to the high-risk or low-risk group. The results showed that prediction performance of the TextCNN was similar to that of the Study 2, with an f-measure value of 0.64, and the prediction performance of the TextRNN outperformed all the other models, with an F-measure value of 0.74. In conclusion, combining different feature extraction methods with various machine learning models, the author found that TextRNN model with feature extracted using the word2vec outperformed all other models in predicting risk of depression. This may be due to that the “gate” structure of the RNN model is more conducive to retain long-term information, and thus can capture bidirectional sequence information of the text. This study verified the feasibility of using composition to detect the adolescents’ risk of depression, and provided a convenient and applicable tool to early screening of adolescent depression in a large population, which can assist in the detection of adolescent depression among students in school. ﹀
参考文献总数：	114
馆藏号：	硕040200-01/22002
开放日期：	2023-06-23

附件下载