- 无标题文档
查看论文信息

中文题名:

 融合文本特征的职员离职预测研究 ——基于生存分析方法    

姓名:

 柯倩    

保密级别:

 公开    

论文语种:

 chi    

学科代码:

 045400    

学科专业:

 应用心理    

学生类型:

 硕士    

学位:

 应用心理硕士    

学位类型:

 专业学位    

学位年度:

 2024    

校区:

 珠海校区培养    

学院:

 心理学部    

研究方向:

 应用心理    

第一导师姓名:

 徐永泽    

第一导师单位:

 心理学部    

提交日期:

 2024-06-20    

答辩日期:

 2024-05-23    

外文题名:

 A Study of Employee Turnover Prediction by Fusing Textual Features - Based on Survival Analysis    

中文关键词:

 职员离职 ; 离职预测模型 ; 文本特征 ; 职场社交平台 ; 生存分析    

外文关键词:

 Employee turnover ; Turnover prediction model ; Text feature ; Workplace social platforms ; Survival analysis    

中文摘要:

在当今商业环境下,人才流失对企业的影响不容忽视。高离职率会导致企业成本上升,业务稳定性下降。因此,借助现代技术来预测员工的离职风险,并据此采取相应的措施以降低离职率,已成为企业管理中至关重要的一环。
然而,目前离职预测模型的构建仍面临一些挑战。首先,该领域尚缺乏统一的理论框架,不同研究者在选择预测变量和方法时存在差异。其次,传统离职预测往往过于关注职员行为数据,忽视了情感和心理状态对离职行为的重要影响。第三,离职预测数据规模较小、来源单一且变量类型相对较少,一定程度上制约了模型的预测精度和应用范围。此外,许多现有模型是静态的,未能充分考虑不同时间段离职风险的发生概率,难以适应职场环境的快速变化及职员离职动机的可能转变。
本研究在前人基础上进行了创新性整合,为构建出具有相当预测能力的离职预测模型,基于中国最大的职场社交平台之一脉脉的公开用户数据开展了四阶段的研究。在数据获取与预处理阶段,清洗并提取了2020年1月1日至2022年12月31日共计4087个工作事件的基本信息和文本数据集。在特征提取阶段,既选用了经典的人口学特征,也通过情感分析、基于词典TF-IDF的主题提取、语义表征分析三种自然语言方法挖掘文本特征。在模型构建阶段,分别采用了Cox比例风险模型和随机生存森林模型两大生存分析模型,并输入了不同的特征进行离职风险预测建模。在模型评估与比较阶段,运用了一致性系数、综合Brier分数、平均累计/动态AUC等多重评估指标和工具,全面评估了模型的预测能力。
研究结果显示,随机生存森林模型在本研究的数据集上表现最佳。与仅依赖人口学特征的模型相比,融合了文本特征的离职预测模型在性能上有了显著提升,其中一致性指数最高提升了3.38%,累计/动态AUC值提升了3.43%。此外,本研究发现在文本特征提取方法中,语义表征向量与情感得分的结合对于提升模型的预测性能最为有效。

外文摘要:

In today's business environment, the impact of brain drain on organizations cannot be ignored. A high turnover rate can lead to increased costs and decreased business stability. Therefore, using modern technology to predict the risk of employee turnover and take appropriate measures to reduce the turnover rate has become a critical part of business management.
However, the construction of turnover prediction models still faces several challenges. First, there is a lack of a unified theoretical framework in the field, and there are differences in the selection of predictor variables and methods among researchers. Second, traditional exit prediction tends to focus too much on employee behavioral data, ignoring the important influence of emotion and psychological state on exit behavior. Third, the small size, single source, and relatively few types of variables in the exit prediction data limit the prediction accuracy and application scope of the model to some extent. In addition, many existing models are static and do not adequately consider the probability of occurrence of separation risk at different time periods, making it difficult to adapt to the rapid changes in the workplace environment and possible shifts in the motivation of employees to leave their jobs.
Based on the innovative integration of previous works, this study conducted a four-step research based on the public user data of one of the largest workplace social platforms in China to construct a turnover prediction model with considerable predictive ability. In the data acquisition and preprocessing stage, basic user information and posts text datasets were extracted and cleaned. In the feature extraction stage, both classical demographic features were selected and text features were extracted by three natural language methods, namely sentiment analysis, lexicon TF-IDF-based topic extraction, and semantic representation analysis. In the model construction stage, we adopted two major survival analysis models, the Cox proportional risk model and the random survival forest model, respectively, and entered various features to model the turnover risk prediction. In the model evaluation and comparison stage, several evaluation metrics and tools, such as C-index, integrated Brier scores, and average cumulative/dynamic AUC, were used to comprehensively evaluate the predictive ability of the models. 
The study results indicate that the RSF model had stronger fitting ability and prediction accuracy on the dataset used in this study. Compared to the model that relies solely on demographic features, the turnover prediction model that incorporates text features shows improved performance. The consistency index improved by 3.38% and the cumulative/dynamic AUC value improved by 3.43%. The combination of semantic representation vectors and sentiment scores has the most significant effect on improving the performance of employee turnover prediction models among textual feature extraction methods.

参考文献总数:

 70    

馆藏地:

 总馆B301    

馆藏号:

 硕045400/24069Z    

开放日期:

 2025-06-20    

无标题文档

   建议浏览器: 谷歌 360请用极速模式,双核浏览器请用极速模式