查看论文信息

查看全文

查看论文信息

中文题名：	数据驱动的在线系统用户冷启动问题研究
姓名：	李冬雪
保密级别：	公开
论文语种：	chi
学科代码：	071101
学科专业：	系统理论
学生类型：	硕士
学位：	理学硕士
学位类型：	学术学位
学位年度：	2024
校区：	北京校区培养
学院：	系统科学学院
研究方向：	信息挖掘
第一导师姓名：	陈家伟
第一导师单位：	系统科学学院
提交日期：	2024-06-27
答辩日期：	2024-05-29
外文题名：	Research on Cold-Start Problem in Online Systems Driven by Data
中文关键词：	冷启动问题 ; 信息挖掘 ; Bootstrap ; 热门项目
外文关键词：	Cold-Start Problem ; Information Mining ; Bootstrap ; Popular Item
中文摘要：	︿随着互联网浪潮袭来，人们在浩如烟海的信息中想要快速寻找到所需的内容是一个被迫面临的现实难题。推荐系统的出现，无疑是解了燃眉之急，也由此奠定了其在在线系统中的不可或缺的地位。然而，面对刚刚加入系统的用户时，用户的冷启动问题显现，这是因为系统缺乏该用户的数据，无法为其提供有效的推荐。在解决推荐系统的冷启动问题的过程中，研究者们开创出了多种不同的方法。但是我们发现，绝大多数的方法倾向于考虑热门项目，因为热门项目在对新用户一无所知的情形下，不失为一种较为有效且不易出错的方式。但是对于新用户来说，热门项目就一定好吗？本文主要是针对用户冷启动问题，构建了一个具有时间信息和网络结构的统计模型，使用它来识别出在线系统中的首推项目并与热门项目进行对比，解决新用户的冷启动问题。本文的工作内容主要是以下四点：（1）本文首先对推荐系统中冷启动问题的背景、意义、研究现状进行了叙述，分析了在线系统信息挖掘的相关综述，总结并介绍了三大类解决冷启动问题的策略，分别为传统策略、数据驱动的策略和方法驱动的策略。虽然这些策略为缓解冷启动问题做出了很大贡献，但是我们发现了他们的一个共同点，都不约而同地考虑了热门项目。为此，我们注册了3个平台作为新用户观测推荐的项目，事实证明平台均向我们推荐了热门的项目。（2）本文基于构建的模型，从系统的角度出发关注节点之间的链接，构建衡量项目特殊度的指标，通过设计Bootstrap模拟实验进行对比，寻找到了一类独特的项目，我们将其作为首推项目，并证明了项目在连接用户时是异质的。在Amazon数据集和Delicious数据集中均找到了这样的典型范例。在条件下采样的Bootstrap模拟中最大的特殊度永远不会大于实际的的最大特殊度，Amazon数据集和Delicious数据集分别被识别出148个和255个首推项目。（3）本文比较首推项目和普通项目连接到的新用户的未来活跃程度，发现首推项目连接到的新用户确实在未来更加活跃，于是使用该模型来向新用户推荐首推项目，提升用户体验以此增强用户对平台的黏性和用户未来的活跃度，从而解决冷启动问题。对Amazon数据中热门项目的特殊度进行了统计特征分析。（4）本文分析了首推项目的性质。首推项目稳定性高，持久度强，并且其中部分首推项目不会随着时间的变化而丧失首推的特性，热门项目与首推项目之间只有很少一部分重合，首推项目并非是热门项目。计算出首推项目集合的更新的时间窗口长度为63天。﹀
外文摘要：	︿ With the advent of the Internet, people are forced to face a real problem in quickly finding the content they need in the vast sea of information. The emergence of the recommendation system undoubtedly solves the urgent need, and thus establishes its indispensable position in the online system. However, when faced with a user who has just joined the system, the user's cold start problem sometimes appears. This is because the system lacks the user's data and cannot provide effective recommendations for them. In the process of solving the cold start problem of recommendation systems, researchers have created a variety of different methods. However, we found that the vast majority of methods tend to consider popular projects, because popular projects are a more effective and less error-prone method when new users know nothing about them. But for new users, must a popular project be good? This article mainly focuses on the problem of user cold start. It builds a statistical model with time information and network structure, uses it to identify the first recommended projects in the online system and compares them with popular projects to solve the cold start problem of new users. The main contents of this article are the following four points: (1) This article first describes the background, significance, and research status of the cold start problem in recommendation systems, analyzes related reviews of online system information mining, and summarizes and introduces three major categories of strategies to solve the cold start problem, namely Traditional strategies, data-driven strategies and method-driven strategies. Although these strategies have made great contributions to mitigating the cold start problem, we found that they all have one thing in common: they all consider popular projects. To this end, we registered three platforms as projects for new users to observe and recommend, and it turns out that the platforms all recommended popular projects to us. (2) Based on the built model, this article focuses on the links between nodes from a system perspective, constructs indicators to measure the specialness of the project, and designs a Bootstrap simulation experiment for comparison, and finds a unique type of project, which we regard as the first recommended project. and demonstrated that projects are heterogeneous in connecting users. Such typical examples are found in both the Amazon dataset and the Delicious dataset. The maximum distinctiveness in the Bootstrap simulation sampled under the condition is never greater than the actual maximum distinctiveness. 148 and 255 top items were identified in the Amazon dataset and the Delicious dataset respectively. (3) This article compares the future activity levels of new users connected to first-recommended projects and ordinary projects, and finds that new users connected to first-recommended projects will indeed be more active in the future, so this model is used to recommend first-recommended projects to new users, improving The user experience enhances the user's stickiness to the platform and the user's future activity, thus solving the cold start problem. A statistical feature analysis was conducted on the specialness of popular items in Amazon data. (4) This article analyzes the nature of the first recommended projects. The first-promoted projects have high stability and strong durability, and some of the first-promoted projects will not lose their first-promoted characteristics over time. There is only a small overlap between popular projects and first-promoted projects, and the first-promoted projects are not Popular items. The length of the time window for updating the top project collection is calculated to be 63 days. ﹀
参考文献总数：	53
馆藏号：	硕071101/24001
开放日期：	2025-06-27

附件下载