- 无标题文档
查看论文信息

中文题名:

 基于大语言模型的国内外博物馆用户信息需求的主题挖掘和时间演化对比研究    

姓名:

 俞剑俐    

保密级别:

 公开    

论文语种:

 chi    

学科代码:

 120102    

学科专业:

 信息管理与信息系统    

学生类型:

 学士    

学位:

 管理学学士    

学位年度:

 2024    

校区:

 北京校区培养    

学院:

 政府管理学院    

第一导师姓名:

 李蕾    

第一导师单位:

 政府管理学院    

提交日期:

 2024-07-02    

答辩日期:

 2024-04-26    

外文题名:

 Comparative Study of Theme Mining and Temporal Evolution of Museum Visitor Information Needs Based on Large Language Models    

中文关键词:

 大语言模型 ; 博物馆用户 ; 信息需求 ; 主题挖掘 ; 时间演化    

外文关键词:

 Large Language Models ; Museum Visitors ; Information Needs ; Topic Mining ; Temporal Evolution    

中文摘要:

随着信息技术的飞速发展,大数据和人工智能技术在文化领域的应用日 益广泛,尤其是大语言模型(LLM)的出现,为深入理解博物馆用户的信息需 求ᨀ供了新的研究途径。本研究旨在通过大语言模型探索和比较国内外博物 馆用户的信息需求主题及其时间演化特征,以期为博物馆服务的优化和个性 化ᨀ供理论支持和实践指导。 首先,本研究聚焦于博物馆,在“知乎”、“百度知道”、“携程问答”、 “Quora”、“Tripadvisor”多个平台上,进行了广泛的互联网用户ᨀ问采集工作。 随后,采取了人工方式对这些数据进行精细化的预处理,旨在去除其中的噪音和 无效信息,确保数据的准确性和可靠性,从而为后续的主题分析工作奠定坚实的基 础。其次,应用大语言模型,包括“llama-2-7B”、“text-embedding-3- small”, “bge-m3”三个模型进行文本嵌入,实现文本基于句向量的篇章语义表示,然后 通过 UMAP 矩阵降维、hdbscan 主题聚类、ᨀ取主题词识别出用户ᨀ问中的关 键主题。对比分析国内外博物馆用户的信息需求主题热度差异以及主题热度 演变趋势。 研究发现,在主题热度方面,国内外博物馆用户对中国博物馆的核心需求 都是“访问信息”、“展览信息”、“藏品信息”、“历史与背景信息”均是 备受关注的热点,但在需求小类上存在一定差异。 从主题分布的角度来看,国内外博物馆用户对“教育资源信息”、“购物 及商品信息”关注度相似。但国外用户更关注“访问信息”,占比达 75%,且 “工作信息”非其关注焦点。 在时间演化方面,除“教育信息”和“购物及商品信息”外,国内外用户 关注的主题构成相似,主要聚焦在“访问信息”、“展览信息”、“藏品信息”和 “历史与背景信息”,且其演化趋势也基本一致,分布相对固定。

外文摘要:

With the rapid development of information technology, the application of big data and artificial intelligence technologies in the cultural sector is becoming increasingly widespread. Particularly, the emergence of Large Language Models (LLMs) provides new avenuesfor deepening the understanding of information needs of museum visitors. This study aims to explore and compare the information needs topics of museum visitors from both domestic and international perspectives using LLMs, focusing on their temporal evolution characteristics. The objective is to offer theoretical support and practical guidance for optimizing and personalizing museum services. Firstly, this study focuses on museums and collects extensive user questions from various platforms such as Zhihu, Baidu Zhidao, Ctrip Q&A, Quora, and Tripadvisor. Subsequently, manual preprocessing is conducted to refine the collected data, aiming to remove noise and irrelevant information to ensure the accuracy and reliability of the data, laying a solid foundation for subsequent thematic analysis. Secondly, Large Language Models, including "llama-2-7B," "text-embedding-3- small," and "bge-m3," are employed for text embedding to achieve sentence-level semantic representation. The UMAP matrix reduction, hdbscan topic clustering, and keyword extraction are then applied to identify key topics in the user questions. Comparative analysis is conducted to examine the differences in topic popularity and evolution trends between domestic and international museum visitors. The study reveals that in terms of topic popularity, core information needs for museum visitors both domestically and internationally focus on "visiting information," "exhibition information," "collection information," and "historical and background information." However, there are differences in subcategories of these needs. From the perspective of topic distribution, both domestic and international museum visitors show similar levels of interest in "educational resource information" and "shopping and merchandise information." Yet, international users exhibit a higher focus on "visiting information," accounting for 75%, while "work 3 information" is not a primary concern for them. In terms of temporal evolution, apart from "educational information" and "shopping and merchandise information," the topics of interest for both domestic and international users are similar, primarily focusing on "visiting information," "exhibition information," "collection information," and "historical and background information." The evolution trends of these topics are also largely consistent, with relatively stable distributions over time.

参考文献总数:

 30    

馆藏号:

 本120102/24014    

开放日期:

 2025-07-02    

无标题文档

   建议浏览器: 谷歌 360请用极速模式,双核浏览器请用极速模式