- 无标题文档
查看论文信息

中文题名:

 词向量模型中的线性类比理论综述    

姓名:

 马倩    

保密级别:

 公开    

论文语种:

 chi    

学科代码:

 070101    

学科专业:

 数学与应用数学    

学生类型:

 学士    

学位:

 理学学士    

学位年度:

 2024    

校区:

 北京校区培养    

学院:

 数学科学学院    

第一导师姓名:

 蔡永强    

第一导师单位:

  数学科学学院    

提交日期:

 2024-05-17    

答辩日期:

 2024-05-07    

外文题名:

 Review of Linear Analogies in Word Vector Models    

中文关键词:

 自然语言处理 ; Word2vec模型 ; SGNS模型 ; GloVe模型 ; 线性词类比    

外文关键词:

 NLP ; Word2vec Model ; SGNS Model ; GloVe Model ; Linear Word Analogies    

中文摘要:

自然语言处理(NLP)旨在让计算机理解自然语言。为此,词向量的提出将自然语言符号信息转化为数值向量形式。最初词的one-hot表示存在无法表示相似性等问题,因此词的分布式表示被提出。神经网络语言模型(NNLM)、循环神经网络语言模型(RNNLM)、对数线性模型(Word2vec模型)等相继被提出。其中在Word2vec模型中最初发现线性词类比现象,例如:。这一发现为获得理解自然语言的高质量词向量提供了启示。但以上现象产生的理论机理尚未明晰。

本文综述已有的对该现象的理论研究,并进行拓展研究。其中Levy等人提出了负采样的Skip-gram模型(SGNS模型)等价于隐式地分解一个词-上下文矩阵。本文将其拓展到负采样的CBOW模型与GloVe模型。Allen等人为SGNS模型中线性词类比的存在性提供了基本理论。我们由此启发,推广到GloVe模型中线性词类比的存在性。并且本文证明了上述理论与Ethayarajh等人的理论等价,并提出在SGNS模型和GloVe模型线性词类比存在的充要条件。

本文旨在为语言模型学习概念表示和实现知识存储的可解释性提供理论基础。

外文摘要:

Natural Language Processing (NLP) aims to enable computers to understand natural language. To achieve this, the introduction of word vectors transforms symbolic information into numerical vector forms. Initially, the one-hot representation of words posed issues such as inability to express similarity, hence the proposal of distributed representations of words. Neural Network Language Model (NNLM), Recurrent Neural Network Language Model (RNNLM), New Log-linear Models (Word2vec Model), among others, have been successively proposed. The Word2vec Model initially discovered the phenomenon of Linear Word Analogies, exemplified by  . This discovery provides insights for obtaining high-quality word vectors to understand natural language. However, the theoretical mechanisms behind the phenomenon mentioned above remain unclear.

This paper reviews existing theoretical research on this phenomenon and conducts further exploration. Levy et al. proposed that the Skip-gram Model with negative sampling (SGNS Model) is equivalent to implicitly decomposing a word-context matrix. This paper extends it to the CBOW Model with negative sampling and the GloVe Model. Allen et al. provided fundamental theoretical support for the existence of the phenomenon in the SGNS Model. Inspired by this, we generalize the existence to the GloVe Model and demonstrate the equivalence of the aforementioned theory with Ethayarajh et al.'s theory, proposing sufficient and necessary conditions for the existence of Linear Word Analogies in the SGNS Model and GloVe Models.

This paper aims to provide a theoretical foundation for the interpretability of language models in learning concept representations and achieving knowledge storage.

参考文献总数:

 23    

馆藏号:

 本070101/24097    

开放日期:

 2025-05-18    

无标题文档

   建议浏览器: 谷歌 360请用极速模式,双核浏览器请用极速模式