- 无标题文档
查看论文信息

中文题名:

 基于Hadoop的社交网络分析——以电信CDR(Call Detail Record)为例    

姓名:

 周立东    

学科代码:

 085212    

学科专业:

 软件工程    

学生类型:

 硕士    

学位:

 工程硕士    

学位年度:

 2014    

校区:

 珠海校区培养    

学院:

 研究生院珠海分院    

研究方向:

 基于Hadoop的大数据应用    

第一导师姓名:

 贺志军    

第一导师单位:

 北京师范大学珠海分校    

提交日期:

 2014-06-26    

答辩日期:

 2014-05-18    

外文题名:

 SOCIAL NETWORK ANALYSIS WITHIN TELECOM NETWORKS UTILIZING HADOOP    

中文摘要:
随着移动通信网络发展和移动终端设备的增多,电信运营商产生了大量数据,并且数据还在不断的增长中,尤其是电信网络中移动用户的相关业务数据。对于电信运营商来说,大量的数据意味着可以获得更多的关于每个用户的特定信息,基于这些信息可以产生很多应用,如用于市场营销的用户群分类或个性化推荐、用户离网预判、用户投诉预警等。因此处理分析网络中的数据可以产生巨大价值,而社交网络分析就是分析处理网络数据的一种方法。 由于社交网络分析缺乏明确的定义,本文的社交网络分析是基于海量关系数据的获取,将数据挖掘技术与社会网络分析技术融合,形成的社交网络分析技术。 本文旨在针对电信CDR数据实现一个关于社交网络分析方法的原型。开发了一个计算社区核心人物的处理流,流的输入数据即为电信CDR数据,处理数据时采用了基于标签传播的社区发现算法发现社区,利用PageRank算法计算了每个社区中的核心人物。最后结合多维关联和基于密度的聚类算法对用户群体进行了识别。由于电信网络和在线社交网络规模比较大,因此采用的工具和算法必须可以随着网络的不断增大而能线性扩展,本文讨论了不同社交网络分析算法的扩展性,特别的,比较了不同算法在Hadoop平台上的运行效率并在最后指出了其在迭代式算法方面的缺陷及其在线性算法扩展方面的优势。
外文摘要:
With the development of mobile communication network and the increasing number of mobile terminal equipment, telecommunication operators have produced large amount of data, especially for the data of Mobile Subscribers. For a telecommunication operator, the massive data means getting more information of specific subscribers. The applications of this are wide-ranged, such as segmentation for marketing purposes, personalization of information recommended, judgments of people about to switching operator early-warning of consumer complaints. Thus the analysis and information extraction is of great value. An approach of this analysis is that of social network analysis. Social network analysis is not usually understood, because it has no clear definition. The SNA concept here is based on massive data, combined with data mining technique, social network theory and analysis technique forms SNA. This thesis aims at implementing the prototype of social network analysis in CDR of telecommunication networks, which has developed a complete process flow for finding core subscribers. The flow uses inputs easily available to the telecommunication operator. In addition to using social network analysis, LPA is employed to discovered communities and PageRank is to found core subscribers. At last, combined with multidimensional association techniques, density-based algorithms to identify specific user group. As these networks can be very large the methods used to study them must scale linearly when the network size increases. Thus, an integral part of the study is to determine which social network analysis algorithms that have this scalability. Moreover, comparisons of software solutions are performed to find product suitable for these specific tasks.
参考文献总数:

 48    

馆藏地:

 总馆B301    

馆藏号:

 硕430113/1428    

开放日期:

 2014-06-26    

无标题文档

   建议浏览器: 谷歌 360请用极速模式,双核浏览器请用极速模式