查看论文信息

查看全文

查看论文信息

中文题名：	异构平台下的数据库索引研究
姓名：	黄翰
保密级别：	公开
论文语种：	中文
学科代码：	081202
学科专业：	计算机软件与理论
学生类型：	硕士
学位：	工学硕士
学位类型：	学术学位
学位年度：	2021
校区：	北京校区培养
学院：	人工智能学院
研究方向：	数据库索引
第一导师姓名：	栾华
第一导师单位：	北京师范大学人工智能学院
提交日期：	2021-06-28
答辩日期：	2021-06-03
外文题名：	Research of Database Indexes on the Coupled CPU-GPU Architecture
中文关键词：	异构CPU-GPU耦合处理器 ; 集成GPU ; 并行计算 ; B+树索引
外文关键词：	Coupled CPU-GPU architectures ; Integrated GPU ; Co-processing ; B+-trees
中文摘要：	︿ B+树是一种重要的数据库索引，被广泛应用于各种领域，例如：数据库、数据仓库、文件系统等。随着硬件技术的发展，出现了很多高性能的硬件结构。在新硬件环境下，应该重新考虑如何有效利用硬件资源提升B+树的操作性能。在本文中，我们主要关注异构CPU-GPU耦合处理器下B+树的检索和插入操作，我们的主要贡献有：（1）优化异构CPU-GPU耦合处理器下B+树的检索速度。首先，我们提出了一种集成GPU下B+树分层检索方法。将B+树的检索过程分为检索中间节点和检索叶节点两部分。工作组在中间节点检索过程中检索较多的关键字，在叶节点检索过程中检索较少的关键字，在利用GPU并行性能的同时减少不规则内存访问和工作项分歧。通过我们的实验分析，在单独的集成GPU中，B+树分层检索方法比粗粒度的B+树检索方法性能提升了36%。其次，我们同时利用异构CPU-GPU处理器中CPU与集成GPU的计算资源，提出了一种基于流水线的CPU-GPU并行B+树检索方法。在B+树检索中隐藏了排序关键字时间和部分B+树检索时间，达到提升整体检索吞吐量的目的。同时，采用了基于实时性能的CPU-GPU工作负载分配模型，合理分配CPU与集成GPU的工作负载。最终，基于流水线的CPU-GPU并行B+树检索方法与粗粒度的B+树并行检索方法相比，检索吞吐量提升了1.8倍。（2）在异构的CPU-GPU耦合处理器下优化B+树的批量插入。首先，我们提出了一种分阶段B+树并行插入方法，将B+树插入过程分成排序插入关键字、检索插入关键字、计算工作组起始位置、插入叶节点、更新中间节点和更新根节点等6步。插入节点阶段，通过一个工作组插入一个节点，减少工作组中内存不规则访问。通过我们的实验分析，在单CPU中，分阶段B+树批量插入方法与PALM相比性能最高提升了2.55倍，在GPU中，分阶段B+树插入方法与PALM相比性能最高提升了2.27倍。进一步，我们在B+树分阶段插入方法的基础上，提出了一种CPU-GPU协同并行的B+树批量插入方法，在检索、插入叶节点、更新中间节点、排序下一批检索数据阶段通过CPU-GPU并行，隐藏了部分时间。并且，CPU与GPU工作负载采用了动态分配模型，充分利用CPU与GPU的计算资源。通过实验结果分析，基于动态分配模型的CPU-GPU协同并行B+树分阶段插入方法与单CPU下的B+树分阶段插入方法相比性能提升了33%，与单集成GPU下的B+树分阶段插入方法相比性能提升了63%。综上所述，本论文提出CPU-GPU耦合处理器下的B+树检索和插入方法，有效利用现代处理器资源，实验结果表明提出的方法能够提高检索和插入的性能。﹀
外文摘要：	︿ B+-Tree is an important database index structure and is widely used in various areas, such as database systems, data warehouses, and file systems. With the development of hardware technology, there are breaking out many high-performance hardware architectures. In the new hardware era, it is important to reconsider how to use hardware resources efficiently to improve the performance of B+-Tree operations. In this dissertation, we focus on optimizing B+-Tree search and insertion on the couple CPU-GPU architecture. Our contributions are as follows. (1) Optimizing B+-Tree searches on the couple CPU-GPU architecture. First, we proposed a B+-Tree hierarchical search approach on the single coupled GPU. We divided the B+-Tree search into two parts: searching inner nodes and searching leaf nodes. Workgroups search more keys in the inner node, while fewer keys in the leaf node, which takes advantage of GPU parallel abilities and reduces irregular memory accesses and work item divergence branches in the workgroups. Our performance study shows that the hierarchical searching scheme provides an improvement of up to 36% on the GPU compared with the coarse-grained B+-Tree search method. Next, we presented a co-processing pipeline B+-Tree search method on the coupled architecture. With the co-processing of the CPU and the GPU, the sorting time and partial searching time could be hidden in B+-Tree search to improve the search throughput. In addition, a distribution model based on real-time performance is designed to support appropriate workload distribution between the CPU and the GPU. As a result, the co-processing pipeline method further increases the throughput by a factor of 1.8 compared with the baseline method, the coarse-grained B+ Tree search method. (2) Improving B+-Tree batch insertion on the coupled architecture. Firstly, we proposed a phase-based B+-Tree batch insertion approach, in which the insertion process is divided into 6 phases, including sorting keys, searching the keys, calculating the start indexes for the workgroups, inserting into leaf nodes, updating inner nodes, and updating the root. In the inserting phase, processing a node through a workgroup reduces irregular memory accesses among different work items. Our performance study shows that, compared with PALM, the phase-based B+-Tree insertion approach increases the insertion performance by a factor of 2.28 on the CPU in the best case, and 2.55 on the coupled GPU in the best case. Furthermore, based on the basic phase-based B+-Tree batch insertion method, we presented a co-processing B+-Tree batch insertion approach on the coupled architecture. Part of the time in searching, inserting leaf nodes, updating inner nodes, and sorting the next batch of keys could be hidden. A dynamic distribution model is adopted to make use of the computing resources of both the CPU and the coupled GPU. Our performance results show that the pipeline B+-Tree insertion approach with the dynamic distribution model could further decrease the inserting time by 33% compared with the B+-Tree phased insertion method on the single CPU, and 63% with the B+-Tree phased insertion method on the single coupled GPU. To sum up, in this dissertation we proposed new methods for B+-Tree searching and insertion on CPU-GPU coupled architectures, which effectively utilize modern processor resources. The experimental results show that our methods could improve the performance of searching and insertion. ﹀
参考文献总数：	32
馆藏号：	硕081202/21004
开放日期：	2022-06-28

附件下载