中文题名: | 基于深度学习的票据识别方法研究 |
姓名: | |
保密级别: | 公开 |
论文语种: | 中文 |
学科代码: | 025200 |
学科专业: | |
学生类型: | 硕士 |
学位: | 应用统计硕士 |
学位类型: | |
学位年度: | 2020 |
校区: | |
学院: | |
研究方向: | 应用统计 |
第一导师姓名: | |
第一导师单位: | |
第二导师姓名: | |
提交日期: | 2020-06-24 |
答辩日期: | 2020-06-24 |
外文题名: | Research on bill recognition method based on deep learning |
中文关键词: | |
外文关键词: | Attention mechanisms ; Convolutional neural networks ; Recurrent Neural Network ; Text detection ; Text recognition |
中文摘要: |
票据广泛存在于我们的工作与生活中,并且具有一些特殊的功能,一般可作为支付手段,汇兑工具以及消费转账凭证。这也决定着在各行各业中都需要对票据进行审核归档等工作,通过手工处理海量的纸质票据对于相关的工作人员来说是一项繁琐且重复的工作。随着人工智能技术渗入各行各业,实现对票据进行智能识别与审核将极大程度的节约人力成本提升工作效率。 票据的智能识别可以被视为一个图像文字识别任务,传统的图像文字识别算法一般基于手工设计特征,步骤较为繁琐且识别效果有限。近年来基于深度学习的图像文字识别研究不断进步,相比较于传统的图像文字识别方法获得了较高的准确性与鲁棒性。图像文字识别任务一般分为两个部分,一是文本检测,即对图像中的文本位置进行检测,二是文本识别,即对检测的文本区域进行识别。在对票据的文本检测任务中,票据文本一般呈现出长文本以及密集文本的特点,并且部分票据包含印章等复杂背景,对于票据中文字识别有较大挑战。在对票据的文本识别任务中,票据文本一般包含多个单词,多种字符以及文字不清晰等问题。针对以上问题与难点本文进行了以下工作: (1)本文在第三章中采用多尺度训练策略的CTPN票据文本检测算法。根据训练样本中,存在尺度差异较大样本的特点,在训练过程中将图片以不同的尺寸送入网络进行训练,增强了模型对于小尺度文本的检测能力。同时我们对基于语义分割的EAST算法进行对比研究,并采用特征提取能力更强的ResNet代替EAST算法原来的PVANet。我们在ICDAR2019的票据识别任务的SROIE数据集下进行试验,采用DetEva法对模型进行评价,试验结果表明经过多尺度训练的CTPN获得较好的识别效果,F1-score为91.20%,优于EAST的80.12%,以及不经多尺度训练CTPN的84.80%。 (2)本文在第四章,根据CRNN模型架构,构建了一个基于Densenet的文本识别模型,并与AED模型(Attention encoder-decoder,AED)进行对比研究,从字符准确率、单词准确率以及平均测试时间来对模型进行评测。试验结果表明,基于Densenet的文本识别模型在票据文本识别上有着显著优势,字符准确率达96.38%,单词识别准确率达 81.36%。在相同环境下,AED模型与基于Densenet的文本识别模型测试平均时间接近,每张票据识别耗时都在0.5s左右,每个文本行识别耗时在0.01s左右。综合来看,基于Densenet的文本识别模型在票据识别场景下有着明显优势。 |
外文摘要: |
Bills widely exist in our work and life, with some special functions, and can be generally used as payment methods, exchange tools and consumption transfer vouchers. This also determines the need to review and file bills in all walks of life. Handling massive paper bills by hand is a tedious and repetitive task for the relevant staff. As artificial intelligence technology permeates all walks of life, achieving intelligent identification and review of bills will greatly save labor costs and improve work efficiency. The intelligent recognition of bills can be regarded as an image text recognition task. Traditional image text recognition algorithms are generally based on manual design features, the steps are cumbersome and the recognition effect is limited. In recent years, the research on text recognition based on deep learning has been continuously improved. Compared with traditional text recognition methods, it has obtained higher accuracy and robustness. Text recognition tasks are generally divided into two parts, one is text detection, that is, the detection of the text position in the image, and the second is text recognition, that is, the detection of the detected text area. In the text detection task of the bill, the bill text generally presents the characteristics of long text and dense text, and some of the bills contain complex backgrounds such as seals, which poses a great challenge to the text recognition in the bills. In the text recognition task of the bill, the bill text generally contains multiple words, multiple characters, and unclear text. In view of the above problems and difficulties, this article has done the following work: (1) In Chapter 3, this paper proposes a bill text detection algorithm based on CTPN with a multi-scale training strategy. According to the training samples, there are characteristics of samples with large scale differences. During the training process, the pictures are scaled to different scales for training, which enhances the model's ability to detect small-scale text. At the same time, we conduct a comparative study on the EAST algorithm, a Text detection algorithm based on semantic segmentation, and replace the original PVANet of the EAST algorithm with ResNet, which is a stronger feature extraction capability. We conducted an experiment under the SROIE dataset of ICDAR2019's note recognition task, and evaluated the model using the DetEva method. The test results show that the multi-scale training CTPN obtains a better recognition effect withF1-score=91.20%,and 80.12% of EAST and 84.80% of CTPN without multi-scale training. (2) In the fourth chapter, this paper builds a text recognition model based on Densenet, which is improvements based on CRNN architecture, and constructing and comparing with the AED model (Attention encoder-decoder, AED), and evaluate the model with character accuracy, word accuracy and average testing time. The test results show that the Densenet-based text recognition model has a significant advantage in bill text recognition, with a character accuracy rate of 96.38% and a word recognition accuracy rate of 81.36%. In the same environment, the average testing time of the AED model and the Densenet-based text recognition model is close to each other. The time for each bill recognition is about 0.5s, and the time for each text line recognition is about 0.01s. Taken together, the text recognition model based on Densenet has obvious advantages in the bill recognition. |
参考文献总数: | 51 |
作者简介: | 万嘉伟 |
馆藏号: | 硕025200/20040 |
开放日期: | 2021-06-24 |