一种用于在架图书书脊语义分割的山字形网络
A Mountain-Shaped Network for Semantic Segmentation of Book Spines on-Shelves
DOI: 10.12677/JISP.2020.94026, PDF,  被引量   
作者: 曾文雯:深圳大学图书馆,广东 深圳;杨 阳, 钟小品*:深圳大学机电与控制工程学院,广东 深圳
关键词: 智慧图书馆图书书脊语义分割深度神经网络Smart Library Book Spine Semantic Segmentation Deep Neural Network
摘要: 在图像中识别在架书脊信息有助于实现更便捷的图书盘点,也可能实现即拿即走等更流畅的读者借阅体验,而书脊区域精确分割是重要前提。区别于普通目标分割,该分割问题的难点在于书脊的密集性及重复性。本文提出一种山字形深层神经网络结构,包含一个编码器及两个解码器。其中一个解码器为书脊分割主通道,另一个则结合书脊边界信息以融入更多的书脊边缘细节。另外,本文建立了一个书脊图像样本集,包含661张图像及15,454个手工标注的书脊实例。实验结果表明,提出的网络模型对书籍一类密集目标图像语义分割具有较高精度,在建立的样本集中具有约90%的均值交并比以及约95%的平均像素精度,性能优于经典的分割模型,验证了提出模型的有效性。
Abstract: Identifying book spine on-shelves in the image can achieve a more convenient book inventory and is possible to realize a better reader experience, such as take-and-go. Segmentation of the spine region is their important prerequisite. Different from ordinary target segmentation, the difficulty of this segmentation problem lies in that the spines are densely-packed and repeating. In this paper, a mountain-shaped deep neural network structure is proposed, which consists of one encoder and two decoders. One of the decoders is the main segmenting channel for the spine, and the other combines the spine interval information to incorporate more spine edge details. In addition, this research establishes a spine image sample dataset, including 661 images with 15,454 manually labeled polygons. The experimental results show that the proposed network model has high accuracy for semantic segmentation of dense target like book spine images, and has an average intersection ratio of 90% and an average pixel accuracy of 95% in the established dataset. The performance is better than the classical segmentation models, which verifies the effectiveness of the proposed model.
文章引用:曾文雯, 杨阳, 钟小品. 一种用于在架图书书脊语义分割的山字形网络[J]. 图像与信号处理, 2020, 9(4): 218-225. https://doi.org/10.12677/JISP.2020.94026

参考文献

[1] 田萱, 王亮, 丁琪. 基于深度学习的图像语义分割方法综述[J]. 软件学报, 2019, 30(2): 440-468.
[2] 张顺, 龚怡宏, 王进军. 深度卷积神经网络的发展及其在计算机视觉领域的应用[J]. 计算机学报, 2019, 42(3): 453-482.
[3] Long, J., Shelhamer, E. and Darrell, T. (2015) Fully Convolutional Networks for Semantic Segmentation. Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, Boston, MA, 7-12 June 2015, 3431-3440.
[Google Scholar] [CrossRef
[4] Badrinarayanan, V., Kendall, A. and Cipolla, R. (2017) Segnet: A Deep Convolutional Encoder-Decoder Architecture for Image Segmentation. IEEE Transactions on Pattern Analysis Machine Intelligence, 39, 2481-2495.
[Google Scholar] [CrossRef
[5] Ronneberger, O., Fischer, P. and Brox, T. (2015) U-net: Convolutional Networks for Biomedical Image Segmentation. International Conference on Medical Image Computing and Computer-Assisted Intervention, Munich, 5-9 October 2015, 234-241.
[Google Scholar] [CrossRef
[6] Chen, L.-C., Papandreou, G., Kokkinos, I., Murphy, K. and Yuille, A.L. (2017) Deeplab: Semantic Image Segmentation with Deep Convolutional Nets, Atrous Convolution, and Fully Connected CRFS. IEEE Transactions on Pattern Analysis and Machine Intelligence, 40, 834-848.
[Google Scholar] [CrossRef
[7] Ruder, S. (2017) An Overview of Multi-Task Learning in Deep Neural Networks. arXiv preprint, arXiv(1706): 05098.
[8] Zhou, X.Y., Zhuo, J.C. and Krahenbuhl, P. (2019) Bottom-Up Object Detection by Grouping Extreme and Center Points. Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, Long Beach, CA, 15-20 June 2019, 850-859.
[Google Scholar] [CrossRef
[9] Lin, T.-Y., Goyal, P., Girshick, R., He, K.M. and Dollár, P. (2017) Focal Loss for Dense Object Detection. Proceedings of the IEEE International Conference on Computer Vision, Venice, 22-29 October 2017, 2980-2988.