基于TLD的物体自动识别系统
The Automatic Object Detection System Based on TLD Framework
DOI: 10.12677/CSA.2016.64031, PDF, HTML, XML, 下载: 2,293  浏览: 5,254 
作者: 李学彦, 王春南, 王昌栋:中山大学数据科学与计算机学院,广东 广州
关键词: 物体识别语音处理TLDDTWObject Detection Voice Processing Tracking-Learning-Detection Dynamic Programming Algorithm
摘要: 大数据时代下随着计算机数据处理能力的提高,传感技术、音频技术、自动化控制技术得到不断地发展,视频帧和图像信息作为人类通过客观世界获得信息的主要来源之一更是得到了诸多的重视。如今计算机视觉作为当下研究的热潮之一,拥有诸如识别、运动、场景重建、图像恢复等众多技术挑战。其中又以物体识别最为重要。与此同时,众多的物体识别系统却仅仅侧重于物体识别的精度而缺乏其他辅助功能的实现,如何拥有更好的人机交互以及更广阔的市场前景的物体自动识别系统是当下众多开发者所探讨的。在本文中我们将物体识别与语音处理相结合,首先在物体识别算法Tracking-Learning-Detection (TLD)的基础上进行改进,以给定的一类物体的图片数据集为基础,训练出适合于识别该类物体的分类器,从而判断新的物体是否为目标物体,实现对指定一类物体的识别;同时该系统将以语音识别作为人机交互的基础,使用户可以利用语音将图片数据添加到训练集中并更新分类器,同时采用动态规划的方式(DTW)对语音特征进行匹配从而保证了语音识别的准确度。
Abstract: With the enhancement of the data processing ability of computer, the technology on sensor, audio and automation control has been developed continuously, and the information in video frames and image has got a lot of attention, which is one of the main sources that human obtain information from the world. Computer vision, as one of the present research upsurges, has many technical challenges such as detection, motion, scene reconstruction and image restoration. Object detection is one of the most important challenges. Although there are plenty of object detection systems with high accuracy rate of detection in the market, they lack realization on auxiliary functions so that they provide poor experience on man-machine interaction. Therefore, many developers focus on the topic that how to design a better man-machine interaction of detection system for human so that the detection system can be accepted widely. In this paper, we propose a system framework which contains the technology on object detection and voice processing. Firstly, we make improvement on the algorithm of Tracking-Learning-Detection (TLD). We use the image sets of the object which we want to detect to get a suitable classifier by training algorithm. Then, we can use the classifier to determine whether the new object is the target object and get the aim of detecting the specified object. Then, the system contains the module of speech recognition for a better man- machine interaction so that the user can add the image data to the data set and update the classifier by voice. In order to guarantee the accuracy of speech recognition, we use the Dynamic Time Warping (DTW) to match the phonetic characteristics.
文章引用:李学彦, 王春南, 谢敏, 王昌栋. 基于TLD的物体自动识别系统[J]. 计算机科学与应用, 2016, 6(4): 248-264. http://dx.doi.org/10.12677/CSA.2016.64031

参考文献

[1] Koller, D., Weber, J. and Malik, J. (1994) Robust Multiple Car Tracking with Occlusion Reasoning. Proceedings of 3rd European Conference on Computer Vision (ECCV’94), 800, 189-196.
http://dx.doi.org/10.1007/3-540-57956-7_22
[2] Gori, F., Santarsiero, M., Piquero, G., Mondello, A. and Simon, R. (2001) Partially Polarized Gaussian Schell-Model Beams. Journal of Optics: A Pure and Applied Optics, 3, 1-9.
http://dx.doi.org/10.1088/1464-4258/3/1/301
[3] Comaniciu, D., Ramesh, V. and Meer, P. (2003) Kernel-Based Object Tracking. IEEE Transactions on Pattern Analysis and Machine Intelligence, 25, 564-577.
[4] Barron, J., et al. (1992) Performance of Optical Flow Techniques. Proceedings of the International Conference on Computer Vision & Pattern Recognition, Champaign, 15-18 June 1992, 236-242.
http://dx.doi.org/10.1109/cvpr.1992.223269
[5] VTB (2013) Visual Tracker Benchmark. http://www.visual-tracking.net
[6] VOT (2013) Visual Object Tracking. http://www.votchallenge.net
[7] Wu, Yi, et al. (2013) Online Object Tracking: A Benchmark. Proceedings/CVPR, IEEE Computer Society Conference on Computer Vision and Pattern Recognition. IEEE Computer Society Conference on Computer Vision and Pattern Recognition, 9, 2411-2418.
http://dx.doi.org/10.1109/cvpr.2013.312
[8] Posdamer, J.L., et al. (1981) Computer Geometric Modeling for Machine Perception of Three-Dimensional Solids. Technical Symposium East. International Society for Optics and Photonics, 29 October 1981.
[9] Engel, F.L. (1977) Visual Conspicuity, Visual Search and Fixation Tendencies of the Eye. Vision Research, 17, 95-108.
http://dx.doi.org/10.1016/0042-6989(77)90207-3
[10] Collins, R., Lipton, A., Fujiyoshi, H. and Kanade, T. (2001) Algorithms for Cooperative Multisensor Surveillance. Proceedings of the IEEE, 89, 1456-1477.
http://dx.doi.org/10.1109/5.959341
[11] Kalal, Z., Mikolajczyk, K. and Matas, J. (2012) Track-ing-Learning-Detection. IEEE Transactions on Pattern Analysis and Machine Intelligence, 34, 1409-1422.
http://dx.doi.org/10.1109/TPAMI.2011.239
[12] Bradski, G.R., et al. (2014) Learning Open CV. Oreilly Me-dia.
[13] Luo, J.W. (2009) Program Design and Implementation of Voice Based on Microsoft Speech SDK. Bulletin of Advanced Technology Research, 3, 22-25.
[14] Su, S.Z., Li, S.Z., Chen, S.Y., Cai, G.R. and Wu, Y.D. (2012) Pede-strian Detection Technology Reviewed. Acta Electronica Sinica, 40, 814-820.