双向切割单/双面英文碎纸片拼接复原算法设计
Algorithm Design of Restoring Two-Way Single/Double-Sized Shredded Documents
DOI: 10.12677/AAM.2016.52021, PDF, HTML, XML, 下载: 2,310  浏览: 5,948 
作者: 张晨, 王诗云:沈阳航空航天大学理学院,辽宁 沈阳
关键词: 峰值权数行间距权数聚类可信度Jffreys & Matusita距离l1范数Peak Weight Row Spacing Weight Clustering Credibility Jffreys & Matusita Distance l1 Norm
摘要: 针对单/双面英文文本文件,经过双向(横向 + 纵向)切割后形成的碎纸片,本文通过设计拼接算法将其还原。首先,利用“英文字母的结构特征”和“空白行间距”这两个几何特征将原图中同行的碎纸片按行聚类。在此基础上,我们利用向量的l1范数差异度模型对每类碎片进行列拼接,以形成一个横切碎片,最后再对所有的横切碎片进行行拼接即可。在算法的数值检验部分,我们以2013年全国大学生数学建模赛题为例,对横纵切后形成的209块单/双面英文碎纸片进行拼接复原。数值复原结果证实了该算法实现简单,且聚类成功率高,其中聚类部分的正确率可以达到93%以上。
Abstract: This paper designs an algorithm to restore English shredded documents no matter they are single- sized or double-sized text files which are cut both vertically and horizontally. Firstly, we cluster the fragments which were located in the same line in original text files according to the structural features of English letters and the row spacing. Then, using l1 norm difference model, we attach the fragments in the same class. By this way, the scraps of paper in the same line can be restored as a whole crosscutting shredded document. Finally, we should splice the crosscutting shredded doc-uments into a complete image. In the numerical test part, taking the 2013 national mathematics model contest problem as examples, our algorithm restores 209 pieces of English shredded doc-uments. Numerical results show that the correct rate of clustering is over 93% which demonstrates the efficiency of the algorithm.
文章引用:张晨, 王诗云. 双向切割单/双面英文碎纸片拼接复原算法设计[J]. 应用数学进展, 2016, 5(2): 159-165. http://dx.doi.org/10.12677/AAM.2016.52021

参考文献

[1] Prandtstetter, M. and Raidl, G.R. (2009) Meta-Heuristics for Reconstructing cross Cut Shredded Text Documents. In-stitute of Computer Graphics and Algorithms Vienna University of Technology, GECCO’09, 349-356.
http://dx.doi.org/10.1145/1569901.1569950
[2] Butler, P., Chakraborty, P. and Ramakrishan, N. (2012) The De-shredder: A Visual Analytic Approach to Reconstructing Shredded Documents. IEEE Symposium on Visual Analytics Science and Technology, Seattle, 14-19 October 2012, 14-19.
http://dx.doi.org/10.1109/vast.2012.6400560
[3] 鲁嘉琪. 基于文字信息的碎纸片拼接复原算法[J]. 现代电子技术, 2014, 37(4): 28-31.
[4] 尹玉萍, 刘万军, 张冲, 刘永超. 基于动态聚类的文档碎纸片自动拼接算法[J]. 计算机工程与应用, 2014, 50(18): 162-170.
[5] Sleit, A., Massad, Y. and Musaddaq, M. (2013) An Alternative Clustering Approach for Reconstructing cross Cut Shredded Text Documents. Telecommunication Systems, 52, 1491-1501.
http://dx.doi.org/10.1007/s11235-011-9626-x
[6] 张宇, 刘雨东, 计钊. 向量相似度测量方法[J]. 声学技术, 2008, 28(4): 532-535.