深度学习与分形相结合的图像识别
Image Recognition Combining Deep Learning with Fractal Theory
摘要: 随着AI产业的发展,深度学习模型在各领域的应用越发广泛,然而,当前的神经网络模型大多基于多层感知器(MLP)架构,这导致高能力模型的模型参数量较大,消耗的计算资源较多。此外,随着模型参数量的增加,其能力提升却并不显著。虽然基于MLP的深度学习模型存在涌现性,但在一些并不需要高精度的应用场景,比如一些日常的简单的手势判别,车牌、路标等识别,提升参数量至其能够展现强大能力的水平,会导致成本增加,从而降低其应用价值;本课题通过使用Paddle和Pytorch框架,基于ResNet模型,结合计盒维数的重分形频谱计算的跨层自相似性统计模型,使用数字手势识别场景进行试验,旨在通过向神经网络模型中加入分形维数的计算这一方案来降低模型计算资源的消耗,为相关领域的研究提供新的思路,并实现了一例基于该方案的手势识别模型,验证了该方案的可行性,说明了其在便携式设备中的应用具有可行性。
Abstract: With the development of the AI industry, deep learning models have become increasingly prevalent across various fields. Nevertheless, most current neural network models are based on the Multi-Layer Perceptron (MLP) architecture, resulting in high-capacity models with a large number of parameters and a significant consumption of computational resources. Additionally, as the number of model parameters increases, the corresponding performance improvement is often not pronounced. While MLP-based deep learning models exhibit emergent properties, enhancing their parameters to levels capable of showcasing their full potential can lead to increased costs, thereby reducing their application value in scenarios that do not require high precision, such as simple daily gesture recognition, license plate identification, road sign recognition, and so on. This research project aims to reduce the consumption of computational resources in neural network models by incorporating fractal dimension calculations into the models. Using the Paddle and Pytorch frameworks, and based on the ResNet model, we have developed a cross-layer self-simila- rity statistical model combined with the re-fractal spectrum calculation of box-counting dimension. Experiments were conducted in the context of digital gesture recognition. This project seeks to provide new insights for research in related fields and has successfully implemented a gesture recognition model based on this approach, verifying its feasibility and demonstrating its potential for application in portable devices.
参考文献
|
[1]
|
陈鸣鸠. 基于移动设备的机器学习, 本地与云端孰优孰劣? [EB/OL]. https://www.leiphone.com/category/ai/kXMhFVvF143mxunm.html, 2017-02-28.
|
|
[2]
|
Kapoulkine, A. (2024) LLM Inference Speed of Light. https://zeux.io/2024/03/15/llm-inference-sol/
|
|
[3]
|
黎枫. 基于深度学习与分形分析的纹理分类研究[D]: [硕士学位论文]. 广州: 华南理工大学, 2021.
|
|
[4]
|
He, K.M., Zhang, X.Y., Ren, S.Q. and Sun, J. (2015) Deep Residual Learning for Image Recognition. 2016 IEEE Conference on Computer Vision and Pattern Recognition (CVPR), Las Vegas, 27-30 June 2016, 770-778. [Google Scholar] [CrossRef]
|
|
[5]
|
魏晋启. 基于分形维数和深度学习的水稻生长阶段识别[D]: [硕士学位论文]. 武汉: 华中农业大学,2023.
|
|
[6]
|
胡鹏博, 刘晓利, 白宏阳. 基于改进梯度向量流动态轮廓模型的图像检测[J]. 探测与控制学报, 2013, 35(5): 26-30+36.
|
|
[7]
|
田金文, 杨磊, 柳健, 等. 基于局部分形特征的快速图像匹配方法[J]. 华中理工大学学报, 1996(2): 12-14.
|
|
[8]
|
李晟. 基于深度学习的粗糙几何形貌表面超分辨率建模研究[D]: [硕士学位论文]. 福州: 福州大学, 2020.
|
|
[9]
|
范靓. 基于遗传算法和深度学习的分形图像压缩算法的研究[D]: [硕士学位论文]. 呼和浩特: 内蒙古农业大学, 2016.
|
|
[10]
|
Good, I.J. (1984) The Fractal Geometry of Nature (Benoit B. Mandelbrot). Siam Review, 26, 131-132.
|
|
[11]
|
百度百科. 计盒维数[DB/OL]. https://baike.baidu.com/item/%E8%AE%A1%E7%9B%92%E7%BB%B4%E6%95%B0/22781986, 2022-09-16.
|
|
[12]
|
集智百科. 多重分形[EB/OL]. https://wiki.swarma.org/index.php/%E5%A4%9A%E9%87%8D%E5%88%86%E5%BD%A2, 2022-12-11.
|
|
[13]
|
Wu, Z.H., Pan, S.R., Chen, F.W., et al. (2019) A Comprehensive Survey on Graph Neural Networks.
|