# 用于无人机巡线的图像分类模型选择算法研究Research on Algorithm of Image Classification Model Selection for UAV Patrol

DOI: 10.12677/CSA.2020.109162

Abstract: As a classic classification algorithm, random forest algorithm is widely used and has high classification accuracy. However, in the process of classification, the classification performance of each decision tree and the difference between two decision trees are two important factors that affect the final classification effect. When some decision trees have similar misclassifications, and they are used in the final voting on the results of the decision tree, the final classification effect of the model will be reduced. Aiming at this problem, this paper proposes a method for measuring the similarity of decision trees based on confusion Matrix. This method takes into account the number of different categories of trees and the correct and incorrect classification, in order to select decision trees with weak similarity, and then remove the decision trees with poor classification results, and finally complete the model selection of random forest. Experimental results show that the method pro-posed in this paper has a higher average classification accuracy rate and higher stability in the three types of datasets.

1. 引言

2. 集成学习

${E}_{total}={E}_{bay}+\frac{1+\rho \left(N-1\right)}{N}{E}_{each}$ (1)

3. 随机森林算法

1) 从训练集中采用bootstrap法，即自助抽样法，有放回地抽取若干样本，作为一个训练子集。

2) 对于训练子集，从特征集中无放回地随机抽取若干特征，作为决策树的每个节点的分裂的依据。

3) 重复步骤1)和步骤2)，得到若干训练子集，并生产若干决策树，将决策树组合起来，形成随机森林。

4) 将测试集的样本输入随机森林中，让每个决策树对样本进行决策，得到结果后，采用投票方法对结果投票，得到样本的分类结果。

5) 重复步骤4)，直到测试集分类完成。

4. 随机森林模型选择

4.1. 本文提出的模型选择方法

4.1.1. 基于误差矩阵判断分类树相似性

$CN=\left[\begin{array}{ccc}c{n}_{11}& \cdots & c{n}_{1m}\\ ⋮& c{n}_{ii}& ⋮\\ c{n}_{m1}& ...& c{n}_{mm}\end{array}\right]$ (2)

$DC{N}^{\left(i,j\right)}=C{N}^{\left(i\right)}-C{N}^{\left(j\right)}=\left[\begin{array}{cccc}c{n}_{11}^{\left(i\right)}-c{n}_{11}^{\left(j\right)}& c{n}_{12}^{\left(i\right)}-c{n}_{12}^{\left(j\right)}& \cdots & c{n}_{1N}^{\left(i\right)}-c{n}_{1M}^{\left(j\right)}\\ c{n}_{21}^{\left(i\right)}-c{n}_{21}^{\left(j\right)}& c{n}_{22}^{\left(i\right)}-c{n}_{22}^{\left(j\right)}& \cdots & c{n}_{2N}^{\left(i\right)}-c{n}_{2M}^{\left(j\right)}\\ ⋮& ⋮& \ddots & ⋮\\ c{n}_{M1}^{\left(i\right)}-c{n}_{M1}^{\left(j\right)}& c{n}_{M2}^{\left(i\right)}-c{n}_{M2}^{\left(j\right)}& \cdots & c{n}_{MM}^{\left(i\right)}-c{n}_{MM}^{\left(j\right)}\end{array}\right]$ (3)

$dc{{n}^{\prime }}_{mn}=\frac{dc{n}_{mn}}{ma{x}_{m}}$ (4)

$ma{x}_{m}=ma{x}_{n}\left(dc{n}_{mn}\right)$ (5)

$r{f}_{ij}=\left\{\begin{array}{l}0,\text{\hspace{0.17em}}\text{\hspace{0.17em}}\left(i\ge j\right)\\ {‖DC{{N}^{\prime }}_{i,j}‖}_{F}=\sqrt{{\sum }_{m}^{M}{\sum }_{n}^{M}dc{{n}^{\prime }}_{mn}^{2}},\text{\hspace{0.17em}}\text{\hspace{0.17em}}\left(i (6)

$r{f}_{ij}$ 越小，则树i与树j的相似度越高，两个分类器对样本的分类结果越接近。

Figure 1. Random forest selection model

4.1.2. 基于“删劣”策略选择模型

4.1.3. 模型选择算法描述

1：通过决策树对测试样本进行分类预测；

2：根据分类结果，为决策树创建误差矩阵 $CN$

3：创建相似度度量矩阵 ${R}_{F}$

4：for (( $i,j=1$ to l) & ( $i ))

5：令 ${m}_{ij}$${R}_{F}$ 中最小的非零元素

6：for ( ${m}_{ij} )

if (决策树i分类效果 $<\beta$ )

${R}_{F}$ 中的树i清除

${m}_{ij}=$ ${R}_{F}$ 中下一个最小非零元素

7：否则结束，未删除的决策树组成随机森林RF。

5. 实验与分析

5.1. 实验数据说明

Table 1. Experimental data set taken from UCI

5.2. 实验结果与分析

Figure 2. Accuracy comparison results on the Iris dataset

Figure 3. Accuracy comparison results on the Breast-cancer dataset

Figure 4. Accuracy comparison results on the Aneal dataset

6. 结束语

