# 最小二乘法及其相关方法的数学原理与类比分析Mathematical Theory and Analogy Analysis of Least Square Method and Its Related Methods

DOI: 10.12677/PM.2018.83029, PDF, HTML, XML, 下载: 1,012  浏览: 2,877

Abstract: In the case of statistical analysis and modeling of various problems, the error analysis is often used by least square method or its related methods. In this paper, these mathematical theories are described in detail on least squares, partial least squares and principal component analysis. We sketch the applications of these methods. In the meantime, the case for which they are not appli-cable is explained. We outline the correlation among them and their different phases. At last, the parameter test in the regression equations is simply explained.

1. 引言

2. 最小二乘法、主成分分析法与偏最小二乘方法的数学原理

2.1. 最小二乘方法

$y={\stackrel{¯}{\beta }}_{0}+{\stackrel{¯}{\beta }}_{1}{r}_{1}\left(x\right)+{\stackrel{¯}{\beta }}_{2}{r}_{2}\left(x\right)+\cdots +{\stackrel{¯}{\beta }}_{n}{r}_{m}\left(x\right)$ ，s.t.

$L\left({\stackrel{¯}{\beta }}_{0},{\stackrel{¯}{\beta }}_{1},\cdots ,{\stackrel{¯}{\beta }}_{m}\right)\triangleq \underset{{\beta }_{i},0\le i\le m}{\mathrm{min}}{\sum }_{j}{\left(y\left(j\right)-\left({\beta }_{0}+{\beta }_{1}{r}_{1}\left(x\left(j\right)\right)+{\beta }_{2}{r}_{2}\left(x\left(j\right)\right)+\cdots +{\beta }_{m}{r}_{m}\left(x\left(j\right)\right)\right)\right)}^{2}$ .

$Y={\left[{y}_{1},{y}_{2},\cdots ,{y}_{n}\right]}^{\text{T}}$$L\left({\beta }_{0},\cdots ,{\beta }_{m}\right)={\sum }_{j}{\left({y}_{j}-\left({\beta }_{0}+{\beta }_{1}{r}_{1}\left({x}_{j}\right)+{\beta }_{2}{r}_{2}\left({x}_{j}\right)+\cdots +{\beta }_{m}{r}_{m}\left({x}_{j}\right)\right)\right)}^{2}$ ，分别求L关于 ${\beta }_{i}\left(i=0,\cdots ,m\right)$ 的偏导数，并令其为0，即

$\begin{array}{l}\frac{\partial L}{\partial {\beta }_{0}}=-2{\sum }_{j}\left({y}_{j}-\left({\beta }_{0}+{\beta }_{1}{r}_{1}\left(x\left(j\right)\right)+\cdots +{\beta }_{i}{r}_{i}\left(x\left(j\right)\right)+\cdots +{\beta }_{m}{r}_{m}\left(x\left(j\right)\right)\right)\right)=0\\ \frac{\partial L}{\partial {\beta }_{i}}=-2{\sum }_{j}{r}_{i}\left(x\left(j\right)\right)\left({y}_{j}-\left({\beta }_{0}+{\beta }_{1}{r}_{1}\left(x\left(j\right)\right)+\cdots +{\beta }_{i}{r}_{i}\left(x\left(j\right)\right)+\cdots +{\beta }_{m}{r}_{m}\left(x\left(j\right)\right)\right)\right)=0\text{\hspace{0.17em}}\left(i=1,2,\cdots ,m\right)\end{array}$

$\beta =\left({\stackrel{¯}{\beta }}_{0},{\stackrel{¯}{\beta }}_{1},\cdots ,{\stackrel{¯}{\beta }}_{m}\right)={\left({X}^{\text{T}}X\right)}^{-1}\cdot {X}^{\text{T}}Y$ (1)

$m=n,\text{\hspace{0.17em}}y={\beta }_{0}+{\beta }_{1}{x}_{1}+{\beta }_{2}{x}_{2}+\cdots +{\beta }_{n}{x}_{n}={\beta }^{\text{T}}\cdot \left(1,{x}^{\text{T}}\right)$ , 其中 $\beta ={\left({\beta }_{0},{\beta }_{1},\cdots ,{\beta }_{n}\right)}^{\text{T}}$ .

$\beta ={\left({X}^{\text{T}}X\right)}^{-1}\cdot {X}^{\text{T}}Y\triangleq \stackrel{¯}{\beta }$ 时， ${\sum }_{j}{\left({y}_{j}-\left({\beta }_{0}+{\beta }_{1}{x}_{j1}+{\beta }_{2}{x}_{j2}+\cdots +{\beta }_{n}{x}_{jn}\right)\right)}^{2}$ 其达到最小值，此时

$y={\stackrel{¯}{\beta }}_{0}+{\stackrel{¯}{\beta }}_{1}{x}_{1}+{\stackrel{¯}{\beta }}_{2}{x}_{2}+\cdots +{\stackrel{¯}{\beta }}_{n}{x}_{n}$ ,

2.2. 主成分分析法

${\beta }_{1}\triangleq {\alpha }_{1}^{\text{T}}x={\alpha }_{11}{x}_{1}+{\alpha }_{12}{x}_{2}+\cdots +{\alpha }_{1n}{x}_{n}$ ,

s.t. $\mathrm{var}\left({\beta }_{1}\right)$ 最大，且 ${\alpha }_{1}^{\text{T}}\cdot {\alpha }_{1}=1$

${\beta }_{2}\triangleq {\alpha }_{2}^{\text{T}}x={\alpha }_{21}{x}_{1}+{\alpha }_{22}{x}_{2}+\cdots +{\alpha }_{2n}{x}_{n}$ ,

s.t. $\mathrm{var}\left({\beta }_{2}\right)$ 最大，且 ${\alpha }_{2}^{\text{T}}\cdot {\alpha }_{2}=1$${\beta }_{1}$${\beta }_{1}$ 线性无关。

$L\left({\alpha }_{1}\right)\triangleq \mathrm{var}\left({\beta }_{1}\right)-\lambda \left({\alpha }_{1}^{\text{T}}{\alpha }_{1}-1\right)={\alpha }_{1}^{\text{T}}\cdot \sum \cdot \text{\hspace{0.17em}}{\alpha }_{1}-\lambda \left({\alpha }_{1}^{\text{T}}{\alpha }_{1}-1\right)$ ,

$\frac{1}{2}\cdot \frac{\partial L\left({\alpha }_{1}\right)}{\partial {\alpha }_{1}}=\sum \cdot \text{\hspace{0.17em}}{\alpha }_{1}-\lambda {\alpha }_{1}=\left(\sum -\text{\hspace{0.17em}}\lambda \cdot {I}_{n}\right)\cdot {\alpha }_{1}=0⇒\sum \cdot \text{\hspace{0.17em}}{\alpha }_{1}=\lambda {\alpha }_{1}$

${\alpha }_{1}^{\text{T}}\cdot \sum \cdot \text{\hspace{0.17em}}{\alpha }_{2}={\alpha }_{2}^{\text{T}}\cdot \sum \cdot \text{\hspace{0.17em}}{\alpha }_{1}=0={\alpha }_{2}^{\text{T}}\cdot {\alpha }_{1}={\alpha }_{1}^{\text{T}}\cdot {\alpha }_{2}$

$L\left({\alpha }_{2}\right)\triangleq \mathrm{var}\left({\beta }_{2}\right)-\lambda \left({\alpha }_{2}^{\text{T}}{\alpha }_{2}-1\right)-\gamma \left({\alpha }_{2}^{\text{T}}{\alpha }_{1}-0\right)$ $={\alpha }_{2}^{\text{T}}\cdot \sum \cdot \text{\hspace{0.17em}}{\alpha }_{2}-\lambda \left({\alpha }_{2}^{\text{T}}{\alpha }_{2}-1\right)-\gamma \left({\alpha }_{2}^{\text{T}}{\alpha }_{1}-0\right)$

$\frac{1}{2}\cdot \frac{\partial L\left({\alpha }_{2}\right)}{\partial {\alpha }_{2}}=\sum \cdot \text{\hspace{0.17em}}{\alpha }_{2}-\lambda {\alpha }_{2}-\gamma {\alpha }_{1}=0$ (2)

$\begin{array}{l}{\alpha }_{1}^{\text{T}}\cdot \sum \cdot \text{\hspace{0.17em}}{\alpha }_{2}-\lambda {\alpha }_{1}^{\text{T}}\cdot {\alpha }_{2}-\gamma {\alpha }_{1}^{\text{T}}\cdot {\alpha }_{1}=0\\ ⇒\gamma {\alpha }_{1}^{\text{T}}\cdot {\alpha }_{1}=0\\ ⇒\gamma =0\end{array}$

$\mathrm{var}\left({\beta }_{2}\right)={\alpha }_{2}^{\text{T}}\cdot \sum \cdot \text{\hspace{0.17em}}{\alpha }_{2}=\lambda {\alpha }_{2}^{\text{T}}{\alpha }_{2}=\lambda$ ，所以 $\mathrm{max}\mathrm{var}\left({\beta }_{2}\right)=\left(\mathrm{max}\lambda \right){\alpha }_{2}^{\text{T}}{\alpha }_{2}=\left(\mathrm{max}\lambda \right)\triangleq {\lambda }_{2}$ 。假设 $\sum$ 没有重根，则 ${\lambda }_{1}>{\lambda }_{2}$${\lambda }_{2}$$\sum$ 的第二大特征根，相应的特征向量 ${\alpha }_{2}$ 是第二主成分中出现的系数4。

2.3. 偏最小二乘法

${\beta }_{1}\triangleq {X}^{\text{T}}{\alpha }_{1}={\alpha }_{11}{x}_{1}+{\alpha }_{12}{x}_{2}+\cdots +{\alpha }_{1n}{x}_{n}$ ,

s.t. $\mathrm{max}\mathrm{var}\left({\beta }_{1}\right)$ , 且 ${\alpha }_{1}^{\text{T}}\cdot {\alpha }_{1}=1$ .

${\tau }_{1}\triangleq {Y}^{\text{T}}{\mu }_{1}={\mu }_{11}{y}_{1}+{\mu }_{12}{y}_{2}+\cdots +{\mu }_{1q}{y}_{q}$ ,

s.t. $\mathrm{max}\mathrm{var}\left({\tau }_{1}\right)$ , 且 ${\mu }_{1}^{\text{T}}\cdot {\mu }_{1}=1$ ;

$\mathrm{max}\mathrm{cov}\left({\beta }_{1},{\tau }_{1}\right)$ .

$\begin{array}{l}X={\beta }_{1}{P}^{\text{T}}+{E}_{1},\\ Y={\tau }_{1}{Q}^{\text{T}}+{F}_{1}{}^{\left(0\right)},\\ Y={\beta }_{1}{R}_{1}^{\text{T}}+{F}_{1}\end{array}$

${E}_{1},{F}_{1}^{\left(0\right)},{F}_{1}$ 均为0时，显然有 $P=\frac{{X}^{\text{T}}\cdot {\beta }_{1}}{{\beta }_{1}^{\text{T}}\cdot {\beta }_{1}}=\frac{{X}^{\text{T}}\cdot {\beta }_{1}}{{‖{\beta }_{1}‖}^{2}}$$Q=\frac{{Y}^{\text{T}}\cdot {\tau }_{1}}{{\tau }_{1}^{\text{T}}\cdot {\tau }_{1}}=\frac{{X}^{\text{T}}\cdot {\tau }_{1}}{{‖{\tau }_{1}‖}^{2}}$${R}_{1}=\frac{{Y}^{\text{T}}\cdot {\beta }_{1}}{{\beta }_{1}^{\text{T}}\cdot {\beta }_{1}}=\frac{{Y}^{\text{T}}\cdot {\beta }_{1}}{{‖{\beta }_{1}‖}^{2}}$ ；一般地， ${E}_{1},{F}_{1}^{\left(0\right)},{F}_{1}$ 分别是这些回归方程的残差矩阵。



$\frac{\partial L}{\partial {\alpha }_{1}}={X}^{\text{T}}Y{\mu }_{1}-2{\lambda }_{1}{\alpha }_{1}=0⇒{X}^{\text{T}}Y{\mu }_{1}=2{\lambda }_{1}{\alpha }_{1},$ (i)

$\frac{\partial L}{\partial {\mu }_{1}}={Y}^{\text{T}}X{\alpha }_{1}-2{\lambda }_{2}{\mu }_{1}=0⇒{Y}^{\text{T}}X{\alpha }_{1}=2{\lambda }_{2}{\mu }_{1},$ (ii)

$\left({X}^{\text{T}}Y\right)\left({Y}^{\text{T}}X\right){\alpha }_{1}=\left({X}^{\text{T}}Y\right)\cdot \left(2{\lambda }_{2}{\mu }_{1}\right)=2{\lambda }_{2}\left({X}^{\text{T}}Y\right)\cdot {\mu }_{1}=2{\lambda }_{2}\left(2{\lambda }_{1}{\alpha }_{1}\right)=2{\lambda }_{2}\cdot 2{\lambda }_{1}{\alpha }_{1}$

$\left({Y}^{\text{T}}X\right)\left({X}^{\text{T}}Y\right){\mu }_{1}=\left({Y}^{\text{T}}X\right)\cdot \left(2{\lambda }_{1}{\alpha }_{1}\right)=2{\lambda }_{1}\left({Y}^{\text{T}}X\right)\cdot {\alpha }_{1}=2{\lambda }_{1}\left(2{\lambda }_{2}{\mu }_{1}\right)=2{\lambda }_{2}\cdot 2{\lambda }_{1}{\mu }_{1}$

${\mu }_{1}^{\text{T}}\cdot \left({Y}^{\text{T}}X\right){\alpha }_{1}={\alpha }_{1}^{\text{T}}\cdot \left(2{\lambda }_{1}{\alpha }_{1}\right)$ $=2{\lambda }_{1}\left({\alpha }_{1}^{\text{T}}\cdot {\alpha }_{1}\right)=2{\lambda }_{1}\cdot \left({Y}^{\text{T}}X\right){\alpha }_{1}={\mu }_{1}^{\text{T}}\cdot \left(2{\lambda }_{2}{\mu }_{1}\right)=2{\lambda }_{2}{\mu }_{1}^{\text{T}}\cdot {\mu }_{1}=2{\lambda }_{2}$

${\alpha }_{1}^{\text{T}}\cdot \left({X}^{\text{T}}Y\right){\mu }_{1}={\alpha }_{1}^{\text{T}}\cdot \left(2{\lambda }_{1}{\alpha }_{1}\right)=2{\lambda }_{1}\left({\alpha }_{1}^{\text{T}}\cdot {\alpha }_{1}\right)=2{\lambda }_{1}$

$\begin{array}{l}1\right)\text{\hspace{0.17em}}w={X}^{\text{T}}u/\left({u}^{\text{T}}u\right)\text{}4\right)\text{\hspace{0.17em}}c={Y}^{\text{T}}t/\left({t}^{\text{T}}t\right)\\ 2\right)\text{\hspace{0.17em}}‖w‖\to 1\left(w=w/‖w‖\right)\text{}5\right)\text{\hspace{0.17em}}‖c‖\to 1\left(c=c/‖c‖\right)\\ 3\right)\text{\hspace{0.17em}}t=Xw\text{}6\right)\text{\hspace{0.17em}}u=Yc\end{array}$

$\begin{array}{c}{w}^{\left(k+1\right)}={X}^{\text{T}}{u}^{\left(k\right)}/\left({u}^{\left(k\right)}{}^{\text{T}}{u}^{\left(k\right)}\right)\\ ={X}^{\text{T}}Y{c}^{\left(k\right)}/\left(\left({u}^{\left(k\right)}{}^{T}{u}^{\left(k\right)}\right)\cdot ‖{c}^{\left(k\right)}‖\right)\\ ={X}^{\text{T}}Y{Y}^{\text{T}}{t}^{\left(k\right)}/\left(\left({u}^{\left(k\right)}{}^{\text{T}}{u}^{\left(k\right)}\right)\cdot ‖{c}^{\left(k\right)}‖\cdot \left({t}^{\left(k\right)}{}^{\text{T}}{t}^{\left(k\right)}\right)\right)\\ ={X}^{\text{T}}Y{Y}^{\text{T}}X{w}^{\left(k\right)}/\left(\left({u}^{\left(k\right)}{}^{\text{T}}{u}^{\left(k\right)}\right)\cdot ‖{c}^{\left(k\right)}‖\cdot \left({t}^{\left(k\right)}{}^{\text{T}}{t}^{\left(k\right)}\right)\cdot ‖{w}^{\left(k\right)}‖\right)\end{array}$

${w}^{\left(k+1\right)}={X}^{\text{T}}Y{Y}^{\text{T}}X\cdot {w}^{\left(k\right)}/\left(\left({u}^{\left(k\right)}{}^{\text{T}}{u}^{\left(k\right)}\right)\cdot ‖{c}^{\left(k\right)}‖\cdot \left({t}^{\left(k\right)}{}^{\text{T}}{t}^{\left(k\right)}\right)\cdot ‖{w}^{\left(k\right)}‖\right)$

$\left({X}^{\text{T}}Y{Y}^{\text{T}}X\right)\cdot {w}^{\left(k\right)}=\left(\left({u}^{\left(k\right)}{}^{\text{T}}{u}^{\left(k\right)}\right)\cdot ‖{c}^{\left(k\right)}‖\cdot \left({t}^{\left(k\right)}{}^{\text{T}}{t}^{\left(k\right)}\right)\cdot ‖{w}^{\left(k\right)}‖\right)\cdot {w}^{\left(k+1\right)}$

$\begin{array}{l}{c}^{\left(k+1\right)}={Y}^{\text{T}}X{X}^{\text{T}}Y{c}^{\left(k\right)}/\left(\left({t}^{\left(k+1\right)}{}^{\text{T}}{t}^{\left(k+1\right)}\right)\cdot ‖{w}^{\left(k+1\right)}‖\cdot \left({u}^{\left(k+1\right)}{}^{\text{T}}{u}^{\left(k+1\right)}\right)\cdot ‖{c}^{\left(k\right)}‖\right)\\ \therefore \text{}{c}^{\left(k+1\right)}={Y}^{\text{T}}X{X}^{\text{T}}Y{c}^{\left(k\right)}/\left(\left({t}^{\left(k+1\right)}{}^{\text{T}}{t}^{\left(k+1\right)}\right)\cdot ‖{w}^{\left(k+1\right)}‖\cdot \left({u}^{\left(k\right)}{}^{\text{T}}{u}^{\left(k\right)}\right)\cdot ‖{c}^{\left(k\right)}‖\right)\\ \left({Y}^{\text{T}}X\cdot {X}^{\text{T}}Y\right){c}^{\left(k\right)}=\left(\left({t}^{\left(k+1\right)}{}^{T}{t}^{\left(k+1\right)}\right)\cdot ‖{w}^{\left(k+1\right)}‖\cdot \left({u}^{\left(k\right)}{}^{\text{T}}{u}^{\left(k\right)}\right)\cdot ‖{c}^{\left(k\right)}‖\right)\cdot {c}^{\left(k+1\right)}\end{array}$

3. 比较最小二乘方法、主成分分析法及偏最小二乘法

${y}_{i}={\beta }_{0}+{\beta }_{1}{r}_{1}\left({x}_{i1}\right)+{\beta }_{2}{r}_{2}\left({x}_{i2}\right)+\cdot \cdot \cdot +{\beta }_{k}{r}_{k}\left({x}_{ik}\right)+{\epsilon }_{i}$

X的列之间存在多重共线时，利用主成分分析法可消除这种共线性的不良影响。主成分分析法从原有的自变量向量中寻找它们的某些线性组合(即综合的指标或称之为综合自变量)，在信息损失最小的原则下，得到少数几个综合自变量。主成分分析法可以在一定程度上消除了原自变量之间的多重共线性，而且降低了待处理数据的维数。处理有周期性变化规律的数据时，如经济运行中的得到的各类经济数据，主成分分析法与谱分析方法相结合 [9] ，分析数据的变化趋势与规律。

PLSR方法是S. Wold和C. Albano [4] 在1983年提出的。许多研究工作者 [5] [6] [7] [8] 对PLSR进行了研究，他们还研究了利用迭代的方式计算主成分(参考上文中“PLS回归迭代过程”关内容)。PLSR是主成分分析法与最小二乘法的综合体，它对自变量和因变量均提取主成分，同时还考虑了因变量的作用以及自变量X对因变量的解释作用，一定程度上消除了基于主成分的回归模型的不可靠性。

4. 拟合回归模型优良性检验说明

$y=X\beta +\epsilon$

${s}^{2}=\frac{SSE}{n-k-1}$ ，其中 $SSE=\underset{i=1}{\overset{n}{\sum }}{e}_{i}^{2}=\underset{i=1}{\overset{n}{\sum }}{\left({y}_{i}-{\stackrel{^}{y}}_{i}\right)}^{2}$

$\begin{array}{l}\underset{i=1}{\overset{n}{\sum }}{\left({y}_{i}-\stackrel{¯}{y}\right)}^{2}=\underset{i=1}{\overset{n}{\sum }}{\left({\stackrel{^}{y}}_{i}-\stackrel{¯}{y}\right)}^{2}+\underset{i=1}{\overset{n}{\sum }}{\left({y}_{i}-{\stackrel{^}{y}}_{i}\right)}^{2}\\ \text{}↑\text{}↑\text{}↑\\ \text{}SST\text{}SSR\text{}SSE\end{array}$

${R}^{2}=\frac{SSR}{SST}=\frac{\underset{i=1}{\overset{n}{\sum }}{\left({\stackrel{^}{y}}_{i}-\stackrel{¯}{y}\right)}^{2}}{\underset{i=1}{\overset{n}{\sum }}{\left({y}_{i}-\stackrel{¯}{y}\right)}^{2}}=1-\frac{SSE}{SST}$ ................线性回归方程所解释的变差占总变差的比例。

${R}_{adj}^{2}=1-\frac{SSE/\left(n-k-1\right)}{SST/\left(n-1\right)}$ ...................依据自由度调整后的R2值。

NOTES



1如果只有一个自变量，则n = 1。

2这里的x已经标准化，即已中心化、均值化。



3尽管主成分分析没有忽略协方差和相关性，但是更注重方差。

4协方差矩阵 $\sum$ 为实对称矩阵，依据线性代数相关知识，可得 $\sum$ 中属于不同特征值的特征向量必定线性无关。

5此处的X、Y与(1)式中的X、Y含义不同。



6 ${R}^{2}$ ${R}_{adj}^{2}$ 含义见附录。

 [1] Walpole, R.E., Myers, R.H., Myers, S.L. and Ye, K.Y. 理工科概率统计[M]. 周勇, 马昀蓓, 谢尚宇, 王晓婧, 译.北京: 机械工业出版社, 2010. [2] Hotelling, H. (1933) Analysis of a Complex of Statistical Variables into Principal Components. Education Psychology, 24, 417-444. https://doi.org/10.1037/h0071325 [3] Massy, W.F. (1965) Principal Components Regression in Exploratory Statistical Research. Journal of the American Statistical Association, 60, 234-256. https://doi.org/10.1080/01621459.1965.10480787 [4] Wold, S., Albano, C. and Dun, M. (1983) Pattern Regression Finding and Using Regularities in Multivariate Data. Analysis Applied Science Publication, London. [5] Rosipal, R. and Krämer, N. (2006) Overview and Recent Advances in Partial Least Squares. Subspace, Latent Structure and Feature Selection, Bohinj, Slovenia, 23-25 February 2005, 34-51. https://doi.org/10.1007/11752790_2 [6] Wold, H. (1982) Soft Modeling: The Basic Design and Some Extensions. In: Jöreskog, J.-K. and Wold, H., Eds., Systems under Indirect Observation, Volume 2, North Holland, Amsterdam, 1-53. [7] Wold, H. (1985) Partial Least Squares. In: Kotz, S. and Johnson, N.L., Eds., Encyclopedia of the Statistical Sciences, Vol. 6, John Wiley, New York, 581-591. [8] Wold, S., Ruhe, H., Wold, H. and Dunn III, W.J. (1984) The Collinearity Problem in Linear Regression. The Partial Least Squares (PLS) Approach to Generalized Inverse. SIAM Journal of Scientific and Statistical Computations, 5, 735-743. https://doi.org/10.1137/0905052 [9] 张红, 谢娜. 基于主成分分析与谱分析的房地产市场周期研究[J]. 清华大学学报(自然科学版), 2008, 48(9): 24-27.