一种新的三项共轭梯度法求解非线性方程组

doi:10.12677/AAM.2019.85097

期刊菜单

一种新的三项共轭梯度法求解非线性方程组
A New Three-Term Conjugate Gradient Method for Solving Nonlinear Equations

DOI: 10.12677/AAM.2019.85097, PDF, HTML, XML, 下载: 1,024 浏览: 1,439
作者: 廖若沙：广西大学，数学与信息科学学院，广西南宁
关键词: 共轭梯度法；充分下降性；全局收敛性；Conjugate Gradient Method； Sufficient Descent Property； Global Convergence

摘要: 本文在现有的三项共轭梯度法的基础上，设计了一种新的共轭梯度法JG求解非线性方程组问题，并在一定的假设条件下证明了JG算法的充分下降性和全局收敛性。通过数值实验的结果我们可以看到，JG算法与PRP算法相比具有更好的性质。

Abstract: Based on the existing three-term conjugate gradient method, this paper designs a new conjugate gradient method JG to solve the problem of nonlinear equations, and proves the sufficient descent and global convergence of JG algorithm under certain assumptions. From the results of numerical experiments, we can see that JG algorithm has better properties than PRP algorithm.

文章引用：廖若沙. 一种新的三项共轭梯度法求解非线性方程组[J]. 应用数学进展, 2019, 8(5): 869-875. https://doi.org/10.12677/AAM.2019.85097

1. 引言

我们考虑如下非线性方程组：

$h (x) = 0, x \in R^{n}$ (1)

其中 $h (x) = 0$ 单调并连续可微。令 $F (x) = \frac{1}{2} {‖ h (x) ‖}^{2}$ ，则(1)式等价于求解(2)式的全局最优解：

$\min F (x), x \in R^{n}$ (2)

通常求解非线性方程组的迭代公式为：

$x_{k + 1} = x_{k} + α_{k} d_{k}$

其中 $α_{k}$ 为沿搜索方向上的步长， $d_{k}$ 为搜索方向。

随着共轭梯度法的发展，产生了一系列的求解 $d_{k}$ 和 $α_{k}$ 的方法 [1] [2] [3] [4] [5] ，例如标准的Wolfe线搜索：

$\begin{array}{l} F (x_{k + 1}) - F (x_{k}) \leq c_{1} α_{k} g_{k}^{T} d_{k} \\ g_{k + 1}^{T} d_{k} \geq c_{2} g_{k}^{T} d_{k} \end{array}$

其中 $0 < c_{1} < c_{2} < 1$ 为任意常数， $g_{k} : = \nabla F (x_{k})$ 。

我们利用Yuan和Lu [6] 给出的一种新的线搜索：

$- g {(x_{k} + α_{k} d_{k})}^{T} d_{k} \geq σ α_{k} ‖ g (x_{k} + α_{k} d_{k}) ‖ {‖ d_{k} ‖}^{2}$ (3)

其中 $α_{k} = \max {s, ρ s, ρ^{2} s, \dots}$ ， $σ, s > 0$ ， $ρ \in (0, 1)$ 。在一定的假设条件下，得到全局收敛性和超线性收敛。

Zhang [7] 提出了一种三项共轭梯度算法：

$d_{k} = {\begin{cases} - g_{k} + β_{k}^{P R P} d_{k - 1} - v_{k} y_{k} if k \geq 1 \\ - g_{k} if k = 0 \end{cases}$

其中 $g_{k} = g (x_{k})$ ， $β_{k}^{P R P} = \frac{g_{k - 1}^{T} y_{k}}{{‖ g_{k - 1} ‖}^{2}}$ ， $v_{k} = \frac{g_{k}^{T} d_{k - 1}}{{‖ g_{k - 1} ‖}^{2}}$ ， $y_{k} = g_{k} - g_{k - 1}$ 。由 $d_{k}$ 的定义易得：

$d_{k}^{T} g_{k} = - {‖ g_{k} ‖}^{2}$

根据上述算法及线搜索，本文给出一种新的三项共轭梯度算法JG：

$d_{k} = {\begin{cases} - g_{k} + \frac{g_{k}^{T} y_{k}^{*} d_{k - 1} - g_{k}^{T} d_{k - 1} y_{k}^{*}}{μ ‖ d_{k - 1} ‖ ‖ y_{k}^{*} ‖ + v {‖ y_{k}^{*} ‖}^{2} + {‖ g_{k - 1} ‖}^{2} + η ‖ g_{k - 1} ‖ ‖ d_{k - 1} ‖ + r {‖ d_{k - 1} ‖}^{2}} if k \geq 1 \\ - g_{k} if k = 0 \end{cases}$ (4)

其中 $μ > 0$ ， $v > 0$ ， $η > 0$ ， $r > 0$ ， $y_{k}^{*} = g_{k} - \frac{‖ g_{k - 1} ‖}{‖ g_{k} ‖} g_{k - 1}$ 。

2. 算法

新的三项共轭梯度算法JG的步骤如下：

Step 0：令初始点 $x_{0} \in R$ ， $μ > 0$ ， $v > 0$ ， $η > 0$ ， $r > 0$ ； $ρ, ε \in (0, 1)$ 。 $k : = 1$ ；

Step 1：若 $‖ g (x) ‖ \leq ε$ ，停止；否则转到Step 2；

Step 2：通过(4)式计算搜索方向 $d_{k}$ ；

Step 3：选择满足条件(3)的步长 $α_{k}$ ；

Step 4：令迭代公式为 $w_{k} = x_{k} + α_{k} d_{k}$ ；

Step 5：若 $‖ g (x) ‖ \leq ε$ ，停止，令 $x_{k + 1} = w_{k}$ ；否则令

$x_{k + 1} = x_{k} - \frac{g {(w_{k})}^{T} (x_{k} - w_{k})}{{‖ g (w_{k}) ‖}^{2}} g (w_{k})$ (5)

Step 6：令 $k : = k + 1$ ，转Step 1。

注：(5)式 [8] 为 $x_{k}$ 在超平面 $H_{k} = {x \in R^{n} | 〈 g (w_{k}), x - w_{k} 〉 = 0}$ 上的投影。

3. 充分下降性和全局收敛性

我们给出如下两个假设条件：

假设1：(1)式的解集非空。

假设2： $g (x)$ 在 $R^{n}$ 上Lipschitz连续，即存在 $L > 0$ 使得

$‖ g (x) - g (y) ‖ \leq L ‖ x - y ‖, \forall x, y \in R^{n}$ (6)

于是可以得到

$‖ g_{k} ‖ \leq ς$

其中 $ς > 0$ 。

引理1：(4)式中的 $d_{k}$ 满足：

$g_{k + 1} d_{k + 1} \leq - {‖ g_{k + 1} ‖}^{2}$ (7)

和

$‖ g_{k} ‖ \leq ‖ d_{k} ‖ \leq (1 + \frac{2}{μ}) ‖ g_{k} ‖$ (8)

证明：由 $d_{k}$ 的定义可以直接得(7)式的结果，而利用(7)式可以得到(8)式的左半部分，接下来证明(8)式的右半部分。

$\begin{matrix} ‖ d_{k} ‖ = ‖ - g_{k} + \frac{g_{k}^{T} y_{k}^{*} d_{k - 1} - g_{k}^{T} d_{k - 1} y_{k}^{*}}{μ ‖ d_{k - 1} ‖ ‖ y_{k}^{*} ‖ + v ‖ y_{k}^{*} ‖ + {‖ g_{k - 1} ‖}^{2} + η ‖ g_{k - 1} ‖ ‖ d_{k - 1} ‖ + r {‖ d_{k - 1} ‖}^{2}} ‖ \\ \leq ‖ - g_{k} ‖ + ‖ \frac{g_{k}^{T} y_{k}^{*} d_{k - 1} - g_{k}^{T} d_{k - 1} y_{k}^{*}}{μ ‖ d_{k - 1} ‖ ‖ y_{k}^{*} ‖ + v ‖ y_{k}^{*} ‖ + {‖ g_{k - 1} ‖}^{2} + η ‖ g_{k - 1} ‖ ‖ d_{k - 1} ‖ + r {‖ d_{k - 1} ‖}^{2}} ‖ \\ \leq ‖ - g_{k} ‖ + \frac{2 ‖ g_{k}^{T} ‖ ‖ y_{k}^{*} ‖ ‖ d_{k - 1} ‖}{μ ‖ d_{k - 1} ‖ ‖ y_{k}^{*} ‖} = (1 + \frac{2}{μ}) ‖ g_{k} ‖ \end{matrix}$

得证，故(8)式成立。 □

引理2：若假设1，假设2均成立， ${x_{k}}$ 和 ${w_{k}}$ 是由算法JG产生的点列，可得：

$α_{k} \geq \min {s, \frac{ρ}{L + σ ‖ g ({\hat{w}}_{k}) ‖} \frac{{‖ g (x_{k}) ‖}^{2}}{{‖ d_{k} ‖}^{2}}}$

其中 ${\hat{w}}_{k} = x_{k} + {\hat{α}}_{k} d_{k}$ ， ${\hat{α}}_{k} = α_{k} ρ^{- 1}$ 。

证明：这里 $α_{k}$ 由(3)式给出，如果 $α_{k} \neq s$ ，那么 ${\hat{α}}_{k} = α_{k} ρ^{- 1}$ 不满足(3)式，即：

$- g {(x_{k} + α_{k} d_{k})}^{T} d_{k} < σ α_{k} ‖ g (x_{k} + α_{k} d_{k}) ‖ {‖ d_{k} ‖}^{2}$

成立。

由Lipschitz条件和(7)式：

${‖ g_{k} ‖}^{2} \leq - g_{k}^{T} d_{k} \leq {(g ({\hat{w}}_{k}) - g (x_{k}))}^{T} d_{k} + σ {\hat{α}}_{k} ‖ g ({\hat{w}}_{k}) ‖ {‖ d_{k} ‖}^{2} \leq {\hat{α}}_{k} (L + σ ‖ g ({\hat{w}}_{k}) ‖) {‖ d_{k} ‖}^{2}$

则

$α_{k} \geq \frac{ρ}{L + σ ‖ g ({\hat{w}}_{k}) ‖} \frac{{‖ g (x_{k}) ‖}^{2}}{{‖ d_{k} ‖}^{2}}$

得证。 □

引理3：若假设1，假设2均成立， ${x_{k}}$ 是由算法JG产生的点列，若 $x_{*}$ 是(1)的解，即 $h (x_{*}) = 0$ 成立，那么：

${‖ x_{k + 1} - x_{*} ‖}^{2} \leq {‖ x_{k} - x_{*} ‖}^{2} - {‖ x_{k + 1} - x_{k} ‖}^{2}$ (9)

和

$\sum_{k = 0}^{\infty} {‖ x_{k + 1} - x_{k} ‖}^{2} < \infty$ (10)

均成立。

证明：由g和超平面 $H_{k}$ 的单调性，可得：

$〈 g (w_{k}) - g (x_{*}), x_{*} - w_{k} 〉 = 〈 g (w_{k}), x_{*} - w_{k} 〉 \leq 0$

$x_{k + 1}$ 为 $x_{k}$ 在 $M_{k} = {x \in R^{n} | 〈 g (w_{k}), x - w_{k} 〉 \leq 0}$ 上的投影。如果 $x_{*}$ 在 $M_{k}$ 上，则 $〈 x_{k} - x_{k + 1}, x_{k + 1} - x_{*} 〉 \geq 0$ 。可得：

${‖ x_{k} - x_{*} ‖}^{2} = {‖ x_{k} - x_{k + 1} ‖}^{2} + {‖ x_{k + 1} - x_{*} ‖}^{2} + 2 〈 x_{k} - x_{k + 1}, x_{k + 1} - x_{*} 〉 \geq {‖ x_{k} - x_{k + 1} ‖}^{2} + {‖ x_{k + 1} - x_{*} ‖}^{2}$

整理得：

${‖ x_{k + 1} - x_{k} ‖}^{2} \leq {‖ x_{k} - x_{*} ‖}^{2} - {‖ x_{k + 1} - x_{*} ‖}^{2}$

故(9)式成立。令 $k \to + \infty$ ，对该式进行累加，得(10)式成立，得证。 □

由(3)、(5)二式，得：

$‖ x_{k + 1} - x_{k} ‖ = \frac{| g {(w_{k})}^{T} (x_{k} - w_{k}) |}{‖ g (w_{k}) ‖} = \frac{- α_{k} g {(w_{k})}^{T} d_{k}}{‖ g (w_{k}) ‖} \geq σ α_{k}^{2} {‖ d_{k} ‖}^{2}$

由(10)式可知，当 $k \to + \infty$ 时：

$‖ x_{k + 1} - x_{k} ‖ \to 0$

综上可得：

$\lim_{k \to + \infty} α_{k} d_{k} = 0$ (11)

定理1若假设1，假设2均成立，序列 ${α_{k}, d_{k}, x_{k + 1}, g_{k + 1}}$ 由算法JG产生，则有

$\lim_{k \to \infty} \inf ‖ g_{k} ‖ = 0$ (12)

证明：若(12)式不成立，则存在 $δ > 0$ 使得对任意 $K \geq 0$ 都有 $‖ g_{k} ‖ \geq δ$ 。结合式(8)，有：

$c_{3} δ \leq c_{3} ‖ g_{k} ‖ \leq ‖ d_{k} ‖ \leq c_{4} ‖ g_{k} ‖ \leq c_{4} ζ$

这里 $0 < c_{3} < 1$ ， $c_{4} = 1 + \frac{2}{μ}$ 。由引理2可得：

$\begin{matrix} α_{k} ‖ d_{k} ‖ \geq \min {s, \frac{ρ}{L + σ ‖ g ({\hat{w}}_{k}) ‖} \frac{{‖ g (x_{k}) ‖}^{2}}{{‖ d_{k} ‖}^{2}}} ‖ d_{k} ‖ \\ \geq \min {c_{3} δ s, \frac{ρ δ}{c_{4} ζ (L + σ ς)}} > 0 \end{matrix}$

与(11)式矛盾，故(12)式成立，得证。 □

4. 数值实验

下面我们将JG算法与PRP算法作比较，选取九个测试函数 [9] 进行测试，结果见表1：

Table 1. Test functions

表1. 测试函数

数值实验的参数设置： $μ = 0.0001$ ， $v = 0.0001$ ， $η = 0.0001$ ， $r = 0.0001$ ， $ε = 10^{- 5}$ ，终止条件为 $‖ g (x) ‖ \leq 10^{- 5}$ ，其中CI为循环次数；CT为计算次数；GN为梯度值的范数。结果见表2：

Table 2. Numerical results

表2. 数值结果

我们根据计算次数(CT)给出图1，可以直观看到在相同条件下，算法JG所需迭代次数更少，图1如下所示：

Figure 1. Algorithm JG and algorithm PRP performance chart (CT)

图1. JG算法与PRP算法的性能比较(CT)

5. 结论

针对求解非线性单调方程组，本文采用文献6的线搜索，给出了一个三项共轭梯度算法JG，并且在一定的假设条件下得到了充分下降性和全局收敛性，从数值实验的结果可以看到，JG算法在与PRP算法比较起来具有更好的性质。

参考文献

[1]	Polyak, B.T. (1969) The Conjugate Gradient Method in Extremal Problems. Ussr Computational Mathematics & Mathematical Physics, 9, 94-112. https://doi.org/10.1016/0041-5553(69)90035-4
[2]	Dai, Y. and Yuan, Y. (1999) A Nonlinear Conjugate Gradient Method with a Strong Global Convergence Property. Siam Journal on Optimization, 10, 177-182. https://doi.org/10.1137/S1052623497318992
[3]	Fletcher, R. and Reeves, C.M. (1964) Function Minimization by Conjugate Gradients. Computer Journal, 7, 149-154. https://doi.org/10.1093/comjnl/7.2.149
[4]	Polak, E. and Ribiere, G. (1968) Note sur la convergence de methodes de directions conjuguees. Revue Française D’-informatique et de Recherche Opérationnelle, 16, 35-43. https://doi.org/10.1051/m2an/196903R100351
[5]	Wei, Z., Yao, S. and Liu, L. (2006) The Convergence Prop-erties of Some New Conjugate Gradient Methods. Applied Mathematics & Computation, 183, 1341-1350. https://doi.org/10.1016/j.amc.2006.05.150
[6]	Yuan, G. and Lu, X. (2008) A New Backtracking Inexact BFGS Method for Symmetric Nonlinear Equations. Computers & Mathematics with Applications, 55, 116-129. https://doi.org/10.1016/j.camwa.2006.12.081
[7]	Zhang, L., Zhou, W. and Li, D.H. (2006) A Descent Modified Polak-Ribiere-Polyak Conjugate Gradient Method and Its Global Convergence. IMA Journal of Numerical Analysis, 26, 629-640. https://doi.org/10.1093/imanum/drl016
[8]	Solodov, M.V. and Svaiter, B.F. (1998) A Globally Con-vergent Inexact Newton Method for Systems of Monotone Equations. Reformulation: Nonsmooth, Piecewise Smooth, Semismooth and Smoothing Methods. Springer, US, 355-369. https://doi.org/10.1007/978-1-4757-6388-1_18
[9]	Yuan, G., Wei, Z. and Lu, X. (2011) A BFGS Trust-Region Method for Nonlinear Equations. Computing, 92, 317-333. https://doi.org/10.1007/s00607-011-0146-z

为你推荐

友情链接