求解大型线性最小二乘问题的贪婪随机坐标下降法

doi:10.12677/aam.2024.136267

期刊菜单

求解大型线性最小二乘问题的贪婪随机坐标下降法
A Greedy Randomized Coordinate Descent Method for Solving Large Linear Least Squares Problem

DOI: 10.12677/aam.2024.136267, PDF, HTML, XML,
作者: 董勤：温州大学数理学院，浙江温州
关键词: 最小二乘问题；贪婪随机坐标下降法；松弛因子；Least Squares Problem； The Greedy Randomized Coordinate Descent Method； Relaxation Parameter

摘要: 贪婪随机坐标下降法(GRCD)是求解大型线性最小二乘问题的有效迭代方法之一。本文在GRCD算法中引入松弛因子，构造了一种含参数的贪婪随机坐标下降法。并证明了当线性最小二乘问题的系数矩阵为列满秩时该方法依期望的收敛性。数值实验表明，当选取适当的松弛因子时，该算法在迭代步数和计算时间比GRCD方法更有效。

Abstract: The greedy randomized coordinate descent method (GRCD) is one of the effective iterative methods to solve large linear least squares problem. A greedy randomized coordinate descent method with parameters was constructed by introducing a relaxation parameter in the GRCD algorithm. It is also proved that the method has the expected convergence when the coefficient matrix of the linear least squares problem is of full column rank. Numerical experiments show that the proposed algorithm is more effective than the GRCD method in terms of iterative steps and calculation time when the appropriate relaxation parameter is selected.

文章引用：董勤. 求解大型线性最小二乘问题的贪婪随机坐标下降法[J]. 应用数学进展, 2024, 13(6): 2780-2790. https://doi.org/10.12677/aam.2024.136267

1. 引言

对于大型线性最小二乘问题

$f (x) = \min_{x \in R} {‖ b - A x ‖}_{2}^{2}$ (1)

其中系数矩阵 $A \in R^{m \times n}$ 列满秩， $b \in R^{m}$ 。 $x_{*} = A^{+} b$ ，为其唯一最小范数最小二乘解，其中 $A^{+} = {(A^{T} A)}^{- 1} A^{T}$ 表示 $A$ 的Moore-Penrose广义逆。众所周知，线性最小二乘问题在数学上等同于正规方程 $A^{T} A x = A^{T} b$ ，也等价于无约束二次优化问题

$\min_{x \in R^{n}} f (x) : = \frac{1}{2} x^{T} A^{T} A x - b^{T} A x$ (2)

对于矩阵 $G = (g_{i j}) \in R^{m \times n}$ ， $G_{(j)}$ 、 ${‖ G ‖}_{2}$ 和 ${‖ G ‖}_{F}$ 分别代表矩阵的第j列、2范数以及F范数， $e_{i}$ 表示第i个坐标向量， ${(\cdot)}^{(j_{k})}$ 表示第k次迭代向量的第j个元素，用 $λ_{\min} (G^{T} G)$ 表示 $G^{T} G$ 最小的正特征值。

最小二乘问题在岭回归[1]-[3]，计算机断层扫描[4]-[6]，机器学习[7]，生物特征选择[8]及其他领域都有广泛的应用[9]-[12]。对于最小二乘问题(1)，坐标下降法是求解这一类问题的重要方法之一，它将经典的Gauss-Seidel迭代方法直接应用于求解(2) [13]。在2009年Strohmer和Vershynin [14]最先通过随机选择系数矩阵A的一行作为工作行提出随机Kaczmarz算法。2010年Leventhal和Lewis [15]受此启发，通过随机选择系数矩阵A的一列作为工作列提出随机坐标下降(RCD)方法。之后，随机坐标下降法被进一步推广或拓展[16]-[18]。2019年Bai和Wu [19]用了一种更有效的概率准则选取系数矩阵A的列，提出了贪婪随机坐标下降法(GRCD)。本文中在GRCD的算法分析中引入松弛因子，构造求解系统(1)的含参数的GRCD算法，并给出了该算法的收敛性。

2. 参数的贪婪随机坐标下降算法

Bai和Wu提出了求解大型线性最小二乘问题(1)的GRCD方法，该算法如下。

算法1 GRCD算法

输入： $A, b, l$ 和 $x_{0}$

输出： $x_{l}$

1：对 $k = 0, 1, \dots, l - 1$ 做

2：计算

$δ_{k} = \frac{1}{2} (\frac{1}{{‖ A^{T} r_{k} ‖}_{2}^{2}}) \max_{1 \leq j \leq n} (\frac{{| A_{(j)}^{T} r_{k} |}^{2}}{{‖ A_{(j)} ‖}_{2}^{2}} + \frac{1}{{‖ A ‖}_{F}^{2}})$

3：构造集合

$ν_{k} = {j | {| A_{(j)}^{T} r_{k} |}^{2} \geq δ_{k} {‖ A^{T} r_{k} ‖}_{2}^{2} {‖ A_{(j)} ‖}_{2}^{2}}$

4：令 $s_{k} = A^{T} r_{k}$ ，根据如下公式计算 ${\tilde{s}}_{k}^{(j_{k})}$

${\tilde{s}}_{k}^{(j_{k})} = {\begin{cases} s_{k}^{(j_{k})}, j_{k} \in ν_{k}, \\ 0, j_{k} \notin ν_{k}, \end{cases}$

5：依概率 $\Pr (c o l u m n = j_{k}) = \frac{{| {\tilde{s}}_{k}^{(j_{k})} |}^{2}}{{‖ {\tilde{s}}_{k} ‖}_{2}^{2}}$ 选择 $j_{k} \in ν_{k}$

6：计算 $x_{k + 1} = x_{k} + \frac{s_{k}^{(j_{k})}}{{‖ A_{(j_{k})} ‖}_{2}^{2}} e_{j_{k}}$

7：结束

本文中含参数的GRCD算法(GRCD(ω))是在GRCD算法迭代公式中添加一个参数 $ω \in (0, 2)$ ，见如下算法2。

算法2 含参数的贪婪随机坐标下降法

输入： $A, b, l, ω$ 和 $x_{0}$

输出： $x_{l}$

1：对 $k = 0, 1, \dots, l - 1$ 做

2：计算

$δ_{k} = \frac{1}{2} (\frac{1}{{‖ A^{T} r_{k} ‖}_{2}^{2}}) \max_{1 \leq j \leq n} (\frac{{| A_{(j)}^{T} r_{k} |}^{2}}{{‖ A_{(j)} ‖}_{2}^{2}} + \frac{1}{{‖ A ‖}_{F}^{2}})$

3：构造集合

$ν_{k} = {j | {| A_{(j)}^{T} r_{k} |}^{2} \geq δ_{k} {‖ A^{T} r_{k} ‖}_{2}^{2} {‖ A_{(j)} ‖}_{2}^{2}}$

4：令 $s_{k} = A^{T} r_{k}$ ，根据如下公式计算 ${\tilde{s}}_{k}^{(j_{k})}$

${\tilde{s}}_{k}^{(j_{k})} = {\begin{cases} s_{k}^{(j_{k})}, j_{k} \in ν_{k}, \\ 0, j_{k} \notin ν_{k}, \end{cases}$

5：依概率 $\Pr (c o l u m n = j_{k}) = \frac{{| {\tilde{s}}_{k}^{(j_{k})} |}^{2}}{{‖ {\tilde{s}}_{k} ‖}_{2}^{2}}$ 选择 $j_{k} \in ν_{k}$

6：计算 $x_{k + 1} = (1 - ω) x_{k} + ω (x_{k} + \frac{s_{k}^{(j_{k})}}{{‖ A_{(j_{k})} ‖}_{2}^{2}} e_{j_{k}})$

7：结束

注1：上述算法中依概率选取工作列时采用Matlab的randsrc命令。

注2：当 $ω = 1$ 时，GRCD(ω)算法简化为GRCD算法。

3. GRCD(ω)算法收敛性分析

关于算法2有如下收敛定理。

定理1对于线性最小二乘问题(1)，设 $ω \in (0, 2)$ 。那么由算法2迭代生成的序列 ${x_{k}}_{k = 0}^{\infty}$ 依期望指数收敛到 $x_{*}$ ，且解误差依期望满足

$E_{k} {‖ x_{k + 1} - x_{*} ‖}_{A^{T} A}^{2} \leq (1 - ω (2 - ω) \frac{λ_{\min} (A^{T} A)}{{‖ A ‖}_{F}^{2}}) {‖ x_{k} - x_{*} ‖}_{A^{T} A}^{2}$ (3)

以及

$E {‖ x_{k} - x_{*} ‖}_{A^{T} A}^{2} \leq {(1 - ω (2 - ω) \frac{λ_{\min} (A^{T} A)}{{‖ A ‖}_{F}^{2}})}^{k} {‖ x_{0} - x_{*} ‖}_{A^{T} A}^{2}$ (4)

这里用 $E_{k}$ 表示前k次迭代的期望值，即 $E_{k} [\cdot] = E_{k} [\cdot | j_{0}, j_{1}, \dots, j_{k - 1} |]$ ， $j_{l} (l = 0, 1, \dots, k - 1)$ 表示算法在第l次迭代时选择的第 $j_{l}$ 列，同时也易知 $E [E_{k} [\cdot]] = E [\cdot]$ 。

证明：令 $P_{j_{k}} = \frac{A_{(j_{k})} A_{(j_{k})}^{T}}{{‖ A_{(j_{k})} ‖}_{2}^{2}}$ ，则

$\begin{matrix} A (x_{k + 1} - x_{*}) = A (x_{k} - x_{*}) + ω A \frac{s_{k}^{(j_{k})}}{{‖ A_{(j_{k})} ‖}_{2}^{2}} e_{j_{k}} \\ = A (x_{k} - x_{*}) + ω \frac{A A_{(j_{k})}^{T} r_{k}}{{‖ A_{(j_{k})} ‖}_{2}^{2}} e_{j_{k}} = A (x_{k} - x_{*}) + ω \frac{A_{(j_{k})}^{T} r_{k} A_{(j_{k})}}{{‖ A_{(j_{k})} ‖}_{2}^{2}} \\ = A (x_{k} - x_{*}) + ω \frac{A_{(j_{k})}^{T} (A x_{*} - A x_{k}) A_{(j_{k})}}{{‖ A_{(j_{k})} ‖}_{2}^{2}} \\ = A (x_{k} - x_{*}) - ω \frac{A_{(j_{k})} A_{(j_{k})}^{T}}{{‖ A_{(j_{k})} ‖}_{2}^{2}} A (x_{k} - x_{*}) \\ = A (x_{k} - x_{*}) - ω P_{j_{k}} A (x_{k} - x_{*}) \end{matrix}$

由 $P_{j_{k}}$ 的定义，可知 $P_{j_{k}}^{T} P_{j_{k}} = P_{j_{k}}$ ，因此

$\begin{array}{l} {‖ A (x_{k + 1} - x_{*}) ‖}_{2}^{2} = {‖ A (x_{k} - x_{*}) - ω P_{j_{k}} A (x_{k} - x_{*}) ‖}_{2}^{2} \\ = {‖ A (x_{k} - x_{*}) ‖}_{2}^{2} - 2 ω {[A (x_{k} - x_{*})]}^{T} P_{j_{k}} A (x_{k} - x_{*}) + ω^{2} {[P_{j_{k}} A (x_{k} - x_{*})]}^{T} P_{j_{k}} A (x_{k} - x_{*}) \\ = {‖ A (x_{k} - x_{*}) ‖}_{2}^{2} - 2 ω {[A (x_{k} - x_{*})]}^{T} P_{j_{k}}^{T} P_{j_{k}} A (x_{k} - x_{*}) + ω^{2} {[P_{j_{k}} A (x_{k} - x_{*})]}^{T} P_{j_{k}} A (x_{k} - x_{*}) \\ = {‖ A (x_{k} - x_{*}) ‖}_{2}^{2} - 2 ω {‖ P_{j_{k}} A (x_{k} - x_{*}) ‖}_{2}^{2} + ω^{2} {‖ P_{j_{k}} A (x_{k} - x_{*}) ‖}_{2}^{2} \\ = {‖ A (x_{k} - x_{*}) ‖}_{2}^{2} - ω (2 - ω) {‖ P_{j_{k}} A (x_{k} - x_{*}) ‖}_{2}^{2} \end{array}$ (5)

又

$\begin{matrix} E_{k} {‖ P_{j_{k}} A (x_{k} - x_{*}) ‖}_{2}^{2} = E_{k} {‖ \frac{A_{(j_{k})} A_{(j_{k})}^{T}}{{‖ A_{(j_{k})} ‖}_{2}^{2}} A (x_{k} - x_{*}) ‖}_{2}^{2} \\ = E_{k} {‖ \frac{A_{(j_{k})}}{{‖ A_{(j_{k})} ‖}_{2}^{2}} | A_{(j_{k})}^{T} A (x_{k} - x_{*}) | ‖}_{2}^{2} \\ = E_{k} ({(\frac{| A_{(j_{k})}^{T} A (x_{k} - x_{*}) |}{{‖ A_{(j_{k})} ‖}_{2}^{2}} A_{(j_{k})})}^{T} (\frac{| A_{(j_{k})}^{T} A (x_{k} - x_{*}) |}{{‖ A_{(j_{k})} ‖}_{2}^{2}} A_{(j_{k})})) \\ = E_{k} (\frac{1}{{‖ A_{(j_{k})} ‖}_{2}^{2}} {| A_{(j_{k})}^{T} A (x_{k} - x_{*}) |}^{2}) \end{matrix}$

$\begin{matrix} = \sum_{j_{k} = 1}^{n} \frac{{| {\tilde{s}}_{k}^{(j_{k})} |}^{2}}{\sum_{j \in ν_{k}} {| {\tilde{s}}_{k}^{(j)} |}^{2}} (\frac{1}{{‖ A_{(j_{k})} ‖}_{2}^{2}} {| A_{(j_{k})}^{T} A (x_{k} - x_{*}) |}^{2}) \\ = \sum_{j_{k} \in ν_{k}} \frac{{| s_{k}^{(j_{k})} |}^{2}}{\sum_{j \in ν_{k}} {| s_{k}^{(j)} |}^{2}} (\frac{1}{{‖ A_{(j_{k})} ‖}_{2}^{2}} {| A_{(j_{k})}^{T} A (x_{k} - x_{*}) |}^{2}) \\ = \sum_{j_{k} \in ν_{k}} \frac{{| s_{k}^{(j_{k})} |}^{2}}{\sum_{j \in ν_{k}} {| s_{k}^{(j)} |}^{2}} \frac{{| s_{k}^{(j_{k})} |}^{2}}{{‖ A_{(j_{k})} ‖}_{2}^{2}} \end{matrix}$

由 $ν_{k}$ 的定义可知，若 $j_{k} \in ν_{k}$ ，则 ${| s_{k}^{(j_{k})} |}^{2} \geq δ_{k} {‖ s_{k} ‖}_{2}^{2} {‖ A_{(j_{k})} ‖}_{2}^{2}$ ，即 $\frac{{| s_{k}^{(j_{k})} |}^{2}}{{‖ A_{(j_{k})} ‖}_{2}^{2}} \geq δ_{k} {‖ s_{k} ‖}_{2}^{2}$

因此 $\sum_{j_{k} \in ν_{k}} \frac{{| s_{k}^{(j_{k})} |}^{2}}{\sum_{j \in ν_{k}} {| s_{k}^{(j)} |}^{2}} \frac{{| s_{k}^{(j_{k})} |}^{2}}{{‖ A_{(j_{k})} ‖}_{2}^{2}} \geq \sum_{j_{k} \in ν_{k}} \frac{{| s_{k}^{(j_{k})} |}^{2}}{\sum_{j \in ν_{k}} {| s_{k}^{(j)} |}^{2}} δ_{k} {‖ s_{k} ‖}_{2}^{2} = δ_{k} {‖ s_{k} ‖}_{2}^{2}$

即 $E_{k} {‖ P_{j_{k}} A (x_{k} - x_{*}) ‖}_{2}^{2} \geq δ_{k} {‖ s_{k} ‖}_{2}^{2} = δ_{k} {‖ A^{T} A (x_{k} - x_{*}) ‖}_{2}^{2}$

对(5)式左右两边求期望可得

$\begin{array}{l} E_{k} {‖ A (x_{k + 1} - x_{*}) ‖}_{2}^{2} \\ = {‖ A (x_{k} - x_{*}) ‖}_{2}^{2} - ω (2 - ω) E_{k} {‖ P_{j_{k}} A (x_{k} - x_{*}) ‖}_{2}^{2} \\ \leq {‖ A (x_{k} - x_{*}) ‖}_{2}^{2} - ω (2 - ω) δ_{k} {‖ A^{T} A (x_{k} - x_{*}) ‖}_{2}^{2} \\ \leq (1 - ω (2 - ω) δ_{k} λ_{\min} (A^{T} A)) {‖ A (x_{k} - x_{*}) ‖}_{2}^{2} \end{array}$ (6)

上式最后一步是由下面的不等式得到

${‖ A^{T} u ‖}_{2}^{2} \geq λ_{\min} (A^{T} A) {‖ u ‖}_{2}^{2}, u \in R (A^{T})$

又由于

$\begin{matrix} δ_{k} {‖ A ‖}_{F}^{2} = \frac{1}{2} (\frac{1}{{‖ A^{T} r_{k} ‖}_{2}^{2}} \max_{1 \leq j_{k} \leq n} \frac{{| A_{(j_{k})}^{T} r_{k} |}^{2}}{{‖ A_{(j_{k})} ‖}_{2}^{2}} + \frac{1}{{‖ A ‖}_{F}^{2}}) {‖ A ‖}_{F}^{2} \\ = \frac{\max_{1 \leq j_{k} \leq n} \frac{{| s_{k}^{(j_{k})} |}^{2}}{{‖ A_{(j_{k})} ‖}_{2}^{2}}}{2 \sum_{j_{k} = 1}^{n} \frac{{| s_{k}^{(j_{k})} |}^{2}}{{‖ A ‖}_{F}^{2}}} + \frac{1}{2} = \frac{\max_{1 \leq j_{k} \leq n} \frac{{| s_{k}^{(j_{k})} |}^{2}}{{‖ A_{(j_{k})} ‖}_{2}^{2}}}{2 \sum_{j_{k} = 1}^{n} \frac{{‖ A_{(j_{k})} ‖}_{2}^{2}}{{‖ A ‖}_{F}^{2}} \frac{{| s_{k}^{(j_{k})} |}^{2}}{{‖ A_{(j_{k})} ‖}_{2}^{2}}} + \frac{1}{2} \\ \geq \frac{\max_{1 \leq j_{k} \leq n} \frac{{| s_{k}^{(j_{k})} |}^{2}}{{‖ A_{(j)} ‖}_{2}^{2}}}{2 \sum_{j_{k} = 1}^{n} \frac{{‖ A_{(j_{k})} ‖}_{2}^{2}}{{‖ A ‖}_{F}^{2}} \max_{1 \leq i \leq n} \frac{{| s_{k}^{(i)} |}^{2}}{{‖ A_{(i)} ‖}_{2}^{2}}} + \frac{1}{2} = \frac{1}{2} + \frac{1}{2} = 1 \end{matrix}$

因此 $δ_{k} \geq \frac{1}{{‖ A ‖}_{F}^{2}}$ ， $k = 0, 1, \dots$ ，

对(6)式进一步估计，可得到

$\begin{matrix} E_{k} {‖ A (x_{k + 1} - x_{*}) ‖}_{2}^{2} \leq (1 - ω (2 - ω) δ_{k} λ_{\min} (A^{T} A)) {‖ A (x_{k} - x_{*}) ‖}_{2}^{2} \\ \leq (1 - ω (2 - ω) \frac{λ_{\min} (A^{T} A)}{{‖ A ‖}_{F}^{2}}) {‖ A (x_{k} - x_{*}) ‖}_{2}^{2} \end{matrix}$

因此(3)式得证。

对上式两边求全期望可得

$E {‖ A (x_{k + 1} - x_{*}) ‖}_{2}^{2} \leq (1 - ω (2 - ω) \frac{λ_{\min} (A^{T} A)}{{‖ A ‖}_{F}^{2}}) E {‖ A (x_{k} - x_{*}) ‖}_{2}^{2}$

又由上式可得

$\begin{matrix} E {‖ A (x_{k} - x_{*}) ‖}_{2}^{2} \leq (1 - ω (2 - ω) \frac{λ_{\min} (A^{T} A)}{{‖ A ‖}_{F}^{2}}) E {‖ A (x_{k - 1} - x_{*}) ‖}_{2}^{2} \\ \leq {(1 - ω (2 - ω) \frac{λ_{\min} (A^{T} A)}{{‖ A ‖}_{F}^{2}})}^{2} E {‖ A (x_{k - 2} - x_{*}) ‖}_{2}^{2} \\ \leq \dots \leq {(1 - ω (2 - ω) \frac{λ_{\min} (A^{T} A)}{{‖ A ‖}_{F}^{2}})}^{k} E {‖ A (x_{0} - x_{*}) ‖}_{2}^{2} \end{matrix}$

即 $E {‖ x_{k} - x_{*} ‖}_{A^{T} A}^{2} \leq {(1 - ω (2 - ω) \frac{λ_{\min} (A^{T} A)}{{‖ A ‖}_{F}^{2}})}^{k} E {‖ x_{0} - x_{*} ‖}_{A^{T} A}^{2}$

最后(4)式得证。

4. 数值实验

在本节中，我们通过数值实验来验证GRCD(ω)方法的有效性，所有的数值结果均是使用软件MATLAB(版本R2022b)完成的，所使用的计算机搭载Windows10操作系统和16 G内存以及3.00 GHz中央处理器(Intel I CoITM) i5-12500 CPU。

用GRCD方法和GRCD(ω)方法求解线性最小二乘系统(1)的算法算例是由Matlab函数randn服从标准正态分布n(0, 1)随机生成的矩阵，以及从文献中选取的5个具有应用背景的稀疏矩阵[20]。分别是Worldcities、divorce、cage5、Cities、abtaha1。

GRCD(ω)和GRCD方法的迭代步骤表示为IT，运行时间以秒为单位表示为CPU。表格中IT和CPU是重复运行50次所需迭代步骤数和CPU时间的中位数，为了直观地展示GRCD(ω)方法的优势，我们将GRCD(ω)方法相对于GRCD方法的加速定义为： $Speed-up = \frac{CPU of GRCD}{CPU of GRCD (ω)}$ ，以此表示GRCD相对于GRCD(ω)在计算时间这一指标上的加速程度，其数值越大表明GRCD(ω)对同一测试矩阵加速得越快。对于选取的稀疏矩阵，表7也给出了测试矩阵的一些基本信息，例如矩阵的条件数cond(A)，稠密性 $density = \frac{m \times n 矩阵 A 中非零元素的个数}{m \times n}$ 。在数值实验中，解向量 $x_{*}$ 是利用MATLAB函数randn(n, 1)随机生成的。当线性系统相容时，令 $b = A x_{*}$ 。当线性系统不相容时取 $b = A x_{*} - r_{0}$ ，其中 $r_{0}$ 是属于 $A^{T}$ 的零空间中的非零向量，null ( $A^{T}$ )是利用MATLAB函数null生成的。在迭代运算中选取的初始向量为 $x_{0} = 0$ ，当相对误差 $RSE = \frac{{‖ x_{k} - x_{*} ‖}_{2}^{2}}{{‖ x_{*} ‖}_{2}^{2}} \leq 10^{- 6}$ 或者迭代步数超过10万步迭代停止。

对于随机生成的矩阵，我们将线性系统相容时GRCD和GRCD(ω)方法的迭代步数和计算时间列于表1~3，线性系统不相容时列于表4~6。从这些表中，线性系统是相容的还是不相容的，GRCD(ω)方法在迭代步数和CPU时间方面几乎是优于GRCD方法。我们观察到，当线性系统相容时，加速最大达到1.28；当线性系统不相容时，加速最大为1.26。

对于Davis和Hu的稀疏全秩矩阵，在表7和表8中分别列出了线性系统相容时和线性系统不相容时GRCD和GRCD(ω)方法的迭代步数和计算时间。数值实验结果表明GRCD(ω)方法在迭代步数和CPU时间方面都明显优于GRCD方法。在表7中，加速至少为2.55，最大达到5.34；在表8中，加速度至少为1.98，最大达到3.91。

Table 1. The matrix A is randomly generated, when n = 50, m is different, the numerical result of the compatible system is solved

表1. 随机生成矩阵A当n = 50，m不同时求解相容系统的数值结果

$m \times n$		1000 × 50	2000 × 50	3000 × 50	4000 × 50	5000 × 50
ω		1.04	1.03	1.01	1.01	1.01
GRCD	IT CPU	130.5 0.002578	114 0.002676	100 0.002805	98 0.003731	103 0.005030
GRCD(ω)	IT CPU	120 0.002013	108 0.002365	99 0.002564	97 0.003694	100 0.004784
Speed-up		1.28	1.13	1.09	1.03	1.05

Table 2. The matrix A is randomly generated, when n = 100, m is different, the numerical result of the compatible system is solved

表2. 随机生成矩阵A当n = 100，m不同时求解相容系统的数值结果

$m \times n$		1000 × 100	2000 × 100	3000 × 100	4000 × 100	5000 × 100
ω		1.09	1.03	1.03	1.01	1.02
GRCD	IT CPU	348.5 0.006807	246 0.006185	239 0.008076	213 0.010942	216 0.015263
GRCD(ω)	IT CPU	277.5 0.005704	236 0.005671	226 0.007313	210 0.010639	210 0.014583
Speed-up		1.20	1.09	1.10	1.03	1.05

Table 3. The matrix A is randomly generated, and when n = 150, m is different, the numerical result of the compatible system is solved

表3. 随机生成矩阵A当n = 150，m不同时求解相容系统的数值结果

$m \times n$		1000 × 150	2000 × 150	3000 × 150	4000 × 150	5000 × 150
ω		1.15	1.07	1.04	1.03	1.02
GRCD	IT CPU	612.5 0.013390	423 0.013721	371 0.018007	365.5 0.029131	338 0.032732
GRCD(ω)	IT CPU	476 0.012004	389 0.012360	354 0.015663	353 0.027459	325 0.031827
Speed-up		1.12	1.11	1.15	1.06	1.03

Table 4. The matrix A is randomly generated, and when n = 50, m is different, the numerical result of the incompatible system is solved

表4. 随机生成矩阵A当n = 50，m不同时求解不相容系统的数值结果

$m \times n$		1000 × 50	2000 × 50	3000 × 50	4000 × 50	5000 × 50
ω		1.03	1.01	1.01	1.01	1.02
GRCD	IT CPU	122 0.001985	106 0.002142	105 0.002708	99 0.003863	100 0.004740
GRCD(ω)	IT CPU	118 0.001877	105 0.002094	103 0.002662	97 0.003615	96 0.004498
Speed-up		1.06	1.02	1.02	1.07	1.05

Table 5. The matrix A is randomly generated, and when n = 100, m is different, the numerical result of the incompatible system is solved

表5. 随机生成矩阵A当n = 100，m不同时求解不相容系统的数值结果

$m \times n$		1000 × 100	2000 × 100	3000 × 100	4000 × 100	5000 × 100
ω		1.05	1.03	1.03	1.01	1.01
GRCD	IT CPU	325 0.005651	268 0.006633	242 0.008443	205 0.010361	210 0.014445
GRCD(ω)	IT CPU	283 0.005175	241 0.006070	226 0.007721	201 0.010205	207 0.013776
Speed-up		1.09	1.09	1.09	1.02	1.05

Table 6. The matrix A is randomly generated, and when n = 150, m is different, the numerical result of the incompatible system is solved

表6. 随机生成矩阵 × 当n = 150，m不同时求解不相容系统的数值结果

$m \times n$		1000 × 150	2000 × 150	3000 × 150	4000 × 150	5000 × 150
ω		1.1	1.08	1.04	1.03	1.02
GRCD	IT CPU	602 0.012388	457 0.013413	386 0.016221	358 0.024975	333 0.322180
GRCD(ω)	IT CPU	475 0.009764	400 0.011613	353 0.015425	340 0.023833	322 0.030974
Speed-up		1.26	1.16	1.05	1.05	1.04

Table 7. The GRCD(ω) and GRCD methods solve the numerical results of the compatible system in the specific example matrix

表7. GRCD(ω)和GRCD两种方法在具体算例矩阵时求解相容系统的数值结果

名称		Worldcities	divorce	cage5	Cities	abtaha1
$m \times n$		315 × 100	50 × 99	37 × 37	55 × 46	14596 × 209
density		23.87%	50.00%	17.02%	50.34%	1.68%
Cond(A)		66.00	19.39	15.42	207.15	12.23
ω		1.8	1.8	1.6	1.8	1.7
GRCD	IT CPU	3834.5 0.067084	594 0.005716	2235 0.022035	60,142 0.669865	10,342 0.554940
GRCD(ω)	IT CPU	1185.5 0.020597	162.5 0.001580	760 0.007437	11,260 0.123660	4063.5 0.554940
Speed-up		3.26	3.66	2.94	5.34	2.55

Table 8. The GRCD(ω) and GRCD methods solve the numerical results of the incompatible system in the specific example matrix

表8. GRCD(ω)和GRCD两种方法在具体算例矩阵时求解不相容系统的数值结果

名称		Worldcities	divorce	Cities	abtaha1
$m \times n$		315 × 100	50 × 99	55 × 46	14,596 × 209
density		23.87%	50.00%	50.34%	1.68%
Cond(A)		66.00	19.39	207.15	12.23
ω		1.8	1.8	1.8	1.7
GRCD	IT CPU	4568 0.077293	601 0.005446	46,131 0.495157	13816.5 0.710776
GRCD(ω)	IT CPU	1267 0.021641	173.5 0.002749	11,973 0.130001	4283 0.228498
Speed-up		3.57	1.98	3.91	3.11

5. 总结

本文在GRCD算法的基础上提出了GRCD(ω)算法，分析了算法的收敛性。数值实验结果表明，在选择适当的参数ω后，GRCD(ω)算法在迭代步数和计算时间方面明显优于GRCD算法。

参考文献

[1]	Duan, L.-X. and Zhang, G.-F. (2021) Variant of Greedy Randomized Gauss-Seidel Method for Ridge Regression. Numerical Mathematics: Theory, Methods and Applications, 14, 714-737. https://doi.org/10.4208/nmtma.oa-2020-0095
[2]	Hefny, A., Needell, D. and Ramdas, A. (2017) Rows versus Columns: Randomized Kaczmarz or Gauss-Seidel for Ridge Regression. SIAM Journal on Scientific Computing, 39, S528-S542. https://doi.org/10.1137/16m1077891
[3]	Liu, Y. and Gu, C. (2019) Variant of Greedy Randomized Kaczmarz for Ridge Regression. Applied Numerical Mathematics, 143, 223-246. https://doi.org/10.1016/j.apnum.2019.04.008
[4]	Byrne, C. (2003) A Unified Treatment of Some Iterative Algorithms in Signal Processing and Image Reconstruction. Inverse Problems, 20, 103-120. https://doi.org/10.1088/0266-5611/20/1/006
[5]	Bouman, C.A. and Sauer, K. (1996) A Unified Approach to Statistical Tomography Using Coordinate Descent Optimization. IEEE Transactions on Image Processing, 5, 480-492. https://doi.org/10.1109/83.491321
[6]	Ye, J.C., Webb, K.J., Bouman, C.A. and Millane, R.P. (1999) Optical Diffusion Tomography by Iterative-Coordinate-Descent Optimization in a Bayesian Framework. Journal of the Optical Society of America A, 16, 2400-2412. https://doi.org/10.1364/josaa.16.002400
[7]	Chang, K.W., Hsieh, C.J. and Lin, C.J. (2008) Coordinate Descent Method for Large-Scale L2-Loss Linear Support Vector Machines. Journal of Machine Learning Research, 9, 1369-1398.
[8]	Breheny, P. and Huang, J. (2011) Coordinate Descent Algorithms for Nonconvex Penalized Regression, with Applications to Biological Feature Selection. The Annals of Applied Statistics, 5, 232-253. https://doi.org/10.1214/10-aoas388
[9]	Canutescu, A.A. and Dunbrack, R.L. (2003) Cyclic Coordinate Descent: A Robotics Algorithm for Protein Loop Closure. Protein Science, 12, 963-972. https://doi.org/10.1110/ps.0242703
[10]	Elad, M., Matalon, B. and Zibulevsky, M. (2007) Coordinate and Subspace Optimization Methods for Linear Least Squares with Non-Quadratic Regularization. Applied and Computational Harmonic Analysis, 23, 346-367. https://doi.org/10.1016/j.acha.2007.02.002
[11]	Scott, J.A. and Tůma, M. (2019) Sparse Stretching for Solving Sparse-Dense Linear Least-Squares Problems. SIAM Journal on Scientific Computing, 41, A1604-A1625. https://doi.org/10.1137/18m1181353
[12]	Scott, J. and Tůma, M. (2022) Solving Large Linear Least Squares Problems with Linear Equality Constraints. BIT Numerical Mathematics, 62, 1765-1787. https://doi.org/10.1007/s10543-022-00930-2
[13]	Ruhe, A. (1983) Numerical Aspects of Gram-Schmidt Orthogonalization of Vectors. Linear Algebra and Its Applications, 52, 591-601. https://doi.org/10.1016/0024-3795(83)80037-8
[14]	Strohmer, T. and Vershynin, R. (2008) A Randomized Kaczmarz Algorithm with Exponential Convergence. Journal of Fourier Analysis and Applications, 15, 262-278. https://doi.org/10.1007/s00041-008-9030-4
[15]	Leventhal, D. and Lewis, A.S. (2010) Randomized Methods for Linear Constraints: Convergence Rates and Conditioning. Mathematics of Operations Research, 35, 641-654. https://doi.org/10.1287/moor.1100.0456
[16]	Gower, R.M. and Richtárik, P. (2015) Randomized Iterative Methods for Linear Systems. SIAM Journal on Matrix Analysis and Applications, 36, 1660-1690. https://doi.org/10.1137/15m1025487
[17]	Liu, Y., Jiang, X. and Gu, C. (2021) On Maximum Residual Block and Two-Step Gauss-Seidel Algorithms for Linear Least-Squares Problems. Calcolo, 58, Article No. 13. https://doi.org/10.1007/s10092-021-00404-x
[18]	Du, K. and Sun, X. (2021) A Doubly Stochastic Block Gauss-Seidel Algorithm for Solving Linear Equations. Applied Mathematics and Computation, 408, Article 126373. https://doi.org/10.1016/j.amc.2021.126373
[19]	Bai, Z. and Wu, W. (2019) On Greedy Randomized Coordinate Descent Methods for Solving Large Linear Least-Squares Problems. Numerical Linear Algebra with Applications, 26, e2237. https://doi.org/10.1002/nla.2237
[20]	Davis, T.A. and Hu, Y. (2011) The University of Florida Sparse Matrix Collection. ACM Transactions on Mathematical Software, 38, Article No. 1. https://doi.org/10.1145/2049662.2049663

为你推荐

友情链接