Tutorial
speedmapping(x0; kwargs...)
solves three types of problems:
- Accelerating convergent mapping iterations
- Solving non-linear systems of equations
- Minimizing a function, possibly with box constraints
using two algorithms:
- Alternating cyclic extrapolations (ACX) Lepage-Saucier, 2024
- Anderson Acceleration (AA) Anderson, 1964
This tutorial will display its main functionality on simple problems. To see which specification may be more performant for your problem, the Benchmarks section compares all of them, along with other Julia packages with similar functionalities.
Accelerating convergent mapping iterations
Let's find the dominant eigenvalue of a matrix $A$ using the accelerated Power iteration.
using LinearAlgebra
n = 10;
A = ones(n,n) .+ Diagonal(1:n);
# An in-place mapping function to avoid allocations
function power_iteration!(xout, xin, A)
mul!(xout, A, xin)
maxabs = 0.
for xi in xout
abs(xi) > maxabs && (maxabs = abs(xi))
end
xout ./= maxabs
end;
x0 = ones(n);
Speedmapping has one mandatory argument: the starting point x0
. The mapping is specified with the keyword argument m!
.
using SpeedMapping
res = speedmapping(x0; m! = (xout, xin) -> power_iteration!(xout, xin, A));
display(res)
• minimizer: 10-element Vector{Float64}:
0.4121491397959738
0.4409506028458422
0.47407986435268584
0.5125916155854076
0.5579135744375137
0.6120273732149385
0.6777660401920018
0.7593262778315447
0.8632012008768285
1.0
Result of SpeedMapping, acx algo
• maps: 16 • iterations: 6 • residual_norm: = 3.664942616680641e-9
• status: first_order
The dominant eigenvalue is:
v = res.minimizer; # The dominant eigenvector
dominant_eigenvalue = v'A*v/v'v;
eigen(A).values[10] ≈ dominant_eigenvalue
true
With m!
, the default algorithm is algo = :acx
. To switch, set algo = :aa
.
res_aa = speedmapping(x0; m! = (xout, xin) -> power_iteration!(xout, xin, A), algo = :aa);
display(res_aa)
• minimizer: 10-element Vector{Float64}:
0.412149137089775
0.44095060327353647
0.47407986082156256
0.5125916088599349
0.5579135698434506
0.6120273645599557
0.6777660340915559
0.7593262700274771
0.8632011949395236
1.0
Result of SpeedMapping, aa algo
• maps: 12 • iterations: 11 • residual_norm: = 7.238700125150564e-9
• status: first_order
By default, AA uses adaptive relaxation, which can reduce the number of iterations. It is specified by the keyword argument ada_relax = :minimum_distance
. For constant relaxation, set ada_relax = :none
.
res = speedmapping(x0; m! = (xout, xin) -> power_iteration!(xout, xin, A), algo = :aa, ada_relax = :none);
Another recent development for AA is Composite AA by Chen and Vuik, 2022. A one-step AA iteration (using 2 maps) is inserted between 2 full AA steps, which reduces the computation and can speed-up some applications. The default is composite = :none
. Two versions are available:
res = speedmapping(x0; m! = (xout, xin) -> power_iteration!(xout, xin, A), algo = :aa, composite = :aa1);
res = speedmapping(x0; m! = (xout, xin) -> power_iteration!(xout, xin, A), algo = :aa, composite = :acx2);
Some mapping iterations maximize or minimize a certain objective function. Since some AA steps can deteriorate the objective, it would be best to avoid them by falling back to the last map. This can be done by supplying an objective function (assumed to be a minimization problem) using f
as keyword argument. Here is an illustrative EM-algorithm example from Hasselblad (1969) estimating a mixture of exponential distributions.
function neg_log_likelihood(x)
freq = (162, 267, 271, 185, 111, 61, 27, 8, 3, 1)
p, μ1, μ2 = x
yfact = μ1expy = μ2expy = 1
log_lik = 0
for y in eachindex(freq)
log_lik += freq[y] * log((p * exp(-μ1) * μ1expy + (1 - p) * exp(-μ2) * μ2expy) / yfact)
yfact *= y
μ1expy *= μ1
μ2expy *= μ2
end
return -log_lik # Negative log likelihood to get a minimization problem
end
function EM_map!(xout, xin)
freq = (162, 267, 271, 185, 111, 61, 27, 8, 3, 1)
p, μ1, μ2 = xin
sum_freq_z1 = sum_freq_z2 = sum_freq_y_z1 = sum_freq_y_z2 = 0
μ1expy = μ2expy = 1
for i in eachindex(freq)
z = p * exp(-μ1) * μ1expy / (p * exp(-μ1) * μ1expy + (1 - p) * exp(-μ2) * μ2expy)
sum_freq_z1 += freq[i] * z
sum_freq_z2 += freq[i] * (1 - z)
sum_freq_y_z1 += (i - 1) * freq[i] * z
sum_freq_y_z2 += (i - 1) * freq[i] * (1 - z)
μ1expy *= μ1
μ2expy *= μ2
end
xout .= (sum_freq_z1 / sum(freq), sum_freq_y_z1 / sum_freq_z1, sum_freq_y_z2 / sum_freq_z2)
end
res_with_objective = speedmapping([0.25, 1., 2.]; f = neg_log_likelihood, m! = EM_map!, algo = :aa);
display(res_with_objective)
• minimizer: 3-element Vector{Float64}:
0.35988539821029
1.2560951029223146
2.6634043582726785
Result of SpeedMapping, aa algo
• maps: 18 • f_calls: 24 • iterations: 17 • residual_norm: = 1.2074325435679254e-10
• status: first_order
Simple constraints on parameters
In the previous example, the parameters being estimated, $p, μ₁, μ₂$, have restricted domains; $1 < p < 1$ is a probability (where the degenerate values $p = 0$ and $p = 1$ are hopefully avoided), and $μ₁, μ₂ > 0$ are the inverse scales of exponential distributions. To avoid the risk that an iterate falls outside of its domain, bounds can be supplied using the keyword arguments lower
and upper
:
res_with_objective = speedmapping([0.25, 1., 2.]; f = neg_log_likelihood, m! = EM_map!, algo = :aa,
lower = [0, 0, 0], upper = [1., Inf, Inf], buffer = 0.05);
Here, the keyword argument buffer
(= 0.05 by default for mapping applications) ensures that a parameter xᵢ will be at least at a distance buffer $× |$xᵢprev $-$ boundᵢ$|$ of boundᵢ, where xᵢprev is xᵢ's previous value. This safeguard avoids jumping to boundᵢ instantly (unless buffer $= 0$), in case boundᵢ is infeasible or is a saddle point. For instance, if xᵢprev $= 0.2$, lowerᵢ $= 0$, but AA's unconstrained next iterate would be xᵢtry $= -0.1$, then xᵢ is set to max$($xᵢtry, buffer $×$ xᵢprev $+ (1 -$ buffer$) ×$ lowerᵢ$) =$ max$(-0.1, 0.05 × 0.2 + 0.95 × 0) = 0.01$.
Avoiding memory allocation
For similar problems solved many times, it is possible to preallocate working memory and feed it using the cache
keyword argument. Each algorithm has its own cache:
acx_cache = AcxCache(x0);
aa_cache = AaCache(x0);
Note that x0
must still be supplied to speedmapping.
For small-sized problems with ACX, heap allocation can be avoided by supplying a static array as starting point and using the keyword argument m
for the mapping function:
using StaticArrays
function power_iteration(xin, A)
xout = A * xin
maxabs = 0.
for xi in xout
abs(xi) > maxabs && (maxabs = abs(xi))
end
return xout / maxabs;
end
As = @SMatrix ones(n,n);
As += Diagonal(1:n);
x0s = @SVector ones(n);
res_static = speedmapping(x0s; m = x -> power_iteration(x, As));
Comparing speed gains
using BenchmarkTools, Unitful
bench_eigen = @benchmark eigen($A);
bench_alloc = @benchmark speedmapping($x0; m! = (xout, xin) -> power_iteration!(xout, xin, $A));
bench_prealloc = @benchmark speedmapping($x0; m! = (xout, xin) -> power_iteration!(xout, xin, $A), cache = $acx_cache);
bench_nonalloc = @benchmark speedmapping($x0s; m = x -> power_iteration(x, $As));
times = Int.(round.(median.([bench_eigen.times, bench_alloc.times, bench_prealloc.times, bench_nonalloc.times])))/1000 .* u"μs";
return hcat(["eigen", "Allocating", "Pre-allocated", "Non allocating"],times)
4×2 Matrix{Any}:
"eigen" 14.647 μs
"Allocating" 2.648 μs
"Pre-allocated" 2.268 μs
"Non allocating" 1.605 μs
Working with immutable types
Along with StaticArray
, m
accepts other immutable types like Real' and
Tuple`.
speedmapping(0.5; m = cos);
speedmapping((0.5, 0.5); m = x -> (cos(x[1]), sin(x[2])));
Solving non-linear systems of equations
For non-linear systems of equations (finding $x^*$ such that $G(x^*) = 0$), only AA with constant relaxation should be used (and is set by default). The keyword argument to supply $G$ is r!
.
function r!(resid, x)
resid[1] = x[1]^2;
resid[2] = (x[2] + x[1])^3;
end
res_nl = speedmapping([1.,2.]; r! = r!);
display(res_nl)
• minimizer: 2-element Vector{Float64}:
-1.1940587873918054e-5
0.0007185277937482261
Result of SpeedMapping, aa algo
• maps: 39 • iterations: 28 • residual_norm: = 3.804974396652084e-10
• status: first_order
Minimizing a function
To minimize a function (using ACX), the function and its in-place gradient are supplied with the keyword arguments f
and g!
. The Hessian cannot be supplied. Compared to other quasi-Newton algorithms like L-BFGS, ACX iterations are very fast, but the algorithm may struggle for ill-conditioned problems.
f_Rosenbrock(x) = 100 * (x[1]^2 - x[2])^2 + (x[1] - 1.)^2;
function g_Rosenbrock!(grad, x) # Rosenbrock gradient
grad[1] = 400 * (x[1]^2 - x[2]) * x[1] + 2 * (x[1] - 1);
grad[2] = -200 * (x[1]^2 - x[2]);
end
res_optim = speedmapping([-1.2, 1.]; f = f_Rosenbrock, g! = g_Rosenbrock!);
display(res_optim)
• minimizer: 2-element Vector{Float64}:
0.9999999999999285
0.9999999999998567
Result of SpeedMapping, acx algo
• maps: 88 • f_calls: 5 • iterations: 33 • residual_norm: = 6.732603331104633e-14
• status: first_order
The function objective is only used to compute a safer initial learning rate. It can be omitted.
speedmapping([-1.2, 1.]; g! = g_Rosenbrock!);
If only the objective is supplied, the gradient is computed using using ForwardDiff.
speedmapping([-1.2, 1.]; f = f_Rosenbrock);
The keyword argument g can be used with static arrays or tuple to supply a non-allocating gradient.
using StaticArrays
g_Rosenbrock(x :: StaticArray) = SA[400 * (x[1]^2 - x[2]) * x[1] + 2 * (x[1] - 1), -200 * (x[1]^2 - x[2])];
speedmapping(SA[-1.2, 1.]; g = g_Rosenbrock);
g_Rosenbrock(x :: Tuple) = (400 * (x[1]^2 - x[2]) * x[1] + 2 * (x[1] - 1), -200 * (x[1]^2 - x[2]));
speedmapping((-1.2, 1.); g = g_Rosenbrock);
Scalar functions can also be supplied. E.g. $f(x) = e^x + x^2$
res_scalar = speedmapping(0.; f = x -> exp(x) + x^2, g = x -> exp(x) + 2x);
display(res_scalar)
• minimizer: -0.3517337140823905
Result of SpeedMapping, acx algo
• maps: 8 • f_calls: 5 • iterations: 3 • residual_norm: = 7.659449519081818e-9
• status: first_order
Adding box constraints
An advantage of ACX is that constraints on parameters have little impact on estimation speed. They are added via the keyword arguments lower
and upper
(= nothing
by default). The starting point does not need to be in the feasible domain.
speedmapping([-1.2, 1.]; f = f_Rosenbrock, g! = g_Rosenbrock!, lower = [2., -Inf]);
speedmapping(0.; g = x -> exp(x) + 2x, upper = -1);