Tutorial

speedmapping(x0; kwargs...) solves three types of problems:

  1. Accelerating convergent mapping iterations
  2. Solving non-linear systems of equations
  3. Minimizing a function, possibly with box constraints

using two algorithms:

This tutorial will display its main functionality on simple problems. To see which specification may be more performant for your problem, the Benchmarks section compares all of them, along with other Julia packages with similar functionalities.

Accelerating convergent mapping iterations

Let's find the dominant eigenvalue of a matrix $A$ using the accelerated Power iteration.

using LinearAlgebra

n = 10;
A = ones(n,n) .+ Diagonal(1:n);

# An in-place mapping function to avoid allocations
function power_iteration!(xout, xin, A)
    mul!(xout, A, xin)
    maxabs = 0.
    for xi in xout
        abs(xi) > maxabs && (maxabs = abs(xi))
    end
    xout ./= maxabs
end;
x0 = ones(n);

Speedmapping has one mandatory argument: the starting point x0. The mapping is specified with the keyword argument m!.

using SpeedMapping
res = speedmapping(x0; m! = (xout, xin) -> power_iteration!(xout, xin, A));
display(res)
• minimizer: 10-element Vector{Float64}:
 0.4121491397959738
 0.4409506028458422
 0.47407986435268584
 0.5125916155854076
 0.5579135744375137
 0.6120273732149385
 0.6777660401920018
 0.7593262778315447
 0.8632012008768285
 1.0

Result of SpeedMapping, acx algo
• maps: 16  • iterations: 6  • residual_norm: = 3.664942616680641e-9
• status: first_order

The dominant eigenvalue is:

v = res.minimizer; # The dominant eigenvector
dominant_eigenvalue = v'A*v/v'v;
eigen(A).values[10] ≈ dominant_eigenvalue
true

With m!, the default algorithm is algo = :acx. To switch, set algo = :aa.

res_aa = speedmapping(x0; m! = (xout, xin) -> power_iteration!(xout, xin, A), algo = :aa);
display(res_aa)
• minimizer: 10-element Vector{Float64}:
 0.412149137089775
 0.44095060327353647
 0.47407986082156256
 0.5125916088599349
 0.5579135698434506
 0.6120273645599557
 0.6777660340915559
 0.7593262700274771
 0.8632011949395236
 1.0

Result of SpeedMapping, aa algo
• maps: 12  • iterations: 11  • residual_norm: = 7.238700125150564e-9
• status: first_order

By default, AA uses adaptive relaxation, which can reduce the number of iterations. It is specified by the keyword argument ada_relax = :minimum_distance. For constant relaxation, set ada_relax = :none.

res = speedmapping(x0; m! = (xout, xin) -> power_iteration!(xout, xin, A), algo = :aa, ada_relax = :none);

Another recent development for AA is Composite AA by Chen and Vuik, 2022. A one-step AA iteration (using 2 maps) is inserted between 2 full AA steps, which reduces the computation and can speed-up some applications. The default is composite = :none. Two versions are available:

res = speedmapping(x0; m! = (xout, xin) -> power_iteration!(xout, xin, A), algo = :aa, composite = :aa1);
res = speedmapping(x0; m! = (xout, xin) -> power_iteration!(xout, xin, A), algo = :aa, composite = :acx2);

Some mapping iterations maximize or minimize a certain objective function. Since some AA steps can deteriorate the objective, it would be best to avoid them by falling back to the last map. This can be done by supplying an objective function (assumed to be a minimization problem) using f as keyword argument. Here is an illustrative EM-algorithm example from Hasselblad (1969) estimating a mixture of exponential distributions.

function neg_log_likelihood(x)
    freq = (162, 267, 271, 185, 111, 61, 27, 8, 3, 1)
    p, μ1, μ2 = x
    yfact = μ1expy = μ2expy = 1
    log_lik = 0
    for y in eachindex(freq)
        log_lik += freq[y] * log((p * exp(-μ1) * μ1expy + (1 - p) * exp(-μ2) * μ2expy) / yfact)
        yfact *= y
        μ1expy *= μ1
        μ2expy *= μ2
    end
    return -log_lik # Negative log likelihood to get a minimization problem
end

function EM_map!(xout, xin)
    freq = (162, 267, 271, 185, 111, 61, 27, 8, 3, 1)
    p, μ1, μ2 = xin
    sum_freq_z1 = sum_freq_z2 = sum_freq_y_z1 = sum_freq_y_z2 = 0
    μ1expy = μ2expy = 1
    for i in eachindex(freq)
        z = p * exp(-μ1) * μ1expy / (p * exp(-μ1) * μ1expy + (1 - p) * exp(-μ2) * μ2expy)
        sum_freq_z1   += freq[i] * z
        sum_freq_z2   += freq[i] * (1 - z)
        sum_freq_y_z1 += (i - 1) * freq[i] * z
        sum_freq_y_z2 += (i - 1) * freq[i] * (1 - z)
        μ1expy *= μ1
        μ2expy *= μ2
    end
    xout .= (sum_freq_z1 / sum(freq), sum_freq_y_z1 / sum_freq_z1, sum_freq_y_z2 / sum_freq_z2)
end

res_with_objective = speedmapping([0.25, 1., 2.]; f = neg_log_likelihood, m! = EM_map!, algo = :aa);
display(res_with_objective)
• minimizer: 3-element Vector{Float64}:
 0.35988539821029
 1.2560951029223146
 2.6634043582726785

Result of SpeedMapping, aa algo
• maps: 18  • f_calls: 24  • iterations: 17  • residual_norm: = 1.2074325435679254e-10
• status: first_order

Simple constraints on parameters

In the previous example, the parameters being estimated, $p, μ₁, μ₂$, have restricted domains; $1 < p < 1$ is a probability (where the degenerate values $p = 0$ and $p = 1$ are hopefully avoided), and $μ₁, μ₂ > 0$ are the inverse scales of exponential distributions. To avoid the risk that an iterate falls outside of its domain, bounds can be supplied using the keyword arguments lower and upper:

res_with_objective = speedmapping([0.25, 1., 2.]; f = neg_log_likelihood, m! = EM_map!, algo = :aa,
    lower = [0, 0, 0], upper = [1., Inf, Inf], buffer = 0.05);

Here, the keyword argument buffer (= 0.05 by default for mapping applications) ensures that a parameter xᵢ will be at least at a distance buffer $× |$xᵢprev $-$ boundᵢ$|$ of boundᵢ, where xᵢprev is xᵢ's previous value. This safeguard avoids jumping to boundᵢ instantly (unless buffer $= 0$), in case boundᵢ is infeasible or is a saddle point. For instance, if xᵢprev $= 0.2$, lowerᵢ $= 0$, but AA's unconstrained next iterate would be xᵢtry $= -0.1$, then xᵢ is set to max$($xᵢtry, buffer $×$ xᵢprev $+ (1 -$ buffer$) ×$ lowerᵢ$) =$ max$(-0.1, 0.05 × 0.2 + 0.95 × 0) = 0.01$.

Avoiding memory allocation

For similar problems solved many times, it is possible to preallocate working memory and feed it using the cache keyword argument. Each algorithm has its own cache:

acx_cache = AcxCache(x0);
aa_cache = AaCache(x0);

Note that x0 must still be supplied to speedmapping.

For small-sized problems with ACX, heap allocation can be avoided by supplying a static array as starting point and using the keyword argument m for the mapping function:

using StaticArrays

function power_iteration(xin, A)
    xout = A * xin
    maxabs = 0.
    for xi in xout
        abs(xi) > maxabs && (maxabs = abs(xi))
    end
    return xout / maxabs;
end

As = @SMatrix ones(n,n);
As += Diagonal(1:n);
x0s = @SVector ones(n);

res_static = speedmapping(x0s; m = x -> power_iteration(x, As));

Comparing speed gains

using BenchmarkTools, Unitful

bench_eigen = @benchmark eigen($A);
bench_alloc = @benchmark speedmapping($x0; m! = (xout, xin) -> power_iteration!(xout, xin, $A));
bench_prealloc = @benchmark speedmapping($x0; m! = (xout, xin) -> power_iteration!(xout, xin, $A), cache = $acx_cache);
bench_nonalloc = @benchmark speedmapping($x0s; m = x -> power_iteration(x, $As));
times = Int.(round.(median.([bench_eigen.times, bench_alloc.times, bench_prealloc.times, bench_nonalloc.times])))/1000 .* u"μs";
return hcat(["eigen", "Allocating", "Pre-allocated", "Non allocating"],times)
4×2 Matrix{Any}:
 "eigen"           14.647 μs
 "Allocating"       2.648 μs
 "Pre-allocated"    2.268 μs
 "Non allocating"   1.605 μs

Working with immutable types

Along with StaticArray, m accepts other immutable types like Real' andTuple`.

speedmapping(0.5; m = cos);
speedmapping((0.5, 0.5); m = x -> (cos(x[1]), sin(x[2])));

Solving non-linear systems of equations

For non-linear systems of equations (finding $x^*$ such that $G(x^*) = 0$), only AA with constant relaxation should be used (and is set by default). The keyword argument to supply $G$ is r!.

function r!(resid, x)
	resid[1] = x[1]^2;
	resid[2] = (x[2] + x[1])^3;
end

res_nl = speedmapping([1.,2.]; r! = r!);
display(res_nl)
• minimizer: 2-element Vector{Float64}:
 -1.1940587873918054e-5
  0.0007185277937482261

Result of SpeedMapping, aa algo
• maps: 39  • iterations: 28  • residual_norm: = 3.804974396652084e-10
• status: first_order

Minimizing a function

To minimize a function (using ACX), the function and its in-place gradient are supplied with the keyword arguments f and g!. The Hessian cannot be supplied. Compared to other quasi-Newton algorithms like L-BFGS, ACX iterations are very fast, but the algorithm may struggle for ill-conditioned problems.

f_Rosenbrock(x) = 100 * (x[1]^2 - x[2])^2 + (x[1] - 1.)^2;

function g_Rosenbrock!(grad, x) # Rosenbrock gradient
	grad[1] = 400 * (x[1]^2 - x[2]) * x[1] + 2 * (x[1] - 1);
	grad[2] = -200 * (x[1]^2 - x[2]);
end

res_optim = speedmapping([-1.2, 1.]; f = f_Rosenbrock, g! = g_Rosenbrock!);
display(res_optim)
• minimizer: 2-element Vector{Float64}:
 0.9999999999999285
 0.9999999999998567

Result of SpeedMapping, acx algo
• maps: 88  • f_calls: 5  • iterations: 33  • residual_norm: = 6.732603331104633e-14
• status: first_order

The function objective is only used to compute a safer initial learning rate. It can be omitted.

speedmapping([-1.2, 1.]; g! = g_Rosenbrock!);

If only the objective is supplied, the gradient is computed using using ForwardDiff.

speedmapping([-1.2, 1.]; f = f_Rosenbrock);

The keyword argument g can be used with static arrays or tuple to supply a non-allocating gradient.

using StaticArrays
g_Rosenbrock(x :: StaticArray) = SA[400 * (x[1]^2 - x[2]) * x[1] + 2 * (x[1] - 1), -200 * (x[1]^2 - x[2])];
speedmapping(SA[-1.2, 1.]; g = g_Rosenbrock);

g_Rosenbrock(x :: Tuple) = (400 * (x[1]^2 - x[2]) * x[1] + 2 * (x[1] - 1), -200 * (x[1]^2 - x[2]));
speedmapping((-1.2, 1.); g = g_Rosenbrock);

Scalar functions can also be supplied. E.g. $f(x) = e^x + x^2$

res_scalar = speedmapping(0.; f = x -> exp(x) + x^2, g = x -> exp(x) + 2x);
display(res_scalar)
• minimizer: -0.3517337140823905

Result of SpeedMapping, acx algo
• maps: 8  • f_calls: 5  • iterations: 3  • residual_norm: = 7.659449519081818e-9
• status: first_order

Adding box constraints

An advantage of ACX is that constraints on parameters have little impact on estimation speed. They are added via the keyword arguments lower and upper (= nothing by default). The starting point does not need to be in the feasible domain.

speedmapping([-1.2, 1.]; f = f_Rosenbrock, g! = g_Rosenbrock!, lower = [2., -Inf]);
speedmapping(0.; g = x -> exp(x) + 2x, upper = -1);