Solving Algebraic Equations
Stan provides a built-in mechanism for specifying systems of algebraic equations. These systems can be solved either with the Newton method, as implemented in the Kinsol package (Hindmarsh et al. 2005), or with the Powell hybrid method (Powell 1970). The function signatures for Stan’s algebraic solvers are fully described in the algebraic solver section of the reference manual.
Solving any system of algebraic equations can be translated into a root-finding problem, that is, given a function \(f\), we wish to find \(y\) such that \(f(y) = 0\).
Example: system of nonlinear algebraic equations
For systems of linear algebraic equations, we recommend solving the system using matrix division. The algebraic solver becomes handy when we want to solve nonlinear equations.
As an illustrative example, we consider the following nonlinear system of two equations with two unknowns: \[\begin{align*} z_1 &= y_1 - \theta_1 \\ z_2 &= y_1 y_2 + \theta_2 \end{align*}\]
Our goal is to simultaneously solve all equations for \(y_1\) and \(y_2\), such that the vector \(z\) goes to 0.
Coding an algebraic system
A system of algebraic equations is coded directly in Stan as a function with a strictly specified signature. For example, the nonlinear system given above can be coded using the following function in Stan (see the user-defined functions section for more information on coding user-defined functions).
vector system(vector y, // unknowns
vector theta, // parameters
data array[] real x_r, // data (real)
array[] int x_i) { // data (integer)
vector[2] z;
1] = y[1] - theta[1];
z[2] = y[1] * y[2] - theta[2];
z[return z;
}
The function takes the unknowns we wish to solve for in y
(a vector), the system parameters in theta
(a vector), the real data in x_r
(a real array) and the integer data in x_i
(an integer array). The system function returns the value of the function (a vector), for which we want to compute the roots. Our example does not use real or integer data. Nevertheless, these unused arguments must be included in the system function with exactly the signature above.
The body of the system function here could also be coded using a row vector constructor and transposition,
return [ y[1] - theta[1],
1] * y[2] - theta[2] ]'; y[
As systems get more complicated, naming the intermediate expressions goes a long way toward readability.
Strict signature
The function defining the system must have exactly these argument types and return type. This may require passing in zero-length arrays for data or a zero-length vector for parameters if the system does not involve data or parameters.
Calling the algebraic solver
Let’s suppose \(\theta = (3, 6)\). To call the algebraic solver, we need to provide an initial guess. This varies on a case-by-case basis, but in general a good guess will speed up the solver and, in pathological cases, even determine whether the solver converges or not. If the solver does not converge, the Metropolis proposal gets rejected and a warning message, stating no acceptable solution was found, is issued.
The solver has three tuning parameters to determine convergence: the relative tolerance, the function tolerance, and the maximum number of steps. Their behavior is explained in the section about algebraic solvers with control parameters.
The following code returns the solution to our nonlinear algebraic system:
transformed data {
vector[2] y_guess = [1, 1]';
array[0] real x_r;
array[0] int x_i;
}
transformed parameters {
vector[2] theta = [3, 6]';
vector[2] y;
y = solve_newton(system, y_guess, theta, x_r, x_i); }
which returns \(y = (3, -2)\).
Data versus parameters
The arguments for the real data x_r
and the integer data x_i
must be expressions that only involve data or transformed data variables. theta
, on the other hand, must only involve parameters. Note there are no restrictions on the initial guess, y_guess
, which may be a data or a parameter vector.
Length of the algebraic function and of the vector of unknowns
The Jacobian of the solution with respect to the parameters is computed using the implicit function theorem, which imposes certain restrictions. In particular, the Jacobian of the algebraic function \(f\) with respect to the unknowns \(x\) must be invertible. This requires the Jacobian to be square, meaning \(f(y)\) and \(y\) have the same length or, in other words the number of equations in the system is the same as the number of unknowns.
Pathological solutions
Certain systems may be degenerate, meaning they have multiple solutions. The algebraic solver will not report these cases, as the algorithm stops once it has found an acceptable solution. The initial guess will often determine which solution gets found first. The degeneracy may be broken by putting additional constraints on the solution. For instance, it might make “physical sense” for a solution to be positive or negative.
On the other hand, a system may not have a solution (for a given point in the parameter space). In that case, the solver will not converge to a solution. When the solver fails to do so, the current Metropolis proposal gets rejected.
Control parameters for the algebraic solver
The call to the algebraic solver shown previously uses the default control settings. The _tol
variant of the solver function allows three additional parameters, all of which must be supplied.
y = solve_newton_tol(system, y_guess, theta, x_r, x_i, scaling_step, f_tol, max_steps);
For the Newton solver the three control arguments are scaling step, function tolerance, and maximum number of steps. For the Powell’s hybrid method the three control arguments are relative tolerance, function tolerance, and maximum number of steps. If a Newton step is smaller than the scaling step tolerance, the code breaks, assuming the solver is no longer making significant progress. If set to 0, this constraint is ignored. For Powell’s hybrid method the relative tolerance is the estimated relative error of the solver and serves to test if a satisfactory solution has been found. After convergence of the either solver, the proposed solution is plugged into the algebraic system and its norm is compared to the function tolerance. If the norm is below the function tolerance, the solution is deemed acceptable. If the solver solver reaches the maximum number of steps, it stops and returns an error message. If one of the criteria is not met, the Metropolis proposal gets rejected with a warning message explaining which criterion was not satisfied.
The default values for the control arguments are respectively scaling_step = 1e-3
(\(10^{-3}\)), rel_tol = 1e-10
(\(10^{-10}\)), f_tol = 1e-6
(\(10^{-6}\)), and max_steps = 200
(\(200\)).
Tolerance
The relative and function tolerances control the accuracy of the solution generated by the solver. Relative tolerances are relative to the solution value. The function tolerance is the norm of the algebraic function, once we plug in the proposed solution. This norm should go to 0 (equivalently, all elements of the vector function are 0). It helps to think about this geometrically. Ideally the output of the algebraic function is at the origin; the norm measures deviations from this ideal. As the length of the return vector increases, a certain function tolerance becomes an increasingly difficult criterion to meet, given each individual element of the vector contribute to the norm.
Smaller relative tolerances produce more accurate solutions but require more computational time.
Sensitivity analysis
The tolerances should be set low enough that setting them lower does not change the statistical properties of posterior samples generated by the Stan program. The sensitivity can be analysed using importance sampling without need to re-run MCMC with different tolerances as shown by Timonen et al. (2023).
Maximum number of steps
The maximum number of steps can be used to stop a runaway simulation. This can arise in MCMC when a bad jump is taken, particularly during warmup. If the limit is hit, the current Metropolis proposal gets rejected. Users will see a warning message stating the maximum number of steps has been exceeded.