Alternating minimization, is a simple and easy to implement method to compute minima of a function of 2 more variables. Although this looks like a heuristic, the convergence can be proved if the function you are trying to optimize follows the 5-point-property. In this post I am using the alternating minimization approach to finding numerical minima to a simple problem. The purpose is to demo Alternating Minimization (AM) approach.
The core idea behind AM is that you start with an initial guess for just one of the optimization variables, say the first optimization variable. You assume that to be fixed and compute the values of other optimization variable that result in a minima. In the second step, you fix the 2nd optimization variable equal to the value obtained from the previous step. You cycle alternately between optimization the first and 2nd variable until convergence. That’s pretty much it…!
Example
Lets explore this my trying to find a minima (numerical) of . We take this simple example, because it is also easy to find the minima in a closed form. See the following for how one could do it.

Now, lets explore I to find a numerical minima to this using the alternating minimization approach. Ofcourse, if you could compute the full gradient of the cost function, iterative methods like Gauss-Newton, Gradient decent etc can work. The AM method is a simpler way to achieve this.
Next I work out, how AM method can be used to numerically solve this problem,


Pseudo Code
In general say you are interested in solving , the pseudo code to doing this will look something like:
a. start with an initial guess of say X as X_0
b. loop for i = 1 to n
1. Y_i = minimize_{Y} f(X_{i-1}, Y)
2. X_i = minimize_{X} f( X, Y_i )
3. If converged, break from the loop
c. return X_n, Y_n as the solution to the original optimization problem.
Note that could also be a bunch of optimization variables. Note also that this won’t always converge to the minima, however it does only under certain conditions.
Why Alternating Minimization (AM) converges to the Optimal Values
At the surface this seem like a heuristic approach, however it has concrete proof as to under what conditions for the cost function, this will converge. I myself am not 100% clear on the proof. The main questions to ask in this context is a) Does this algorithm converge ? b) If it converges than does the convergence point is same as the minima of the original function.
It was derived by Csiszar that, there is a 5-point-property which your cost function should follow for AM to return the optima (proved in Csiszar’s paper). There is also a 3-point property and a 4-point property, see Charles Byrne’s paper. 3- and 4- point property presents a simpler way to prove convergence of the AM method to a minima. Proving the 3- and 4- point properties is equivalent to proving the 5-point property.
I WILL UPDATE THIS SECTION AS I UNDERSTAND MORE IN THIS REGARD.
AS NOTED BY OTHERS, THIS IS SIMILAR TO COORDINATE DESCENT. EXPLORE MORE ON THAT.
Reference
- Authoritative Reference: Csiszar and Tusnady “Information Geometry and Alternating Minimization” by Statistics and Decision 1983. PDF available here.
- Charles L. Byrne “Alternating Minimization and Alternating Projection Algorithms: A Tutorial”, 2011
- http://curtis.ml.cmu.edu/w/courses/index.php/Alternating_Minimization
- Amarnag Subramanya & Jeff Bilmes, “Soft-supervised learning for text classification”, EMNLP2008. A more recent reference which uses AM.
- http://blog.geomblog.org/2008/01/important-heuristics-alternating.html