|
Limited-memory BFGS (L-BFGS or LM-BFGS) is an optimization algorithm in the family of quasi-Newton methods that approximates the Broyden–Fletcher–Goldfarb–Shanno (BFGS) algorithm using a limited amount of computer memory. It is a popular algorithm for parameter estimation in machine learning.〔 Like the original BFGS, L-BFGS uses an estimation to the inverse Hessian matrix to steer its search through variable space, but where BFGS stores a dense ''n''×''n'' approximation to the inverse Hessian (''n'' being the number of variables in the problem), L-BFGS stores only a few vectors that represent the approximation implicitly. Due to its resulting linear memory requirement, the L-BFGS method is particularly well suited for optimization problems with a large number of variables. Instead of the inverse Hessian H''k'', L-BFGS maintains a history of the past ''m'' updates of the position x and gradient ∇''f''(x), where generally the history size ''m'' can be small (often ''m''<10). These updates are used to implicitly do operations requiring the H''k''-vector product. ==Algorithm== L-BFGS shares many features with other quasi-Newton algorithms, but is very different in how the matrix-vector multiplication for finding the search direction is carried out . There are multiple published approaches using a history of updates to form this direction vector. Here, we give a common approach, the so-called "two loop recursion." We'll take as given , the position at the -th iteration, and where is the function being minimized, and all vectors are column vectors. We also assume that we have stored the last updates of the form and . We'll define , and will be the 'initial' approximate of the inverse Hessian that our estimate at iteration begins with. Then we can compute the (uphill) direction as follows: For For Stop with This formulation is valid whether we are minimizing or maximizing. Note that if we are minimizing, the search direction would be the negative of z (since z is "uphill"), and if we are maximizing, should be negative definite rather than positive definite. We would typically do a backtracking line search in the search direction (any line search would be valid, but L-BFGS does not require exact line searches in order to converge). Commonly, the inverse Hessian is represented as a diagonal matrix, so that initially setting requires only an element-by-element multiplication. This two loop update only works for the inverse Hessian. Approaches to implementing L-BFGS using the direct approximate Hessian have also been developed, as have other means of approximating the inverse Hessian. 抄文引用元・出典: フリー百科事典『 ウィキペディア(Wikipedia)』 ■ウィキペディアで「Limited-memory BFGS」の詳細全文を読む スポンサード リンク
|