## Theory

Osprey is designed to optimize the hyperparameters of machine learning models
by maximizing a cross-validation score. As an optimization problem, the key
factors here are

- very expensive objective function evaluations (minutes to hours, or more)
- no gradient information is available
- tension between exploration of parameter space and local optimization (explore / exploit dilemma)

A good, if somewhat dated overview of this problem setting can be found in
Jones, Schonlau, Welch (1998) . The key idea is that we can procede by
fitting a **surrogate function** or **response surface**. This surrogate function needs to provide both our best guess of the function as well as our
degree of belief – our uncertainty in the parts of parameter space that we
haven’t yet explored. Does the maxima lie over there? Then at each iteration,
a new point can be selected by maximizing the **expected improvement** over
our current best solution, by maximize the expected **entropy reduction in the distribution of maxima**, or a similar so-called acquisition function.

`osprey` supports multiple *search strategies* for choosing
the next set of hyperparameters to evaluate your model at. The most
theoretically elegant of the supported methods, Gaussian process expected improvement using the MOE backend, attacks this problem directly by modeling
the objective function as a draw from a Gaussian process.