What about using a neural network as an optimizer, such that the input is the sampled neighborhood of a random starting point, and the output is the next point to evaluate. That neural network could be trained on billions of generated input functions. You could then use that optimizer to optimize the training of the optimizer itself !
Your algorithm description sounds like a variant of greedy local search with a NN heuristic. While I personaly do not know of a paper that does this kind of local search (I have not looked), I do want to add that there is a rich literature on "learning to learning", including "learning to optimize" approaches, e.g., https://arxiv.org/abs/1606.04474.
I'm not sure exactly what you are suggesting, but it seems conceptually similar to Bayesian Optimization, which fits a Bayesian Process to previous evaluations of the objective function, and uses this to estimate the best point at which to evaluate it next.
This is expensive, so is typically used only when the objective function is itself very expensive to compute, such as for tuning hyperparameters.
If you're suggesting training a single neural network to then use as a general optimizer for any problem, you should consider the No Free Lunch Theorem: https://en.wikipedia.org/wiki/No_free_lunch_theorem
The NFL only applies to settings where your task distribution is uniform random over all possible tasks. It is my intuition that this kind of task distribution is almost surely not something we would encounter.
What about using a neural network as an optimizer, such that the input is the sampled neighborhood of a random starting point, and the output is the next point to evaluate. That neural network could be trained on billions of generated input functions. You could then use that optimizer to optimize the training of the optimizer itself !
Your algorithm description sounds like a variant of greedy local search with a NN heuristic. While I personaly do not know of a paper that does this kind of local search (I have not looked), I do want to add that there is a rich literature on "learning to learning", including "learning to optimize" approaches, e.g., https://arxiv.org/abs/1606.04474.
I'm not sure exactly what you are suggesting, but it seems conceptually similar to Bayesian Optimization, which fits a Bayesian Process to previous evaluations of the objective function, and uses this to estimate the best point at which to evaluate it next.
This is expensive, so is typically used only when the objective function is itself very expensive to compute, such as for tuning hyperparameters.
If you're suggesting training a single neural network to then use as a general optimizer for any problem, you should consider the No Free Lunch Theorem: https://en.wikipedia.org/wiki/No_free_lunch_theorem
The NFL only applies to settings where your task distribution is uniform random over all possible tasks. It is my intuition that this kind of task distribution is almost surely not something we would encounter.