A possibility to optimize a black box system using algebraic surrogate models that are identified using a symbolic regression approach.

Tim Forster

Towards Data Science

Optimization with Surrogate Models via Symbolic Regression | by Tim Forster | Jan, 2024 - image  on https://aiquantumintelligence.com
Photo by Jeremy Bishop on Unsplash

Performing an optimization is a very interesting task. In our daily life, we might be interested in the best way to get to work in the shortest amount of time, or maybe in the best particle size of our ground coffee to achieve a very tasty cup of coffee ☕. Industries are also interested in optimizing things, such as supply chains, carbon emissions, or waste accumulation.

There are is a large number of possibilities how set up an optimization, depending on how the particular situation looks. Let me divide these situations in two parts for this article:

On the one hand we might have knowledge about the physics, chemistry or biologics that drive the system under study. With this, we could set up algebraic equations that accurately describe what we observe (first-principles). These situations allow the usage of off-the-shelf solvers, such as GLPK, BARON, ANTIGONE, SBB, or others, since we have closed-form expressions and can calculate their derivatives.

On the other hand, we might not really have an idea of how our system looks or behaves. One way to get some information out of it would be to perform experiments, meaning define some inputs and observe what happens in the output. To optimize such a system, we could use heuristics, like particle swarm optimization, apply a genetic algorithm, or use powerful techniques like Bayesian optimization.

We could dive deeply into literature and many discussion now. But let us keep it simple here. Let us focus only the second case, where we do not have a nice and accurate mathematical closed-form description of our system, or we don’t have time to come up with one because we are busy drinking coffee ☕. Let us also assume we have some past observations, but we cannot sample new data from our system due to whatever reason.

Such a situation might arise when you are working with very expensive material, such as pharmaceuticals. You might have produced some batches of drug product in the past, but you cannot produce another batch just for the sake…

Source link