Optimizing Nightlife
The first weeks at my new job in data analytics inspired me to start using the 'minimize' optimization module from Python's 'scipy.optimize' package. What is it good for? As the name suggests, it's about finding the optimal values for us to solve a problem. More mathematically speaking, it searches for the minima of a given function, including functions of multiple variables. And we can also set conditions for the search. I found this module particularly useful for finding optimal values of variables when an exact analytical solution led to values that are not achievable in real life.
The Principle
The 'minimize' module minimizes the value of a function. This can be, for example, the difference between some desired values and values that we generate and that we cannot achieve completely, as in the situation described below. The solution is restricted by certain conditions and depends on several variables. An algorithm searches for the minimum of the function, which could look like in Fig. 1, for example (in the picture with a maximum, the function can be always reverted to search for minimum).
Fig 1. A paraboloid with maximum at [0, 0, 4]. From wikimedia by IkamusumeFan under the CC BY-SA 4.0 Deed license.
It is important to note that the algorithm needs to start somewhere. The starting point is set by the initial guess argument. We can of course imagine functions with more complicated shape and multiple local minima. The initial guess would then determine which minimum is found, so it is important to come up with a good estimate or try more options. The algorithm chooses a mathematical approach that is suitable for the specific problem, but we can also give it the desired method as an argument. And one last comment on the theory. The package uses two types of conditions: bounds and constraints. Bounds are simply the limits between which a certain variable can lie. Constraints are additional conditions for the relationship between the variables.
An Example: Optimizing the Berlin Nightlife 🕺
Let's imagine something specific. I could have a plan for visiting clubs in Berlin in certain ratio of nights:
-
SchwuZ: 28 %
-
KitKat 22 %
-
Grosse Freiheit 15 %
-
Prinzknecht 25 %
-
Berghain 10 %
-
SchwuZ: 16.2 %
-
KitKat 23.5 %
-
Grosse Freiheit 14.8 %
-
Prinzknecht 27.6 %
-
Berghain 17.9 %
If you do some simple calculations, you will see that with the remaining number of weeks it is impossible to reach the original goal exactly. The numbers would have to follow the formula derived in Fig. 2 and some values would fall outside of the 0 - 100 % interval. But don't worry, it's not a big deal, let's try to approach the goal as much as possible. For this we will use the optimization.
Fig 2. The math behind the problem (see also the notebook below).
We need to minimize the difference between the targets and between values which are determined by the equation above. So this difference is the function to minimize! It has five variables, corresponding to the five clubs, and we are looking for their optimal values. We mustn't forget to set the boundaries (all values between 0 and 100 %) and our constraint (all five values together need to sum up to 100 %). And that's basically it. The algorithm will search for the minimum of the function in the given range of values that satisfy our constraint. The computed differences between the goal and final shares range from about 0.9 to 3.4 %.
A note: in this case, a different initial guess leads virtually to the same result. In other situations, it is advisable to try various values of the guess to prevent the algorithm from falling into a local minimum instead of the global one.
Interested in the code? Just check the notebook on my github!