A few years ago we published on this blog a riddle attributed to Einstein that only 2% of people are said to be able to solve. It remains unsaid that this is so when the problem is approached with a “human strategy” of resolution, as we would solve a sudoku puzzle.
However, this problem and any of its variants can be successfully solved by prescriptive analytics in tenths of a second. And the reason why we are telling you this is that there are many industries that still solve their problems as if they were a sudoku puzzle, despite the fact that there is a whole discipline dedicated to solving these problems in very short times and with excellent results.
The seemingly simple puzzle is based on a series of considerations and a question: who owns the fish?
- The Brit lives in the red house.
- The Swede has a dog.
- The Dane drinks tea.
- The Norwegian lives in the first house.
- The German smokes Prince.
- The green house is immediately to the left of the white one.
- The owner of the green house drinks coffee.
- The person who smokes Pall Mall keeps birds.
- The owner of the yellow house smokes Durnhill.
- The man who lives in the house in the centre drinks milk.
- The man who smokes Blends lives next door to the one who has a cat.
- The man who owns a horse lives next door to the one who smokes Dunhill.
- The man who smokes Bluemaster drinks beer.
- The man who smokes Blends is a neighbour of the man who drinks water.
- The Norwegian lives next to the blue house.
Einstein’s problem is a combinatorial problem. Combinatorial problems are characterized by the existence of a set of elements (for example, each person’s drink) for which there are different alternatives (water, beer, tea, coffee or milk). Let’s see what a solution to this problem looks like.
By filling in the two tables below, we obtain a solution. On the one hand, we assign a drink, a pet, a house and a brand of tobacco to each person and, on the other, we decide the relative position of each house.
There are five candidate drinks in this table, so each of the cells in the drink column can take one of these five options and, in turn, the set of all possible combinations with which we can fill that column is 55 = 3125. Similarly there are 3125 other possible pet assignments to the five people, and the same for houses and tobacco brands. Therefore, the total number of combinations with which we can fill this first table is (55)5=525, that is, a total of 298 023 223 876 953 000.
Note: Obviously, many of these combinations are not valid and they are discarded by the human mind (nobody would fill in dog, dog, dog, dog, dog). However, when we delegate the task to a machine, as we will see later, we must explicitly exclude them from the search.
Likewise, we have to assign a position to each house by filling in the second table. Since there are five possible values for each cell, there are 55 = 3125 possible combinations.
Finally, the set of all possible combinations for the two tables is the products of the combinations of both, that is, 525 x 55 or 931 322 574 615 479 000 000 possible combinations. If we needed one nanosecond (10-9 seconds) to build each one of them and verify the fulfilment of all the rules, it would take us almost 30,000 years to evaluate them all.
This conclusion can be generalized to all combinatorial problems: the combination of all possible allocations for every element usually returns a solution space that is too large to manage by brute force, that is, exploring each and every combination.
For this reason, when the human mind tries to solve a problem like this, it does not try every possible allocations one by one in the hope of finding the solution, but rather does so sequentially, trying to comply with the rules as it builds the solution (as it fills in the table), like a sudoku puzzle.
In this way, although we avoid exploring solutions that clearly do not comply with the rules, like for example, assigning water as a drink to all people, the “human resolution approach” has serious drawbacks:
- In the first place it is “short-sighted”. As it tries to allocate values by taking into account the previous rules and allocations, it may reach a dead end where there is no way to make more allocations without violating any rules. When this happens, it has to go back or even start over. That is, there is no guarantee of success.
- Furthermore, it is slow and uncertain. As it is not a procedure for which there is proof of convergence, the time it takes is uncertain, variable and, especially if it is carried out manually, very slow.
- Finally, it depends on the expertise of the person solving the problem.
How to solve the puzzle using prescriptive analytics
Solving a problem like Einstein’s without any mathematical aid is a good exercise to keep an active brain, but not more.
Combinatorial problems are becoming more and more complex in this time and age. Delivery routes, workforce schedules, production plans … are problems of the same nature as Einstein’s riddle: they are combinatorial problems albeit typically much larger; instead of five people, we have hundreds of customers to visit, and dozens of vehicles to load thousands of packages… The number of possible combinations exceeds almost any imaginable number.
On many occasions, those responsible for solving these problems approach them as the human mind approaches Einstein’s riddle, that is, sequentially. In the case of route planning for example, routes can be created by adding customers one by one, so that each new customer on a route is the closest to the last one added. However, if customers require delivery windows, this method will often not work and it will be necessary to start over.
Prescriptive analytics on the other hand approaches the problem globally, not sequentially. It consists on describing mathematically all the possibilities through variables, equations and inequations, and then intelligently exploring the values taken by those variables. By “intelligently” we mean not looking for the needle in the haystack by going through every nook and cranny, but identifying in which areas the needle cannot be in order not to waste time on them.
A mathematical optimization model comprises three elements: variables, constraints, and an objective function. Let’s take a closer look at each of them in Einstein’s example.
The variables represent the solution by assigning values to them. In the case of Einstein’s puzzle, one of the variables in the problem is xpc, so xpc=1 means that the person p lives in the house c and xpc=0 means that person p does not live in house c. We can define analogous variables for the drink, the tobacco brand and the pet and, on the other hand, the houses’ positions.
The constraints represent the requirements or rules that a solution, that is, the variables’ values, must meet. Continuing with the example of variable xpc, we know that one and only one person lives in each house – that is a constraint. Therefore, the following equation must be met in the case of the green house:
xenglish green+xswede green+xdane green+xnorwegian green+xgerman green=1
The sum of all five variables corresponding to the green house assignment must add up 1. That is, one and only one variable must take the value 1, which translates into one and only one person living in the green house.
The objective function.
The objective function is the indicator that we want to maximize or minimize, such as, for example, the costs or times associated with some activity. In the case of this puzzle there is no objective function since finding a feasible solution is enough. In reality however, more often than not there is some KPI that allows solutions to be differentiated and which can be used to build an objective function.
We are searching for the optimal solution to our problem, which is the feasible solution with the best value of the objective function (if it is not feasible, it cannot be implemented, it does not meet the requirements and we are therefore not interested in it).
Mathematical decision models in real situations share the same structure.
Thus, we can find variables such as xvcc’ which takes the value 1 if vehicle v visits customer c’ after having visited customer c or, for example, which represents the amount of product transported from plant p to distribution centre d.
Similarly, business requirements translate into relationships between variables. For example, if a vehicle arrives at customer c’ from customer c, it must necessarily depart from c’ to some other customer c”. That is, xvcc’= ∑c”xvc’c” must be met for any vehicle v and customers c and c’.
Finally, there is an objective function. In the case of routes, this might be the total distance travelled. Thus, if we know the distance between any two clients c and c’ (Dcc’), we can calculate the sum of all the distances travelled by all vehicles by using an expression similar to ∑v∑c∑c’Dcc’xvcc’
Once the model is formulated, the next step is to solve it, that is, to find the set of values that all the variables must take (the decision) that meets all the constraints (customer requirements) and that minimize the objective function (the KPI that we want to optimize).
George Dantzig, the pioneer in this area, formulated the so-called Simplex method in 1947. Since then, the resolution techniques have been extended and refined and today we have at our disposal different solvers that incorporate these methods.
In this case, we offer a Python implementation using the PuLP library and the CBC solver. It is developed in a Google Colaboratory Jupyter notebook, whose code can be run through the option ‘Run all’ in the ‘Runtime’ menu (or with the shortcut Control + F9). In this way, all cells of code in the notebook are executed.
The last cell shows the tables below which reflect the allocation that respects all the rules.
By the way: it is the German that has the fish.