The Change-Making Problem

“Money is the most universal and efficient system of mutual trust ever devised.” This is how the author Yuval Noah Harari describes money in his bestselling book on the history of humanity—Sapiens.

In mathematics, we tend to use abstractions to make problems more approachable and avoid getting lost in the minutiae of the real world. We strip away the meaning and instead focus on the equations. However, sometimes, we may look to add some kind of metaphor to our equations and techniques—such is the case of the change-making problem. In this article, we present an overview of the problem of making change. Although the problem may have arisen directly from the real-world task of making change, this aspect should be seen only as a metaphor for the underlying mathematics and techniques. We go beyond the necessary optimizations and improvements that one may need in day to day life, but keep the terminology of coins and change simply because they are useful in illustrating examples and concepts.

All of this being said, it is still good practice to investigate why a problem is named the way it is. Money, of course, is an essential aspect of human life and gave rise to many fascinating mathematical problems. Let us briefly examine its history.

A Very Short History of Money

Money, historically, was a driving force for the development of numeral systems in various cultures, although not the only one. Its commonly used definition is ambiguous, referring to anything that may be used as payment. Chronologically, the first kind that fits this description is commodity money. Commodity money obtains its value intrinsically, i.e., we can make practical use of the currency in question—we can eat it, wear it, smoke it, use it in battle, and so on. Anything with an intrinsic value that a large group of people is willing to accept as payment, we call commodity money. One interesting example is barley, which seems to be the closest to what we would call an official currency, originating around 3000 BC in the Sumerian civilization in the historical region of southern Mesopotamia, now southern Iraq. Alongside barley money, the Sumerians developed a sexagenary numeral system, which means they were using 60 different symbols instead of just ten like we do. There are many theories about the reason why the Sumerians developed such a system, but nobody knows for sure. They used a big unit of roughly 30 kilograms of barley called a talent, which they divided into 60 minas, which they divided further into 60 shekels.

Green barley in a field
Green barley in a field.

Commodity money evolved into something we now call representative money. We may think of representative money as one layer of indirection on top of an intrinsically valuable asset. Instead of giving you three barley talents, which is arguably hard for me to carry, I may give you a certificate that proves you now own the barley in question. Sumerians used to maintain warehouses of commodities, which issued such certificates to the owners.

Nowadays, we use coins and banknotes, which do not have an intrinsic value. You can't eat a £2 coin. Its strong point is that you can exchange it for some rolled oats at the supermarket. This £2 coin is what we call fiat money since it's backed by the Bank of England, the banker of the government of the United Kingdom.

fiat /ˈfiːat,ˈfʌɪat/
noun
a formal authorization or proposition; a decree.

The history of fiat money began in the 11th century when the Song government in China introduced the first government-backed paper currency, called Jiaozi. The Jiaozi didn't have any intrinsic value at all. It didn't represent anything in particular. Its value was instead backed by the government. This is, to some extent, the kind of money we use today. What Harari is referring to in his book when he describes money as a system of mutual trust, is this lack of intrinsic value of the most wide-spread currencies. In the £2 example, you can indeed go to the local store and buy a kilogram of oats, but if tomorrow suddenly everybody else refuses to accept pounds as a method of payment, the value of your coin is reduced to nothing. This is unlikely, but an interesting thought experiment. Most you could do with your coin is throw it in a well and wish for better luck.

Change-Making

The British pound comes in coins of 1p, 2p, 5p, 10p, 20p, 50p, £1, and £2, and banknotes of £5, £10, £20, and £50. This is what we call the denominations of the currency, or simply a coin system. Let us imagine a vending machine. In goes a £5 for a £1.2 pack of Oreo cookies. The vending machine needs then to give back £3.8. The customer would like to not be flooded back with 380 1p coins. In fact, it would be nice if the vending machine gave back £3.8 in as few coins as possible, in this case £2 + £1 + 50p + 20p + 10p = £3.8. This is what we call the change-making problem. Mathematically, this problem is hard; NP-hard, in formal terms. However, there is an interesting aspect to its complexity: it becomes easier when the coin system is of a special kind, which we call canonical. Most of the modern currencies use a canonical system, including the British, although this wasn't the case until recent times.

Soda vending machines by the shore in Japan
Soda vending machines in Japan.

Interestingly enough, the day of February 5th, 1971 came to be known as Decimal Day in the United Kingdom and Ireland. This was the day both countries switched to a decimal (and canonical) coin system. Some might be surprised by how recently this happened, considering that China has been using one for at least 2000 years by that time, Russia switched in 1704, the United States of America in 1792, and France in 1795.

The Problem, in Formal Terms

Let us define a coin system formally as a vector \( C = (c_1, c_2, \dots, c_n) \). For the sake of simplicity, we also add the condition that \( c_1 > c_2 > \dots > c_n \) and that \( c_n = 1 \). This is to avoid negative numbers altogether and to ensure that any given value can be represented. We say that \( R = (r_1, r_2, \dots, r_n) \) is a representation of t if \[ t = C \cdot R = c_1 r_1 + c_2 r_2 + \dots c_n r_n. \] The problem asks us to, given a value t, find a representation R such that the total number of coins used, \[ \vert R \vert = r_1 + r_2 + \dots + r_n, \] is minimized.

It turns out that we don't know any polynomial algorithm in the size of C to solve this problem for an arbitrary coin system. However, there are some systems for which we do know a polynomial-time algorithm. By definition, we call these systems canonical. The algorithm, of the greedy kind, works by repeatedly taking the largest coin whose value is not larger than the remaining amount, beginning with the largest one and working its way down from that. To see that this approach does not work in the general case, take the example of the system (4, 3, 1). The algorithm wrongly outputs (1, 0, 2), while the correct solution is (0, 2, 0). It uses three coins, 4 + 1 + 1 = 6, when it could've used just two by doing 3 + 3 = 6.

The Greedy Algorithm

The greedy algorithm begins with the coins of the largest type in the system and tries to take as many of them as possible, without going over the target amount t, then it goes to the next coin and does the same for the remaining amount. In pseudocode, we might do something like this:

1 function makeChangeGreedy(C, t)
2 k \( \leftarrow \) 0, a \( \leftarrow \) 0
3 R \( \leftarrow \) zeros(length(C))
4 while a < t
5 \( R_k \) \(\leftarrow\) \( \lfloor (t - a) / c_k \rfloor \)
6 a \(\leftarrow\) a \(+\) \( R_k c_k \)
7 k \( \leftarrow \) k + 1
8 return R
A greedy algorithm to solve the change-making problem that runs in \( O(n) \) time, where n is the number of elements in C, the coin system.

Since the last number in C is always 1, the algorithm is guaranteed to return a result, although it might not be the optimal one. Let us formally describe the kind of solution it returns. First, note that the algorithm prioritizes larger coins. Can there be a representation \( V = ( v_1,\) \( v_2, \dots,\) \( v_n ) \) that is lexicographically larger than the representation \( W = (w_1, \) \(w_2, \dots, \) \(w_n) \) returned by the algorithm? By lexicographically larger we mean that there is some i such that \( v_j = w_j \) for all \( j < i \) and that \( v_i > w_i \).

We are going to prove by contradiction that this can't be the case. Note that, at some iteration of the loop on line 4, it must be true that \( R_j = w_j \) for all \( j < i \) and that \( k = i \). Since we assumed that W is returned, \( R_i \) must be set now to \( w_i \) on line 5. But what is \( \lfloor (t - a) / c_k \rfloor \)? How many times can we fit \( c_k \) in \( t - a. \) Note that at this point we have that \[ \begin{align} a &= c_1 w_1 + c_2 w_2 + \dots + c_{i - 1} w_{i - 1} \\ &= c_1 v_1 + c_2 v_2 + \dots + c_{i - 1} v_{i - 1}. \end{align} \] Particularly, since V is a representation of t, we have that \[ \begin{align} t - a &= t - c_1 v_1 - c_2 v_2 - \dots - c_{i - 1} v_{i - 1} \\ &= c_i v_i + c_{i + 1} j_{i + 1} + \dots + c_n j_n. \end{align} \] Remember that \( k = i \), thus \( c_k \) fits in \( t - a \) at least \( v_i \) times. But \( v_i > w_i \), contradicting the fact that the algorithm returns W.

So we proved that the representation that our algorithm outputs is the lexicographically greatest representation of t. We are going to call this representation \( G(C, t) \), for short. At the same time, we are going to define the representation \( M(C, t) \) as the lexicographically greatest representation of minimal length—optimal solution concerning the number of coins used. By definition, when \( G(C, t) = M(C, t) \) for all t—when the greedy algorithm returns the minimal representation for all t, we say that the coin system C is canonical.

In what follows, we shall attempt to understand when exactly is it the case that \( G(C, t) = M(C, t) \) for all t. To that extent, we shall devise a polynomial-time method to check whether a coin system is canonical.

When Is a Coin System Canonical?

Note that since \( G(C, t) = M(C, t) \) for all t for a canonical system, we also have that \( \lvert G(C, t) \rvert\) = \(\lvert M(C, t) \rvert \) for all t. To prove that a system is not canonical, it suffices to find some w, such that \( \lvert G(C, t) \rvert > \lvert M(C, t) \rvert \). Unfortunately, to prove that the system is canonical, we can't check all possible values for w. There are infinitely many of them. Instead, we devise a theorem—a theorem that tells us that it is enough to check some specific values for w, and if we find no counterexample among them, then the system is canonical.

First, let us build some terminology. For a choice of system C of n coins, we are going to call some n-vector V greedy if \( V = G(C, V \cdot C) \). Similarly, we call V minimal if \( V \) = \( M(C, V \cdot C) \). In short: \[ \begin{align} V \text{ is greedy} &\Leftrightarrow V = G(C, V \cdot C), \\ V \text{ is minimal} &\Leftrightarrow V = M(C, V \cdot C). \end{align} \] Further, we are going to use the usual order comparison signs between representations to denote lexicographical order: \[ \begin{align} V &< W \Leftrightarrow W \text{ is lexicographically greater than } V, \\ V &\le W \Leftrightarrow W \text{ is equal or lexicographically greater than } V. \end{align} \] Besides this lexicographical relation, we also introduce an element-wise order comparison of the two vectors: \[ \begin{align} V &\subset W \Leftrightarrow V_i < W_i, \\ V &\subseteq W \Leftrightarrow V_i \le W_i, \\ \end{align} \] where i goes from 1 to n. This relation has an interesting property, which we are going to use later. Let us call it Lemma 1, and split it in two parts.

Lemma 1. Let U and V be some representations under the same coin system.

  1. If V is greedy and \( U \subseteq V \), then U is greedy too.
  2. If V is minimal and \( U \subseteq V \), then U is minimal too.

Proof. To prove (a), take any representation X of \( U \cdot C \). We have that \[ X \cdot C = U \cdot C. \] Since \( U \subseteq V \), we have that the elements of \( V - U \) are all non-negative. Thus, because of the linearity of the dot product, we may add \( V - U \) to both sides: \[ (V - U + X) \cdot C = V \cdot C. \] Notice that, since V is greedy, it must be the lexicographically equal to or greater than any other representation, so \[ V - U + X \le V, \] which is the same as saying \[ X \le U, \] for any representation X of \( U \cdot C \). This means that U is the lexicographically greatest representation of \( U \cdot C \). In short, U is greedy. It is important to note that this last step was possible because lexicographical order is preserved during addition. We were able to add \( U - V \) to both sides because in doing so, the lexicographical relation between the left and the right-hand sides does not change.

In the proof for (a) use began with an arbitrary representation X of \( U \cdot C \) and showed that \( X \le U \). We can prove (b) in similar manner, but we need a relation other than \( \le \). We need a relation \( \sqsubseteq \) such that when \( X \sqsubseteq U \) for any representation X of \( U \cdot C \), we can say that U is minimal. Note that, by definition, the minimal representation is the lexicographically greatest representation of minimal length. Thus, we may define \( \sqsubseteq \) as follows: \[ \lvert A \rvert > \lvert B \rvert \text{ or } ( \lvert A \rvert = \lvert B \rvert \text{ and } A \le B). \] \[ \lvert A \rvert > \lvert B \rvert \text{ or } \\ ( \lvert A \rvert = \lvert B \rvert \text{ and } A \le B). \] It turns out that to prove (b), we may use the same arguments use used to prove (a). We just have to replace \( \le \) with \( \sqsubseteq \). The only thing to note is that we used the fact that the lexicographical order is preserved under addition. Is the minimality order preserved under addition? To see that it is, note that if we add D to a representation A, we have that \( \lvert A + D \rvert = \lvert A \rvert + \lvert D \rvert \), assuming that \( A + D \) is non-negative. The second condition for \( \sqsubseteq \) follows from the fact that lexicographical order is preserved under addition. □

Let us now, given a coin system C, attempt to build the set of possible counterexamples. As stated above, if we can't find a counterexample among the elements of this set, we are going to show that there is no such example and therefore the system is canonical.

Let's suppose, for a moment, that w is the smallest counterexample. We have that \( \lvert M(C, w) \rvert < \lvert G(C, w) \rvert \). What can we say about \( M(C, w) \) and \( G(C, w)?\) Well, for starters, they can't have a 0 in the same position. To see why, note that if they both did have a 0 in position k, we could subtract 1 from that position to find two new representations for \( w - c_k.\) We would have that \[ \lvert M(C, w - c_k) \rvert < \lvert G(C, w - c_k) \rvert, \] which would contradict the fact that w is the smallest counterexample, since \( w - c_k \) would be an even smaller counterexample.

So we know that the sets of positions with zeros are different for \( G(C, w) \) and \( M(C, w).\)

Before we make our main argument, let us make one more observation. If i is the first position of \( M(C, w) \) that is non-zero, there must be a position \( i' < i \) for which \( G(C, w) \) is non-zero. This is because, by definition, \( M(C, w) < G(C, w).\) This also means that \( i > 1 \), i.e. \( M(C, w) \) begins with one or more zeros.

We now proceed as follows. We take a look at \( G(C, c_{i - 1} - 1),\) where i is the first non-zero position of \( M(C, w).\) We are going to find a relation between \( G(C, c_{i - 1} - 1) \) and \( M(C, w) \) that will allow us to compute \( M(C, w) \) directly from \( G(C, c_{i - 1} - 1) \). Then, we'll just compute \( G(C, c_{i - 1} - 1) \) for all \( 1 < i \le n \) and check if any resulting candidates for w is a counterexample. If none of them is, it must mean that there is no smallest counterexample, and thus there is no counterexample at all, and thus the system C is canonical.

Theorem 1. For a given non-canonical coin system C, let \( M(C, w) \) be the minimal representation of the smallest counter example w. Let i and j be the first and last non-zero positions, of this representation, respectively. It is true that \( M(C, w) \) agrees with \( G(C, c_{i - 1} - 1) \) in all positions less than j, is one greater in position j, and zero in all positions after that.

Proof. This is a bold statement. How does one go about proving it? Well, for the easy part, notice that all positions that come after position j in \( M(C, w) \) are 0. This is by the definition of j. For the more complicated part, let's ask ourselves with the representation of what number does \( M(w) \) agree with in the first \( j - 1 \) positions? There are many, but an interesting one is \( w - c_j \). Notice that since \( w - c_j < w \), the minimal and greedy representations of \( w - c_j \) are the same. Subtracting \( c_j \) from w corresponds to subtracting 1 from the jth position in \( M(C, w),\) since minimality is preserved under addition. In short \( M(C, w - c_j) \) differs from \( M(C, w) \) in only one position, in which it is one less. If we could manage to show that \[ M(C, w - c_j) \le G(C, c_{i - 1} - 1) < M(C, w), \] \[ \begin{align} M(C, w - c_j) &\le G(C, c_{i - 1} - 1) \\ &< M(C, w), \end{align} \] our job would be done. This is because the first \( j - 1 \) positions in \( G(C, c_{i - 1} - 1) \) would have to agree with the first \( j - 1 \) positions in \( M(C, w), \) since we know that they do agree between \( M(C, w - c_j) \) and \( M(C, w), \) and because the jth position in \( G(C, c_{i - 1} - 1) \) could only be one less than the jth position in \( M(C, w). \)

We are going to prove each of the inequalities in the new statement separately. Notice that the greedy representation of \( w - c_j \) can't have any coin \( c_{i - 1} \) or bigger. This is because its greedy representation is also the minimal representation, which, as we noted before, differs from the representation of w in just one position. It is important to note that the greedy representation of \( w - c_j \) does not include a coin larger that \( c_i \). This means that \( w - c_j < c_{i - 1}. \) There reason for this is that is that if \( w - c_j \) would've been equal to or larger than \( c_{i - 1} \), its greedy representatation would have necessasarily included a coin at least as large as \( c_{i - 1} \).

So we have that \( w - c_j < c_{i - 1} \), but how is this important? It means that \[ w - c_j \le c_{i - 1} - 1.\] This, in turn, means that \[ G(C, w - c_j) = M(C, w - c_j) \le G(C, c_{i - 1} - 1). \] Thus the first inequality is proven. For the second one, notice that the greedy representation of \( c_{i - 1} - 1 \) can't use a coin larger than \( c_i \), but also that \( c_{i - 1} - 1 \ge c_i, \) so it must use the coin \( c_i \). This is the same as saying that the ith position is non-zero in \( G(C, c_{i - 1} - 1) \). If we reduce it by 1 we get \( G(C, c_{i - 1} - 1 - c_i), \) the greedy representation of \( c_{i - 1} - 1 - c_i. \) At the same time, we also have that \( c_{i - 1} - 1 < w, \) since we know the greedy representation of w to use a coin at least as large as \( c_{i - 1}.\) By subtracting \( c_i \) from both sides of this inequality, it follows that \( c_{i - 1} - 1 - c_i <\) \( w - c_i. \) Remember that \( M(C, w) \) also had a non-zero in position i, and thus, by Lemma 1, if we decrease it by 1 we get the minimal representation of \( w - c_i, \) which must also be its greedy representation. This results in the following relation \[ G(C, c_{i - 1} - 1 - c_i) < G(C, w - c_i). \] Finally, since lexicographical order is preserved under addition, we may re-add 1 to the ith position of both representations, obtaining the second inequality: \[ G(C, c_{i - 1} - 1) < M(C, w). \] Be careful here, by adding one to the ith position of \( G(C, w - c_i) \) we don't get \( G(C, w), \) but \( M(C, w). \) Minimality and greediness are not preserved under addition. Lemma 1 says the we can decrease numbers in a greedy or minimal representation and obtain another valid greedy or minimal representation, respectively. It does not say we can increase them with the same outcome.

Finally, to reiterate, we just proved that \[ M(C, w - c_j) \le G(C, c_{i - 1} - 1) < M(C, w). \] This relation forces \( G(C, c_{i - 1} - 1) \) to agree with \( M(C, w) \) in the first \( j - 1 \) positions, and be exactly one less in position j. By definition, in \( M(C, w) \) all the positions the come after j are 0. □

Next, we are going to implement the checker. Theorem 1 tells us that, for a given system C, we may check all combinations of \( 1 < i < j \le n \) and compute counterexample candidate by computing \( G(C, c_{i - 1} - 1) \) and then increasing its jth position by 1 and set all the following positions to 0. There are \( O(n^2) \) possibilites for i and j. Once we choose one, the computation of the greedy representation of \( c_{i - 1} - 1 \) is done in \( O(n) \) time. Consequently, our checker will run in \( O(n^3) \) time.

Implementing a Checker

As proven in the previous section, it suffices to check only \( O(n^2) \) candidates for a counterexample. If there is no counter example among them, we know there are no counterexamples at all and therefore the system is canonical.

1 function checkCanonical(C)
2 n \( \leftarrow \) length(C)
3 for i from 1 to \( n - 1 \)
4 for j from i + 1 to n
5 M \( \leftarrow \) makeChangeGreedy(C, \( C_{i - 1} - 1 \))
6 \( M_j \leftarrow M_j + 1 \)
7 for k from j + 1 to n
8 \( M_k \leftarrow = 0 \)
9 w \( \leftarrow M \cdot C\)
10 if makeChangeGreedy(C, w) = M
11 return false
12 return true
An algorithm to check whether a coin system C is canonical that runs in \( O(n^3) \) time, where n is the number of elements in C.

So we have a quick way to check whether a system is canonical and a quick way to make change if it is. What about when the system is not canonical? As we said in the introduction, we don't know of any algorithm that makes change in polynomial time in the number of different coins in the system, but that doesn't mean we can't solve the problem.

Non-Canonical Coin Systems

The change-making problem, in the general case, is part of a set of problems called NP-hard. Let us briefly examine what that means.

The Computational Complexity of the Change-Making Problem

In complexity theory, we usually categorize problems in a few categories. Many of them are in the set P, which stands for polynomial time. These are the decision problems that your day to day computer may solve in polynomial time with respect to the size of the input. There is another set of problems, called NP. These are the decision problems which your computer may not be able to solve in polynomial time, but instead it is able to, when the answer to the problem is yes, verify that it is correct in polynomial time. NP stands for non-deterministic polynomial time, the reason behind the name is that those problems may be solved in polynomial time by a special kind of theoretical machine: a non-deterministic Turing machine. In theory, this machine is able to make multiple different steps at the same time, basically be in multiple states at the same time. It is not possible to build one, and it's not the same as a quantum computer. This is in contrast to the deterministic Turing machine, which is more like your day to day computer. Any problem in P is also in NP, so we have P \( \subseteq \) NP. In fact, it's still an open question whether NP \( \subseteq \) P as well, whether P is the same as NP.

Inside NP, there exists a special kind of problems, called NP-complete, which are the hardest problems in the set. When we say one problem is harder than another, what we mean is that if we solve the first, harder, problem, we automatically solve the other one (but not necessarily the other way around). In complexity theory, this is what we call reduction. These are the problems solvable in polynomial time by a non-deterministic Turing machine, but not by a deterministic Turing machine. Notice that since, by definition, a problem in NP-complete is the hardest in NP, it true that all problems in NP-complete can be reduced to one another.

There is yet one more interesting set of problems, called NP-hard. These problems are not necessarily solvable even by a non-deterministic Turing machine. They are as hard or harder than the NP-complete problems. This is the set where we find the change-making problem.

To be fair, there is a discussion to be had here. All problems in NP are decision problems—problems that can only have yes or no as answers. The problem of finding whether some amount of money can be represented in some coin system is indeed in NP-complete. The problem of finding the optimal representation is in NP-hard, though, but not in NP. Problems in NP-hard need not be decision problems.

Making Change With Dynamic Programming

There are many ways to apply dynamic programming to the problem of making change. For brevity, we shall only compute the minimal number of coins to be used, but all algorithms can easily be extended to return the minimal representation along with its size.

Let us assume this time that \( c_1 < c_2 < \dots < c_n \) and that \( c_1 = 1.\) For starters, what would be the intuitive way to find the optimal representation for a target value t? We might devise the following recursive function \[ f(n, t) = \left\{ \begin{array}{ll} \text{min}(1 + f(n, t - c_n), f(n - 1, t)) & c_n < t \text{ and } n > 1, \\ 1 + f(n, t - c_n) & c_n < t \text{ and } n = 1, \\ f(n - 1, t) & c_n > t, \\ 1 & c_n = t. \\ \end{array} \right. \] In brief, this function works by looking at each coin in decreasing order and taking the best answer of what would happen if we use on more instance of it of if we just skip it. The problem is that it computes f for the same inputs multiple times. The first thing that comes it mind is to use memoization. With memoization, once we compute the value of f for some inputs, we store it in a table so we can retrieve it at a later time when we see those same inputs again, instead of having to recompute. In this way, we reduce the asymptotic running time from exponential to \( O(nt). \)

Although recursive dynamic programming with memoization does work, it is sometimes better, from an implementation standpoint, to solve the problem from the bottom up.

Imagine the call dependency graph for \( f(n, t).\) In the recursive approach, we compute the nodes in a top-down manner, beginning with the root. For example, to compute \( f(3, 6), \) we move, in the figure below, in the direction of the arrows. However, because of the way that f was defined, it's easy to figure out what the leaf nodes are and compute those first. These are the cases where f does not make any other recursive calls, i.e. when \( c_n = t. \) After that, we compute the ones that can be computed directly from the leaves, and then the nodes that can be computed from those, and so on. We continue in this manner until we reach the root. Basically, we go against the dependency arrows.

f(1, 6) f(2, 6) f(3, 6) f(1, 5) f(2, 5) f(3, 5) f(1, 4) f(2, 4) f(3, 4) f(1, 3) f(2, 3) f(3, 3) f(1, 2) f(2, 2) f(3, 2) f(1, 1) f(2, 1) f(3, 1)
Call dependency graph for f when \( C = (1, 3, 4). \) Leafs are highlighted in red. Arrows represent dependencies. For example, to compute \( f(2, 6), \) we fist need to compute \( f(1, 6) \) and \( f(2, 3).\)

Is this helpful from a theoretical standpoint? Not really. The asymptotic time and space measurements remain the same. In practice, it does avoid the recursivity overhead, though.

In pseudocode, this approach to dynamic programming might look something like the function below.

1 function makeChangeDp(C, t)
2 n \( \leftarrow \) length(C)
3 M \( \leftarrow \) matrix(t, n, 0) // a t by n matrix initialized with zeros
4 for k from 1 to n
5 for x from 1 to t
6 if \( c_k = x \)
7 \( M_{x, k} \leftarrow \) 1
8 else if \( c_k > x\)
9 \( M_{x, k} \leftarrow M_{x, k - 1} \)
10 else if \( c_k < x \)
11 if k > 1
12 \( M_{x, k} \leftarrow \) min(\(M_{x - c_k, k} + 1, M_{x, k - 1}\))
13 else if k = 1
14 \( M_{x, k} \leftarrow \) \(M_{x - c_k, k} + 1\)
15 return \( M_{t, n} \)
A dynamic programming algorithm that computes the size of the size of the minimal representation of an amount t in a coin system C. Runs in \( O(tn) \)

This dynamic programming approach is an elegant solution. It is simple to implement and gives and optimal solution to any input. It runs in \( O(tn) \) time, and can be classified as something we call pseudo-polynomial time algorithm. Can we do bettter?

Making Change With Polynomials

Let us revisit the case where \( C = (1, 3, 4). \) What amounts of money can we represent with only one coin? Well, that's simple: 1, 3, and 4. What amounts can we represent with one or two coins? Well, we still have the amounts which can be represented with one coin, to which we add the amounts which we can represent with exactly two coins: 1 + 1 = 2, 1 + 4 = 5, 3 + 3 = 6, 3 + 4 = 7, 4 + 4 = 8. Note that we skipped combinations such as 1 + 3 = 4 because we already have a 4-coin.

Generally speaking, with two coins or less we can represent the amount equal to any coin \( c_i, \) or the amount equal to any combination of coins \( c_i + c_j. \) For three coins or less, we have everything that we had for two coins, to which we add any combination \( c_i + c_j + c_k. \) Where else do these combinations arise? If you thought polynomial multiplication, you are correct.

Let's take a look at the polynomial \[ P(x) = 1 + x + x^3 + x^4. \] We added together terms of the form \( x^{c_{i}} \), plus an extra \( x^0 = 1 \), which corresponds to some sort of a value-less coin. If we write down \( P^2(x), \) we can see all the combinations arise as exponents: \[ P^2(x) = 1 + 2 x + x^2 + 2 x^3 + 4 x^4 + 2 x^5 + x^6 + 2 x^7 + x^8 \] There's also an added perk: the coefficients tell us in how many ways we can achieve the amount represented by the exponent, assuming that order matters and we use the value-less coin as well. We can see there are two ways to represent 7 = 3 + 4 = 4 + 3. We also see there are four ways to represent 4 = 0 + 4 = 1 + 3 = 3 + 1 = 4 + 0.

Why does this happen? The idea is that when we multiply two arbitrary polynomials P and R, a term \( x^k \) of P gets multiplied by each term of R, giving rise to terms \( x^{k + d_1},\) \( \dots, \) \(x^{k + d_n}, \) where \( d_1, \dots, d_n \) are exponents of all terms in R with a non-zero coefficient.

How can we use this information? For starters, we can use it to answer the question “what amounts can we represent with k coins?” We compute \( P^k(x) \) and see what terms arise. We can also answer the question “can t be represented with k coins?” by checking whether some multiple of \( x^t \) if within those terms. The original question—“what is the minimum amount of coins needed to represent t?” can be answered by sequentially computing powers of \( P(x) \) until we find one in which \( x^t \) has a non-zero coefficient. This, of course, would be the naïve way.

In what follows, we are going to abandon the terminology of coins and change, and instead focus on the following question: given a polynomial \( P(x) = 1\) + \(x^{c_1}\) + \(x^{c_2} + \dots\) + \(x^{c_n}, \), where \( c_1 = 1, \) and a number t, what is the smallest k such that \( x^t \) has a non-zero coefficient in \( P^k(x)? \) This is, as we discussed above, equivalent to our initial change-making problem. Note that we keep the assumption that one coin has value 1. This is, again, to ensure that any value can be represented.

Yes, we could sequentially compute powers of \( P(x) \) until we find a good one, but there are better ways, such as binary search or the lower bound finding method, the latter of which we shall present now. We know that \( k_0, \) no matter what it is, can be written as \[ k_0 = \sum_{i = 0}^{m} b_i 2^i, \] where \( b_i \in \{ 0, 1 \} \) for all i and \( b_m = 1\). We just expressed \( k_0 \) as a sum of powers of 2, which is more or less its binary representation. You might be more used to this notation: \[ k_0 = (b_0b_1b_2 \dots b_{m})_2. \] Knowing this, we may write \[ P^{k_0}(x) = \prod_{i = 0}^{m} P^{b_i 2^i}(x). \] Now, to actually find \( k_0, \) we begin my computing \( m, \) which is the highest number such that \( P^{2^m} \) does not contain a multiple of \( x^t. \) This can be done by repeatedly squaring P until we find a polynomial that does contain a multiple of \( x^t, \) and taking its square root (the square computed before). Once we know m, we continue to compute the bits of \( k_0 \) in decreasing order.

Once we know \( k_0, \) we have our answer. A method that implements this idea may be found below.

1 function findK0(P, t)
2 m \( \leftarrow \) 0, R \( \leftarrow \) P
3 while R does not contain a multiple of \( x^t \)
4 R \( \leftarrow R^2 \)
5 m \( \leftarrow \) m + 1
6 m \( \leftarrow m - 1\)
7
8 \( k_0 \leftarrow 0 \)
9 for i from m to 0
10 if \( P^{k_0 + 2^i - 1} \) does not contain a multiple of \( x^t \)
11 \( k_0 \leftarrow k_0 + 2^i \)
12 return \( k_0 \)
An algorithm that solves the change-making problem by multiyplying polynoamials.

You may have noticed that, to compute \( k_0, \) we use a greedy approach similar to the one we used for making change in a canonical coin system. We begin with the highest power of 2 that we know to fit in \( k_0 \) and continue to add the others if they fit.

The running time of this algorithm is dependent on the way we do the polynomial multiplication. Also, in practice, we would not compute the powers of P each time. Instead, we would make use of caching.

Multiplying Polynomials With Convolutions and the Fourier Transform

There are many ways to multiply polynomials. Let us examine one well-known method of doing it. We begin by defining two polynomials: \[ P(x) = \sum_{i = 0}^{m} p_i x^i \quad \text{ and } \quad R(x) = \sum_{i = 0}^{n} r_i x^i. \] With a bit of algebraic manipulation, their product is \[ \begin{align} P(x)R(x) &= \sum_{i = 0}^{m} \sum_{j = 0}^{n} p_i r_j x^{i + j} \\ &= \sum_{i = 0}^{m + n} \left( \sum_{j} p_{j} r_{i - j} \right) x^i. \end{align} \] where in the second sum j takes all values such that \( 0 \le j \le m \) and \( 0 \le i - j \le n. \) The term inside the paranthesis is what we call the discrete convolution of coefficients. Both lists of coeffiecints can be seen as discrete functions. The topic of convolutions is very complex, and of great importance in mathematics and signal processing.

Aperiodic discrete convolutions do indeed arise when multiplying polynomials, and while this is an important fact, it's not the best to lead with when first learning about convolutions. The topic is too large to be presented in this article. We are going to use this fact purely out of practical reasons here, but to do so, we need to bring the Fourier transform into the spotlight. The Fourier transform is sadly yet another concept out of the scope of the article. Briefly, it is a method of decomposing a discrete or continuous function into its constituent frequencies. We are interested in the discrete case mainly because we can quickly compute the Fourier transform of a discrete function via a fast Fourier transform algorithm. To make use of this, we employ a theorem that says that for two functions f and g, we have that: \[ f * g = \mathscr{F}\{ \mathscr{F}^{-1}\{f\} \cdot \mathscr{F}^{-1}\{g\}\}, \] where \( f * g \) is the convolution of the function f and g, \( \mathscr{F} \) is the Fourier transform, and \( \mathscr{F}^{-1} \) is its inverse counterpart.

To compute the kth of a polynomial \( P(x) \) we use, along with the convolution of coefficients via the Fourier transform, the idea of fast exponentiation: \[ P^k(x) = \left\{ \begin{array}{ll} P^{k / 2} & k \text{ is even} , \\ P^{k - 1} & k \text{ is odd}. \end{array} \right. \] These are only some of the improvements we might make to the basic algorithm presented in the pseudocode above. There are many more optimizations, and also completely different approaches, that one may use.

Conclusion

I hope this served you well as an introduction to the problem of change-making. Along with others, such as the subset sum problem, it is of a type of problems called knapsack problems. Mathematicians and computer scientists have been studying them extensively for more than one hundred years now. Their applications span from low-level computing, networking, stock trading, filling your backpack in your favourite RPG game and, of course, the problem of giving you back the £3.8 that the vending machine owes you, along with your pack of Oreo cookies—in as few coins as possible, of course.

See Also

As I mentioned, the change-making problem and its siblings are being studied all the time because of their many and diverse applications. Let me point you in the direction of some of the works I used in the writing of this article.

  • On the Change-Making Problem by Timothy M. Chan and Qizheng He, 2020. We presented a method that makes change in \( O(t \log^3 t) \) time, this paper lowers it to \( O( t \log t ) \) by the means of a randomized algorithm.
  • A Polynomial-time Algorithm for the Change-Making Problem by David Pearson, 1994. This is the paper that first described a polynomial time algorithm to check whether a coin system is canonical.
  • Knapsack Problems by Hans Kellerer, Ulrich Pferschy, and David Pisinger, published by Springer in 2004. Since 2004, new research has changed the picture quite significantly in the space of knapsack problems. Nevertheless, this book is a very good introduction and overview of many of those problems and their applications.