Combinatorial number system
In mathematics, and in particular in combinatorics, the combinatorial number system of degree k (for some positive integer k), also referred to as combinadics, is a correspondence between natural numbers (taken to include 0) N and k-combinations, represented as strictly decreasing sequences ck > ... > c2 > c1 ≥ 0. Since the latter are strings of numbers, one can view this as a kind of numeral system for representing N, although the main utility is representing a k-combination by N rather than the other way around. Distinct numbers correspond to distinct k-combinations, and produce them in lexicographic order; moreover the numbers less than correspond to all k-combinations of { 0, 1, ..., n − 1}. The correspondence does not depend on the size n of the set that the k-combinations are taken from, so it can be interpreted as a map from N to the k-combinations taken from N; in this view the correspondence is a bijection.
The number N corresponding to (ck,...,c2,c1) is given by
The fact that a unique sequence so corresponds to any number N was observed by D. H. Lehmer.[1] Indeed a greedy algorithm finds the k-combination corresponding to N: take ck maximal with , then take ck − 1 maximal with , and so forth. Finding the number N, using the formula above, from the k-combination (ck,...,c2,c1) is also known as "ranking", and the opposite operation (given by the greedy algorithm) as "unranking"; these operations are known by those names in most computer algebra systems, and in computational mathematics.[2][3]
The originally used term "combinatorial representation of integers" is shortened to "combinatorial number system" by Knuth,[4] who also gives a much older reference;[5] the term "combinadic" is introduced by James McCaffrey [6] (without reference to previous terminology or work).
Unlike the factorial number system, the combinatorial number system of degree k is not a mixed radix system: the part of the number N represented by a "digit" ci is not obtained from it by simply multiplying by a place value.
The main application of the combinatorial number system is that it allows rapid computation of the k-combination that is at a given position in the lexicographic ordering, without having to explicitly list the k-combinations preceding it; this allows for instance random generation of k-combinations of a given set. Enumeration of k-combinations has many applications, among which software testing, sampling, quality control, and the analysis of lottery games.
Ordering combinations
A k-combination of a set S is a subset of S with k (distinct) elements. The main purpose of the combinatorial number system is to provide a representation, each by a single number, of all possible k-combinations of a set S of n elements. Choosing, for any n, {0, 1, ..., n − 1} as such a set, it can be arranged that the representation of a given k-combination C is independent of the value of n (although n must of course be sufficiently large); in other words considering C as a subset of a larger set by increasing n will not change the number that represents C. Thus for the combinatorial number system one just considers C as a k-combination of the set N of all natural numbers, without explicitly mentioning n.
In order to ensure that the numbers representing the k-combinations of {0, 1, ..., n − 1} are less than those representing k-combinations not contained in {0, 1, ..., n − 1}, the k-combinations must be ordered in such a way that their largest elements are compared first. The most natural ordering that has this property is lexicographic ordering of the decreasing sequence of their elements. So comparing the 5-combinations C = {0,3,4,6,9} and C' = {0,1,3,7,9}, one has that C comes before C', since they have the same largest part 9, but the next largest part 6 of C is less than the next largest part 7 of C'; the sequences compared lexicographically are (9,6,4,3,0) and (9,7,3,1,0). Another way to describe this ordering is view combinations as describing the k raised bits in the binary representation of a number, so that C = {c1,...,ck} describes the number
(this associates distinct numbers to all finite sets of natural numbers); then comparison of k-combinations can be done by comparing the associated binary numbers. In the example C and C' correspond to numbers 10010110012 = 60110 and 10100010112 = 65110, which again shows that C comes before C'. This number is not however the one one wants to represent the k-combination with, since many binary numbers have a number of raised bits different from k; one wants to find the relative position of C in the ordered list of (only) k-combinations.
Place of a combination in the ordering
The number associated in the combinatorial number system of degree k to a k-combination C is the number of k-combinations strictly less than C in the given ordering. This number can be computed from C = { ck, ..., c2, c1 } with ck > ... > c2 > c1 as follows. From the definition of the ordering it follows that for each k-combination S strictly less than C, there is a unique index i such that ci is absent from S, while ck, ..., ci+1 are present in S, and no other value larger than ci is. One can therefore group those k-combinations S according to the possible values 1, 2, ..., k of i, and count each group separately. For a given value of i one must include ck, ..., ci+1 in S, and the remaining i elements of S must be chosen from the ci non-negative integers strictly less than ci; moreover any such choice will result in a k-combinations S strictly less than C. The number of possible choices is , which is therefore the number of combinations in group i; the total number of k-combinations strictly less than C then is
and this is the index (starting from 0) of C in the ordered list of k-combinations. Obviously there is for every N ∈ N exactly one k-combination at index N in the list (supposing k ≥ 1, since the list is then infinite), so the above argument proves that every N can be written in exactly one way as a sum of k binomial coefficients of the given form.
Finding the k-combination for a given number
The given formula allows finding the place in the lexicographic ordering of a given k-combination immediately. The reverse process of finding the k-combination at a given place N requires somewhat more work, but is straightforward nonetheless. By the definition of the lexicographic ordering, two k-combinations that differ in their largest element ck will be ordered according to the comparison of those largest elements, from which it follows that all combinations with a fixed value of their largest element are contiguous in the list. Moreover the smallest combination with ck as largest element is , and it has ci = i − 1 for all i < k (for this combination all terms in the expression except are zero). Therefore ck is the largest number such that . If k > 1 the remaining elements of the k-combination form the k − 1-combination corresponding to the number in the combinatorial number system of degree k − 1, and can therefore be found by continuing in the same way for and k − 1 instead of N and k.
Example
Suppose one wants to determine the 5-combination at position 72. The successive values of for n = 4, 5, 6, ... are 0, 1, 6, 21, 56, 126, 252, ..., of which the largest one not exceeding 72 is 56, for n = 8. Therefore c5 = 8, and the remaining elements form the 4-combination at position 72 − 56 = 16. The successive values of for n = 3, 4, 5, ... are 0, 1, 5, 15, 35, ..., of which the largest one not exceeding 16 is 15, for n = 6, so c4 = 6. Continuing similarly to search for a 3-combination at position 16 − 15 = 1 one finds c3 = 3, which uses up the final unit; this establishes , and the remaining values ci will be the maximal ones with , namely ci = i − 1. Thus we have found the 5-combination {8, 6, 3, 1, 0}.
National Lottery example in Excel
For each of the lottery combinations c1 < c2 < c3 < c4 < c5 < c6 < , there is a list number N between 0 and which can be found by adding
Suppose you want to find a position of a British National Lottery result 3,6,15,17,18,35 in a list of possible results. Excel function COMBIN(49,6) would show that number of results is 13983816. Now if you put numbers 3,6,15,17,18,35 each in its cell and formulas COMBIN(49-3,6), COMBIN(49-6,5), COMBIN(49-15,4), COMBIN(49-17,3), COMBIN(49-18,2), 49−35 under each of them. Use cell references instead of actual values, the actual values are provided for readability. You would get a row of numbers of 9366819,962598,46376,4960,465,14. Adding these would show particular position 10381232. Note that you do not need use formula COMBIN(49-35,1) to get 14. You can have it by subtraction 49-35 alone. Also the COMBIN function does not return 0 in case 49-X becomes less than 6. You'd need to use IF with ISNUMBER function to convert #NUM! into 0.
Now the reverse engineering is a bit trickier. You could use 49 IF statements in one cell or use solver to find maximum argument for COMBIN result to be less or equal than position number. Instead let's use a table of 6 by 49 and use MATCH function where resulting row number would be the argument and a ball number. If you make column headings from 6 to 1 (B1:G1) in descending order and row labels of 1 to 49 (A2:A50) in ascending order (vertically ascending in Excel means numbers growing from top to bottom). Then if you fill up the table with formula COMBIN($A2,B$1) starting from left top corner. $ signs will make sure that index values are always taken from heading row and label column. Replace #NUM! with zeros. You should get something like this:
6 5 4 3 2 1 1 0 0 0 0 0 1 2 0 0 0 0 1 2 3 0 0 0 1 3 3 4 0 0 1 4 6 4 5 0 1 5 10 10 5 6 1 6 15 20 15 6 7 7 21 35 35 21 7 8 28 56 70 56 28 8 9 84 126 126 84 36 9 10 210 252 210 120 45 10 11 462 462 330 165 55 11 12 924 792 495 220 66 12 ... 49 13983816 1906884 211876 18424 1176 49
Now if you create a new row of six cells and fill them with formulas which would find the largest values in each column which are less than or equal to position number in question: The first cell with =INDEX(B2:B50,MATCH(10381232,B2:B50)), the rest of the cells with
INDEX(C2:C50,MATCH(10381232-SUM(of previous cells),C2:C50)) ... INDEX(G2:G50,MATCH(10381232-SUM(of previous cells),G2:G50))
This would present you with a row of values you have already seen 9366819,962598,46376,4960,465,14 Now in a next row first cell write =49-MATCH(10381232,B2:B50) and similarly
=49-MATCH(10381232-9366819,C2:C50) ... =49-MATCH(10381232-9366819-962598-46376-4960-465,G2:G50)
Again use the references to cells instead of actual values. You should be presented with original ball numbers of 3,6,15,17,18,35.
Now you can generate a fresh Lottery number combination from single =randbetween(1,combin(49,6)) or you could look at the list position numbers of earlier results to see if there is a trend.
Applications
One could use the combinatorial number system to list or traverse all k-combinations of a given finite set, but this is a very inefficient way to do that. Indeed, given some k-combination it is much easier to find the next combination in lexicographic ordering directly than to convert a number to a k-combination by the method indicated above. To find the next combination, find the smallest i ≥ 2 for which ci ≥ ci−1+2 (if there is no such i, take i = k+1); then increase ci−1 by one and set all cj with j < i − 1 to their minimal value j − 1. If the k-combination is represented by its characteristic vector, that is as a binary value with k bits 1, then the next such value can be computed without any loop using bitwise arithmetic: the following C++function will advance x to that value or return false:
// find next k-combination
bool next_combination(unsigned long& x) // assume x has form x'01^a10^b in binary
{
unsigned long u = x & -x; // extract rightmost bit 1; u = 0'00^a10^b
unsigned long v = u + x; // set last non-trailing bit 0, and clear to the right; v=x'10^a00^b
if (v==0) // then overflow in v, or x==0
return false; // signal that next k-combination cannot be represented
x = v +(((v^x)/u)>>2); // v^x = 0'11^a10^b, (v^x)/u = 0'0^b1^{a+2}, and x ← x'100^b1^a
return true; // successful completion
}
This is called Gosper's hack;[7] corresponding assembly code was described as item 175 in HAKMEM.
On the other hand the possibility to directly generate the k-combination at index N has useful applications. Notably, it allows generating a random k-combination of an n-element set using a random integer N with , simply by converting that number to the corresponding k-combination. If a computer program needs to maintain a table with information about every k-combination of a given finite set, the computation of the index N associated with a combination will allow the table to be accessed without searching.
See also
- Factorial number system (also called factoradics)
- Primorial number system
References
- ↑ Applied Combinatorial Mathematics, Ed. E. F. Beckenbach (1964), pp.27−30.
- ↑ http://www.site.uottawa.ca/~lucia/courses/5165-09/GenCombObj.pdf
- ↑ http://www.sagemath.org/doc/reference/sage/combinat/subset.html
- ↑ Knuth, D. E. (2005), "Generating All Combinations and Partitions", The Art of Computer Programming, 4, Fascicle 3, Addison-Wesley, pp. 5−6, ISBN 0-201-85394-9.
- ↑ Pascal, Ernesto (1887), Giornale di Matematiche, 25, pp. 45−49
- ↑ McCaffrey, James (2004), Generating the mth Lexicographical Element of a Mathematical Combination, Microsoft Developer Network
- ↑ Knuth, D. E. (2009), "Bitwise tricks and techniques", The Art of Computer Programming, 4, Fascicle 1, Addison-Wesley, p. 54, ISBN 0-321-58050-8