Counting and Randomly Generating k -Ary Trees

k-ary trees are one of the most basic data structures in Computer Science. A new method is presented to determine how many there are with n nodes. This method gives additional insight into their structure and provides a new algorithm to efficiently generate such a tree randomly.


Introduction
The number, , bn k , of k-ary trees with n nodes is well known and given in [1] as ( ) ( ) ( ) , 1 1 C kn n k n − + where ( ) , C n k denotes the number of ways to choose k places from n places, which is ( ) ! ! ! n k n k − . This paper generalizes the results from [2] on binary trees with n nodes to k-ary trees with n nodes by providing a simple direct approach to finding , bn k and a new method to generate a random k-ary tree with n nodes efficiently. The direct approach here to finding , bn k relies on the detailed structure of the trees developed here rather than the standard recursive description of the tree and solving the resultant recurrence relations. Another approach for the random generation is given in [3]. The numeration of k-ary trees is done in [4]. The generation of binary and k-ary trees has been and continues to be of interest [5]

Construction Procedure
The unique k-ary tree that generated this sequence of tuples can easily be constructed by processing the tuples from left to right, effectively building the tree as it is being preorder traversed. This procedure builds the tree as shown below: We will call these spines 1 to 7 of the tree, where spine i contains nodes 1 to i and its branches. The number of unused branches in spine i is the number of branches-(i − 1).

Valid Sequences
Each node of an n node k-ary tree, except for the root, has a unique branch coming into it. Since each branch corresponds to a unique 1 in the tree's n node k-ary sequence there must be 1 n − 1's in the sequence. Each of the tree's null subtrees corresponds to a unique one of the ( ) There are ( ) , 1 C kn n − of these sequences of n k-tuples, one for each way the 1 n − 1's can be assigned to the kn places. Not all of these allow our procedure to construct an n node k-ary tree. Those that do we call valid sequences and the others invalid.

A Look Ahead
The approach here is to confirm two facts. First, that our construction procedure to generate a tree from a valid sequence establishes a 1-1 correspondence be-tween n node k-ary trees and valid sequences.
Secondly, every invalid sequence is one of the distinct n − 1 rotations generated from a unique valid sequence, and each is also distinct from the valid sequence. Thus, each invalid sequence can be associated with a unique tree. It must then be the case that since the number of valid plus the number of invalid sequences equals the total number of sequences. Solving for , bn k we obtain Rotation i of a valid sequence is obtained by shifting its first i tuples from the front to the rear of the sequence. For our example, rotation 3 is: Applying our construction algorithm to rotation 3 produces: Notice that the last, and incomplete tree is spine 3 of the original tree. If the first subtree is added to the first available branch of the spine and the second to the second available branch the original tree is obtained. We will see that our construction applied to rotation i of a tree will always produce r subtrees followed by spine i and adding subtree j to the jth available branch for 1 ≤ j ≤ r produces the original tree.

Excess Sequences and Valid Sequences Are the Same
Let Ni be the number of 1's in the first i k tuples of any n node k-tuple sequence.
we say the sequence is an excess sequence. In general, spine i + 1 is produced from spine i when node i + 1 is processed and added to spine i. When node i + 1 is processed it becomes the child of the first available branch encountered in the preorder traversal starting from node i. This branch must be available since the first i nodes used the first i − 1 available branches encountered and Ni was greater than (i − 1). Since this is true for each 1 i n < − the first n − 1 spines can be built. Since 1 2 Nn n − > − a branch is available for node n to be added to spine n − 1. However, since 1 Nn n = − all the branches will then have been used and a tree has been constructed. This shows that an excess sequence is a valid sequence. A valid sequence must have an unused branch to add node i+1 to in the construction procedure. The first i nodes used i − 1 of the branches. So Ni must have been greater than (i − 1) for each i < n. When i is n, 1 Nn n = − so after the nth node is added all the n − 1 branches have been used. Consequently, a valid sequence is an excess sequence.

Every Invalid Sequence Is One of the n − 1 Rotations of a Valid Sequence
If a sequence, s, is not an excess sequence it must be an invalid sequence. All n node k-ary sequences satisfy ( ) 1 Nn n = − so for s to fail to be an excess sequence there must be a smallest n1 < n such that ( ) Thus, the first n1 tuples of s must be an excess sequence and so represent an n1 node k-ary tree. In fact, the sequence must look like: just r consecutive excess sequences of lengths 1, 2, , n n nr  with a last sequence of length ( ) 1 2 n n n nr − + + +  .
Each of the first r sequences then has ni − 1 1's, i ≤ i ≤ r, and represents an ni node tree and the last sequence must have and we will refer to its nodes as node 1 to N. Now 1 2 n n nr + + +  cannot be n since, if it were, r must be 1 and n1 would contain all the nodes and be an excess sequence and would be the original invalid sequence. Also, r cannot be n, otherwise each ni, 1 ≤ i ≤ r, must be 1 so 1 2 n n nr + + +  would be n, which we know cannot happen.
Lemma. The construction procedure applied to a last sequence L produces spine N of a tree with r unused branches.
Proof. Number the nodes of N from 1 to N and let ti be the number of 1's in tuple i, 1 ≤ i ≤ N. After processing the first i nodes of L, the construction procedure creates spine i of a tree with ( ) 1 2 1 t t ti i + + + − −  unused branches with 1 2 Mi t t ti = + + +  the number of 1's in spine i. As long as each Mi is greater than (i − 1) node i can be added to the spine. If any Mi becomes equal to (i − 1) then nodes 1 to i are a head of L consisting of its first i tuples. Since L cannot have such a head, this cannot happen. So, spine N is produced and has ( ) , unused branches.
Each of the nj sequences represent a subtree, 1 < j < r, and the first node of the jth subtree can be made the node to which the jth unused branch in spine N comes into. This effectively "hangs" the jth subtree from the jth unused branch, in preorder order. This means that the original invalid sequence was rotation r of the tree just created. Hence, every invalid sequence is one of the n − 1 rotations of a valid sequence.

There Is a One-to-One Correspondence between n Node k-Ary Trees and Valid Sequences
Spine i is generated by rotation i for 1 < i < n and these spines are all distinct.
Consequently, the rotations that generated them must be distinct. Since each of the n − 1 spines is distinct, the n − 1 rotations that generated them must be distinct. A valid sequence generates spine n, which is the tree itself so each tree's spine n must be distinct. This establishes a one-to-one correspondence between n node k-ary trees and valid sequences.

Conclusions
This confirms the two facts referred to earlier. The procedure described in the lemma allows us to construct the n node k-ary tree corresponding to any n node k-ary sequence and to do it in O(nk) time.
To generate an n node k-ary tree at random, merely modify the algorithm in [9] so n − 1 integers are selected in step 1, let them determine where the n − 1 1's are placed in the n k-tuples to give an n node k-ary sequence, and use our construction specified in the lemma to find the unique tree it produces. Since all the sequences are equally likely to be produced in step 2 and each tree will be produced by an equal number of sequences, this modification generates an n node k-ary tree at random. It takes O(nk) time and uses integers no larger than kn.

Conflicts of Interest
The author declares no conflicts of interest regarding the publication of this paper.