1. Introduction
Classically, a word over an alphabet Ω is a finite sequence
of letters taken from Ω and a run in a word
is a maximal subsequence of consecutive identical letters in
, see, for example, Feller [1], or Mood [2]. A slightly different point of view consists of interpreting a word over Ω as the result of the replacement of each element of a linear digraph by an arbitrary letter of Ω. A run in a word is then interpreted as a maximal linear sub-digraph whose letters are all identical. Here are four runs,
, in a 9-letter word over the alphabet
) obtained under this interpretation (see below).
(1)
Using this point of view, words can now be easily “structured” in various ways. For example, starting from the digraph of a circular permutation (instead of a linear digraph, as above) gives rize to the concept of a circular word studied probabilistically in the basic paper of Koutras et al. [3]. In a similar way, any combinatorial species F, in the sense of Joyal [4], leads to a corresponding notion of F-structured word (or F-word, for short). Technically speaking, a F-word of length n over an alphabet Ω is an orbit of the following natural action of the symmetric group
,
(2)
defined by
where
is the set of F-structures on
,
is the result of relabelling the
-structure
along the permutation
, and
is an ordinary word of length
over the alphabet Ω. Figure 1 illustrates this action for
,
the left drawing shows the ordered pair
where
is a graph on [5] and
, the
-th letter of the word
, is written just under its associated vertex
in the graph, for
; while the right drawing shows the resulting
under the action of
.
It is important to note that under this action, the labels
of the underlying set of the structure
are permuted while each letter is kept in a fixed position. This is a general fact and it constitutes the reason why the orbits of action (2) can be though of as unlabelled F-structures whose nodes are replaced (or filled) by letters of the alphabet, that is, F-words. Summing the orbits of (2) over
, gives rize to the concept of an analytic functor in the sense of Joyal [5],
(1) |
of the category of sets into itself, sending each set Ω (= alphabet) to the set of all F-words over Ω.
There exist special classes of F-words, for which a natural notion of “run” can be defined. This is the case when F is a graphical species, whose structures are of the form of a simple graph possibly equipped with extra structures. Standard examples of graphical species include: the species of digraphs, of circular permutations, of tree-like structures (e.g., when F is the species of
-enriched rooted trees or the species of R-enriched trees in the sense of Labelle [6]), etc.
Definition 1 Let F be a graphical species and
be a F-word over an alphabet Ω. A run in
is a maximal connected subgraph of
having identical letters.
This definition makes sense since the relabellings of F-structures under the action (2) is letter-invariant. Figure 2 shows the runs in a circular word and in a tree-like word over the alphabet
.
The study of runs in graphical words appears far from being trivial, under such a general setting. For example, a simple graph word whose runs are all singletons is equivalent to a proper coloring of the graph. So that, even in this special case, the whole theory of chromatic polynomials must be used. In the present text, our run analysis will be concentrated on classes of arborescent words.
In Section 2, we study various types of runs in Fwords over an alphabet of k letters, where F is the graphical species of R-enriched rooted trees, called here R-arborescences, for short. To do so, we make use of functional equations on multisort species and introduce a special 2-sort species, that we call a runs selector species, to classify the types of runs under study. Various enu-
![](https://www.scirp.org/html/2-1200127\3b1f92d7-e85d-45ee-b985-77901d9ef6b3.jpg)
Figure 2. Four runs in a circular word and five runs in a tree-like word.
merative results about runs (e.g., number of F-words whose runs are all of a given type, distribution of the biggest run, etc) then follow by making use of appropriate underlying cycle index series. Section 3 illustrates the theory of R-arborescent words by analysing specific examples and their associated series. Section 4 indicates some extensions to be developed in the future. For an introduction to species, the reader can consult the basic paper of Joyal [4] or the book by Bergeron, Labelle, and Leroux [7].
2. Runs in Enriched Arborescent Words
An arborescence is a directed rooted tree in which all edges point away from the root. Let
be a given species of structures having at least one structure on the empty set. Recall that an
-arborescence is an arborescence in which the (possibly empty) set of direct successors of each vertex is equipped with an
- structure. Figure 3(a) shows an
-arborescence, where the black dots represent the elements of its underlying set. The small arc at each vertex represents the
-structure placed on the set of successors of the vertex. The species
of
-arborescences is characterized by the combinatorial equation,
(3)
and, depending on the choice of the “enriching” species,
, various species of tree-like structures are obtained. Now, take a
-letter alphabet
and associate a sort
of singletons to letter
, for
. The structures belonging to the
-sort species
are
-arborescences whose vertices are of arbitrary sorts taken among the
’s. The first step in our run analysis is to classify these multisort structures according the runs of sorts they contain. From (3), we obviously have,
(4)
where
is the species of the
-structures having a root of sort
. Note that this root of sort
is part of a run of vertices of sort
. To reflect this fact, we now give an alternate expression for
in terms of an auxiliary two-sort species,
, where
is an extra sort of singletons.
Lemma 2 Let
be the two-sort species defined by
(5)
Then, the species
satisfy the combinatorial equations,
(6)
Proof. Figure 3(b) shows a
-structure, where singletons of sort
(resp.,
) are drawn as black dots (resp., white triangles). The result follows from the fact that an
-structure is a
-structure in which the blackdots are replaced by singletons of sort
and the white triangles are replaced by arbitrary
-structures,
, (see Figure 4(a), where
,
,
-singletons are white dots,
-singletons are black dots,
-singletons are gray dots).
Definition 3 Any subspecies
is called a runs selector species. The species
![](https://www.scirp.org/html/2-1200127\b89b5104-761e-4cb0-ba30-af4a100bb819.jpg)
of
-arborescences having only runs of sorts of type
is defined by
(7)
An
-arborescent word having only runs of type
over the alphabet
is the result of the replacement of each vertex of sort
in a
-structure, by the letter
, for
.
It is worthwhile to note that a runs selector species
does not only select the shape of the runs under interest but also specifies how these runs should be interconnected. For example•
selects unrestricted runs (see Figure 3(b))•
selects linear runs (see Figure 5(a))•
selects linear runs of length
•
selects singleton runs (see Figure 5(b))•
selects maximal runs (see Figure 5(c)), etc.
The main interest in Lemma 2 and Definition 3 is that they give rise to a variety of enumerative results about runs of various types in
-arborescent words by the introduction of special weight counters in (5)-(7). For example, taking
, where
is the species of cyclic permutations, Equation (3) takes the form
whose solution is the species
of
(a) (b)
Figure 3. (a)
-arborescence; (b)
-structure.
(a) (b)
Figure 4. (a)
-structure; (b) Mobile word with linear runs.
(a) (b) (c)
Figure 5. Selectors for (a) Linear runs; (b) Singleton runs; and (c) Maximal runs.
mobiles in the terminology of Bergeron, Labelle, and Leroux [7]. The reason is the following. Putting the root on top, the descendents of each node can be thought as turning around along horizontal circles. In this case,
-words are called mobile words. Figure 4(b) illustrates a mobile word on a 2-letter alphabet
having only linear (type (ii)) runs.
The introduction of a run counter
and the simultaneous substitutions
, in (5)-(7), where
is a letter counter lead to the following general proposition.
Proposition 4
Let
be the species of R-arborescences and
be a runs selector species. Give a weight
to each
-word with
runs and let
be the total weight of
-words of length
having only runs of type
over the
-letter alphabet
. Then the generating series
![](https://www.scirp.org/html/2-1200127\60c5fec2-06eb-4b02-a0d4-b2007692a437.jpg)
is given by,
(8)
where
is the cycle index series of
. In the special case,
, corresponding to unrestricted runs, the generating series
is explicitely given by,
(9)
where
is the cycle index series of
.
Proof. The series
is the total weight of all unlabelled
-structures in which each node is replaced by the variable counter
and each run is given a weight
. Now, consider the species
![](https://www.scirp.org/html/2-1200127\cb105f94-55f2-4cbf-9ea2-837aa81419d3.jpg)
obtained by the substitutions
,
, in the species
weighted as above. Then, from Pólya Theory in the context of species,
, where
is the cycle index series of
. By symmetry, all the species
,
, are equal. So that they can all be identified with a single species,
, say. By (4), taking into account the run counter,
,
(10)
(11)
Define the species
by
.
Then,
, and the recursive scheme (8) follows by taking the cycle index series of both sides of this last equation. Now, consider the special choice,
. Then, using the fact that
we have,
(12)
Hence,
![](https://www.scirp.org/html/2-1200127\3f6c42da-15c1-4635-8c9a-16f10921039d.jpg)
This implies that the species
![](https://www.scirp.org/html/2-1200127\fb4539d9-f1cc-4957-88b7-c830ceb94fd6.jpg)
satisfies the equation
![](https://www.scirp.org/html/2-1200127\2c2ac161-5586-4e05-a682-ee7920e1bb48.jpg)
Since
, we must have
![](https://www.scirp.org/html/2-1200127\4d7506cd-4042-4587-b08b-5dc7364e3a8c.jpg)
by the implicit species theorem [4]. So that,
and (9) follows by taking the cycle index series on both sides of this combinatorial equation.
The following corollary is immediate from Proposition 4, via the substitution
.
Corollary 5 Let
be the species of
arborescences and
be a runs selector species. Let
be the number of
-words of length
having only runs of type
over the
- letter alphabet
. Then the generating series,
, is given by,
(13)
In the special case,
, corresponding to unrestricted runs, the series
is explicitely given by,
(14)
The use of cycle index series above comes from the fact that R-enriched arborescences have non-trivial automorphisms, in general. This occurs when R-structures have non-trivial automorphisms. In the asymmetric case, however, that is when the automorphism group of each
-structure is trivial, it is well-known that cycle index series reduce to exponential generating series.
Corollary 6 If the species,
, is asymmetric, then
and
with
(15)
where
is the exponential generating series of
. Moreover, if
, then
(16)
where
is the exponential generating series of
.
Various computational general methods exist to obtain successive approximations (or explicit expressions, sometimes) for
and
. Simple iteration is the most obvious one, but more sophisticated methods include Good-Lagrange inversion, the use of eclosion operators and Newton-Raphson iteration adapted to cycle index series (see Labelle [8,9]). There is not enough space here to describe these methods in detail in the present context. Instead, we have chosen to illustrate our analysis of runs by focusing on specific examples.
3. Specific Examples
3.1. Runs in Ordinary Words
With
, Equation (3) takes the form
whose solution is the asymmetric species
of non empty linear orders. In this case,
-words are simply (ordinary non empty) words and Equation (5) takes the form
whose solution is
(17)
• In the case
, of unrestricted runs, (16) immediately gives,
(18)
• Take the runs selector
![](https://www.scirp.org/html/2-1200127\7984b520-615e-4c68-aae7-803742fd3e89.jpg)
of linear runs of length
. Then recurrences (15) take the form,
(19)
Solving for
, we get the well known g.f.’s of words whose runs are of size
,
(20)
Note that taking
in (20) can be called the Fibonacci case since gives,
(21)
where
is the
-th Fibonacci number. This corresponds to twice the well known fact that a sequence of
adjacent squares can be tiled by monominos or dominos in
ways. Taking
in (20) and substracting gives the g.f. of words whose maximal runs are of size
,
(22)
from which one can compute or estimate the expected size of the largest run (see, for example, Erdos and Revesz [10], Guibas and Odlyzko [11], Kong [12]).
• Take any subset
and the selector
of runs of length in
. Then,
(23)
3.2. Runs in Binary Arborescent Words
Planar case. With
, where
is the species of linearly ordered 2-point sets, Equation (3) takes the form
whose solution is the asymmetric species
of binary plane arborescences. Accordingly,
-words are binary plane arborescent words and (5) takes the form
.
• Take the linear runs selector
![](https://www.scirp.org/html/2-1200127\c3f18819-b859-4816-bf08-6be8cf735549.jpg)
corresponding to (ii) above. Then by (16), the g.f.’s of plane binary arborescent words having only linear runs are given by,
(24)
Coefficient extractions give, for
odd,
![](https://www.scirp.org/html/2-1200127\b9b60818-f7bb-4872-8da1-c96b845fcf1c.jpg)
where
, are the Catalan numbers.
• The Fibonacci case, k = 2, m = 2, corresponds, in the present context, to case (iii) above with a 2-letter alphabet. Here,
. So that,
(25)
The first terms of
and of
are given by,
![](https://www.scirp.org/html/2-1200127\63e83fbd-c440-42ac-a94a-ee7571c8d5ec.jpg)
![](https://www.scirp.org/html/2-1200127\deea87ac-af78-4da7-aa13-4e2159107a8d.jpg)
Non-planar case. With
, where
is the species of 2-element sets, Equation (3) takes the form
whose solution is the species
of (free) binary arborescences. Accordingly,
-words are binary arborescent words and (5) takes the form
. Note that in this case, Binstructures have non trivial symmetries in general, so that cycle-index series must be used instead of exponential generating series.
• Take the linear runs selector
![](https://www.scirp.org/html/2-1200127\3eb3950a-76cf-44f9-8158-0e823fc8e9b6.jpg)
corresponding to (ii) above. Then
![](https://www.scirp.org/html/2-1200127\7cdb81bf-35cb-405d-ab48-ef9723117d62.jpg)
and by (8),
where,
(26)
For example, the first terms of
are given by• ![](https://www.scirp.org/html/2-1200127\4fdaa6d6-7e00-4683-a0d8-70f21c135e7a.jpg)
• In the Fibonacci case, k = 2, m = 2, the runs selector (iii) above is given by
.
So that,
(27)
The first terms of
and of
are given by,
![](https://www.scirp.org/html/2-1200127\ab5bc9fa-873c-40cd-a7dc-9fd7a740a5f5.jpg)
![](https://www.scirp.org/html/2-1200127\639c9a1d-e633-4a7c-bec7-8a124c8c0ce8.jpg)
3.3. Runs in Cayley Arborescent Words
With
, where
is the species of finite sets, Equation (3) takes the form
whose solution is the species
of standard Cayley arborescences. In this case,
-words are Cayley arborescent words and equation (5) takes the form
. Since
, this equation can be rewritten as
. Hence, from the unicity of solution in the implicit species theorem, we have,
(28)
• Take the runs selector
. Then since
, (9) and (14) can be rewritten as,
(29)
For example, the first terms of
and
are given by• ![](https://www.scirp.org/html/2-1200127\be828faf-f7cd-4a57-b51f-2ca54444d4e7.jpg)
• Take the linear runs selector
corresponding to (ii) above. Then,
(30)
More generally, the runs selector
![](https://www.scirp.org/html/2-1200127\02d73be4-efb2-4a89-be32-7053a6126ae9.jpg)
corresponds to linear runs of size
. Here,
(31)
• For Cayley arborescences, the Fibonacci case, k = 2, m = 2, corresponds to the runs selector
. So that,
(32)
• For singleton runs (case (iv) above),
. So that,
(33)
• For maximal runs (case (v) above),
, since
. So that,
(34)
where
is the ordinary generating series of unlabelled Cayley arborescences.
3.4. Runs in Mobile Words
Recall that with
, where
is the species of cyclic permutations, Equation (3) takes the form
whose solution is the species
of mobiles. In this case,
-words are called mobile words (see Figure 4(b)). Since
the above case (ii) of linear runs, for example, corresponds to the runs selector,
(35)
and the generating series,
, of mobile words having only linear runs, weighted by the run counter
is given by,
(36)
since
, where
is the Euler function. For example, the first terms of
, for a 2-letter alphabet
are given by,
![](https://www.scirp.org/html/2-1200127\c310cd63-8a74-4479-a31b-cb0f653dafab.jpg)
4. Some Extensions
The present approach to runs in structured words can be extended in various directions. For example, a probabilistic analysis of runs can be made by adding a probability of occurence
to letter
in the alphabet
. Runs in other classes of graphical words can also be considered. For example, the class of
- enriched tree words is a tractable one. Here, the species
of
-enriched arborescences is replaced by the species
of
-enriched trees. This case needs a more complex analysis since extra symmetries of the structures have to be taken into account (the structures being unrooted). The (enriched) dissymmetry formula,
(37)
which expresses the species
of
-enriched trees in terms of the species
of
-enriched arborescences (see Bergeron, Labelle, and Leroux [7]), is the tool to be used for the analysis of the multisort species
leading to an analogue of Proposition 4 for
-enriched tree words. These extensions will be developed elsewere.
5. Acknowledgements
The authors would like to thank Jérôme Tremblay for his LaTeX and Maple help.