1. Introduction
We continue our study of the combinatorial properties of trinucleotide circular codes. A trinucleotide is a word of three letters (triletter) on the genetic alphabet. The set of 64 trinucleotides is a code in the sense of language theory, more precisely a uniform code, but not a circular code [1,2]. In order to have an intuitive meaning of these notions, codes are written on a straight line while circular codes are written on a circle, but, in both cases, unique decipherability is required.
Comma free codes, a very particular case of circular codes, have been studied for a long time, e.g. [3-5]. After the discovery of a circular code in genes with important properties [6], circular codes are mathematical objects studied in combinatorics, theoretical computer science and theoretical biology, e.g. [7-23].
There are 528 self-complementary circular codes of 20 trinucleotides [6,24,25] and, as proved here, they are naturally partitioned into two quite symmetric classes.
Let be the four trinucleotides with identical nucleotides. In this paper, we study some particular partitions of. Indeed, each circular code can be associated with two other subsets and of simply by operating two circular permutations of one letter and two letters on the trinucleotides of. Then, we prove our main result, i.e. a circular code is self-complementary if and only if the remaining two classes are complement of each other. Furthermore, we also show that a subset of is a circular code if and only if the set consisting of all its complements is a circular code.
As a consequence of these results, we also prove that if a circular code is self-complementary then either both its two conjugated classes are circular codes or none is a circular code.
In Section 2, we give the necessary definitions and a characterization for a set of trinucleotides to be a circular code. In Section 3, we give the results, mainly expressed by Proposition 7 and Proposition 8.
2. Definitions
The classical notions of alphabet, empty word, length, factor, proper factor, prefix, proper prefix, suffix, proper suffix, lexicographical order, etc. are those of [1]. Let denote the genetic alphabet, lexicographically ordered with. We use the following notation:
• (respectively) is the set of words (respectively non-empty words) over;
• is the set of the 16 words of length 2 (diletters or dinucleotides);
• is the set of the 64 words of length 3 (triletters or trinucleotides).
We now recall two important genetic maps, the definitions of code and circular code, and the property of C3- self-complementarity for a circular code, in particular [1,6,17,24,25].
Definition 1. The complementarity map: is defined by, , and, and by for all, e.g.,.
The map on words is naturally extended to a word set: its complementary trinucleotide set is obtained by applying the complementarity map to all the trinucleotides of.
Definition 2. The circular permutation map: permutes circularly each trinucleotide as follows.
The map on words is also naturally extended to a word set: its permuted trinucleotide set is obtained by applying the circular permutation map to all the trinucleotides of. We shortly write for.
Definition 3. A set of words is a code if, for each, , the condition implies and for.
Definition 4. A trinucleotide code is circular if, for each, , , , the conditions and imply, (empty word) and for .
Definition 5. A trinucleotide code is self-complementary if, for each,.
Definition 6. If is a subset of, we denote by the permuted trinucleotide set and by the permuted trinucleotide set and we call and the conjugated classes of.
Definition 7. A trinucleotide circular code is - self-complementary if, and are circular codes satisfying the following properties: (self-complementary), (and).
We have proved that there are exactly 528 self-complementary trinucleotide circular codes having 20 elements [6,24,25].
The concept of necklace was introduced by Pirillo [17] in order to characterize the circular codes for an efficient algorithm development. Let be letters in, diletters in and an integer.
Definition 8. Letter Diletter Continued Necklace (LDCN): We say that the ordered sequence is an for a subset if
and .
Any trinucleotide set is a code (more precisely, a uniform code [1]) but only few of them are circular codes. We have the following proposition.
Proposition 1. [17] Let be a trinucleotide code. The following conditions are equivalent:
1) is a circular code;
2) has no 5LDCN.
The figure below explains the notion of 5LDCN.
3. Results
Proposition 2. If is a trinucleotide circular code having 20 elements and and are its two conjugated classes then, and constitute a partition of.
Proof. It is enough to prove that . Suppose that the trinucleotide belongs both to the classes and. Then and are both in class. As no two conjugated trinucleotides can belong to a circular code, we are in contradiction. Suppose that the trinucleotide belongs both to the classes and. Then and are both in class. As no two conjugated trinucleotides can belong to a circular code, we are in contradiction. Suppose that the trinucleotide belongs both to the classes and. Then and are both in class. As no two conjugated trinucleotides can belong to a circular code, we are in contradiction. So,.
Proposition 3. The class of self-complementary circular codes with both and in the class of circular codes is non-empty.
Proof. Consider, for example, the following set of 20 trinucleotides
It is enough to prove that is a self-complementary circular code and that its two conjugated classes and are also circular codes.
is a self-complementary circular code.
is self-complementary. Obvious by inspection.
is a circular code. We use Proposition 1 [17]. By way of contradiction, suppose that admits a 5LDCN. As can be, , or, it is enough to prove that each choice leads to a contradiction.
1) If then there is no possible as is not a suffix of any trinucleotide of, contradiction.
2) If, there are three possible:
• if (a) or (b) then (c) but there is no possible as is not a prefix of any trinucleotide of, contradiction
• if (d), there is a contradiction as no trinucleotide of has a prefix.
3) If, there are six possible:
• if or, contradiction (a) and (b)
• if then, contradiction (c)
• if or then or:
if, there are three possible: if or then, similarly to (c), contradiction, and if, similarly to (d), contradiction
if, contradiction (c)
• if, contradiction (d).
4) If, similarly to (c), contradiction.
As, for each letter, we cannot complete the assumed 5LDCN for, we are in contradiction. Hence, is a circular code.
is a circular code. We have to prove that
is a circular code. By way of contradiction, assume that admits a 5LDCN.
1) If, there are four possible:, , and, but no possible, contradiction.
2) If, there are three possible:, and, but no possible, contradiction.
3) If, there are six possible:, and, and the cases, and already seen, but no possible, contradiction.
4) If, there is no possible, contradiction.
Hence, is also a circular code.
is a circular code. Finally, we have to prove that
is a circular code. By way of contradiction, assume that admits a 5LDCN.
1) If, there is no possible, contradiction.
2) If, there are six possible:, , , , and, but no possible, contradiction.
3) If, there are three possible:, and which are cases already seen, contradiction.
4) If, there are four possible:, , and, but no possible, contradiction.
Hence, as and, is also a circular code.
Proposition 4. The class of self-complementary circular codes having 20 elements with neither nor in the class of circular codes is non-empty.
Proof. Consider, for example, the following set of 20 trinucleotides
It is enough to prove that is a self-complementary circular code and that neither its conjugated class nor its conjugated class are circular codes.
is a self-complementary circular code.
is self-complementary. Obvious by inspection.
is a circular code. We use Proposition 1 [17]. By way of contradiction, assume that admits a 5LDCN.
1) If then there is one possible but no possible, contradiction.
2) If, there are two possible:
• if then (a) and (b) but there is no possible, contradiction
• if (c) then there is no possible, contradiction.
3) If we have seven possible:
• if then or:
if (d) then or:
- if then and but there is no possible, contradiction
- if then there is no possible, contradiction
if, contradiction (a)
• if, similarly to (b), contradiction
• if, or then, contradiction (a)
• if then or, contradiction (a) and (d)
• if, contradiction (c).
4) If, similarly to (a), contradiction.
Hence, is a circular code.
is not a circular code. We have
We use a technique developed in [23]. Observe that contains So,
is a 5LDCN for this 4-element subset of and, a fortiori, for itself which, consequently, is not a circular code.
is not a circular code. We have
We again use a technique developed in [23]. Remark that contains. So,
is a 5LDCN for this 4-element subset of and, a fortiori, for itself which, consequently, is not a circular code.
We need the propositions hereafter and, in particular the following one which states a general property of the involutional antiisomorphisms such as the complementary map.
Proposition 5. A subset of is a circular code if and only if is a circular code.
Proof. Suppose, first, that is not a circular code and that is a circular code. So has a 5LDCN. This means that there are 13 nucleotides, say
such that the trinucleotides
and
Now, consider the sequence
All the following trinucleotides belong to:
and
as they are the complement of trinucleotides in. So, admits a 5LDCN and it cannot be a circular code. Contradiction.
The case is a circular code and is not a circular code is similar.
Proposition 6. Let be a self-complementary subset of. If is partitioned into three classes such that two of them are the complement of each other then necessarily the third one is self-complementary.
Proof. Let, and be the three classes of an arbitrary partition of and suppose that and are complementary, i.e. and satisfy. Let be a trinucleotide of. We claim that. Indeed, in the opposite case, should not be the complement of because. We also claim that. Indeed, in the opposite case, should not be the complement of because. It remains the case. So, is self-complementary.
Remark 1. Clearly, if, and constitute an arbitrary partition of then the self-complementarity of is not enough to ensure that and are complementary of each other. This remark is again true if, in addition, is a self-complementary circular code having 20 elements. Indeed in this case, it is easy to make a partition in two classes and that are not complementary of each other. Any case, if we consider the partition of in the three classes given by a self-complementary trinucleotide circular code having 20 elements and by its two conjugated classes and then the necessary and sufficient condition holds (Proposition 7 below).
Proposition 7. A trinucleotide circular code having 20 elements is self-complementary if and only if and are complement of each other.
Proof if part. It is a trivial consequence of Proposition 6.
Only if part. Suppose that is self-complementary and consider the partition, and of. Suppose that the trinucleotide, say, belongs to. Then, also
.
We have
and
.
As is a generic trinucleotide of and as
and
then is the complement of.
As a consequence, we have the following proposition.
Proposition 8. If a trinucleotide circular code having 20 elements is self-complementary then either 1) and are both circular codes or 2) and are not circular codes (both have a necklace).
Proof. We have four possibilities:
is a circular code and is a circular code;
is a circular code and is not a circular code;
is not a circular code and is a circular code;
is not a circular code and is not a circular code.
Now, by applying Propositions 3 and 4, we have that the first and the last possibilities can be effectively realized.
Suppose that, by way of contradiction, the second possibility is realized. So, is a circular code. By Proposition 7, we have. So, by Proposition 5, must also be a circular code. Contradiction.
Suppose that, by way of contradiction, the third possibility is realized. So, is a circular code. By Proposition 7, we have. So, by Proposition 5, must also be a circular code. Contradiction.
So, only the first and the last possibilities can occur.
Hence, our proposition holds.
Proposition 9. The 528 self-complementary circular codes having 20 elements are partitioned into two classes: one class contains codes with the two permuted sets and which are both circular codes while the other class contains codes with the two permuted sets and which both are not circular codes.
Proof. It is enough to apply Proposition 8 to each of the 528 trinucleotide circular codes having 20 elements.
4. Acknowledgements
We thank Jacques Justin for his advices. The second author thanks the Dipartimento di matematica U. Dini for giving him a friendly hospitality.