The Development of Gordon Life Science Institute: Its Driving Force and Accomplishments

Established in 2004, Gordon Life Science Institute is the first Internet Research Institute in the world. It is a non-profit institute, a gift to science. Those scientists, who are really loving science more than anything else and have shown fantastic creativity in science, can become the membership of such Institute. Their driving force is not funding but firmly belief that scientists will do much better science if they do not have to spend a lot of time for funding application, and that great scientific findings in history were often discovered by those who were without funding at all but driven by profound imagination and curiosity. Summarized in this review are also the accomplishments of the Gordon Life Science Institute and its fu-ture perspective.


INTRODUCTION
The Gordon Life Science Institute was founded by Professor Dr. Kuo-Chen Chou right after he was retired from Pfizer Global Research and Development in 2003. Its birthplace or cradle was at San Diego of California, USA. Its mission is to develop and apply new mathematical tools and physical concepts for understanding biological phenomena. For a briefing about its history and philosophy, click https://gordonlifescience.org/GordonLifeScience.html.
The Institute's name reflects an interesting historical story. After the Cultural Revolution, China started to open its door, the founder was invited by Professor Sture Forsén, the then Chairman of Nobel Prize Committee, to work in Chemical Center of Lund University as a Visiting Professor. In order for Swedish people easier to pronounce his name, Professor Chou used "Gordon" as his name in Sweden. About a quarter of century later, the same name was used for the Institute, implying that "Reform and Opening" and "Free Communication" can stimulate a lot of great creativities.
The current liaison site of Gordon Life Science Institute is in Boston of Massachusetts, USA; gls@gordonlifescience.org.
Open Access Natural Science

TASKS AND CAMPUS
As an Internet Institute, it is without physical boundaries. Its members do not have to work in a same campus or building. Located around the world, they shall freely collaborate, exchange ideas, and share information and findings via various kinds of "Online Communication" methods. In some sense, it is also a very efficient practice to reduce the risk of suffering from the "New Coronavirus" that have endangered the entire world during the period of 2019-2020. Actually, during that pandemic period all the other Research Institutes and Universities in the entire world must be forced to being closed. In contrast to this, seven very powerful predictors (e.g., "pLoc_Deep-mEuk", "pLoc_Deep-mHum", and "pLoc_Dep-mVirus") were generated from the Gordon Life Science Institute and have become very useful tools to fight against the Coronavirus. Furthermore, its members can focus completely on science without having to cope with the troubles in obtaining visas and in paying for relocation expenses, among many others.
The Gordon Life Science Institute is a non-profit organization. It is a gift to human beings and science. Its founding principle is to pursue the excellence in science: anyone who has proved his/her creativity in science can become a member regardless of his/her age, occupation, and nationality. Accordingly, the Institute has provided an ideal organization or society for those scientists who are indeed loving science more than anything else.
Members of the Institute have firmly believed that science will be more fantastic if scientists do not have to spend a lot of time on funding applications and that great scientific findings in history were often made by those who were without any funding but driven by profound imagination and curiosity. They are absolutely concurring with the famous statement of Albert Einstein, "Imagination is more important than knowledge. For knowledge is limited, whereas imagination embraces the entire world, stimulating progress, giving birth to evolution". Listed below are just some represented works produced by the Gordon Life Science Institute.

Special PseAAC Has Been Extended to the General One
With the avalanche of biological sequences in the post-genomic age, one of the most critical problems in computational biology is how to formulate a biological sequence with a vector or discrete model, yet still keep considerable sequence-order information or key pattern characteristic. This is because all the existing machine-learning algorithms can only handle vectors as elaborated in a comprehensive review [1]. However, a vector defined in a discrete model may completely lose all the sequence-pattern information. To avoid completely losing the sequence-pattern information for proteins, the pseudo amino acid composition [2] or PseAAC [3] was proposed. Ever since then, it has been widely used in nearly all the areas of computational proteomics .

Extension of PseAAC to PseKNC
Encouraged by the successes of using PseAAC to deal with protein/peptide sequences, the concept of PseKNC (Pseudo K-tuple Nucleotide Composition) [72] was developed for generating various feature Natural Science vectors for DNA/RNA sequences that have proved very useful as well [73]. Particularly, in 2015 a very powerful web-server called "Pse-in-One" [74] and its updated version "Pse-in-One2.0" [75] have been established that can be used to generate any desired feature vectors for protein/peptide and DNA/RNA sequences according to the need of users' studies.

Distorted Key Theory for Peptide Drugs
According to Fisher's "lock and key" model [76], Koshland's "induced fit" theory [76], and the "rack mechanism" [77], the prerequisite condition for a peptide to be cleaved by the disease-causing enzyme is a good fit and tightly binding with the enzyme's active site ( Figure 1). However, such a peptide, after a modification on its scissile bond with some simple chemical procedure, will no longer be cleavable by the enzyme but it can still tightly bind to its active site. An illustration about the distorted key theory is given in Figure 2, where panel (a) shows an effective binding of a cleavable peptide to the active site of HIV protease, while panel (b) the peptide has become a non-cleavable one after its scissile bond is modified although it can still bind to the active site. Such a modified peptide, or ''distorted key", will automatically become an inhibitor candidate against HIV protease. Even for non-peptide inhibitors, the information derived from the cleavable peptides can also provide useful insights about the key binding groups and fitting conformation in the sense of microenvironment. Besides, peptide drugs usually have no toxicity in vivo under the physiological concentration [78]. For more discussion about the distorted key theory, see a comprehensive review paper [79]. It was based on such a distorted key theory that many investigators were enthusiastic to develop various methods for predicting the protein cleavage sites by disease-causing enzymes (see, e.g., [80]). Furthermore, a web-server called "HIVcleave" [81] has been established for predicting HIV protease cleavage sites in proteins. Its website address is at http://chou.med.harvard.edu/bioinf/HIV/.

Introduction of Wenxiang Diagram
Using graphic approaches to study biological and medical systems can provide an intuitive vision and useful insights for helping analyze complicated relations therein, as indicated by many previous studies on a series of important biological topics, (see, e.g., [82]). The "wenxiang" diagram ( Figure 3) [83,84] is a special kind of graphical approach, which is very useful for in-depth studying protein-protein interaction mechanism [85,86]. Also, the wenxiang diagram has also been used to study drug-metabolism system [87]. The name of "wenxiang" came from that its shape looks quite like the Chinese wenxiang (蚊香), a Figure 1. A schematic illustration to show a peptide in good fitting and tightly binding with the enzyme's active site before it is cleaved by the latter. Adapted from [79] with permission. Figure 2. Schematic drawing to illustrate the "Distorted Key" theory, where panel (a) shows an effective binding of a cleavable peptide to the active site of a disease-causing enzyme, while panel (b) the same peptide has become a non-cleavable one after its scissile bond is modified although it can still bind to the active site. Such a modified peptide, or ''distorted key", will automatically become an inhibitor candidate against the disease-causing enzyme. Adapted from [79] with permission. coil-like incense widely used in China to repel mosquitos. In the wenxiang graphs each residue is represented by a circle with a letter to indicate its code: a hydrophobic residue is denoted by a filled circle Natural Science with a white code symbol, a hydrophilic residue is denoted by an open circle with a black code symbol, whereas the invalid residue is denoted by a yellow-filled circle.

Predictors for Multi-Label Systems
Information of subcellular localization for a protein is indispensable for revealing its biological function. Therefore, one of the fundamental goals in molecular cell biology and proteomics is to determine the subcellular locations of proteins in an entire cell. Before 2007, most efforts in this regard were focused on the single-label system by assuming that each of the constitute proteins in a cell had one, and only one, subcellular location (see, e.g., [88][89][90][91][92]). However, with more experimental data uncovered, it has been found that many proteins may simultaneously occur or move between two or more location sites in a cell and hence need multiple labels to mark them. Proteins with multiple locations are also called multiplex proteins [93,94], which are often the special targets for drug development (see, e.g., [94][95][96]). In studying the multi-label systems, we need two kinds of metrics to measure performance quality of a predictor: one is for the accuracy of global prediction and the other for the accuracy of local prediction [97]. As a showcase, let us consider the multi-label predictor of pLoc_bal-mVirus [98], which was developed for studying the 6 organelles or subcellular locations ( Figure 4) in a virus cell. 1) Click the link at http://www.jci-bioinfo.cn/pLoc_bal-mVirus/, you'll see the top page of the predictor prompted on your computer screen ( Figure 5). 2) You can either type or copy/paste the sequences of query virus proteins into the input box at the center of Figure 5. The input sequence should be in the FASTA format. You can click the Example button right above the input box to see the sequences in FASTA format. 3) Click on the Submit button to see the predicted result; e.g., if you use the four protein sequences in the Example window as the input, after 10 seconds or so, you will see a new screen ( Figure 5) occurring. On its upper part are listed the names of the subcellular locations numbered from (1) to (6) covered by the current predictor. On its lower part are the predicted results: the query protein "P01115" of example-1 corresponds to "2", meaning it belonging to "Host cell membrane" only; the query protein "P03495" of example-2 corresponds to "4, 5", meaning it belonging to "Host cytoplasm" and "Host nucleus"; the query protein "P89873" of example-3 corresponds to "4, 5, 6", meaning it belonging to "Host cytoplasm", "Host nucleus", and "Secreted". All these results are perfectly consistent with experimental observations.

Five-Steps Rule
The Institute was the birth-place or cradle of the famous 5-steps rule [71], which has been used in nearly all the areas of computational biology (see, e.g., [5,11,13,16,28,32,[99][100][101][102][103][104][105][106][107][108][109][110][111]), material science [112], and even the commercial science (e.g., the bank account systems). The only difference between them is how to formulate the statistical samples or events with an effective mathematical expression that can truly reflect their intrinsic correlation with the target to be predicted. It just likes the case of many machine-learning algorithms. They can be widely used in nearly all the areas of statistical analysis.
Working in such Institute filled with this kind of philosophy and atmosphere, the scientists would be more prone to be stimulated by the master piece papers from the then Chairman of Nobel Prize Committee StureForsen (see, e.g., [113,114]), so as to drive them substantially more creative and productive.

CONCLUSION AND PERSPECTIVE
In contrast to the conventional institutes, Gordon Life Science Institute has the following unique advantages: it can 1) attract those scientists who are really loving science more than anything else; 2) maximize their creativity in science and minimize the distraction or disturbance caused by the relocation and various followed-up tedious things; 3) provide them with an ideal environment to completely focus on doing science; 4) drive their motivation by profound imagination and curiosity; and 5) create a rich and thick scientific atmosphere to produce their scientific results more truthful, wonderful, and awesome.
Accordingly, it would not be surprised to see that a total of five members of Gordon Life Scientist have been selected by Clarivate Analytics as Highly Cited Researcher or HCR (see Section 3), indicating that for the ratio of HCR per member, the "Gordon Life Science Institute" has already exceeded the "Broad Institute of MIT and Harvard, USA", becoming the very top in the world.
It is anticipated that more significant accomplishments will be achieved by the Gordon Life Science Institute for many years to come, as indicated by a series of very recent papers [98,.

ETHICAL APPROVAL STATEMENT
This article does not contain any studies with human or animal participants.