Gordon Life Science Institute and Its Impacts on Computational Biology and Drug Development

Gordon Life Science 
Institute is the first Internet Research Institute ever established in the 
world. It is a non-profit institute. Those scientists who really dedicate 
themselves to science and loving science more than anything else can become its 
member. In the friendly door-opened Institute, they can maximize their time and 
energy to engage in their scientific creativity. They have also believed that 
science would be more truthful and wonderful if scientists do not have to spend 
a lot of time on funding application, and that great scientific findings and 
creations in history were often made by those who were least supported or 
funded but driven by interesting imagination and curiosity. Recollected in this 
review article is its establishing and developing processes, as well as its 
philosophy and accomplishments. Particularly, its productive and by-productive 
outcomes have covered the following five very hot topics in bioinformatics and 
drug development: 1) PseAAC and PseKNC; 2) Disported key theory; 3) Wenxiang 
diagram; 4) Multi-label system prediction; 5) 5-steps rule. Their impacts on 
the proteomics and genomics as well as drug development are substantially and 
awesome.


INTRODUCTION
The Gordon Life Science Institute was established in 2003 and its cradle was in San Diego of California, USA. Its mission is to develop and apply new mathematical tools and physical concepts for understanding biological phenomena. For a briefing about its history and philosophy, click https://gordonlifescience.org/GordonLifeScience.html.
The Institute is a newly emerging academic organization in the Age of Information and Internet, founded by Professor Dr. Kuo-Chen Chou, right after he was retired from Pfizer Global Research and Development in 2003. Its mission is to develop and apply new mathematical tools and physical concepts for Open Access Natural Science understanding biological phenomena.
The Institute's name reflects an interesting historical story. After the Cultural Revolution, China started to open its door, the founder was invited by Professor Sture Forsén, the then Chairman of Nobel Prize Committee, to work in Chemical Center of Lund University as a Visiting Professor. To make Swedish people easier to pronounce his name, Professor Chou used "Gordon" as his name in Sweden. About a quarter of century later, the same name was used for the Institute, meaning that "Reform and Opening" and "Free Communication" can stimulate a lot of great creativities.
The current liaison site of Gordon Life Science Institute is in Boston of Massachusetts, USA; gls@gordonlifescience.org.

MISSION AND ORGANIZATION
The Institute has no physical boundaries. Its members do not have to work in a same building or campus. Distributed over different countries of the world (Figure 1), they shall freely collaborate, exchange ideas, and share information and findings via a variety of modern communication methods. This versatile system allows the members to focus completely on science without having to cope with the troubles in obtaining visas and in paying for relocation expenses, among many others.
The Gordon Life Science Institute is a non-profit organization. It is a gift to science and human beings. Its founding principle is to pursue the excellence in science: anyone who has proved his/her creativity in science can become a member regardless of his/her age, occupation, and nationality. Accordingly, the Institute has provided an ideal society or organization for those scientists who really dedicate themselves to science and loving science more than anything else. In the friendly door-opened Institute, these scientists can maximize their time and energy to engage in their scientific creativity.
Members of the Institute believe science would be more truthful and wonderful if scientists do not have to spend a lot of time on funding application. We also note that great scientific findings and creations in history were often made by those who were least supported or funded but driven by interesting imagination and curiosity. As pointed out by Albert Einstein, "Imagination is more important than knowledge. For knowledge is limited, whereas imagination embraces the entire world, stimulating progress, giving birth to evolution".  . A schematic illustration to show a peptide in good fitting and tightly binding with the enzyme's active site before it is cleaved by the latter. Adapted from [301] with permission. Figure 3. Schematic drawing to illustrate the "Distorted Key" theory, where panel (a) shows an effective binding of a cleavable peptide to the active site of a disease-causing enzyme, while panel (b) the same peptide has become a non-cleavable one after its scissile bond is modified although it can still bind to the active site. Such a modified peptide, or ''distorted key", will automatically become an inhibitor candidate against the disease-causing enzyme. Adapted from [301] with permission.

Introduction of Wenxiang Diagram
Using graphic approaches to study biological and medical systems can provide an intuitive vision and useful insights for helping analyze complicated relations therein, as indicated by many previous studies on a series of important biological topics (see, e.g., [308]). The "wenxiang" diagram ( Figure 4) [309,310] is a Natural Science . Schematic drawing to show the "wenxiang diagram". Adapted from [309] with permission. special kind of graphical approach, which is very useful for in-depth studying protein-protein interaction mechanism [311,312]. Also, the wenxiang diagram has also been used to study drug-metabolism system [313]. The name of "wenxiang" came from that its shape looks quite like the Chinese wenxiang (蚊香), a coil-like incense widely used in China to repel mosquitos. In the wenxiang graphs each residue is represented by a circle with a letter to indicate its code: a hydrophobic residue is denoted by a filled circle with a white code symbol, a hydrophilic residue is denoted by an open circle with a black code symbol, whereas the invalid residue is denoted by a yellow-filled circle.

Predictors for Multi-Label Systems
Information of subcellular localization for a protein is indispensable for revealing its biological function. Therefore, one of the fundamental goals in molecular cell biology and proteomics is to determine the subcellular locations of proteins in an entire cell. Before 2007, most efforts in this regard were focused on the single-label system by assuming that each of the constitute proteins in a cell had one, and only one, subcellular location (see, e.g., [314][315][316][317][318]). However, with more experimental data uncovered, it has been found that many proteins may simultaneously occur or move between two or more location sites in a cell and hence need multiple labels to mark them. Proteins with multiple locations are also called multiplex proteins [319,320], which are often the special targets for drug development [320][321][322][323][324][325][326]). Therefore, how to deal with this kind of multi-label systems is a critical challenge. To take the challenge, the Institute has developed the following four series of predictors: 1) [320,[327][328][329][330][331][332][333]; 2) [334][335][336][337][338][339]; 3) [203,204,215,[224][225][226]340]; 4) [227-230, 254, 265, 266]. All these predictors have yielded very high success rates, both globally and locally, as summarized in a comprehensive review paper [341]. In studying the multi-label systems, we need two kinds of metrics to measure performance quality of a predictor: one is for the accuracy of global prediction and the other for the accuracy of local prediction [342]. As a showcase, let us consider the multi-label predictor of pLoc_bal-mHum [229], which was developed for studying the 14 organelles or subcellular locations ( Figure 5) in a human cell. 1) Click the link at http://www.jci-bioinfo.cn/pLoc_bal-mHum/, you'll see the top page of the predictor prompted on your computer screen ( Figure 6). 2) You can either type or copy/paste the sequences of query human proteins into the input box at the center of Figure 6. The input sequence should be in the FASTA format. You can click the Example button right above the input box to see the sequences in FASTA format. c) Click on the Submit button to see the predicted result; e.g., if you use the four protein sequences in the Example window as the input, after 10 seconds or so, you will see a new screen (Figure 7) occurring. On its upper part are listed the names of the subcellular locations numbered from (1) to (14) covered by the current predictor. On its lower part are the predicted results: the query protein "O15382" of example-1 corresponds to "10", meaning it belongs to "Mitochondrion" only; the query protein "P08962" of example-2 corresponds to "8, 13", meaning it belongs to "Lysosome" and "Plasma membrane"; the query protein "P12272" of example-3 corresponds to "2, 6, 11", meaning it belongs to "Cytoplasm", "Extracellular", and "Nucleus". All these results are perfectly consistent with experimental observations.

Five-Steps Rule
The Institute was the birth place of the famous 5-steps rule [278], which has been used in nearly all the areas of computational biology [203, 204, 215, 224-230, 233, 251, 254-256, 259-261, 264, 265, 283, 285, 294, 340, 341, 343-382]), material science [383], and even the commercial science (e.g., the bank account systems). The only difference between them is how to formulate the statistical samples or events with an effective mathematical expression that can truly reflect their intrinsic correlation with the target to be predicted. It just likes the case of many machine-learning algorithms. They can be widely used in nearly all the areas of statistical analysis.
Working in such Institute filled with this kind of philosophy and atmosphere, the scientists would be more prone to be stimulated by the eight pioneering papers from the then Chairman of Nobel Prize Committee Sture Forsen [384][385][386][387][388][389][390][391] and many of their follow-up papers [172,189,310,311,, so as to drive them substantially more creative and productive.

CONCLUSION AND PERSPECTIVE
In comparison with the conventional institutes, Gordon Life Science Institute has the following unique advantages: it can 1) attract those scientists who are really loving science more than anything else; 2) maximize their creativity in science and minimize the distraction or disturbance caused by the relocation and various followed-up tedious things; 3) provide them with an ideal environment to completely focus on doing science; 4) drive their motivation by insightful imagination and intriguing curiosity; and 5) create the atmosphere to guide their scientific results more truthful, fantastic, wonderful, and awesome.
Accordingly, it would not be surprising to see that a total of five members of Gordon Life Scientist have been selected by Clarivate Analytics as Highly Cited Researcher or HCR (see Section 3), indicating that for the ratio of HCR per member, the "Gordon Life Science Institute" has already exceeded the "Broad Institute of MIT and Harvard, USA", becoming the top in the world.