The Ethical Implications for Humans in Light of the Poor Predictive Value of Animal Models

The notion that animals could be used as predictive models in science has been influenced by relatively recent developments in the fields of complexity science, evolutionary and developmental biology, genetics, and evolutionary biology in general. Combined with empirical evidence, which has led scientists in drug development to acknowledge that a new, nonanimal model is needed, a theory—not a hypothesis—has been formed to explain why animals function well as models for humans at lower levels of organization but are unable to predict outcomes at higher levels of organization. Trans-Species Modeling Theory (TSMT) places the empirical evidence in the context of a scientific theory and thus, from a scientific perspective, the issue of where animals can and cannot be used in science has arguably been settled. Yet, some in various areas of science or sciencerelated fields continue to demand that more evidence be offered before the use of animal models in medical research and testing be abandoned on scientific grounds. In this article, I examine TSMT, the empirical evidence surrounding the use of animal models, and the opinions of experts. I contrast these facts with the opinions and positions of those that have a direct or indirect vested interest—financial or otherwise—in animal models. I then discuss the ethical implications regarding research constructed to find cures and treatments for humans.


Introduction
Before I analyze the use of animals in scientific research and testing, I need to delineate the areas of science R. Greek where animals are currently used.As shown in Table 1, animals are used in various areas of science.
Category 1, "Animals are used as predictive models of humans for research into such diseases as cancer and AIDS" and category 2, "Animals are used as predictive models of humans for testing drugs or other chemicals," are both examples of using animals as predictive models for humans.This implies that such models have a high predictive value for human response.In other words, if the drug kills the animal model, animal modelers assume it will kill you and vice-versa.Categories 3-9 make no such claim.It has been my contention that while animal models are successfully used per categories 3-9, they cannot be used to predict human response to drugs and disease because their predictive value is so low [1]- [20].I have used the formula for sensitivity and positive predictive value (PPV) to support this contention as most of the literature available has not presented data sufficient to calculate specificity and negative predictive value (NPV) or likelihood ratios (LRs).NPV and specificity were available from some studies and in those cases I used those values, but most studies simply did not provide the needed data.Table 2 shows how these values are calculated.On other occasions, the studies listed a more crude evaluation of predictive values, but the crude value was sufficient to make the conclusion that animal models failed in terms of predictive value.
Table 1.Nine categories of animal use in science and research [19]. 1 Animals are used as predictive models of humans for research into such diseases as cancer and AIDS.
2 Animals are used as predictive models of humans for testing drugs or other chemicals.
3 Animals are used as "spare parts", such as when a person receives an aortic valve from a pig.

4
Animals are used as bioreactors or factories, such as for the production of insulin or monoclonal antibodies, or to maintain the supply of a virus.

5
Animals and animal tissues are used to study basic physiological principles.
6 Animals are used in education to educate and train medical students and to teach basic principles of anatomy in high school biology classes.
Animals are used as a modality for ideas or as a heuristic device, which is a component of basic science research.
8 Animals are used in research designed to benefit other animals of the same species or breed.9 Animals are used in research in order to gain knowledge for knowledge sake.People have opposed the use of animals in testing, research, and science in general for millennia.Many people who experimented on animals or wrote about the history of such referred to the practice as vivisection, meaning cutting into the living [21]- [25].Schiller states: Vivisection was an ancient tradition and its roots went back to dissection.Its original aim was living anatomy.Herophilus and Erasistratus of Alexandria are known to have practised it in the third century, B.C.According to Celsus they even used human subjects.In the 18th century Maupertuis attempted to justify the vivisection of criminals for the benefit of mankind.Galen was the real promoter of the method and he attempted to establish animal vivisection as the foundation of physiology [21].
Claude Bernard, the father of modern-day animal modeling, referred to what he did as vivisection ( [26], p. 19,104).Darwin also referred to the practice as vivisection: "In the agony of death a dog has been known to caress his master, and every one has heard of the dog suffering under vivisection, who licked the hand of the operator; this man, unless the operation was fully justified by an increase of knowledge, or unless he had a heart of stone, must have felt remorse to the last hour of his life" ( [27], p. 90).Only recently has the term been disowned by the vivisection community.
The opposition to vivisection increased dramatically in the mid-1800s with the popularization of the practice by Bernard.Bernard was a French physiologist and conducted open demonstrations of vivisection along with public experiments involving vivisection.Bernard's wife and two daughters formed one of the first French anti-vivisection societies after they found that Claude had vivisected the family dog.
Bernard was a strict causal determinist, meaning that if X caused Y in a monkey it would also cause Y in a human.Bernard states: "Physiologists ... deal with just one thing, the properties of living matter and the mechanism of life, in whatever form it shows itself.For them genus, species and class no longer exist.There are only living beings; and if they choose one of them for study, that is usually for convenience in experimentation" ( [26], p. 111).
Bernard continues: Now the vital units, being of like nature in all living beings, are subject to the same organic laws.They develop, live, become diseased and die under influences necessarily of like nature, though manifested by infinitely varying mechanisms.A poison or a morbid condition, acting on a definite histological unit, should attack it in like circumstances in all animals furnished with it; otherwise these units would cease to be of like nature; and if we went on considering as of like nature units reacting in different or opposite ways under the influence of normal or pathological vital reagents, we should not only deny science in general, but also bring into zoology confusion and darkness... Experiments on animals, with deleterious substances or in harmful circumstances, are very useful and entirely conclusive for the toxicity and hygiene of man.Investigations of medicinal or of toxic substances also are wholly applicable to man from the therapeutic point of view; for as I have shown, the effects of these substances are the same on man as on animals... ( [26], pp.[124][125].
Clearly, Bernard is stating that any reaction observed in a mouse or dog will also be seen in humans.This was, and still is in some quarters, the very high predictive value that animal modelers assigned their model.As I will discuss, the fields of evolutionary biology and complexity theory, along with empirical evidence, have proven Bernard's deterministic view naïve.
One reason for Bernard's deterministic view was that he was a creationist and this aspect of Bernard's beliefs is not unimportant.Bernard was not a creationist in the current sense of the word, but he did reject evolution.LaFollette and Shanks: Moreover, certain types of creationism may involve a commitment to the interchangeability of species.Those who think that all creatures are products of a designer would likely assume-on grounds of ontological simplicity-that the designer took the same basic stock of parts and re-arranged them to produce different species.Certainly this was one response to the discovery of homologous structures by 19th century comparative anatomists: what Darwin would see as evidence of descent with modification, creationists were apt to see as evidence of a designer's variations on a basic common blueprint.According to creationists, the main difference between men and animals was merely that the designer added an extra ingredient-a soul.But the basic body parts remained constant.Under these assumptions, if we knew how a rat's liver functioned, we would likewise know how a human liver functioned (once we had adjusted for differences in size and weight) [28].
People in Bernard's day thought human parts and animal parts were more or less interchangeable.Humans had souls and were sentient; otherwise animals and humans were identical.This view of creationism explains, at least in part, why Leonard Bailey transplanted the heart of baboon into Baby Fae ( [29], pp.[162][163]. The controversy regarding vivisection initially centered on ethics and compassion, but some anti-vivisectionists began criticizing vivisection on scientific grounds.All in all, most of those science-oriented criticisms turned out to be wrong.For example, the Germ Theory of Disease was just being developed and various diseases were being discovered that ultimately were found to be caused by germs such as bacteria.Humans and animals can both be infected with various "germs" and in some cases the responses and treatments are similar.(See reference [15] on why this is the case.)So it should come as no surprise that a superficial study of infectious diseases resulted in the conclusion that animal models had predictive value for medical science.Had cancer or coronary artery disease been the primary killer of the 19th century, vivisection would probably have struggled and eventually been abandoned by science and society alike.Regardless, a superficial examination (or an examination on the gross level-the level seen with the naked eye) revealed many similarities between species.This was the era in which vivisection thrived.
Just as was the case in the 1800s, some people today reject vaccines, the Germ Theory of Disease, along with science in general, in part because, from their perspective, all of this is associated with vivisection.Anti-vivisectionists of the 1800s over-reacted in criticizing the science of vivisection.For example, one anti-vivisectionist surgeon stated that performing surgery on animal tissues had made him unfit to work with human tissues.This was absurd.One ties sutures, controls bleeding, and handles the tissues in more or less the same fashion whether the patient is a dog or human.Moreover, many anti-vivisectionists claimed that knowledge could never come from the vivisection lab because knowledge would not allow itself to come from such evil.Along the same lines, the anti-vivisectionist and theosophist Anna Kingsford thought she had killed Claude Bernard by throwing her psychic energy at him.Theosophy was popular in 19th-century Europe and an anti-science mentality fit into this worldview.
In conclusion, early vivisectors learned much about the things that mammals and humans have in common.Most of that could probably have been learned from human-based study, but that is a topic for another time.In any event, there is no doubt that the fundamentals of mammalian anatomy and physiology were discovered, or could have been discovered, from vivisection.
By 1900 however, vivisection had a track record and some of the conclusions from the animal lab had been shown false.From the early 1900s to about 2000, the anti-vivisection community largely pointed out where vivisection studies failed and justified their position that vivisection was scientifically flawed on the basis of these failures.Such was not a bad position.In science the burden of proof is on the claimant (in this case the vivisection community that claims animal models have predictive value-more on that momentarily).If one can show enough examples of failure, when the practice in question is claimed to be predictive, or in cases when the response is claimed to always be identical to humans, then these examples count.Examples in the forms of supported case reports can invalidate a scientific claim while case reports alone cannot prove a scientific claim.Proof requires data, usually in the form of controlled studies.
The position of the anti-vivisectionists was also reasonable compared to the claims of the vivisection community.If the vivisection community had claimed that animal models helped them form hypotheses for testing in humans, the anti-vivisectionists of the 20th century would have had little to criticize from a scientific perspective.Failures are to be expected for hypotheses.But the vivisection community over-reached in their claims just as the anti-vivisection community of the 19th century had over-reacted with theirs.William Osler's statement, in a 1907 address to the Congress of American Physicians and Surgeons, implies that animal models are of predictive value for human response to drugs and disease: The limits of justifiable experimentation upon our fellow creatures are well and clearly defined.The final test of every new procedure, medical or surgical must be made on man, but never before it has been tried on animals.... For man absolute safety and full consent are the conditions which make such tests allowable.We have no right to use patients entrusted to our care for the purpose of experimentation unless direct benefit to the individual is likely to follow.Once this limit is transgressed the sacred cord which binds physician and patient snaps instantly [30].
Statements and positions illustrated by the above would discomfit vivisectors as more and more differences in response to drugs and disease surfaced between species.By the last few decades of the 20th century, there were vast numbers of studies and examples demonstrating that animal models clearly failed in terms of predicting human response to drugs and disease.From a scientific perspective however, anti-vivisectionists still lacked one important piece of the puzzle.Vivisectors frequently acknowledged that animal model X did not in fact fulfill the criteria as a good predictor of human response.However, they went on to assure society that there was another animal model being sought or invented that would succeed where animal model X had failed.(Today, this argument takes the form of genetically modified animals.)Anti-vivisectionists had no science-based counter to the argument that a better animal model was possible and hence should be sought.They needed a scientific theory to account for the failures and successes of animal models and that would also answer the question of whether trans-species extrapolation would ever be possible.The empirical evidence proved that current animal models had failed but this did not prove the paradigm was destined to fail.
In the mid-20th century, evolutionary biologists started advising caution to their vivisectionist colleagues regarding the latters' claims for animal models [31].Even then, the better evolutionary biologists knew that the odds were against animal models in general in terms of having predictive value for human response to drugs and disease.There were just too many differences between species.These differences were not inconsequential; these differences were the reason there were different species in the first place.Vivisectors largely ignored the advice.
Also in the mid-20th century, two new fields of physics were being developed.Chaos and complexity arguably dated back to the turn of the 20th century, but the real work began in the 1950s and 1960s.Today, chaos is considered a division of complexity studies and both have revolutionized physics.Animals and humans are examples of evolved, complex systems.That statement summarizes the problems with animal models.Complex systems are highly dependent upon initial conditions and the reason we have different species is because the initial conditions (genetic make-up) change (in the form of mutations and changes in gene regulation, and so forth).The fact that animals and humans are evolved, complex systems, along with some knowledge from genetics, means that animal models will never be of predictive value for human response to drugs and disease.No matter how many genes one adds or deletes, the background genes will differ among species, as will other initial conditions and perhaps even emergent phenomena.
Everyone, however, does not agree that knowledge from the fields of evolutionary biology and complexity science has added to the empirical data adequately to justify, as of say, November 2013, abandoning animal models for drug and disease response.I will now analyze the position of three such people, reported in their December 2013 article [32].
Jarrod Bailey, Michelle Thew, and Michael Balls published an article titled "An Analysis of the Use of Dogs in Predicting Human Toxicology and Drug Safety" [32], which appeared online in December of 2013, that analyzed the predictive value of using dogs in drug testing for toxicity.A dataset of 2366 drugs from a drug company was made available to them and they used it to calculate likelihood ratios (LRs) for toxicity.This is good research, which supports numerous, previous studies.Unfortunately Bailey et al. demeaned the value of previous studies where positive predictive value (PPV) was or could have been calculated.Additionally, they ignored the importance of theory when evaluating a practice like animal modeling.In this article, I will review the literature prior to the Bailey et al. article (prior to November of 2013) and attempt to place the contribution of their paper in context, as well as defend the importance of theory in science.I begin by exploring our current knowledge of evolutionary biology and complex systems.

What Was Known Regarding the Predictive Value of Animal Models Prior to the
Publication by Bailey et al.?

Evolved Complex Systems
Humans and animal are examples of evolved, complex systems.As I have addressed this concept many times [2]- [7] [10]- [18] [20], I will here briefly describe the characteristics of such systems.Systems in general can be classified as simple, chaotic, or complex.Characteristics of complex systems include the following: • Complex systems are adaptive.They interact with and adapt to their environment.Evolution is an example of this adaptive quality.• Complex systems have chaotic subsystems.Chaos is a discipline of complexity science and is best known for dependence on initial conditions-the butterfly effect-and the fact that even though the systems are deterministic, they are not predictable.Perhaps the most well-known example of dependence on initial conditions is the graph from Lorenz that was first associated with chaotic systems (Figure 1).
• Complex systems are highly dependent on initial conditions.• Complex systems manifest emergent properties.Even complete knowledge of all the components would be insufficient to discover the emergent properties of the system.This means that biological complex systems cannot be fully understood by reductionism alone.This has led to the field of study known as systems biology.• Complex systems have feedback loops.
• Complex systems demonstrate a hierarchy of organization.These levels of organization are important because, as we discussed in "Animal Models and Conserved Processes" [12], animal models have predictive value at lower levels of organization.For example, fundamental particles act the same regardless of where in the complex system those particles are.Likewise, the laws of physics apply grossly to animals and human equally: gravity will act equally on a frog and human when dropped out of an airplane.The force of impact will also result in similar outcomes.• Complex systems are composed of many components.
• Some of the components can be organized into modules.
• Complex systems are non-Gaussian.They do not necessarily demonstrate the typical bell curve of distribution.• Complex systems exhibit nonlinearity in response to perturbations.Small changes in input may lead to large changes in output, while large changes in input may lead to small or no changes in output.At one time, a perturbation might elicit large response but at another time no response.Furthermore, two seemingly identical complex systems might respond oppositely to the same perturbation.• Complex systems are thought to be nonsimulable.This is one reason it is so difficult to predict outcomes in humans that have been exposed to drugs or disease.• Complex systems demonstrate redundancy.The loss of a component does not necessarily mean the entire system is incapacitated.• Complex systems demonstrate robustness.The system is resistant to change, in part because of the redundancy of the system.• Complex systems demonstrate self-organization.
• The whole is greater than the sum of parts of a complex system.This also limits what can be learned about a complex system through the use of reductionism.(For more on complexity see [33]- [47].)The above characteristics must be considered in light of the fact that animals and humans evolved.Basically, evolution changes the initial conditions, the genetic makeup, of an organism and complex systems are highly dependent on initial conditions.Perhaps the best example of small differences in initial conditions resulting in very different outcomes comes not from animal-human comparisons but from intra-human variability in responses to drugs and disease.The area of research known as personalized medicine stems from these differences [48]- [70].
Physicians have known for decades that great variability exists between men and women [71]- [82], and among ethnicities [65] [76] [81] [83]- [95] in terms of response to drugs and disease.It was long suspected and recently confirmed that monozygotic twins differ in disease susceptibility and drug response [96]- [114].These dramatic differences, such as when only one twin contracts schizophrenia or multiple sclerosis, are due to very small differences in genetic makeup-differences in initial conditions.When the consequences of very small changes in human genetic makeup-such as single nucleotide polymorphisms, copy number variants, the effect of gene deletion, differences in genetic regulation and expression, differences in gene and protein networks, alternative splicing, background and modifier genes, pleiotropy, and mutations in general-are considered, it should come as no surprise that the even greater genetic differences between species will result in dramatically different outcomes to perturbations like diseases and drugs.More on this momentarily.

Empirical Evidence
Numerous studies have compared the outcomes from drugs or disease in humans with outcomes in various animal species and strains.(The following references compose a very partial list [115]- [139].Moreover, there is a history of failures in specific areas such as HIV vaccine research [7] [140]- [142] and neuroprotection [143]- [153].Historically, there are also a plethora of failures such as the response to poliovirus [154]- [156].The instances in which animal models have responded similarly have either been when evaluating efficacy for treating a third complex systems, such as was the case for anti-bacterials [15], or after a long series of failures when a model was discovered that responded similarly to humans but only in this one case [6] [15].Such a success rate does not yield numbers consistent with even a moderate predictive value.

Opinions of Experts
While the argument from authority, isolated from other factors, is an example of fallacious reasoning, the opinions of experts, especially when there is consensus, should be considered.A consensus exists among scientists inside and outside of drug development that animal models have no predictive value [2]- [4]  has not yet led to the introduction of truly novel pharmacological approaches to the treatment of central nervous system disorders.This situation has been partly attributed to the difficulty of predicting efficacy in patients based on results from preclinical studies.Few would dispute the need to move away from the concept of modeling CNS diseases in their entirety using animals.However, the current emphasis on specific dimensions of psychopathology that can be objectively assessed in both clinical populations and animal models has not yet provided concrete examples of successful preclinical-clinical translation in CNS drug discovery [200].
The FDA has acknowledged the need to make toxicology science-based [247], with FDA Commissioner Margaret Hamburg echoing MacDonald and Robertson, stating: "Most of the toxicology tools used for regulatory assessment rely on high-dose animal studies and default extrapolation procedures, and have remained relatively unchanged for decades, despite the scientific revolutions of the past half-century" [248].Elias Zerhouni, former director of NIH and current head of R&D at Sanofi, was quoted in the June 25, 2012 issue of Forbes as saying: "R&D in pharma has been isolating itself for 20 years, thinking that animal models would be enough and highly predictive, and I think I want to just bring back the discipline of outstanding translational science, which means understand the disease in humans before I even touch a patient."Zerhouni was also quoted in NIH Record in 2013 as stating the following: "We have moved away from studying human disease in humans," [Zerhouni] lamented."We all drank the Kool-Aid on that one, me included."With the ability to knock in or knock out any gene in a mouse-which "can't sue us," Zerhouni quipped-researchers have over-relied on animal data."The problem is that it hasn't worked, and it's time we stopped dancing around the problem…We need to refocus and adapt new methodologies for use in humans to understand disease biology in humans" [250].
The above can be easily multiplied and numerous examples from the second half of the 20th century listed.The notion that animal models lack predictive value is not new to the 21st century.When the opinions of experts are combined with the empirical evidence, evolutionary biology, and complexity science, a compelling case exists against using animal models for their predictive value.I now turn to combining these areas into a theory of science.

Trans-Species Modeling Theory
The term theory is commonly misrepresented by scientists and nonscientists therefore the following two explanations are appropriate.The National Academy of Sciences (USA), explains theory as follows: In everyday usage, "theory" often refers to a hunch or a speculation.When people say, "I have a theory about why that happened," they are often drawing a conclusion based on fragmentary or inconclusive evidence.The formal scientific definition of theory is quite different from the everyday meaning of the word.It refers to a comprehensive explanation of some aspect of nature that is supported by a vast body of evidence.Many scientific theories are so well established that no new evidence is likely to alter them substantially.One of the most useful properties of scientific theories is that they can be used to make predictions about natural events or phenomena that have not yet been observed ( [253], p. 11).
The American Association for the Advancement of Science (AAAS) states: In detective novels, a "theory" is little more than an educated guess, often based on a few circumstantial facts.In science, the word "theory" means much more.A scientific theory is a well-substantiated explanation of some aspect of the natural world, based on a body of facts that have been repeatedly confirmed through observation and experiment.Such fact-supported theories are not "guesses" but reliable accounts of the real world.The theory of biological evolution is more than "just a theory."It is as factual an explanation of the universe as the atomic theory of matter or the germ theory of disease.Our understanding of gravity is still a work in progress.But the phenomenon of gravity, like evolution, is an accepted fact [254].
Trans-Species Modeling Theory (TSMT) states: "While trans-species extrapolation is possible when perturbations concern lower levels of organization or when studying morphology and function on the gross level, one evolved, complex system will not be of predictive value for another when the perturbation affects higher levels of organization" [16].TSMT was formulated by the author and is not yet a universally accepted theory like the Theory of Evolution and the Germ Theory of Disease.However, the basis, or components, of TSMT are universally accepted.TSMT is based on the Theory of Evolution and Complexity Theory (or complexity science) and has been tested in the form of the empirical evidence, comparing animal to human outcomes to perturbations.
In the early days of animal-based research, the researchers discovered that the general morphology and function of organs was the same in species with a recent common ancestor.For instance, the pancreas is involved in regulating sugar levels in the body in mammals.Evolution predicts this to be the case-general morphology will be very similar the closer two species are to their common ancestor.For example, humans and chimpanzees look more like each other than either looks like a fish.But evolution also predicts that on a finer level of examination, the details regarding how an organ functions and responds to perturbations will vary.For example, HIV infects both humans and chimps but the response to the infection is dramatically different-a mild cold versus death if untreated.Likewise, the heart pumps blood in pigs and humans but only humans suffer myocardial infarctions due to intra-coronary plaque.Superficial similarities do not imply the same disease or the same mechanism.Trichotillomania, the pulling out of one's hair in humans responds to behavioral modification and anti-depressants.When cats do the same it is usually due to allergies, for example flea allergy, and is treated by eliminating the fleas or desensitization to the allergen.Humans and other mammals have brains and humans are naturally predisposed to experience cerebral ischemia-strokes.Cerebral ischemia can be induced in animals and numer-ous drugs have been shown efficacious in preventing long terms brain damage in animals.Such drugs have consistently been shown inefficacious in humans.
Greek and Hansen, discussing how TSMT fulfills Popper's criteria of a scientific theory [255], state: 1) It is supported by a vast amount of evidence.Where direct comparisons are possible-such as with drug toxicity, efficacy, bioavailability, and so forth-definitive evidence exists to support the concerns raised by trans-species modeling theory... [For references see original.]If two evolved CASs [complex adaptive systems] are not predictive for each other in these areas, what changes in evolution would account for their being predictive in other areas such as HIV, ALS, and cancer?Indeed the empirical evidence in these areas agrees with that from the drug development literature.
2) The prediction that such systems should not be of predictive value contained risk in that evolution has followed common pathways and therefore many similarities should and do exist.Based on TSMT, we should expect agreement among species when the perturbation affects only the lower levels of the hierarchy of organization (for example, the laws of physics affect all mammals equally).However, as one moves into the higher levels of organization the perturbations should be expected to result in varying responses both qualitatively as well as quantitatively... [See original for references.]Empirically, we find those two predictions are fulfilled.
3) It prohibits animal models from predictive value at higher levels of organization and this has been quantified.
4) It is refutable.For example, a species or strain that correlated with human data regarding known teratogens >95% of the time and that correctly predicted novel teratogens a similar percentage of the time would falsify the theory.
5) It has been tested many times in disciplines ranging from infectious diseases, cancer, toxicity, neurology, and drug efficacy to teratogenicity.
6) Confirming evidence has come from many and varied disciplines involved in animal use: research on heart disease, sepsis, trauma, and anesthesiology.
7) The theory is straightforward and has no ad hoc features [16].TSMT summarizes the current scientific knowledge regarding when animal models have predictive value and when they do not.(Please see reference 16 for more on TSMT.)But, it sometimes takes theories in science decades to be accepted by other scientists and society.For example, Climate Change Theory, also known as global warming, is vigorously opposed by some [256] [257].It also took time for the Germ Theory of Disease to be accepted [258] [259].

Animal Modelers Claim a High Predictive Value for Animal Models
Claude Bernard's 19th-century position that animal models were of high predictive value has not changed among current animal modelers.Consider the following from Gad: Biomedical sciences' use of animals as models [is to] help understand and predict responses in humans, in toxicology and pharmacology by and large animals have worked exceptionally well as predictive models for humans.Animals have been used as models for centuries to predict what chemicals and environmental factors would do to humans.The use of animals as predictors of potential ill effects has grown since that time.If we correctly identify toxic agents (using animals and other predictive model systems) in advance of a product or agent being introduced into the marketplace or environment, generally it will not be introduced.The use of thalidomide, a sedative-hypnotic agent, led to some 10,000 deformed children being born in Europe.This in turn led directly to the 1962 revision of the Food, Drug and Cosmetic Act, requiring more stringent testing.Current testing procedures (or even those at the time in the United States, where the drug was never approved for human use) would have identified the hazard and prevented this tragedy [260].
(See reference number [6] for a rebuttal of Gad's sentiment regarding thalidomide.)Gad is representative of the community of scientists, politicians, and spokespeople that have a vested interest in animal modeling, as typified by the following statements.David Willetts, Science Minister of the UK, states: "The Government is committed to working to reduce the use of animals in scientific research, but we do recognise that there remains a strong scientific case for the careful regulated use of animals in scientific research and that this does play a role in ensuring new medicines are safe and effective" [261].

Hart et al. state:
Nonhuman primates (NHPs) are important models in preclinical research enabling understanding of pathogenic mechanisms in human disease that readily translate into therapy development.Marmoset colonies are outbred reflecting the genetic heterogeneity of the human population, although differences exist compared with humans.Marmoset disease models are appropriately complex and their use requires in-depth knowledge of marmoset biology and optimal laboratory management [262].
Cheng states: "Animal tests are necessary for some research, such as testing drugs for toxicity.It would be, in my opinion, improper to release drugs for human use without animal testing" [263].Vassar states: "Chronic dosing in mice and monkeys is necessary to show the efficacy and safety of the antibody before it's taken into humans" [264].Rigmor Thorstensson, Head of the Department of Virology, Immunology and Vaccinology, at SMI in Sweden wrote in an article titled "Medical research on apes is no ethical problem for me" [265]: The ethical reasons against animal testing must be weighed against the evidence that more and more people across the globe can have access to effective drugs and vaccines.If these were tested in clinical trials without first undergoing animal testing large numbers of people risking their lives in such studies and the development could also be delayed catastrophic.For me it is no ethical problem of using monkeys in experiments, it is the only way to produce an effective vaccine against the major global infectious diseases, HIV, tuberculosis and malaria [265].
On March 25, 2011, a letter from Andrew B. Rudczynski, Yale University's associate vice president for research administration, was published in the New Haven Register, which contained the following: "Contrary to claims in a letter to the editor, the basic research model used by Yale University and its peer institutions is scientifically valid and predictive of human disease" [266].(For more on the use of animals in basic research see reference [3].)Hau states: "A third important group of animal models is employed as predictive models.These models are used with the aim of discovering and quantifying the impact of a treatment, whether this is to cure a disease or to assess toxicity of a chemical compound" [267].The Committee on Applications of Toxicogenomic Technologies to Predictive Toxicology and Risk Assessment stated in 2007: Animal models offer important experimental research opportunities to understand how genetic factors influence differential response to toxicologic agents.Animal models are advantageous as a first line of research because they are less expensive, less difficult, and less time-consuming than human studies.In addition, animal studies can address questions that are almost insurmountable in human studies, such as questions about sporadic effects or effects that cannot be adequately examined for sex linkage because of sex bias in employment ( [268], p. 123]. The above could be easily multiplied and firmly establishes that animal modelers assign high predictive value to animal models.Contrast the above with the examples of statements from scientists involved in both drug development and animal modeling from Section 2.3.The contrast between the two should be kept in mind when evaluating the following statements from Bailey et al.In the remainder of Section 3, I will quote from Bailey et al. [32] and compare and contrast TSMT with the authors' statements.In previous research into the reliability of animal models as predictors of toxicity in humans, some authors (e.g. 9) have focused on the sensitivity, expressed as the "true positive concordance rate", or the so-called Positive Predictive Value (PPV), given by a/(a + b) [see Table 2], which reflects the probability that human toxicity was correctly identified by the animal model, given that toxicity was observed in the animal model (e.g.12).However, neither of these metrics is suitable for the role of assessing the evidential weight provided by any toxicity test...The analysis presented here is urgently required, to support informed debate about the worth of animal models in preclinical testing.It is acknowledged among some stakeholders (if not universally among all stakeholders) that assessment of the scientific value of animal data in drug development is necessary, has been scarce, and has been thwarted for decades by the unavailability of relevant data for analysis (e.g.14).Nevertheless, primarily due to concerns over privacy and commercial interests, data sharing and making data available continue to be resisted, in spite of assurances to the contrary from industry (14).

Where Does the
Reference [12] in the above quotation is "Systematic Reviews of Animal Models: Methodology versus Epistemology" by Greek and Menache [17].In that article, as well as in many others, Greek et al. explain why PPV is useful in determining the predictive value of animal models.But, before I go into detail regarding PPV, let's take closer look at the language of Bailey et al.The phrase "the so-called Positive Predictive Value (PPV)" is disingenuous as PPV is a routinely used calculation.Prefacing it with "so-called" is an example of poisoning the well, as "so-called" immediately places the legitimacy of PPV into doubt.
Bailey et al. make clearer their position that without the LRs calculated in their article, no definitive data existed on the predictive value of dogs toxicity testing: "However, only limited evaluations of the reliability of the canine model for this purpose have been conducted, chiefly due to the difficulty of accessing relevant data, most of which are unpublished and proprietary to pharmaceutical companies" [32].While it is true that volumes of proprietary drug company data exist, there have been exceptions to the rule that none has been made available.Data from Phase I, II, and III trials, in the form of attrition rates [130] [135] [136], reflects directly on the animal models in terms of efficacy and safety as every drug was both safe and effective on some animal species or strain and the drug development community acknowledges that animal models are the primary method used to evaluate safety and efficacy [129]- [131] [135]- [137] [164] [173] [187] [197] [208] [228] [269]- [271].As Bailey et al. point out, most of the time the dog and a rodent species were used in these studies.Given the extremely high attrition rates (90% -95%), as Bailey et al. also allude to, one could logically, as well as scientifically, conclude that the dog has no predictive value for human response.
Furthermore, direct human-to-animal comparisons have been made for efficacy and toxicity and the predictive values found to be very low [115] [117]- [123] [126] [128] [132] [138].(Also see references in Section 2.2.)Many of these studies measured toxicity as a single entity.Moreover, they counted as a positive any animal that exhibited the same toxicity as a human.This is a very generous interpretation of the predictive value in terms of toxicity and animal models.If these toxicities had been separated by species, the total number of positive hits per species would have been much less and thus the PPV less.Therefore, even with unsophisticated studies such as some of the above, the PPV was low and would have been lower had better methods been used.Sophisticated measurements are not always required in order to ascertain facts.Very rarely are people studied as they jump out of an airplane without a parachute and they almost never have monitors in place such as EKGs, pulse oximetry, end tidal carbon dioxide detectors, and pulmonary artery catheters.Yet, scientific consensus remains that jumping out of an airplane flying over 3000 feet above ground level is very likely lethal.Centuries of observing that animal models fail to inform scientist regarding what a drug or disease will do in humans has provided data similar to the parachute example.Higher math or statistics is not needed.
Agencies such as the FDA and scientists in drug development have also been very vocal that animal tests in general offer little in terms of predictive value for efficacy or toxicity [129]- [131] [135]- [137] [164] [173] [187] [197] [208] [228] [269]- [271].Finally, the physiological properties that lead to efficacy and toxicity (absorption, distribution, metabolism, and elimination) have also been studied and predictive value shown to be nonexistent (see Figure 2) [116] [125] [127] [160].Some of these scientists studied dogs specifically [124].Other studies have revealed variation in genetic response to the diseases that drugs are designed to treat (see Figure 3) [139].(Consilience-evidence from fields other than the exact one under question-is important in science.See reference [16] for more on the Seok [139] study.)Based on such empirical evidence, the lack of predictive value for animal models in general is well accepted in the scientific community [2]

-[4] [6] [10] [19] [20] [117] [119] [120] [125] [127]-[130] [132] [135] [136] [138] [139]
[157]- [240].This is further confirmed by the fact that roughly 50% of clinical trials are never published, in part due to the fact that the trials failed because the animal data was   Mouse Endotoxemia so misleading [272].The noteworthiness of Bailey et al. is that they obtained access to the data from a drug company.That was an accomplishment in that no one else had obtained such access to such a large number of drugs, but it does not lessen the importance all the empirical evidence before the publication of their paper or demand that society call into question the conclusions drawn from such studies.
Finally, thalidomide was deemed dangerous by physicians because of three patients [273].The use of animals in drug development affects far more than three patients and has far more evidence against it than thalidomide did at the time it was removed from the market.(For more on thalidomide see [6].) Granted, this analogy is not perfect; nevertheless large numbers and sophisticated analyses are not always needed to form conclusions.
Bailey et al. continue: Those evaluations that have been conducted have usually employed "concordance" metrics (e.g.9), which various authors have interpreted as the true positive rate ("sensitivity") or the Positive Predictive Value (PPV).While these metrics are appropriate for assessing the reliability of a diagnostic test for a specific disorder (e.g.HIV infection), the insights they provide depend critically on the question being asked of the diagnostic test.However, they are not appropriate for assessing the salient question at issue with animal models, which is whether or not they contribute significant weight to the evidence for or against the toxicity of a given compound in humans.
Bailey et al. are conflating an accepted scientific formula that measures positive predictive value with a different formula that yields sensitivity.They are also confusing the issue further by introducing concordance, used by Olson et al. [274] as "true positive concordance rate", a term an invented by Olson et al. (Shanks and I address this Animal Models in Light of Evolution [20].)Sensitivity, along with specificity, PPV, and NPV have been used for decades in many areas of science and industry to measure predictive ability of a range of methods including diagnostic tests in medicine.PPV is not confined to medical tests, however.It can be used to assess the predictive ability of dogs to sniff out drugs or explosives in airports, cadaver dogs' ability to sniff out chemicals associated with the post-mortem state [275], the predictive ability of fortune tellers and psychics, whether death certificates accurately reflect cause of death in workplace homicide victims [276], in the jet-manufacturing industry [277] retrospective analysis of occupational exposure [278], in industry in general [279], business in general [280], ergonomics [281], customer behavior analysis [282], nutrition science [283], in computer science in general [284]- [287], for testing programs to catch spam [288], and to test whether a sobriety test has a high enough predictive value to evaluate driver behavior [289].Indeed these simple statistics have been used to evaluate virtually any model, test, practice, method, or entity in any area.
For example, Que et al. state the following regarding an algorithm for Biosurveillance and Biosecurity: In this paper, we propose a Z-Score Based Multi-level Spatial Clustering (ZMSC) algorithm for the early detection of emerging disease outbreaks.Using semi-synthetic data for algorithm evaluation, we compared ZMSC with the Wavelet Anomaly Detector, a temporal algorithm, and two spatial clustering algorithms: Kulldorff's spatial scan statistic and Bayesian spatial scan statistic.ROC curve analysis shows that ZMSC has better discriminatory ability than the three compared algorithms.ZMSC demonstrated significant computational efficiency-1000× times faster than both spatial algorithms.Finally, ZMSC had the highest cluster positive predictive values of all the algorithms.However, ZMSC showed a 0.5 -1 day average delay in detection when the false alarm rate was lower than one false alarm for every five days.We conclude that the ZMSC algorithm improves current methods of spatial cluster detection by offering better discriminatory ability, faster performance and more exact cluster identification [290].

Similarly, Almeda et al. state:
Several computational systems which depend on the precise location of the eyes have been developed in the last decades.Aware of this need, we propose a method for automatic detection of eyes in images of human faces using four geostatistical functions-semivariogram, semimadogram, covariogram and correlogram and support vector machines.The method was tested using the ORL human face database, which contains 400 images grouped in 40 persons, each having 10 different expressions [291].
PPVs ranged from 83% -94% depending on the function used.Likewise, in pattern recognition and information retrieval, the positive predictive value measures retrieved cases that are deemed relevant.LRs require prevalence in order to evaluate diagnostic tests and there is no pre-valence to measure in some situations where PPV is used.PPV can also be derived from Bayesian analysis [292] and, as Bayesian analysis is not confined to medical testing, neither is PPV.
The claim that PPV cannot be used to determine "whether or not they [animal models] contribute significant weight to the evidence for or against the toxicity of a given compound in humans," suggests a fundamental misunderstanding of what PPV is.PPV is a probability and this probability can be applied to situations included in the original calculation of PPV.For example, if human hepatotoxicity was evaluated using mongrel dogs and the PPV found to be 0.25, then one could assess a new chemical's probability of producing hepatotoxicity in human, when hepatotoxicity was seen in mongrel dogs, as being 0.25.That is what PPV is used for.When predictive values are that low, one must conclude that the model is not capable of judging hepatotoxicity.Granted NPV cannot be calculated without the false-negative rate being known, but TSMT addresses why one evolved complex system should even be considered to be capable of having predictive value, positive or negative, for another evolved complex system at higher levels of organization.TSMT combined with PPVs and the empirical evidence was sufficient to determine that animal testing for toxicity lacked predictive value and that this would not change with more genetic modifications to animals.

Bailey et al. continue:
The case of the PPV is more subtle.This metric is a measure of the probability that human toxicity will be correctly identified, given that the animal model detected toxicity.As such, PPVs are conditional probabilities, the condition being the preexistence of a positive animal test result.This makes PPVs dependent on the prevalence of toxicity in compounds, and thus an inappropriate measure of the reliability of the test with any specific compound (e.g. 10, 13).
Prevalence is indeed important in determining predictive ability of diagnostic tests.Grimes and Schultz state: "Likelihood ratios can refine clinical diagnosis on the basis of signs and symptoms; however, they are underused for patients' care.A likelihood ratio is the percentage of ill people with a given test result divided by the percentage of well individuals with the same result" [293].This is a good definition.Note that prevalence is defined by those taking the test and not by a survey of the population at large.This can skew the real numbers and percentages.For example, the prevalence of appendicitis in the general population is lower than the prevalence in the emergency room among patients with right lower quadrant pain and rebound tenderness.
Also note that prevalence, as used to calculate LR, can only be determined retrospectively after we know which patients really were ill and which were not.To make this definition specific to our analysis of animal models, the true prevalence of the side effect, in terms of toxicity, or of the effect-efficacy-can only be ascertained after administering the drug to thousands of humans.I will address why thousands, if not tens of thousands, are needed momentarily.For now, we need to understand that in terms of screening drugs for efficacy and toxicity in animals, the prevalence of these factors are not known in humans hence those evaluating a new drug will not have that data available.This is highly significant for those evaluating new drugs for safety.The prevalence of a disease is known while the prevalence of side effect of a new drug is not.This is yet another difference between evaluating diagnostic tests and the efficacy and safety of new drugs.
Altman and Bland state: The whole point of a diagnostic test is to use it to make a diagnosis, so we need to know the probability that the test will give the correct diagnosis.The sensitivity and specificity do not give us this information.Instead we must approach the data from the direction of the test results, using predictive values [294].
Altman and Bland then describe the use of prevalence when evaluating diagnostic tests.They then state: "If the prevalence of the disease is very low, the positive predictive value will not be close to 1 even if both the sensitivity and specificity are high.Thus in screening the general population it is inevitable that many people with positive test results will be false positives" [294].But, as we have seen, PPV is used in many areas including manufacturing and ergonomics where prevalence is either immaterial or unknown.The point being that PPV has value in specific areas even without knowledge of prevalence or where the concept of prevalence is not applicable.Moreover, using animal models for testing safety is not an example of a diagnostic test, hence the rules for application are different.
Bailey et al. continue: "Thus, any appropriate metric of the evidential value of animal models requires knowledge of both the sensitivity and the specificity of the model.This, in turn, implies that the appropriate metrics for the evidential weight provided by an animal model are LRs (e.g.13)." The following are definitions of evidential value: Value of those records which are necessary to provide an authentic and adequate evidence of an organization's actions, functioning, policies, and/or structure.Evidential value relates to the document's creation and not necessarily to its content or informational about the activities, functions, and origins of its creator [295].Value of records given as or in support of evidence, based on the certainty of the records origins.The value here is not in the record content.This certainty is essential for authentic and adequate evidence of an entity's actions, functioning, policies, and/or structure [296].
Evidential value appears to be used primarily by archivists and in forensic science, business, law, and history.As of January 2014, PubMed contained 126 articles with the phrase evidential value.Of the five freely available, all were related to forensic science and most of the other 121 also appeared to be related to forensic science.None of the articles addressed PPV and only one mentioned LR.Clearly, the articles in PubMed do not use the phrase evidential value very frequently and only very rarely associate it with LR.I am not sure what Bailey et al. are referring to when they use the phrase evidential value.Moreover, the way animal models are used in drug development is more on the order of digital than analog.If certain toxicities are seen in animal models then it is unlikely that the drug will continue to be developed.Thus, even if we assume that what Bailey et al. mean by evidential value is more along the non-dictionary use of the phrase (consistent with "Can we trust this test?")we still must assess whether the animal model offers predictive value for drug development.
In order to illustrate the differences between +LRs (PLR) and PPV, Bailey et al. include a graph: "The inappropriate nature of PPVs is demonstrated in Figure 2 [shown below as Figure 4], which shows a scatter plot of 'ranked' PPVs against equivalent ranked PLRs.Each PPV and PLR was ranked according to its value for each of the 436 classifications of effects, and these ranks were plotted against each other."The graph does indeed show a shotgun blast pattern, or scattergram, indicating no correlation between the rankings of the drugs using the two values.This is quite impressive at first glance but upon closer examination the graph is shown to be of no real value.The graph does not compare PPV with LR, which is what one would expect from the article.Rather it compares where the drugs studied ranked with respect to the other drugs using the two methods.Whether "PPVs and PLRs for all 436 results were ordered according to their value, with the highest ranking first and the lowest last.For each BMO and tissue effect, the corresponding PPV and PLR rank were plotted against each other.If a perfect correlation exists, all points should lie on the line, where, for example, the 10th, 50th, and 100th highest PPV value would also be the 10th, 50th, and 100th highest PLR values.However, the significant scatter of the data points demonstrates that little correlation exists between PPV and PLR.For example: the 20th highest PPV ranks only 404/436 for PLR, whereas the 30th highest PLR ranks only 406/436 for PPV" [32].a drug has a PPV of 0.5 or 0.01, along with corresponding LRs, determined the ranking of the drug.But in real life PPVs of 0.01 or 0.5 both reveal a test that offers no predictive value, regardless of where that value ranks compared to the LR+.The ranking system as used appears to be an attempt to mislead the reader into thinking that rank, as opposed to low PPV is related to the success or failure of the method in general.Statistics offers many ways to mislead and graphs offer yet another way.The fact that Bailey et al. chose to employ this graph suggests they might have an agenda other than simply reporting likelihood ratios from drug data.(See Figure 2 for an example of a graph that is also a scattergram but that is valuable because it compares the relevant values directly.) The situation of using LRs to evaluate animal models as opposed to PPV is similar to using a CT scan of the chest as opposed to a chest x-ray (CXR) to diagnose a pneumothorax.Chest x-rays are readily available and less expensive than a chest CT, however the chest CT is the gold standard for diagnosing a pneumothorax.Nevertheless, a vast majority of pneumothoraxes are diagnosed with CXR.The reason for this is simply that the CXR quicker, easier, less expensive test is all that is usually needed to diagnose a clinically relevant pneumothorax.One does support the diagnosis from CXR with a CT scan.Likewise, calculating LRs is currently the gold standard in terms of diagnostic tests, but again animal toxicity is not a diagnostic test and even if it were there exists historical data that is sufficient to rule out animal tests as offering predictive value for human response.Additionally, a series of very low PPVs combined with theory, other data, along with the opinions of experts is adequate for revealing such a profound lack of predictive value that drug development should abandon animal models.Indeed, this is exactly what we are seeing.As the quotes from experts in section 2.3 revealed, the pharmaceutical industry is changing.Bailey et al.'s success was in obtaining the data from a pharmaceutical company that allowed the calculation of specificity and LRs.Bailey et al. did not reveal a conceptually new conclusion that could only have been discovered with their data.
The pharmaceutical industry understands the above and has been searching for viable methods to test for efficacy and toxicity for over a decade.Clearly Bailey et al. offer nothing new to the people who are actually doing the work of drug development.Why then does Bailey et al. make such claims?The people actively defending the status quo regarding toxicity testing are the contract research organizations that are paid to perform the animal testing.Also rejecting the data that proves toxicity testing has never been effective are various animal protection organizations whose livelihood depends on animal testing and their supposed opposition to it.If animal testing were shown to have been unnecessary for the last two to three decades, when animal protection organizations were saying it was necessary, they would be completely discredited and possibly in violation of national laws.

Animal Models in Light of Personalized Medicine
Sensitivity, specificity, positive predictive value, and negative predictive value vary with the genetic makeup of the population.Prevalence, regardless of how it is measured, does not take into account genetic variation.This is one reason why, as I stated above, thousands if not tens of thousands of people are needed in order to truly evaluate a drug.This is why estimates of prevalence based on clinical trials are very misleading.Even when one thousand people are tested prior to marketing the drug, side effects that are rare, such as was the case with Vioxx, will not necessarily be seen.But such side effects are important enough to necessitate the drug's withdrawal from the market.The solution to this is not in vitro tests per se or in silico tests per se, but rather gene-based tests and gene-based prescribing.Animal models do not figure into the solution in any fashion.Drug development must be human-based and that is what the industry is aiming for.
As I previously stated, physicians and scientists have long known that individual patients respond differently to drugs and that one reason for this is genetic variations.The fields of pharmacogenetics and personalized medicine are based on this.For example, the enzyme CYP2D6 converts the commonly prescribed pain killer codeine to morphine.In 2005, a new mother was prescribed codeine for postpartum pain.There is nothing unusual about this so far.Unfortunately, the woman had several copies of the gene that makes CYP2D6 and therefore she metabolized almost all the codeine, converting it to morphine.These very high morphine levels did not adversely affect her but the morphine was passed along to the baby through the woman's breast milk.When the baby was brought to the ER, 12 days after birth, he had gray skin and died the next day.Ultimately the concentration of morphine in the baby's blood was discovered to be 30 times higher than anticipated [49].
We now understand that there are important differences in genetic makeup among ethnicities [65] [76] [81] [83]- [95].Among cigarette smokers, African Americans and Native Hawaiians are more susceptible to lung cancer than whites, Japanese Americans, and Latinos [85].Acute lymphoblastic leukemia (ALL) is a common childhood cancer and affects ethnicities in varying frequencies.Hispanic children are more likely to contract ALL than Caucasian or African children.Variants in four genes, ARID5B, IKZF1, CEBPE, and CDKN2A/2B, have been identified that appear to be responsible for some cases of ALL and more than 5 copies of these genes are associated with ALL [297].(The four genes can be inherited from both parents thus a total of eight copies of the variants are possible.)A recent study [298] revealed that: "African American women coinfected with human immunodeficiency virus (HIV) and hepatitis C virus (HCV) are less likely to die from liver disease than Caucasian or Hispanic women." Differences exist between the sexes [71]- [81], and even between monozygotic twins [96]- [114].Individuals of the same species or strain also differ in ways that affect disease and drug response [61] [190] [201] [240] [299]- [316].
Even diseases have even been differentiated because of genetics.By studying tissues from human cancer patients, researchers discovered that stomach cancer is actually two different diseases and response to therapy depends on the genome of the cancer [59].An individual's cancer is not one cancer but many, varying in genetics [317].This is important, but perhaps even more important is that some of the cancer cells have genes turned on such that they can implant into other tissues more easily.A study examined circulating tumor cells (CTCs), cells that are circulating in the bloodstream and that were derived from the original cancer.The study revealed marked genetic variation among the cancer cells [62].This in part explains drug-resistant cancer, why patients respond so differently to treatments, and why different treatments may be needed in the same patient.It also, again, supports the notion that animal models are never going to be predictive modalities for human response to drugs and disease.This has major implications for treatment.Pharmacogenomics seeks to match a drug to the patient with the genotype that maximizes effects and minimizes side effects [48] [49] [52] [57] [58] [64]- [70].See Figure 5 [70].The current emphasis on personalized medicine makes clear the need for drug development to go from the blockbuster to the niche-buster (see Figure 6 [318]).Drug testing needs to be personalized and,

Ryanodine receptor SLCO1B1
Malignant hyperthermia Myopathies/rhabdomyolysis as I have suggested [14], this could follow the current format of microdosing but be expanded to include pharmacodynamics.

TSMT and Baye's Theorem
Bailey et al. state: Our results therefore have important implications for the value of the dog in predicting human toxicity, and suggest that alternative methods are urgently required.... We have, for the first time, addressed the salient question of contribution of evidential weight for or against the toxicity of a given compound in humans by data from dog tests, by using the appropriate metrics of LRs.Furthermore, we have applied the apposite LRs to a dataset of unprecedented scale, to critically question the value of the use of the dog as a preclinical species in the testing of new pharmaceuticals.
If one has all the relevant data, LRs are better than PPV alone for calculating the accuracy of diagnostic tests.However, a search of PubMed for "likelihood ratio" returns 6298 hits, but searching for "positive predictive value," I obtained 25,128 hits.PPV has been successfully used historically and is currently being successfully used in many areas.As Bailey et al. mention, the exact question one is asking must be considered when choosing statistical methods and animal testing is not a diagnostic test.I would add that evidence from other fields (consilience) is important in determining the question as well as framing the answer.Moreover, when the predictive value has been calculated in the form of PPV and found to be as incredibly low as animal modeling, it is effectively impossible that LRs will contradict this.LRs have an important place in science and medicine but in light of TSMT, expert opinions, and the data mentioned in previous sections, the Bailey et al. paper offers nothing new.It supports TSMT, the opinions of experts, and the previous data, and for that alone the paper has value, but the statement that "The analysis presented here is urgently required," is a gross exaggeration.
TSMT is a theory that explains why animal models fail.Empiricism is an important aspect of science.But we do not have LRs for jumping out of an airplane with a parachute vs. without a parachute, for treating severe bacterial infections with anti-bacterials vs. without anti-bacterials, or for whether a specific living organism is the result of special creation or evolution.LRs are a better way to determine the value of a diagnostic test but not everything requires LRs.If one has a law or theory, then using LRs to judge a particular instance covered under the law or theory is not necessary.We do not evaluate every species to see if it was placed here by special creation or if it evolved.
No research is perfect, the paper describing the research is not perfect, and rarely are perfect statistical methods used or even available to analyze the data.LRs are better than solely using PPV to analyze diagnostics tests.Evidence-Based Medicine (EBM) is one reason LRs are popular and EBM also employs systematic reviews and meta-analysis.But even systematic reviews and meta-analyses can arrive at the wrong answer.For example, a meta-analysis by the Cochrane Group reported that albumin increased deaths in certain patient groups [319].However, a large study in Australia later revealed no such effects [320].
One reason, among many others, that scientific studies are later shown wrong or misleading is that the scientists do not take into account prior probability (or prior plausibility) [321]- [324].What this means is that scientists should be using Bayesian analysis whenever possible [322] [325]- [327].Bayes' theorem is as follows.
Carrier defines the terms in Bayes' theorem as follows: P = Probability (epistemic probability = the probability that something stated is true).h = hypothesis being tested.~h = all other hypotheses that could explain the same evidence (if h is false).e = all the evidence directly relevant to the truth of h (e includes both what is observed and what is not observed).b = total background knowledge (all available personal and human knowledge about anything and everything, from physics to history).P(h|e.b)= the probability that a hypothesis (h) is true given all the available evidence (e) and all our background knowledge (b).P(h|b) = the prior probability that h is true = the probability that our hypothesis would be true given only our background knowledge (i.e. if we knew nothing about e).P(e|h.b)= the consequent probability of the evidence (given h and b) = the probability that all the evidence we have would exist (or something comparable to it would exist) if the hypothesis (and background knowledge) is true.P(~h|b) = 1 -P(h|b) = the prior probability that h is false = the sum of the prior probabilities of all alternative explanations of the same evidence (e.g. if there is only one viable alternative, this means the prior probability of all other theories is vanishingly small, i.e. substantially less than 1%, so that P(~h|b) is the prior probability of the one viable competing hypothesis; if there are many viable competing hypotheses, they can be subsumed under one group category (~h), or treated independently by expanding the equation, e. P(e|~h.b)= the consequent probability of the evidence if b is true but h is false = the probability that all the evidence we have would exist (or something comparable to it would exist) if the hypothesis we are testing is false, but all our background knowledge is still true.This also equals the posterior probability of the evidence if some hypothesis other than h is true-and if there is more than one viable contender, you can include each competing hypothesis independently (per above) or subsume them all under one group category (~h) [328].
In developing TSMT, as described above, I attempted to take into account the variables in Bayes' theorem.Bailey et al. have increased the amount of evidence available and for that they are to be commended.But Bayes' has many other factors to consider.Explaining the following from animal modeling, if the opposite hypothesis is true-that animal models are of predictive value-is impossible at this time: • Approximately 100 successful HIV-like vaccines have been efficacious in animals but none in humans [329].• Hundreds, and possibly over one thousand neuroprotectant drugs have been efficacious in animals but none in humans [149] [152] [188] [330]- [332].
• The fact that humans respond so differently to the same drugs and diseases and have different disease susceptibilities.(See Section 3.3 for references.)Moreover, the total background knowledge, in the form of complexity and evolutionary biology, is sufficient to abandon animal-based testing and research designed to take advantage of the supposed predictive value of animal models.
As more segments of the scientific community analyze TSMT, more supporting evidence should appear from disciplines such as evolutionary and developmental biology, comparative anatomy, comparative medicine, and mathematics.This may have implications for personalized medicine as well as other apparently disparate fields of science.

Conclusions and Ethical Implications
Bailey et al. are to be complimented for introducing more data regarding dog models of toxicity.As I state, the study supports the current literature with an evaluation that is arguably better than has ever been published.But the study does not break new ground conceptually and the claim by the authors that it does, reinforced by the tone of the paper along with the accompanying graph, suggests the authors had an agenda when commenting on previous studies.Moreover, animal tests are not diagnostic tests, hence different rules apply for evaluating how useful PPV is under the circumstances.The fact that Bailey et al. ignored empirical evidence, theory, and the opinions of experts was also concerning.Ioannidis states in his article "Why Most Published Research Findings Are False" that "The greater the financial and other interests and prejudices in a scientific field, the less likely the research findings are to be true" [324].I would extend this concept to the comments in a research article that are unrelated to the research itself.
There has been sufficient empirical evidence to abandon the use of all animals in testing and research since the middle of the 20th century; the exact date is immaterial.Furthermore, prior to November of 2013, there was also a theory in the form of evolutionary biology and complex systems.TSMT was published online in June 2013 but the basis for it had been known for a very long time.Additionally, there were concerns from the philosophy of science community, in terms of modeling, that dated back to the early 1990s [336]- [338] and concerns from the evolutionary biology community dating back to the 1940s [31].
There are ethical concerns regarding the use of animals in research and testing both from the animals' perspective as well as from the perspective of humans.The animal costs are obvious: suffering and death.But the human-based concerns are the same.Patients suffer and die because of animal-based research and testing.This occurs in three ways: First, in the form of animal models offering no predictive value for human responses and consequently patients taking medications that are ineffective and/or harmful.Second, the money spent on animal-based research and testing could have been put to more dependable and thus useful research or testing methods.The most reliable estimate for the percentage of funding that goes to animal-based research is from the NIH in 1985 which calculated that around 50% went to fund animal models [339].Relatedly, in 1964, John R. Platt wrote the classic paper Strong Inference [340].In it, Platt anticipated some of the points I have presented in this article: "We speak piously of taking measurements and making small studies that will 'add another brick to the temple of science.'Most such bricks just lie around the brickyard" [340].
Third, in contrast to NIH funding, the cost of animal testing in drug development is minimal.The real cost comes in the form of bad drugs that make it through clinical trials only to be pulled from the market or shelved prior to going to market [216] [228] [229] [341] [342].There is also the cost of good drugs that are lost because of animal testing [159] [166] [184] [197] [223] [343]- [345].This costs patients in that an otherwise effective treatment is not forthcoming, hence more suffering and death.The US National Cancer Institute acknowledges that society may have lost cures for cancer because of misleading animal studies [166].Moreover, the drugs that do eventually go to market are more expensive for patients because those drugs must cover the cost of developing the failed ones.It also costs pharmaceutical companies profits that would have, at least partially, gone back into research.
Considering the above, suggesting as Bailey et al., do that only recently has enough evidence existed to abandon the practice of using animals in general and dogs in particular in toxicity testing is not only scientifically unsustainable but unethical.
The reasons the animal model continues to be accepted by society in general are not unique to this situation, and can be explained by the following: 1) Animal use is now entrenched in society and it is very difficult to change traditions.
2) Along the same lines, animal use is ingrained in institutions of higher learning.Promotions and salaries are frequently tied to factors related to animal-based research and the hierarchy of power is related to the money that animal-based research generates.
3) Many scientists have dedicated their careers to animal-based research and have an emotional interest in the process.
4) Billions of US dollars are spent annually on animal-based research and testing.This money generates special interest groups that have power in the political system.5) Society in general is not knowledgeable enough in science or medical science to discover the flaws of animal models.
6) Any discussion of the ethics of using animals in research and testing usually revolves around the pain and suffering of animals.The human implications need to be included in these conversations.
Biomedical research needs to fully embrace evolutionary biology and complexity theory and move beyond the vestiges of a creation-based research program.TSMT is one step in this process.
These conclusions should be communicated to society as: 1) society has ethical concerns regarding animalbased research; 2) these conclusions have important implications in light of what type of research is currently being funded (animal-based) and what is not (clinical research and research leading to better technology); 3) what is funded influences which disciplines young scientists consider for their careers; 4) the legal requirements for animal testing must be changed as they impede progress and drive up costs without providing a safer drug supply as the US Congress mandated.

Figure 1 .
Figure 1.Small changes in a variable, three places beyond the decimal point, in Lorenz' computer program produced very different results (red line) from the original (black line).(Graph is not the original but a likeness by the author.)time [6] [10][19] [20] [117][119] [120][125] [127]-[130] [132] [135] [136] [138] [139] [157]-[250].This stands in sharp contrast to scientists, spokespeople, and politicians who have a vested interest, directly or indirectly, be it financial or emotional, in animal modeling.Consider the following examples from scientists representing the consensus.In a 2009 article, Markou et al. state: Despite great advances in basic neuroscience knowledge, the improved understanding of brain functioning

Figure 2 .
Figure 2. Comparison of oral bioavailability among three species.Data from reference [160], graph by author.

Figure 4 .
Figure 4. "Scatter plot illustrating the lack of correlation of PPVs and PLRs of biomedical observations (BMOs) and tissue effects in humans and dogs" [32].

Figure 5 .
Figure 5. Diseases are composed of different effects hence various drugs arerequired for treating the appropriate effect [318].

Figure 6 .
Figure 6.Pharmacogenetics seeks to predict drug response in the individual [70].

Table 2 .
Binary classification and formulas for calculating predictive values of modalities such as animal-based research.

Bailey et al. Study Rank in Importance for Proving Animal Models Are Have No Predictive Value?
Bailey et al. sought to "estimate the evidential weight provided by canine data to the probability that a new drug may be toxic to humans..." In order to accomplish this, Bailey et al. "calculated Likelihood Ratios (LRs) for an extensive dataset of 2366 drugs with both animal and human data..."Bailey et al. appear to be making three claims: 1) Their paper establishes the lack of predictive value for canine models as used in toxicity testing in drug development for the first time.2) Positive Predictive Value (PPV) cannot be used to assess a model or test's real predictive value.
3) Similar studies are needed for each test and research project in which animals are used.Consider the following statements fromBailey et al.: