Translation in Data Mining to Advance Personalized Medicine for Health Equity

Personalized medicine is the development of " tailored " therapies that reflect traditional medical approaches with the incorporation of the patient's unique genetic profile and the environmental basis of the disease. These individualized strategies encompass disease prevention and diagnosis, as well as treatment strategies. Today's healthcare workforce is faced with the availability of massive amounts of patient-and disease-related data. When mined effectively, these data will help produce more efficient and effective diagnoses and treatment, leading to better prognoses for patients at both the individual and population level. Designing preventive and therapeutic interventions for those patients who will benefit most while minimizing side effects and controlling healthcare costs requires bringing diverse data sources together in an analytic paradigm. A resource to clinicians in the development and application of personalized medicine is largely facilitated , perhaps even driven, by the analysis of " big data ". For example, the availability of clinical data warehouses is a significant resource for clinicians in practicing personalized medicine. These " big data " repositories can be queried by clinicians, using specific questions, with data used to gain an understanding of challenges in patient care and treatment. Health informaticians are critical partners to data analytics including the use of technological infrastructures and predictive data mining strategies to access data from multiple sources, assisting clinicians' interpretation of data and development of personalized, targeted therapy recommendations. In this paper, we look at the concept of personalized medicine, offering perspectives in four important, influencing topics: 1) the availability of " big data " and the role of biomedical informatics in personalized medicine, 2) the need for interdisciplinary teams in the development and evaluation of personalized therapeutic approaches, and 3) the impact of electronic medical record systems and clinical data warehouses on the field of personalized medicine. In closing, we present our fourth perspective, an overview to some of the ethical concerns related to personalized medicine and health equity.


Introduction
In 2015, US President Barack Obama announced the Precision Medicine Initiative, to focus attention and resources towards the development of innovative technologies and infrastructure designed to improve individual and societal health and treatment of disease [1]. The President's Council of Advisors on Science and Technology (PCAST) defines personalized medicine (aka precision medicine) as the ability to tailor "medical treatment to the individual characteristics of the patient" [2]. Personalized medicine and precision medicine are terms that are often used interchangeably. There is mixed evidence on which term is the most correct, although the US National Research Council uses the term precision medicine. Some researchers prefer this term as they feel it more accurately represents the fact that therapies are recommended based on an individual patients characteristics as related to a similar population rather than constructed solely on the basis of individual characteristics [3]. The basis of personalized medicine is delivering information to the physician on the genetic, environment and lifestyle differences of an individual patient. The objective is to more effectively target their physiological and pharmacological responses to a specific treatment. Personalized medicine enables physicians and other care providers to treat individuals as unique, using observations from the individual patient as well as harnessing the power of aggregated data drawn from millions of other patients' data [4]. In so doing, "preventive or therapeutic interventions can then be concentrated on those who will benefit, sparing expense and side effects for those who will not" [2]. This ideal requires bringing together diverse data sources into an analytic paradigm.
Data used in creating a personalized medicine treatment plan are already routinely collected in day-to-day healthcare settings, for example, primary and specialty care offices, laboratories, imaging centers, pharmacies, and hospitals. Data can include patient sociodemographic and environmental information, family histories, personal monitoring device feedback, images and imaging results, phenotypes and genotypes, molecular and cellular data, disease traits, and treatment responses. Other healthcare and health-related data that are increasingly available include clinical trial data, claims data from health insurance providers, and research study data. Data, such as those available in electronic medical records (EMRs), will be in both structured and unstructured formats [4] [5]. Data from large projects, such as the Human Genome Project, have been used to highlight the importance of single nucleotide polymorphisms in human genetic variability and their association with disease and their role in understanding complex disease etiology [2] [6]. In short, there is now a superabundance of data which may be mined and analyzed to identify and support better clinical decisions as well as new treatment discoveries.
The principal difference in providing good clinical care today and into the future depends not only on mining "big data" but also on the "speed of change"; how quickly these changes and employing these data-dynamic, static, structured, and unstructured-can be implemented. Implementation will require a complex and advanced technological infrastructure that will allow data from many different sources and diagnostic disciplines to be integrated and presented to practitioners in ways that facilitate the interpretation of data and development of personalized interventions [7]. For example, personalized medicine reports may provide physicians with suggestions for pharmacological or biological interventions, targeting specific cells or, selecting interactive patient education that enables patients and providers to develop customized treatment plans for disease management [8]. The development of systems that provide targeted outputs must focus on a variety of tools, including biologic, technical and computational strategies. These leveraged approaches support a greater understanding of the complexity and interrelated nature of disease, environment, and the person [4].

Perspectives: Bioinformatics and "Big Data"
These data, collectively recognized as "big data", are keys that can unlock doors to the potential of personalized medicine. These large, complex datasets will house structured data (e.g., numerical data for height, weight, blood pressure, glucose levels; categorical data such as yes/no for marital status; etc.) as well as unstructured data (e.g., physician notes). Such systems will need to be able to effectively aggregate and normalize data from across data sources (e.g., genomic data, patient data from EMRs, radiologic images, etc.). Systems such as those built on the "informatics for integrating biology and the bedside (i2b2)" framework are particularly useful and effective for integrating clinical and research data that may be used by investigators [9]. These clinical data repositories (i.e., databases) are often de-identified and often anonymized, allowing researchers to generate hypotheses and identify cohorts for study. These "big data" repositories can be queried by clinicians, using specific questions, with data used to gain an understanding of challenges in patient care and treatment.
The systematic mining of data in research has already shown to be of benefit to individuals and populations in a variety of diseases. In particular, cancer research has benefited greatly from the use of aggregated data in order to help assess which individuals may be more prone to developing cancers and which types of cancers could affect them. For example, research in breast and ovarian cancer led to the discovery of mutations in BRCA1 and BRCA2 which predispose women to these cancers [2]. Based on family and personal histories, physicians may now order testing for BRCA1 and/or BRCA2 for their patients and assist them in making decisions based on the results to mitigate their risk of developing breast or ovarian cancer [2]. Medications have also been developed using "big data," from chemotherapy designed to target specific cancers, to pharmacogenetic dosing algorithms for warfarin (an anticoagulant used to prevent heart attacks, strokes, and blood clots) [5]. There have also been strides in personalizing treatment for the onset of type 2 diabetes in children and young adults using genetic analyses. "Specific genetic variants that affect pancreatic beta cell function" have been discovered and are used to "add clarity to diagnosis" and treatment of this condition [2].
All these data have driven pharmacogenomics to an area that has expanded the most in molecular diagnostics with system-guided support. Such evidence has reduced the incidence of adverse events by checking for susceptible genotypes for drugs [10]. DNA testing for drug biomarkers is one area for data mining with more growth potential because of its pharmacogenetics applications for drug selection; drug dosing and drug interactions. For example, the following inherited determinants for drug response have helped significantly in the treatment of disease: 1) Drug Metabolizing Enzymes: Genotyping assays for allelic variants from CYP450 enzyme activity helps to identify Poor Metabolizers (PMs) from extensive metabolizers. One of the most significant first examples was with CYP2D6 in patients who had a 10-fold absorption and a 5-fold higher peak concentration to a given dose compared with Extensive Metabolizers to Atomoxetine HCl (Strattera), the first FDA-approved non-stimulant drug for attention deficit hyperactive disorder, NE uptake inhibitor [11]; 2) Drug Targets: Specific disease-associated genes, allelic variants or gene products. A very well-known example is HER-2/neu oncogene and Herceptin for breast cancer [12]; 3) Transporters: ATP-Binding proteins. An example is P-glycoprotein which blocks absorption in the gut, acting as gatekeeper for later P450 cytochrome actions (CYP3A4), multidrug resistance in tumors, responsible gene PGY-1 [13], and 4) DNA Processes such as the methylation patterns on DNA, methyl guanine methyl transferase (MGMT) gene promoter alters response to treatment with alkylating agents [14].
An illustrative use of clinical data repositories to advance personalized medicine is often seen in the area of cancer research, such as Kathy Halamka's case reported by Strickland [15]. Diagnosed at the age of 49 with Stage III breast cancer, Kathy was faced with the choice of the conservative, standard-of-care protocol of mastectomy followed by chemotherapy. As a patient of Beth Israel Deaconess Medical Center, Kathy's medical team was able to query the clinical data repository comprised of data from the EMRs of five Harvard-affiliated hospitals, using the i2b2 platform. By searching for patients with certain characteristics (e.g., 50-year old, Asian females, stage III breast cancer, medications, outcomes), the medical team was able to identify variables which indicated that a different treatment protocol was more appropriate for Kathy. Choosing to not have surgery as a first treatment step, Kathy began with chemotherapy drugs that targeted the estrogen-sensitive tumor cells. By the completion of chemotherapy, the tumor was no longer visible on radiographs. Her treatment ended with a lumpectomy and a continuation of estrogen-blocking medication.

A New Paradigm in Medical and Health Professions Education: Team Science in Biomedical Informatics
In addition to the "grand challenge" of developing a systems approach in data mining to integrating "multiscale biological information intro predictive and actionable models", [4] there are many other challenges to the advancement of personalized medicine. For example, it has been suggested that training for physicians should shift "away from the current discipline-specific model to a vertically integrated nodes-and-connections framework" to give future physicians a more holistic view of biological processes [2] [4]. The sheer volume of information, and its rapid growth, will require such changes and new clinical decision support tools must be developed to support physicians in their work [8] [16]. Further educational outreach will be necessary to all principal stakeholders in the healthcare industry, including patients, healthcare organizations, pharmaceutical companies, and those who will develop research careers. These biomedical problems will best be solved by multi-disciplinary 'team science' groups, working through challenges using the diverse lens' clinical, healthcare, and informatics professionals [2] [4]. The incoming health workforce has been slowly transforming the way health care is delivered mainly through technological advances, innovative approaches and the need to make faster and better decisions in prevention, diagnosis, prognosis and treatment [17]. Experience has shown that achieving effective and efficient delivery of any type of solution to a challenge is done faster and more reliably if many minds work together in a team which has a common goal. Academic institutions have responded by providing healthcare professionals with education and training in the skills necessary to work effectively in multidisciplinary teams [18]. Delivering innovative health solutions at a faster pace, while providing the highest level of quality, requires that diverse, different and complementary health disciplines join expertise and establish clear and efficient patterns of communication [19]. There is an urgent need to integrate the experts in bioinformatics and technology (i.e., biomedical and health informaticians) with their end users (e.g., practitioners, researchers and health professionals), who will work with their data on a day-to-day basis. Communication among disciplines, by the use of technology, supports these stakeholders becoming partners for better health. These teams create a common pathway that allows them to network and understand each other's role in delivering targeted therapies.

Role of Electronic Medical Records in Translational Medicine
Translational medicine is a multidisciplinary form of science that bridges gaps that often exist between basic science, applied science, and clinical practice-ultimately leading to more meaningful health outcomes at the individual as well as the population level. "Translation" is driven by the development of diagnostic tools, data analysis tools, medicines, policies and procedures, and education. The "gold standard" for translational medicine is that new products and therapies are incorporated into clinical practice as beneficial and may become part of accepted standards of care.
However, the use of new tests, implementation of diagnostic tools, approval of new medicines, and forwardthinking development of education, policies, and procedures is limited by the speed and accuracy of independent reviews for reliability and accuracy. Translational medicine has assisted in accelerating the movement of research into medical products and therapies, supported by funding into regulatory science. Westfall, Mold and Fagnan reported that it takes an average of 17 years for research evidence to reach clinical practice [20]. This lag has generally been applied to all forms of treatment modalities including pharmaceutical medications, therapeutic modalities, and outcomes related to interventions related to behavior change. As the burden of chronic disease continues to increase across the globe, increased wait times for more effective treatments to come to clinical practice is unrealistic. Case in point, over the past 50 years, advances in science and medicine have led to a dramatic increase in the average life expectancy in the United States (US). At the beginning of the 20 th century, a child born in the US in 1900 rarely lived beyond the age of 50; by contrast, a child born in the US in 2000 had a life expectancy of 76 years [21]. With increased life expectancy, adults are living longer with a greater burden of chronic disease and the possibility of disabilities, such as neuropathy, a comorbidity of diabetes. In this paradigm, the US can expect to continue to spend between 70% and 75% of healthcare expenditures on the treatment of chronic diseases [22]. Translational medicine can shorten the time from 'bench to bedside' by focusing research in various aspects of clinical testing, such as biomarkers, new assay methods, and the role of genetics as mediators in response to treatment to drugs.
As the translational medicine pathway increasingly leads to a more effective and quicker delivery of products and therapies to clinical practice, clinicians have been faced with the implementation of information technology into the clinical care environment. The adoption and implementation of electronic medical records (EMRs) systems, providers are now faced with a plethora of patient information. These digital healthcare infrastructures are designed to improve how healthcare is delivered and improve the quality of healthcare provided, i.e., improve patient healthcare outcomes.
The concept of electronic health (medical) records was first introduced by Lawrence L. Weed in the 1960's. A physician, Weed believed that an automated system that would reorganize and improve access to patient information housed in medical records would improve patient care. His work was the nucleus for the PROMIS project at the University of Vermont in 1967 [23]. Project objectives were to develop an automated system that would provide physicians with timely and sequential information for a patient, enabling the rapid collection of data which could be analyzed and used for such endeavors as epidemiological studies to business audits. During this same time period, Mayo Clinic began work on developing an electronic medical record system as well. The decades of the 1970's and 1980's saw rapid development of early models of electronic medical record systems. Early EMR development was often driven by academic and research institutions. In addition to the PROMIS system from the University of Vermont, Harvard University developed the COSTAR EMR system for ambulatory care environments and Duke University pioneered "The Medical Record" as an early in-patient care EMR.
In 1972, the Regenstrief Institute developed the Regenstrief Medical Record System, a physician-designed, integrated in-patient and out-patient information system, implemented in the Wishard Diabetes Clinic [24]. In 2004, George W. Bush released The President's Information Technology Plan [25], which set forward a bold plan to ensure that most Americans had an electronic health record within the next decade. This national commitment to the use of health information technology would be an innovation leading to reduced medical errors, controlling the rising costs of healthcare, and improving the quality of healthcare. The adoption of EMRs was stimulated by funding provided under the Health Information Technology for Economic and Clinical Health (HITECH) Act, a component of the American Recovery and Reinvestment Act of 2009 [26] under President Barack Obama. Not only did the HITECH Act promote adoption of EMRs, but it enacted sets of "meaningful use" guidelines which would guide or help providers realize improvements in care [27]. By 2013, 94% of acute care hospitals were using EMR systems that were certified as meeting federal requirements for Meaningful Use objectives. And these facilities were increasingly using EMRs that were considered comprehensive systems, those that allowed providers to capture electronic clinical information (e.g., patient demographics, physician notes, medication lists, discharge summaries, etc.) as well as computerized provider order entry (CPOE) (e.g., lab reports, medications, radiology tests, nursing orders, etc.); results management (e.g., viewing of lab and radiology reports, view radiology images, view diagnostic images and results, etc.); and provided decision support (e.g., clinical guidelines, clinical reminders, drug allergy results, drug-drug interactions, drug dosing support, etc.) [28]. Each EMR system, in essence, is a small data repository of those patients receiving care from providers operating under a given healthcare system (i.e., hospital, clinic, etc.). Alone, these data repositories offer clinicians important insights into patient care, opening the door to personalized medicine. When combined with other data sources, resulting in large, "big data" repositories, the opportunity to further personalized medicine is increased exponentially.

Personalized Medicine and Health Equity
Hamburg and Collins commented that the investments made in basic science, particularly the Human Genome Project, laid the foundation for translational medicine and opened the door to the development of personalized medicine [29]. With the complete mapping of the human genome, researchers have identified genetic variations that contribute to disease. In turn, scientists have actively sought and continue to develop diagnostic tests and treatment modalities that improve an individual patient's response to therapy targeted toward their unique genetic profile. In this way, translational medicine can also provide a pathway to improve health equity across traditional barriers such as socioeconomic status, race/ethnicity, sex/gender, and geographical location [30].
Yet, personalized medicine and its impact on health equity is not without its concerns. Health disparities have often been linked to disparities in education, race, and income. In 2002, the Institute of Medicine (IOM) Committee on Understanding and Eliminating Racial and Ethnic Disparities in Health Care released its report on the quality chasm that existed between racial and ethnic minorities and the quality of healthcare they received [31]. Advocates of personalized medicine say that the healthcare system will realize lower total costs of care. This will be a result of the efficiency in clinical care that personalizes therapies. These savings could then be used for the care of uninsured and/or low-income patients.
Yet, there are concerns that the benefits of personalized medicine will go more to patients who are higher income, and who are often not people of racial and/or ethnic minority groups, thus increasing the health disparities healthcare now struggles with. Three strategies have been proposed to proactively optimize the benefits of personalized medicine, lessen the potential worsening of health disparities [32]. The foundation of these strategies is that health equity be included as a cornerstone in the development of personalized therapeutics. A brief summary of the strategies proposed by Ward [32] are: 1) Data must be collected in a systematic way from access to and use of personalized therapies, including how use varies by those patients of different socioeconomic status. This would include eligible users but who are not yet users of these therapies. A wide variety data sources should be included such as existing databases and registries including those maintained at the state and federal level as well as partners such as pharmaceutical companies and patient advocacy groups. Access to these multiple sources of data will require the development of public-private partnerships. These partnerships will need to not only support the extraction and translation of data out of information systems for use in developing personalize therapies, but will require feedback loops that inform providers about utilization of these therapies and their success.
2) Dissemination models for developed therapies should include strategies that include outreach to patients who are not likely to have access to medications and healthcare, e.g., uninsured vs. insured. Strategies should serve to open access to patients who have been historically disadvantaged in access to healthcare and medical treatment without putting previously non-disadvantaged groups at risk for reduced access to personalized treatments.
3) Policies that foster the equitable development and access of medications, therapies, and devices that would be used in personalized medicine. Ward proposes that linking these developments to reimbursement and patent protections may establish a more level playing field for developers such as pharmaceutical companies may help assure that all patients may be eligible for personalized medicine approaches.

Conclusions
Through this review, we aim to provide an overview of personalized medicine, offering perspectives in four important, influencing topics: 1) the availability of "big data" and the role of biomedical informatics in personalized medicine, 2) the need for interdisciplinary teams in the development and evaluation of personalized therapeutic approaches, 3) the impact of electronic medical record systems and clinical data warehouses on the field of personalized medicine, and 4) strategies that will increase the impact of personalized medicine as a tool to promote health equity. Ongoing challenges to the translation of biomedical information into applied clinical practice of personalized medicine include reclassification of disease states, scientific challenges such as determining which genetic markers have the most clinical significance, and policy regulation of genetic testing. Health informaticians are critical partners in overcoming these challenges, working with clinicians and researchers, to identify and implement data analytics, including the development of predictive data mining strategies, to enable the use of clinical data repositories for personalized therapies. Some have expressed concerns that personalized medicine may more significantly benefit those patients who have greater access to care, further deepening already existing health disparities and health inequities, However, there is a significant opportunity for personalized medicine to increase treatment accuracy and decrease social and scientific bias.