A Case-Based Reasoning System for Aiding Physicians in Decision Making

Physicians gather a vast amount of information about patients’ medical procedures, treatments, insurance coverage, and other clinical data. Such information is crucial in formulating diagnosis or treatment plans for patients with similar traits. A Case-Based Reasoning (CBR) system has been developed to address the effective organization and retrieval of vital patient information to aid physicians in making decisions. Integers are used to uniquely represent various medical procedures, treatments, etc. In this research, a new algorithm is presented to retrieve suitable cases to recommend to physicians. The system is tested in a simulated environment and the results prove that the system can adapt to changes such as new medical procedures or treatments that take place in the medical field.


Introduction
Treatment facilities accumulate an enormous amount of patient data every single day. Such data includes but is not limited to 1) the nature of illnesses, 2) medical procedures patients underwent, 3) insurance coverage the patients had at various times, 4) time gaps between various events such as the time gap between visits to physicians, and 5) other details such as age, gender, date, and race. The utilization of this information is vital in aiding physicians in determining a successful diagnosis or treatment of patients with similar circumstances. Also, the type of insurance coverage a patient has determines which treatment procedures are permitted. When treating a new patient for an illness, the experience in treating a similar patient with the same or similar insurance coverage plays a vital role in deciding the best or allowable treatment plan. Similarly, the ethnicity of a patient is important in the diagnosis or treatment because some genetic conditions are common among ethnic groups [1]. Therefore, the collection of such information from various patients is vital in the development of a system that can aid the physicians in diagnosing or treating the incoming patients. The storage and retrieval capability of the traditional database structure [2] is not effective for this purpose.
Artificial Intelligence (AI) deals with the development of systems that mimic the behavior of humans, and it can play a vital role in aiding physicians in decision making. Case-Based Reasoning (CBR) [3], a sub-field of AI, deals with using past cases in solving similar problems. In CBR, previous cases (i.e., solutions to previous problems) are organized in the system, with suitable indexes, for possible future retrieval and usage. For a given new problem, the system checks existing cases and selects a suitable case (or parts of cases) to adapt. The adapted solution will be modified to match the requirements of the new problem. The modified solution will be tested and, if necessary, revised to make sure it is successfully solving the new problem at hand. The confirmed solution will be stored, with suitable indexes, in the case-base. This way, a CBR system improves its knowledge over time. The CBR cycle can be depicted as shown in Figure 1. CBR has been successfully applied to solving problems in many fields including law [4], design [5], e-commerce [6], and agriculture [7] because these fields are highly dependent on using past experiences to solve new problems.
There is no CBR system which can capture the details on symptoms, diseases, time gaps, medications, date, insurance coverage, age, sex, and race of various patients and apply this knowledge to new patients who need medical attention. The goal of this research is to develop such a system. This research is a revision and extension of the earlier research of the primary author [8]. As explained later, the algorithm presented in this research is substantially different from the one presented in the earlier research. While considering all the features in finding a match with the current patient's scenario, the algorithm presented in this research drops (i.e., deselects) some features if there is no match found with previous cases. Furthermore, in this system, physicians can also drop some features which are useless in the current scenario. The system presented in the earlier research [8] cannot do these.

Case-Based Reasoning in Medicine
Plenty of research has been done on successfully applying CBR for the medical domain. Gierl and Stengel-Rutkowski [9] presented a CBR system for diagnosing dysmorphic syndromes. Macura and Macura [10] developed a classification system for radiology images by indexing the images according to their radiologic content. Perner [11] presented a CBR system for image segmentation that can adapt to changes in image qualities and environmental conditions. Schmidt et al. [12] used CBR for trend prognoses for the monitoring of kidney function in an intensive care unit setting. Marling and Whitehouse [13] presented a prototype that prescribed neuroleptic medicines to patients with Alzheimer's disease with behavioral problems. A CBR system for therapy support for endocrinology was presented by Vorobieva et al. [14].
CBR is used for identifying meaningful groups of attention-deficit hyperactivity disorder people [15]. CBR is also used to support clinical researchers to identify inborn metabolic defects [16]. A CBR approach for diabetes management for Type 1 diabetic patients who are on insulin pump therapy was presented by Marling et al. [17]. Lin [18] used CBR and classification and regression tree techniques to increase the accuracy of liver disease diagnosis. Chuang [19] presented CBR and other machine learning techniques to enhance the efficiency of diagnosing liver disease. CBR is also used to describe a physician's expertise, intuition, and experience while treating patients having thyroid cancer [20]. Nasiri et al. [21] used CBR to analyze images and text from patient health records. A knowledge support system for asthma care services, using CBR, is presented by Tyagi and Singh [22]. Several research papers on CBR in health sciences were presented at the workshop on Case-Based Reasoning in the Health Sciences that was part of the Twenty-Third International Conference on Case-Based Reasoning [23]. Pesl et al. [24] used CBR for enhancing insulin bolus calculators for people who have Type 1 diabetes. Lamy et al. [25] presented a CBR system for breast cancer domain in which the system can visually present the similarities between a query and similar cases in the case-base.

Details of the System
In this research effort, the following facts are collected about each patient: symptoms, diseases (comorbid conditions), treatments (which include medical procedures and medications), age (i.e., age of the patient at the time of an event such as a symptom or treatment took place), time gap; for example, time gap between two successive visits to the physician or time gap between the identifi-cation and treatment of a disease, date (i.e., date on which some event such as a symptom or treatment took place), insurance coverage (i.e., insurance coverage at the time of an event such as a treatment took place), gender, race, and identity of the patient.
In this system, a date is represented using 2 digits for month, followed by 2 digits for day, followed by 4 digits for the year. Age is represented as a decimal number from 0 to 150 with a maximum of 2 decimal digits. For example, if a person's age is 30 years and 6 months, then it is represented as 30.50. The time gap between events is represented as a decimal number from 0 to 150. The calculation and representation of the time gaps are the same as that of age. Gender is represented as 0 for male, 1 for female, and 2 for others. Symptoms are represented as integers ranging from 151 to 10,000, each integer representing a unique symptom. For example, 151 represents fever, 152 represents headache, and so on. Therefore, a total of 9849 (i.e., 10,000 minus 151) different symptoms can be uniquely represented in this system. Diseases are represented as integers ranging from 10,001 to 100,000, thereby allowing the system to represent 89,999 different diseases. Treatments are represented as integers ranging from 100,001 to 200,000, thereby allowing the system to represent 99,999 different treatments. Insurance coverage is represented as integers ranging from 1 to 1000, where each number represents a specific coverage plan.
The identity of the patients ranges from 1 to 1,000,000, by a unique identification number for each patient. A returning patient's identification number will be the same as the one assigned to that patient when that patient visited the physician for the very first time.
A case (or plan) consists of a sequence of states as shown in Figure 2. In that figure, S1, S2, S3, S4, and S5 represent the states.
A "primary item" is one of the following: symptom, disease, treatment, or time gap. Each state contains one primary item or a set of primary items. For example, in Figure 3, State S3 contains one primary item, 10,002, which represents a disease. A state Si would precede another state Sj in a plan if the primary item in Si occurred before the primary item in Sj. In Figure 2, S2 precedes S3 because the primary item in S2 occurred before the primary item in S3. Note that in Figure 2, the primary items are not shown. Each state may also contain a set of primary items, instead of a single primary item. If two or more primary items of the same type occur on the same day then the set of those primary items will represent the primary item of the state created. For example, if two or more treatments are performed on the same day, the set of those different treatments will be the primary item of the state created.
If more than one type of primary items (for example, one or more symptoms and one or more treatments) are identified or performed on the same day then the states are created in the following order: the state containing symptom (or set of symptoms) precedes the state containing disease (or set of diseases), which in turn precedes the state containing treatment (or set of treatments). For example, in Figure 3, Symptoms 151 and 157 appeared on the same day, where the primary item of state S2 is the set consisting of these two symptoms. Disease 10,002 was identified on the same day, thereby causing State S3 to be created with 10,002 as its primary item. Treatments 100,001 and 100,004 were performed on the same day. Therefore, the set {100,001, 100,004} represents the primary item of the State S4 in Figure 3. As another example, Figure 4 represents the following facts: Symptoms 151 and 157 appeared on the same day; after 0.5 years, disease 10,002 was identified; after another 0.04 years, treatments 100,001 and 100,004 were performed on the same day.
Other items, such as age, date, race, insurance coverage, and the identity of the patient, are called "associate items" in this research. These "associate items" are present in each of the states, and they are not shown in Figures 2-4. In addition to the associate items, each state in a plan is assigned a unique integer. The successive states in a plan are assigned successive integers starting from 1 for the first state, 2 for the second state, and so on. In this system, there is exactly one plan for each patient, and the plan's length (i.e., its number of states) increases each time the patient visits the physician with a complaint. Each plan is organized as sub-plans, as in the previous research reported in [8].
For a patient looking for medical treatment, the system checks if the patient is new. If the patient is new then the patient's complaint is entered into the system to begin a new plan for that patient. After entering, if there are at least two states in the plan for that patient, the system works as explained in the algorithm below to provide recommendations taken from similar cases in the case-base. If the patient is not new, then the current complaint will be entered into the system by adding at least one new state to the existing plan for that patient. Then the system works as explained in the algorithm below to provide recommendations taken from similar cases.

Details of the Algorithm
The algorithm starts by considering the following: 1) all states in the plan for the  68 Intelligent Information Management current patient and 2) all the primary and associate items. Initially, a variable i is set to the total number of states in the plan for the current patient. Item-set represents the set of all primary and associate items. The items that can be dropped are provided in a list called the Priority-List. If time gap is not an element in Item-Set then the state(s) that contain time gap will be removed from the current patient's most recent states. Another variable j is set to represent the total length of the plan for the current patient. If j is greater than or equal to 2, that is if there are at least two or more states in the plan, then the following will be done else no recommendation will be made by the system.
The system retrieves plans from its case-base and put them in a set C, which is called the conflict set. Each plan in C must fulfill the following two requirements: 1) the length of that plan is at least (j + 1) and 2) for each of the first j states (if the states are counted from left to right, i.e., from least recent to most recent, in the plan), the primary item and the associate item(s) of the Item-Set, and which is/are member(s) of the state, matches the primary item and the corresponding associate item(s) in the corresponding state of the plan for the current patient. If a primary item is a set such as a set of symptoms, then matching between two primary items is the equality of sets. If an item (either primary or associate) is not a set, then the similarity between two items is the numerical equality in all the corresponding fields of those two items. If C is empty then the algorithm repeats the above process after doing the following two operations: 1) from the Item-set, delete the leftmost item of the Priority-List then 2) remove that leftmost item from the Priority-List.
If there are no more items to delete from the Priority-List and the value of i is 2 and C is empty, the system will not recommend anything. This is because there will be no sequence of states in the plan for the current patient if the value of i is further reduced. Else, the system reduces the value of i by 1 then repeats the whole process.
If C is not empty at the end of a process repetition cycle then the primary item in the next state (i.e., in the (j + 1) th state), which is not a time gap and is presenting in the majority of the sub-plans in C, will be recommended. If there is a tie among such primary items then the system recommends the one which is in the most recent state. Nevertheless, if there is still a tie, then those competing primary items will be recommended as different options.
If the primary item in the next state (i.e., in the (j + 1) th state) in each of the subplans in C is time gap then the whole process will be repeated after reducing the value of i by 1 if i is greater than 2. This is because, asking the patient to wait for some time, as specified in the time gap, is not realistic. If i is equal to 2 then no recommendation will be made.  3) Si, ... , S1 are respectively the i most recent states in the plan of the current patient. Here, S1 is the most recent and Si is the least recent among all; P  A plan made up of the states Si, … , S1 in the given order; If time gap is not an item in Item-Set then, from P, remove all those states which contain time gap as their primary item, and call that resulting plan as P; j  Length of P; If j < 2 then do not recommend anything and then exit the process else do the following: C  The set of all those existing sub-plans in the system which fulfill the following 2 conditions (if time gap is not an item in Item-Set then any state that contains time gap as its primary item will be ignored): a) The sub-plan length is at least (j + 1); b) The primary item and the associate item(s), specified in the Item-Set, in each of the first j adjacent states of that sub-plan, respectively match with the primary item and the corresponding associate item(s) of the corresponding state } If C is empty and Priority-List has no items and if: a) i > 2 then i  (i − 1) and then go to Step 2; b) i = 2 then do not recommend anything and then exit the process; If C is not empty, then go to Step 5; /* DETERMINE THE WINNER AND RECOMMEND */ 5) Select the primary item, which is not a time gap and is in the (j + 1) th state and is also presenting in the majority of the sub-plans of C, and recommend it.
If there is more than one such primary item then recommend the one that is in the most recent state. Again, if there is more than one such primary item, then recommend those competing primary items as different options.
If the primary item in the (j + 1) th state of each of the sub-plans in C is time gap and if: a) i > 2 then i  (i − 1) and then go to Step 2; Intelligent Information Management b) i = 2 then do not recommend anything and then exit the process.
The physician can also manually drop/ignore either "time gap" (which is a primary item) or one or more associate items from consideration. If time gap is dropped from consideration, either by the system (as specified in the above algorithm) or manually by the physician, then any state(s) consisting of time gap will be ignored/removed from all the plans and sub-plans as if that state(s) never existed in those plans and sub-plans. If an associate item is dropped from consideration, either by the system (as specified in the above algorithm) or manually by the physician, then the system will ignore that associate item from all considerations (while making recommendations) as if that associate item never existed.
Dropping of items from consideration is an added system feature that does not exist in the earlier research reported in [8].
Example: Consider the following sample scenario. A patient's plan is: 151, 0.5, 10,002, 100,001, 0.44, 155. Associate items are not shown here. This plan conveys the fact that the patient had a symptom (whose identity is 151), the patient waited for 0.5 years, and the disease (whose identity is 10,002) is identified. Then the treatment (whose identity is 100,001) is applied. After another 0.44 years gap, another symptom (whose identity is 155) is identified. No exact match was found in the system to be recommended. Therefore, based on the above algorithm, the system dropped "time gap" from consideration. That is, 0. In this situation, the system alerted the physician that the patient may have disease whose identity is 10,008 because it is present in a majority of the plans in the conflict set and is in the 5 th , that is (j + 1) th state as specified by the above algorithm.

Adaptability to Change
Like most other fields, changes do happen in the medical field, sometimes more quickly than in other fields. For example, the best medicine or a procedure yesterday may not be the best option today due to the discovery of negative side-effects. In such situations, the old medicine or procedure is no longer good to recommend or perform. Therefore, unless the system adapts to such changes, it will soon be outdated and useless. The CBR system is designed to adapt to such changes automatically or manually.
In manual mode, the physician can block certain items from influencing the system in its recommendations. For example, if the physician notices that a drug is found to be no longer effective or has a negative side-effect, the physician can exclude it by specifying its name. Once the name is specified, the algorithm will not recommend that drug even if it stands at the top position in the recommendation process; instead, the algorithm will recommend the next best available choice. This manual mode is another added system feature that does not exist in the earlier research reported in [8].
In automatic mode, once the system's recommendations start falling monotonically (i.e., physicians are not buying those recommendations) and the failure rate reaches a predefined threshold, which is currently set to 50%, the system will not use any sub-plan whose starting date (i.e., the date on which the primary item in the first state of that sub-plan occurred) is before the start date of that monotonic fall. This way, the system avoids ineffective "old" plans from influencing the future recommendation process.
The system is tested for its operational performance, in regards to its adaptability to change, on a simulated environment with 300 different time gaps, 550 different symptoms, 755 different diseases, 1000 different treatments, 25 different insurance coverage plans, 10,000 different dates, 1000 different ages, 3 different genders, 8000 different plans, and 5 different races. The results can be seen in Figure 5.
In Figure 5, the number of plans is shown on the X-axis and the acceptance rates of the recommendations made by the system are shown on the Y-axis. The acceptance rate starts falling monotonically when the number of plans was at 400 (the corresponding date was 10/21/2019, which is not shown in the figure) and fell below the threshold value when the number of plans was 650. After the system automatically stopped using the plans dated before 10/21/2019, the system improved its acceptance rate.

Conclusions
A new system is developed to aid physician decision making when treating patients. This research is an extended and revised version of the primary author's   [8]. The system is based on CBR and organizes the information, related to earlier patients, in the form of cases to use them for addressing the new patients' medical issues.
The system needs to be tested in a real-world environment. At present, the system treats several items as binary. For example, if a patient has fever or not, or has pain or not, and so on. However, items such as fever or pain can be at various levels. For example, severe fever or light fever. Also, certain diseases and/or symptoms are related. For example, if a patient has a viral infection, then that patient may have fever due to that viral infection. Presently, the system is not representing those relationships. Finally, an item such as a medical operation itself can be treated as a case involving several steps/states but, in this system, it is represented as a single item. If such items are represented as cases, then how to integrate those cases with the cases the system currently has is another issue that needs to be addressed.