Correlation of Asthma Symptoms with Prevalence of Indoor NO 2 Concentration in Kuwait

The research literature provides strong evidence that characteristics of buildings and their indoor environments influence the prevalence of several adverse health effects. Kuwait is considered one of the countries with harshest weather conditions. It is estimated that Kuwaitis spend most of their times indoors. Indoor environments quality should be taken seriously since indoor allergens and irritants can play a significant role in determining the health of households. In this research we propose to profile synergistic interaction between morbidity differentials and air quality in Kuwait residential area. The objective of this project is to investigate the relation between indoors air quality and asthma symptoms. Data mining techniques are employed to discover the correlation between indoor air quality measures and asthma symptoms and trigger. The main trigger considered in this research is the concentration of nitrogen dioxide. Some other triggers investigated are dust mites, smoking and others.


Introduction
Collecting real environment data about Kuwait is considered one of the most important tasks in developing a suitable environment.This is especially true when such data can be used in solving actual problems related to population health.Asthma is considered one of the common Sick Building Syndrome symptoms.Kuwait is considered one of the countries with harshest weather conditions.It is estimated that Kuwaitis spend most of their times indoors.Indoor environment quality should be taken seriously since indoor allergens and irritants can play a significant role in determining the health of households.It is important to recognize potential asthma triggers in the indoor environment and reduce exposure to those triggers.Some of most common indoor asthma triggers include secondhand smoke, dust mites, mold, cockroaches and other pests, household pets and combustion byproducts.Indoors, nitrogen dioxide (NO 2 ) can be a byproduct of fuel burning appliances, such as gas stoves, gas or oil furnaces, fireplaces, and gas space heaters.For outdoors, nitrogen dioxide can produced by internal and external combustion vehicles, desalination plants, recycling centers, waste burning facilities and furnaces, factories and petrochemical facilities.Data mining advanced techniques will be used to process and extract useful, hidden rules and relationships from the data that we will collect.
Data mining is successfully utilized in many situations where a better insight is required to make a better decision.In this project, we used the data mining software, Weka and the different data mining algorithms in it to make high quality decisions to determine whether Nitrogen Dioxide affects asthma patients.

Literature Review about Asthma
In [1] asthma is a multifactor disease that is likely to be the result of interactions between a genetically determined predisposition to allergic diseases and environmental factors that serve to enhance allergic inflammation and target inflammation to the lower airway.
In [2] The Expert Panel of the National Asthma Education and Prevention defined asthma as a chronic inflammatory disorder of the airways, in which many cells and cellular elements play a role, in particular mast cells, eosinophils and, T-lymphocytes, neutrophils, and epithelial cells.
In [3] Asthma may present in all age groups, but most studies suggest that the majority of patients' asthma will present before puberty.Asthma is the most common chronic medical condition affecting children.The increase in case of childhood asthma is of critical health concern because the onset of asthma in children is particular debilitating.Although with some children, symptoms will decrease in adulthood, approximately 50% will continue to be affected throughout their life.The effects of asthma on children and adolescent social role function, including children's ability to play, participate in school activities, and construct meaningful social and family relationships, are important to consider in accounting for the overall burden of this disease.
In [4][5][6][7][8][9][10][11] it is shown that people spend approximately 90% of their time indoors, where the levels of some pollutants often are higher than they are outdoors.Indoor pollutants that can trigger asthma include house dust, environmental tobacco smoke, pet dander, incenses, and molds.

Asthma and Studies Done in Kuwait
There have been a number of studies done on respiratory problems in Kuwait in the aftermath of the burning of oil wells in Kuwait after the first Gulf war.Subsequently, Kuwait also participated in the International Study of Asthma and Allergies in Childhood [12][13][14][15].Recently, there has been increased interest in the effect of indoor allergens on asthma.Studies have been carried out on the effect of moulds and pets on asthma patients.

Asthma Triggers
Some of the most common indoor asthma triggers include secondhand smoke, dust mites, mold, cockroaches and other pests, household pets, and combustion byproducts.The following are a brief description of each asthma trigger.

1) Secondhand Smoke
Secondhand smoke is a mixture of smoke from the burning end of a cigarette, pipe or cigar and the smoke exhaled by the smoker that is often found in homes and cars where smoking is allowed.

2) Dust Mites
Dust mites are too small to be seen, but can be found in almost every home in mattresses and bedding materials, carpets, upholstered furniture, stuffed toys and curtains.

3) Mold
Mold can grow indoors when mould spores land on wet or damp surfaces.In the home, mold is most commonly found in the bathroom, kitchen and basement.

4) Cockroaches and Other Pests
Cockroach body parts, secretions and droppings, and the urine, droppings and saliva of pests, such as rodents, are often found in areas where food and water are present.

5) Warm-Blooded Pets (Such as Cats and Dogs)
Pets skin flakes, urine and saliva can be found in homes where pets are allowed inside.

6) Nitrogen Dioxide
Nitrogen Dioxide can be a byproduct of indoor fuelburning appliances, such as gas stoves, gas or oil furnaces, fireplaces, wood stoves and unvented kerosene or gas space heaters.NO 2 is an odorless gas that can irritate your eyes, nose and throat and cause shortness of breath.
In people with asthma, exposure to low levels of NO 2 may cause increased bronchial reactivity and make young children more susceptible to respiratory infections.Long-term exposure to high levels of NO 2 can lead to chronic bronchitis.
The Environmental Protection Agency uses its Air Quality Index to provide general information to the public about air quality and associated health effects.An Air Quality Index (AQI) of 100 for any pollutant corresponds to the level needed to violate the federal health standard for that pollutant.
For nitrogen dioxide, an AQI of 100 corresponds 0.053 parts per million (averaged over 24 hours) -the current federal standard.Short-term health effects for NO 2 do not occur until index values are above 200; therefore, an AQI value is not calculated below 201 for NO 2 .An index value of 201 for NO 2 corresponds to an NO 2 level of 0.65 parts per million (averaged over 24 hours).As shown in Table 1 as below, the EPA Air Quality Index, Levels of Health Concentration and Cautionary Statements are as follows: Children and people with respiratory disease, such as asthma, should limit heavy outdoor exertion.

-500 Hazardous
Children and people with respiratory disease, such as asthma, should limit moderate or heavy outdoor exertion.

Waikato Environment for Knowledge Analysis -Weka
The Weka workbench is a common research tool consisting of state-of-the-art machine learning algorithms and data processing tools.It is flexible allowing a variety of methods to be applied on the datasets easily.Weka is developed at the University of Waikato in New Zealand."Weka" stands for the Waikato Environment for Knowledge Analysis.The system is written in Java, an object oriented programming language that is widely available for all major computer platforms, and Weka has been tested under Linux, Windows, and Macintosh operating systems.Java allows us to provide a uniform interface to many different learning algorithms, along with methods for pre-and post-processing and for evaluating the result of learning schemes on any given dataset.There are several different levels at which Weka can be used.First of all, it provides implementations of stateof-the-art learning algorithms, it also includes a variety of tools for transforming datasets.By Weka we can preprocess a dataset, feed it into a learning scheme, and analyze the resulting classifier and its performance, all without writing any program code at all.The most important resource for navigating through the software is the online documentation, which has been automatically generated from the source code and concisely reflects its structure.It is very helpful because it is the only complete list of available algorithms and it is always up to date.
One way of using the workbench is to apply a learning method to a dataset and analyze its output to extract information about the data.Another is to apply several learners and compare their performance in order to choose one for prediction.Weka contains methods such as classification, clustering, association rule mining and attribute selection.

Scope of the Problem
This project aims at identifying the correlation of the concentration of nitrogen dioxide and Asthma symptoms in Kuwait.To achieve this goal, a number of steps were performed.A set of real data from asthma patients in Kuwait was collected.The data was cleaned and errors were eliminated to obtain only the interesting and relevant data.Once such data was identified, it was tested using data mining techniques to prove the effect of nitrogen dioxide on asthma patients.

Sample Size Determination
If x is used as estimate of m, we can be 100(1 − α) percent confidant that the error E = |x − m| will not exceed a specified amount of error E when the sample size is The more reliable the sample, the lower the value of STD will be and the narrower the confidence interval will be.The research has shown that it is seldom necessary to sample more than 10% of the population to obtain adequate confidence (if the population is above 1000).The research also indicates that confidence intervals narrow sharply when very small sample size are increased, up to about 100 respondents [17,18].
This means that the maximum practical size of a sample has absolutely nothing to do with the sample size of the population if it is many times greater than the sample.Hence, we targeted 80 houses distributed through Kuwait.

Implementation of Asthma Project
The following are the phases of the project implementation.

Phase I: Data Collection
The first phase of this project was data collection.A target of data of 40 asthma patients and 40 non-patients distributed throughout Kuwait was set and a 7 -page questionnaire was designed in Arabic to collect data about the various habits and indoor living conditions of the asthma patents.To collect this data, we needed addresses of 40 patients and 40 non-patients.An official letter was sent from Kuwait University to the Ministry of Health requesting residential data of asthma patients but no response came from the ministry for several months.After this we decided to collect data directly from the patients in form of a survey.We now needed 80 persons who voluntarily agreed to take part in this study.To collect the residential addresses of the patients, the following three methods were utilized: 1) Data Collection through Private Hospitals and Clinics: Twelve Private Hospitals and Clinics dealing with asthma patients and located across different areas of Kuwait were identified.Official letters giving details of this survey were prepared and sent by fax as well as local courier to these medical centers.This action was followed up by phone contact for further explanation of the survey.A few clinics responded positively.A consent form was designed and sent to these clinics to be filled by the patients.This consent form stated that the patients were ready to take part in this survey voluntarily and would allow university personnel to enter their homes to measure indoor NO 2 gas levels.It was also used to record the address and phone numbers of the patients.

2) Data Collection through Website:
For data collection directly by the patients on a website was created giving details of the survey and requesting voluntary participation and residential address entry from the website.
3) Data Collection through Personal Contacts: Three volunteers were recruited among the university students.They worked through personal contacts and word of mouth publicity to motivate patients to take part in this survey.
Out of the above three methods, the 3rd one generated the maximum interest, and the response from the first two methods was very low.This was reported to be due to reluctance on patient's part on allowing strangers inside their houses to measure the gaseous data.After continuous and sustained effort, the data from residences of 80 persons were collected.
A total of 4 monitors were used in the measurement of NO 2 levels inside residential premises.Out of these two were Q-RAE PLUS Four-Gas Monitors and two were iTX Multi Gas Monitors.

Phase II: Data Cleansing
In the data cleansing process, we excluded some attributes because data of these attributes was missing in most of survey forms.The attributes that we selected after cleaning the data are the following:  Nearest_distance_MainRoad A total of 80 survey forms -40 asthma & 40 non asthma households were obtained.The following (Table 2) is a summary of the survey forms.

Phase III: Data Processing
In order to process the data, the Weka workbench was used which is common tool consisting of state-of-the-art machine learning algorithms and data processing tools.
Weka contains methods such as classification, clustering, association rule mining and attribute selection.The ones of interest to this project are classification and attribute selection [19].

Classification
In this section, we outline different classification methods we used.

Classifier Subcategories
Classification allows the user to select a learning algorithm, also called a classifier, to be applied to the dataset.
The output can then be analyzed in order to learn more about the nature of the data.Weka provides a large collection of learning algorithms for users to choose.The subcategories that we selected for testing are Bayes, Function, Trees, and Rules.
A comparison between all of the algorithms is made to check which one gives the best success percentage under each subcategory and then use these success percentages to evaluate the different attributes we have in this project.
Along with classification, attribute selection will also be used.Attribute selection chooses a set of attributes that most influence the decision making process.It operates by searching the space of attributes and evaluating the attributes.In Weka, an attribute evaluator must be specified along with a search method.The attribute evaluator to be used is the Correlation-based Feature selection, which is provided in Weka under the name CfsSub-setEval.This algorithm calculates the predictive ability of each attribute and the relationship with others.It then picks those that identify the different classes best and have the least inter-relationship.The search method used is Genetic search which is a simple genetic algorithm that uses parameters like population size and probabilities of crossover.

Test Options
To evaluate the performance of each classifier, the percentage of correctly classified instances was used.Weka provides a number of test options, the ones used are testing using two sets: percentage split, and the cross validation technique.The first strategy allows the user to split data into a training set and a separate independent test set.Each set contains data that depends on the percentage the user entered for splitting.Percentage split 90% will be used in our project, which means that 90% of the data is used for training and 10% of the data for testing.For cross validation, this method reserves a certain amount for testing and uses the rest for training.This method requires the user to enter a fixed number of folds before starting.For example, if 10 is entered, the data is divided into 10 equal groups and one group is used for testing while the rest for training.The groups alternate at being the test set so that all groups are used for testing.Therefore, the learning procedure is carried out 10 times and the average of the results is calculated.
Also different random seed value will be used with both strategies.Random seed value is used to randomize the dataset before it is split into train and test set.
In this project, these two procedures were used.The percentage split set procedure and the 10-fold cross validation.Each algorithm run 10 iterations and the average was taken.

Test Details
Classification was performed on the attributes we obtained after cleaning the data.Four different classification subcategories were used as mentioned earlier.Classification was done at first on all the attributes that we obtained after cleaning the data to compare between the different algorithms under each subcategory and to check which algorithm gives best success percentage under each sub category.In next phase test each trigger was tested to check its success percentage under the chosen algorithms.Also each trigger was tested correlated with nitrogen dioxide to check the effect of nitrogen dioxide on specific trigger.

Classifier Test
The test were performed on data of 80 persons, 40 asthma patients and 40 non asthma patients with following attributes: (NO 2 , Carpet, Smokers, Incense, Kitchen Inside Home, Heat Kind, Coal Kind, Neatest Industrial Area, Nearest distance Main Road) As mentioned earlier, the tests were applied on the data using 10-fold cross validation strategy and percentage split 90%.The data of the 80 persons were fed into the classifier and the algorithms from the 4 subcategories of classifiers were tested on all the attributes.
By comparing the performance of the algorithms under the 4 subcategories using both the 10-fold cross validation and percentage split-90%, results in Table 3 shows that clearly show that under Bayes category, the best algorithm performance is Navie Bayes Simple, under Function category, is Radial Basis Function Network (RBF Network), under Trees category is Alternating Decision Tree (AD Tree) algorithm, and under Rules category is Nearest Neighbor Generalized Exemplars (NNge) algorithm.The results above clearly show that nitrogen dioxide has effect on the other triggers.The success percentage of any attribute in this project has a higher value when it is correlated with nitrogen dioxide.This indicates the importance of the effect of the concentration of nitrogen dioxide on asthma symptoms, see Table 4.

Classification Test on Selected Attributes
Classification test will be done on selected attributes after removing three attributes from the list.The attributes that removed are (Nearest Industrial Area, Nearest Main Road, and smokers), we believe that Nearest Industrial Area, and Nearest Main Road effect asthma symptom but they are removed because in the data that is collected, the distances of the nearest industrial area and the main road are far from the houses of the asthma patients.Also for Smoker trigger, we believe that it has an effect on asthma patients but it is removed because most of questioners that we collected are for non smokers' patients.Classification testes will be performed on the following attributes (NO 2 , Carpet, Incense, KitchenInsideHome, Coal_ Kind).The results of the 10 iterations tests using cross validation folds 10 and percentage split 90% are summarized in Table 5.
By comparing the results, it is noticed that the best success percentage which is 95.75 is obtained by using RBF Network algorithm.

Attribute Selection
Attribute selection will also be used to choose the set of attributes that most influence classification process.The attribute evaluator used as mentioned earlier is the Correlation-based Feature selection (CfsSubsetEval).The search methods used is Genetic search and Best First method.It is noticed that the attribute that is selected by the selection attribute category using Genetic search and Best First method is nitrogen dioxide.

Conclusions
The goal of this project was to investigate the relation between indoors concentration of nitrogen dioxide and asthma symptoms.The goal has been achieved.The classifier tests performed on 4 subcategories which are Bayes, Function, Trees, and Rules, selected Navie Bayes Simple, RBF Network, AD Tree, and NNge as the best classifier algorithms yielding the highest classification correctness in our project.The classification correctness using 10 fold cross validation for Navie Bayes Simple, RBF Network, AD Tree, and NNge in order was 77.42%, 83.16%, 71.45% and 82.86%.These percentages were obtained by using all the attributes we have in this project after applying the data mining cleaning process.
A number of classification tests were done using each attribute we have in individual and then correlated it with nitrogen dioxide to compare the success percentages.It was found that when using any attribute correlated with nitrogen dioxide always yield to higher percentage than the success percentage of using individual attribute.
In next phase of tests, some attributes were removed, and another classification tests were done on the rest of the attributes.It was noticed that the RBF network has the best performance over the other algorithms.The percentage obtained using this algorithm was 95.75.This result validates our hypothesis that there is relation between indoors concentration of nitrogen dioxide and asthma symptoms.By discovering these facts which prove the negative effect of such concentrations of nitrogen dioxide in Kuwait's environment, we hope to raise awareness to this issue so suitable action can be taken.