
Comparison of Various Classification Techniques Using Different Data Mining Tools for Diabetes Diagnosis 87
examples of class 1 and 268 of class 2.
This data set is extracted from a larger database origin-
nally owned by the National Institute of Diabetes and
Digestive and Kidney Diseases. The purpose of the study
is to investigate the relationship between the diabetes
diagnostic result and a list of variables that represent
physiological measurements and medical attributes. The
data set in the UCI repository contains 768 observations
and 9 variables with no missing values reported. How-
ever, as some researchers point out, there are a number of
impossible values, such as 0 body mass index and 0
plasma glucose. Furthermore, one attribute (2-hour se-
rum insulin) contains almost 50% impossible values. To
keep the sample size reasonably large, this attribute is
removed from analysis. There are 236 observations that
have at least one impossible value of glucose, blood
pressure, triceps skin thickness, and body mass index.
There are nine variables, including the binary response
variable, in this data set; all other attributes are numeric-
valued. The attributes are given below:
1) Number of times pregnant
2) Plasma glucose concentration a 2 hours in an oral
glucose tolerance test
3) Diastolic blood pressure (mm Hg)
4) Triceps skin fold thickness (mm)
5) 2-hour serum insulin (mu U/ml)
6) Body mass index (weight in kg/(height in m)^2)
7) Diabetes pedigree function
8) Age (years)
9) Class variable (0 or 1)
4. Methodology
We use different classification techniques in this research.
Those techniques with running parameters are given
below:
4.1. Multilayer Perceptron
Multilayer perceptron (MLP) [11] is one of the most
commonly used neural network classification algorithms.
The architecture used for the MLP during simulations
with PIDD dataset consisted of a three layer feed-for-
ward neural network: one input, one hidden, and one
output layer. Selected parameters for the model are:
learningRate = 0.3/0.15; momentum = 0.2; randomSeed
= 0; validationThreshold = 20, Number of Epochs = 500.
4.2. BayesNet
BayesNet [12] learns Bayesian networks under the pre-
sumptions: nominal attributes (numeric one are pre-de-
scretized) and no missing values (any such values are
replaced globally). There are two different parts for es-
timating the conditional probability tables of the network.
In this study we run BayesNet with the SimpleEstimator
and K2 search algorithm without using ADTree. K2 al-
gorithm is a greedy search algorithm that works as fol-
lows. Suppose we know the total ordering of the nodes.
Initially each node has no parents. The algorithm then
incrementally adds the parent whose addition increases
most of the score of the resulting structure. When no ad-
dition of a single parent can increase the score, it stops
adding parents to the node. Since an ordering of the
nodes is known beforehand, the search space under this
constraint is much smaller than the entire space. And we
do not need to check for cycles, since the total ordering
guarantees that there is no cycle in the deduced structures.
Furthermore, based on some appropriate assumptions, we
can choose the parents for each node independently.
4.3. Naïve Byes
The Naïve Bayes [12] classifier provides a simple ap-
proach, with clear semantics, representing and learning
probabilistic knowledge. It is termed naïve because is
relies on two important simplifying assumes that the pre-
dictive attributes are conditionally independent given the
class, and it assumes that no hidden or latent attributes
influence the prediction process.
4.4. J48graft (C4.5 Decision Tree Revision 8)
Perhaps C4.5 algorithm which was developed by Quinlan
[13] is the most popular tree classifier till today. Weka
classifier package has its own version of C4.5 known as
J48 or J48graft. For this study, C4.5 classifier used in
TANAGRA platform and J48graft classifier used in
WEKA platform. J48graft is an optimized implemen-
tation of C4.5 rev. 8. J48graft is experimented is this
study with the parameters: confidenceFactor = 0.25;
minNumObj = 2; subtreeRaising = True; unpruned =
False. C4.5 is experimented in this study with the pa-
rameters: Min size of leaves = 5; Confidence-Level for
pessimistic = 0.25. Final decision tree built from the al-
gorithm is depicted in Figure 1.
4.5. Fuzzy Lattice Reasoning (FLR)
The Fuzzy Lattice Reasoning (FLR) classifier is pre-
sented for inducing descriptive, decision-making knowl-
edge (rules) in a mathematical lattice data domain in-
cluding space RN. Tunable generalization is possible
based on non-linear (sigmoid) positive valuation func-
tions; moreover, the FLR classifier can deal with missing
data. Learning is carried out both incrementally and fast
by computing disjunctions of join-lattice interval con-
junctions, where a join-lattice interval conjunction cor-
responds to a hyperbox in RN. In this study we evaluated
FLR classifier in WEKA with the parameters: Rhoa = 0.5;
Number of Rules = 2.
Copyright © 2013 SciRes. JSEA