Extraordinary Potential of High Technologies Applications : A Literature Review and a Model of Assessment of Head and Neck Squamous Cell Carcinoma ( HNSCC ) Prognosis

Head and neck squamous cell carcinoma (HNSCC) is the sixth most common cause of cancer mortality in the world and the 5th most commonly occurring cancer (Siegel, R. 2014). In the last few decades a growing interest for the emerging data from both tumor biology and multimodality treatment in HNSCC has been developed. A huge number of new markers need to be managed with bio-informatics systems to elaborate and correlate clinical and molecular data. Data mining algorithms are a promising medical application. We used this technology to correlate blood samples with clinical outcome in 120 patients treated with chemoradiation for locally advanced HNSCC. Our results did not find a significant correlation because of the sample exiguity but they show the potential of this tool.


Introduction 1.Data Mining
With the term "Data mining", people usually intend a set of algorithms to discover hidden knowledge from a very large amount of heterogeneous data and to group data into categories.Data mining was originally developed for economy field to help managers in their decisions but after few years it was progressively introduced in other fields; in last ten years their application in medicine was largely increased in particular for the elaboration of signals such as EEG, ECG, etc. [1] [2].The main objective of this paper is to explain the data mining tool and to provide an example of its application in clinical practice.We also provided a brief review of data mining application in clinical practice.
In this study we looked for a correlation among clinical outcome (tumor progression) and blood tests (white blood cell-WBC, C reactive protein-PCR).Therefore we applied data mining algorithms to blood test.Obviously, this tool shouldn't be considered as the absolute method to detect progression but it may play a prognosticator role in providing an elaboration of several variables.Normally blood tests, imaging and biomarkers are used to evaluate patient state of disease.However, recent data suggest that high levels of inflammatory markers indicate a high probability of progression.

Medical Data
Head and neck carcinoma (HNC) is the sixth most common cancer worldwide [3].
Despite recent advances in the diagnosis and treatment of head and neck squamous cell carcinoma HNSCC, there has been little evidence of improvement in 5-year survival rates over the last few decades [4].The most important risk factors are heavy exposure to alcohol and smoking and human papilloma virus (HPV) infections.These last two are also prognostic factors [5].Other common prognostic factors include T and N stage, synchronous multiple primary cancers, patients performance status and age [6].
Correlations with blood sample values have not been reported although the role of PCR and infiammation is now well known to contribute to both pathogenesis and toxic deaths.
The goal of this paper is to provide an example of data mining application in HNSCC treated with chemoradiation (CRT) or bio-radiation (bio-RT) at the S. Croce General Hospital in the years 2010 and 2011 in daily clinical practice.

Methods
We analyzed blood samples results of 120 patients, all patients were treated with chemo-radiation or bio-radiation at the S. Croce General Hospital in the years 2010 and 2011 in daily clinical practice.We analyzed results of white blood cell (WBC), hemoglobin (HB), PCR and lactate of each patients pre during and post treatment.
First steps of this work were loading and cleaning original data; original data format was an excel file with 57 columns, in this file were stored many information about patients, treatments, progressions, exams but not all of these information were helpful for this analysis.We started saving the excel file as csv (Comma separated text format) another format more simple to manage and use in database contexts.The second step was to create a "tablespace" and a user on an Oracle Database Schema (Using Oracle Express Edition 11 g) and load all data in a table called "tmpdata", using PL/SQL Developer Text data import tool, with the same structure of the original data.After this were created two new tables: a table to store patient's information such as Name, Surname, birth date; a second table with exams.In user table was generated a unique code called "id" for each patient, this code is used in the second table to link exams with a specific patient without using his personal data.The exam This table was populated from the tmpdata cleaning exams data for example "S-LDH" value was the same of Sldh, SLDH and so on; exam age was calculated from exam data minus patient birth's date. of patients mining processor don't care of names and surnames so that information could be safely removed without effect negatively the elaboration and increasing the running of algorithm because it has to process less fields.Another important thing is to add pre-calculated data for example if we want to analyze data in witch are relevant the days after an event we should calculate this information before processing so processor has an important field in plus to help it in taking decisions.

Creating a Model of Data
Data mining algorithms groups' data into a limited set of groups called "Classes" the basic rules are: an element must stay only in one class; elements in the same class are similar and they are different from element of other classes.[6] To Classification algorithm analyze the attribute of an object (every data element is an object for example a patient with his exams) and decide the class of an element.The core of data mining is the creation of a model of data; it is a decisional model used by mining to choose in which class put new elements.There are several algorithm to generate models one of the most popular is the "Decision Tree" model; it has three elements: "decision node", "leaf", "branch" the model created is similar to a tree where there is an initial node (often called root) with two or more branch, a branch can has a decision node with others branch or a leaf that is the ending point.The most important part is the decision nodes; every decision node has a set of binary rules such as "major than…" "equals to" and so on.To generate this model the algorithm needs some data in witch is know the class (at least one element for each class), this set of data is called "training set".Now an example of data and generated model was given: Data: Decision tree: Based on generated model is irrelevant type of object.This tree is a two level's three.

Testing Generated Model
To generate a model usually people submit at least 1/3 of total data and use the remaining to test the model.The training data (1/3) contains also the associated class, other data, often called "test set" contains also this information but it isn't submitted to algorithm.The algorithm takes test set and the previous generated model and returns an associated class for each data record; after this the automatic associated data is compared with real association if confidence is better than 90% the model is usable otherwise we retry to generate model using another set of training data (training data are chose selecting random records from full data, remaining data became test set).

Applying Generated Model to New/Complete Data
When the model is created and tested and it's considered stable (confidence factor equals of better than 90%) the model can be applied to full set of data and to new data to decide the correspondent class for example if we have a model than can distinguish between healthy patient or not we can use it to discover the health status of a submitted patient.A model is never perfect so is a good procedure to update periodically model with new data.

Clustering
Until now we talk about classification but there is another important group of algorithms for data mining called "cluster algorithms".Main aim of these algorithms is to discover automatically classes and store data into them.They analyze a test set without class and put similar data to same class; a class is generated when a single data is very different from others in other classes; the process is repeated recursively until there is a stable classification.
The core of algorithm is the "distance function" a function that takes two data and returns a value that represents the distance between the two data, in other words it represents of much two objects are different.Clustering is used when we don't know classes, for example we could analyze the purchases of users of a credit card to classificate users in some categories.

Data Mining Clinical Application
Several data mining approaches are routinely used in research work these include dose-volume metrics, equivalent uniform dose, mechanistic Poisson model, and model building methods using statistical regression and machine learning techniques.Their application in daily clinical practice could quicken the time lost to achieve information from biomarkers or physics or genetic variables [7] [8].
From a brief revision of literature in English language of cancer patients we concluded that software automated analysis will significantly reduce the overall time required to complete daily biological-radiological or physics studies (such as dose volumes studies in radiotherapy, microarray analyses and genetic elaboration).Many tools are available for automated digital acquisition of images of the spots from the microarray slide.

Conclusions
This study provides an example of future applications of high technology in oncology.In the era of microarray and personalized medicine these instruments are fundamental.Furthermore as the HNC patients clinical approach is well recognized to necessitate a multidisciplinary team (including ENT surgeons, radiation oncologist, medical oncologist, speech language specialist), the future global approach cannot work without a close cooperation between HT Engineers and biologists.
A correlation among elevated and reduced blood tests was not found.Data are too small to be interpreted but our analyses show the potential of this tool to evaluate correlations among a huge number of records.
table contains the following columns: Patient_code  id of the patient, linked with the patient's table; Age  the age of the patient when he/she information were recorded for first time; Exam_age  the age of the patient at the exam's day; Exam_type  the type of exam (for example S-LDH); Exam_result the value of exam; Target_value  a value used to indicate if the patient at that date and exam was in progression or not (initially empty).