A Survey on Utilization of the Machine Learning Algorithms for the Prediction of Erythemato Squamous Diseases

: In the advent of ozone depletion the ultra violet radiation is the major cause of many skin diseases, which are leading to skin cancer. Early detection of skin cancer is more important to avoid human loses and especially the white skinned people are more affected. The Asian and African race people are less affected as they have melanin in their skin. The American’s are directly and more widely affected by the ozone depletion, due to this Erythemato Squamous Disease (ESD), which is predominant among the skin diseases. Due to technology advancements a large amount of data are deposited. In these data the information is hidden as raw data and with latest methodologies and technologies like Data Mining, neural networks, fuzzy systems, Genetic and Evolutionary computing a pattern can be evolved to study them. Guvenir, et al. [1] studied about ESDs and contributed 366 patients data with 34 features consisting of clinical and histopathological data in the dermatology dataset (The data taken from School of Medicine in Gazi University and the department of Computer Science in Bilkent University, Turkey; and it is available in the URL (http://archive.ics.uci.edu/ml /datasets/Dermatology) in the year 1998. This survey paper gives a brief description about the contribution of what in the field of ESDs in Chronological order from the year 1998 till 2013. In this paper we intend to contribute various machine learning algorithms dealing with ESDs.


INTRODUCTION identified visually but for histopathological features
The detection of ESDs is very difficult and it is a dataset consists of values from 0 to 3 except for age Herculean task as these diseases share common features feature.The value '0' represents there is no like clinical and histopathological features with very occurrence of the feature and '3' represents the high minor differences.Due to ozone depletion, the sun's ultra occurrence.The intermediate values represent the violet radiation cause many skin diseases and the severity of the features.The clinical features are of 12 and predominant skin disease is ESD.Many factors influence that of histopathological are 22.The value of family skin diseases like increasing bacteria involvement, climatic history is either 0 or 1 and the feature is not considered conditions like dampness or humidity, dryness, exposure much.For the missing parameter either median values is of more sunlight's ultra violet radiation, fungal considered or omitted.involvement, food habits, allergic to gases and chemicals, In this paper, section-2 discusses the various external infections, dead skin, dust, unwanted secretions, machine learning algorithm used to determine the ESDs.oral involvement,etc.,The dataset of ESDs consists of 366 The performance analysis of the various machines patients data with 34 features consisting of clinical and learning algorithm is given in section-3.Finally, the histopathological data in the http://archive.ics.uci.edu/ml/conclusion is given in section-4 followed by the datasets/Dermatology.The clinical features can be easily references.
a biopsy of the patient's sample is needed and the

Prediction of ESDs using Machine Learning
of critical dataset types and also proved that U*F the Algorithms: Some of the important machine learning computation time of the SOM phase is negligible and algorithms and its analysis is given below: it does not require apriori knowledge on the number Guvenir, et al. [1] developed a new classification algorithm.The "cluster-mining" algorithm was not algorithm, namely VF15 and implemented to able to distinguish two of the actual categories.differential diagnosis of ESDs.It has short training Again, they proved that U*F clustering method is an and classification times and the algorithm proved the alternative method to other clustering algorithms like robustness in noisy training instances and missing single-linkage, K-means, ward and so on.feature values.The missing feature values are Radwan E. Abdel-Aal, et al. [7] proposed that divide ignored during training and test instances.Guvenir, and conquer principle can be used effectively for et al. [2] proposed a Graphical User Interface (GUI) differential diagnosis of dermatology through tool for diagnosing ESDs which is based on three decomposing into simpler sub-problems and each is classification algorithms, namely Nearest Neighbor solved separately.In the same year, Loris Nanni [8] Classifier (NNC), Naive Bayesian Classifier using proposed an ensemble of SVM based on Random Normal Distribution (NBCND) and VF15.Again, the Subspace (RS) and feature selection is developed team of Guvenir, et al. [2] has also developed an and applied to ESDs.The results showed that the expert system based GUI for diagnosing ESDs.
average predictive accuracy obtained by a "stand-Shenghuo Zhu, et al.only 4.61% and out of six diseases Psoriasis is 2.47% Ubeyli, et al. [5] proposed a new approach for predominant.The most common in the Eczema estimating the ESDs based on ANFIS.For improving dermatitis group is seborrheic dermatitis (29.76%), the higher accuracy, they have used seven classifiers patient's origin is from Arab (98.01%) and non Arabs instead of six classifiers.The seventh classifier is is (1.99%).Psoriasis was 53.4%, Lichen Planus was taking all the output of the six classifiers respectively (24.7%).The comparison of the percentage of the for six ESDs as inputs and it classified the exact incidences for some common skin diseases in Riyadh disease.The proposed approach is based on fuzzy and other regions of Saudi Arabia and it is given in input values with neural network capabilities.The table 1. White skin persons are widely affected by all proposed system achieved more accuracy rates than the forms of Skin Tumors which is given in table 2. that of simple neural network model.The results of The skin diseases do not affect people based upon six classifiers are combined and given to the seventh the age as in table 3. classifier.The classification accuracy of this model is Samy S. Abu Naser et al. [12] proposed an expert raised to 95.5%.system using Artificial Intelligence (AI), to help Fabien Moutarde, et al. [6] proposed an automated dermatologists in diagnosing some of the skin "flood-fill segmentation" method and it is a new diseases with the help of an interface engine and a clustering method of the U*-matrix of a Self knowledge base.This proposed expert system is not Organizing Map (SOM) after training.They found for a specialized disease but it can be used to that U*F method is performing better for the wide set diagnose nine skin diseases.Übeyli, et al. [13 and 27] of clusters, making it a real "cluster-mining"  [14] proposed the Rough Sets upon SVM with IFSFFS (Improved F-score and of data-mining technique and the classification Sequential Forward Floating Search) which selects accuracy is high with values more than 98% in some optimal feature subset by considering the instances.The classification results shows that the advantages of wrappers and filters.Finally, the clinical feature shows only 50-60% accuracy when classification accuracy is evaluated with the help of age feature is added and histopathological features 14 features, which includes both histopathological produced only 85% approximately.Übeyli [15] and clinical features.FELM based approach is reached to 94% before data method namely IGSBFS (Information Gain and preprocessing and the same is increased to 99.02% Sequential Backward Floating Search), which after data preprocessing.But, the total computational combines the advantages of filters and wrappers to time is less than 1 second, where as the average select the optimal feature subset from the original computational time for other machine learning feature set based on a diagnostic model of Naïve algorithms is 124 seconds.Again, Badrinath et al.

Bayes. The classification accuracy of the proposed
[26] developed AdaBoost and Hybrid classifier method is 98.9% with only 10 features.
methodologies along with Apriori and ARs data Ak n Özçift, et al. [22] suggested a Genetic Algorithm preprocessing.In this case, the classification (GA) based FS algorithm combined in parallel with a accuracy is increased to 99.26% and the BN classi?er based on GA wrapped Bayesian computational is very high than FELM.Network (BN) Feature Selection (FS).The accuracy of BN algorithm is increased due to GA based heuristic Performance Analysis: Some of the important machine search of 10-fold cross-validation and the learning algorithm listed above and its performance classification accuracy is 99.20%.The accuracy of measures are given in Table 4. Table 4 shows that the performance accuracy is REFERENCES varying from 89% to little bit above of 99%.Clearly, the performance of accuracy is reached to 93% before data preprocessing and the performance of accuracy is reached to above of 99% if data preprocessing is used.

CONCLUSION
In this paper, machines learning algorithms used in determining the ESDs have been discussed.The data taken from medical and skin related information involves numerous and complex datasets.The data preprocessing helped us to reduce dimensionality of the given datasets considerably and hence it indirectly helped us to reduce the time complexity.In this paper, we have consolidated some of the important works which are related to machine learning algorithms based on the prediction of ESDs.Since the machine learning algorithms have exponentially used in the last one and half decades in many of the medical and skin related problems and hence in this paper, we have summarized some of the important works related to machine learning algorithms based on the prediction of ESDs.The performance analysis in terms of diagnosis of the accuracy of the ESDs is presented in table 4.
[3] aimed at studying and alone" SVM or by a RS ensemble of SVM is less identifying the pattern distribution and intrinsic when compared to the proposed method.The correlations in large data sets by partitioning the data classification accuracy is raised from 97.22% to points into similarity clusters and developed a new 98.3%.algorithm called CoFD and it is based on an anon Abdulrahman Y. Al-Zoman, et al. [9] had done a distance based clustering algorithm for high retrospective study on major skin diseases in the dimensional spaces.period 2001-2005 in the central region of Saudi Castellano, et al. [4] proposed a multistep learning Arabia.The patients' details were collected from strategy called KERNEL (Knowledge Extraction and Riyadh Military Hospital and out of 58450 cases Refinement by Neural Learning).Data sets are split women (58.38%) were most affected than man into 10 subsets and out of which nine are used for (41.62%) [10-11].Most of the diagnoses were done training and one for testing purposes.The age by clinical method and the large volume of patients feature is removed as it contains missing values and affected in the age group of 41-50 years.ESDs were also the results vary with the inclusion of age.

Table 1 :
The comparison of the percentage of the incidences for some common skin diseases in Riyadh and other regions of Saudi Arabia

Table 3 :
[9]tribution of Age base of Sex[9]proposed a new approach for detection of ESDs proposed the use of Combined Neural Networks based upon K-means clustering of data mining (CNNs) model for diagnosis of Erythemato Squamous methods.The proposed methodology is used only and also Multilayer perceptron neural networks for classification and in this paper only 33 features (MLPNNs) is also tested.The network is trained are considered and only five diseases are considered using Levenberg-Marquardt algorithm.The Juanying Xie, et al. [16-17] developed a model based Kenneth Revett, et al.